This study investigates how the learning order between segmentation and classification tasks influences performance and generalization in medical image analysis. We propose a Sequential Swin Transformer framework that reuses a shared Transformer backbone with alternating task-specific heads to compare two sequential strategies: (i) segmentation followed by classification and (ii) classification followed by segmentation. Unlike conventional multitask or preprocessing-based pipelines, the proposed framework isolates the impact of task ordering on feature transfer under an identical architecture. Evaluated on the HAM10000 skin lesion dataset, the segmentation-then-classification configuration achieves the highest multiclass accuracy (up to 86.9%) while maintaining strong segmentation performance (Jaccard index ≈ 86%). Statistical tests confirm its superiority in accuracy and macro F1 score, whereas Grad-CAM and t-distributed stochastic neighbor embedding (t-SNE) analyses reveal that segmentation-first training yields more lesion-centered attention and a more discriminative latent space. Cross-domain evaluation on gastrointestinal endoscopy images further demonstrates robust segmentation (Jaccard index ≈ 91%) and multiclass accuracy (≈94.5%), confirming the generalizability of the sequential paradigm. Overall, the proposed method provides a theoretically grounded, clinically interpretable, and reproducible alternative to joint multitask learning approaches, enhancing feature transfer and generalization in medical imaging.
A Sequential Segmentation and Classification Learning Approach for Skin Lesion Images
Gallazzi, Mirco
;Gallo, Ignazio;Corchs, Silvia
2025-01-01
Abstract
This study investigates how the learning order between segmentation and classification tasks influences performance and generalization in medical image analysis. We propose a Sequential Swin Transformer framework that reuses a shared Transformer backbone with alternating task-specific heads to compare two sequential strategies: (i) segmentation followed by classification and (ii) classification followed by segmentation. Unlike conventional multitask or preprocessing-based pipelines, the proposed framework isolates the impact of task ordering on feature transfer under an identical architecture. Evaluated on the HAM10000 skin lesion dataset, the segmentation-then-classification configuration achieves the highest multiclass accuracy (up to 86.9%) while maintaining strong segmentation performance (Jaccard index ≈ 86%). Statistical tests confirm its superiority in accuracy and macro F1 score, whereas Grad-CAM and t-distributed stochastic neighbor embedding (t-SNE) analyses reveal that segmentation-first training yields more lesion-centered attention and a more discriminative latent space. Cross-domain evaluation on gastrointestinal endoscopy images further demonstrates robust segmentation (Jaccard index ≈ 91%) and multiclass accuracy (≈94.5%), confirming the generalizability of the sequential paradigm. Overall, the proposed method provides a theoretically grounded, clinically interpretable, and reproducible alternative to joint multitask learning approaches, enhancing feature transfer and generalization in medical imaging.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



