Fine-Grained classification models can expressly focus on the relevant details useful to distinguish highly similar classes typically when the intra-class variance is high and the inter-class variance is low given a dataset. Most of these models use part annotations as bounding box, location part, text attributes to enhance the performance of classification and other models use sophisticated techniques to extract an attention map automatically. We assume that part-based approaches as the automatic cropping method suffers from a missing representation of local features, which are fundamental to distinguish similar objects. While Fine-Grained classification endeavours to recognize the leaf of a graph, humans recognize an object trying also to make a semantic association. In this paper, we use the semantic association structured as a hierarchy (taxonomy) as supervised signals and used them in an end-to-end deep neural network model termed as EnGraf-Net. Extensive experiments on three well-known datasets: Cifar-100, CUB-200-2011 and FGVC-Aircraft prove the superiority of EnGraf-Net over many Fine-Grained models and it is competitive with the most recent best models without using any cropping technique or manual annotations.
EnGraf-Net: Multiple Granularity Branch Network with Fine-Coarse Graft Grained for Classification Task
La Grassa R.;Gallo I.;Landro N.
2021-01-01
Abstract
Fine-Grained classification models can expressly focus on the relevant details useful to distinguish highly similar classes typically when the intra-class variance is high and the inter-class variance is low given a dataset. Most of these models use part annotations as bounding box, location part, text attributes to enhance the performance of classification and other models use sophisticated techniques to extract an attention map automatically. We assume that part-based approaches as the automatic cropping method suffers from a missing representation of local features, which are fundamental to distinguish similar objects. While Fine-Grained classification endeavours to recognize the leaf of a graph, humans recognize an object trying also to make a semantic association. In this paper, we use the semantic association structured as a hierarchy (taxonomy) as supervised signals and used them in an end-to-end deep neural network model termed as EnGraf-Net. Extensive experiments on three well-known datasets: Cifar-100, CUB-200-2011 and FGVC-Aircraft prove the superiority of EnGraf-Net over many Fine-Grained models and it is competitive with the most recent best models without using any cropping technique or manual annotations.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.