The extensive genetic diversity of HIV-1, also represented by the circulation of multiple subtypes and circulating recombinant forms (CRFs), poses significant challenges for accurate subtype classification, especially when sequencing is limited to partial genomic regions. This study evaluated the performance of four commonly used automated subtyping tools (Stanford HIVdb, COMET, REGA, and Geno2pheno) by comparing their outputs with molecular phylogenetic analysis (Mphy), considered the gold standard, using three NGS-derived sequence data sets: protease-reverse transcriptase (PR-RT), pol, and near full-length (NFL). One hundred plasma samples were processed to generate sequences of increasing length, which were analyzed to assess concordance, sensitivity, and specificity. NFL-based Mphy identified a higher proportion of circulating recombinant forms (51.6%) than PR-RT and pol (44.1%) and enabled the reclassification of 13 samples as more complex CRFs. Automated tools displayed good concordance with Mphy for PR-RT and pol, particularly for pure subtypes, whereas concordance decreased considerably for NFL sequences, especially among non-B subtypes and CRFs. Sensitivity varied substantially across tools and subtypes, while specificity remained consistently high. Overall, the findings indicate that whole genome or NFL sequencing enhances the detection of CRFs and that the accuracy of automated tools is strongly influenced by the completeness and updating of their reference databases.
Improved HIV-1 Subtyping Accuracy Using near Full-Length Sequencing: A Comparison of Common Tools
Maggi F.;
2025-01-01
Abstract
The extensive genetic diversity of HIV-1, also represented by the circulation of multiple subtypes and circulating recombinant forms (CRFs), poses significant challenges for accurate subtype classification, especially when sequencing is limited to partial genomic regions. This study evaluated the performance of four commonly used automated subtyping tools (Stanford HIVdb, COMET, REGA, and Geno2pheno) by comparing their outputs with molecular phylogenetic analysis (Mphy), considered the gold standard, using three NGS-derived sequence data sets: protease-reverse transcriptase (PR-RT), pol, and near full-length (NFL). One hundred plasma samples were processed to generate sequences of increasing length, which were analyzed to assess concordance, sensitivity, and specificity. NFL-based Mphy identified a higher proportion of circulating recombinant forms (51.6%) than PR-RT and pol (44.1%) and enabled the reclassification of 13 samples as more complex CRFs. Automated tools displayed good concordance with Mphy for PR-RT and pol, particularly for pure subtypes, whereas concordance decreased considerably for NFL sequences, especially among non-B subtypes and CRFs. Sensitivity varied substantially across tools and subtypes, while specificity remained consistently high. Overall, the findings indicate that whole genome or NFL sequencing enhances the detection of CRFs and that the accuracy of automated tools is strongly influenced by the completeness and updating of their reference databases.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



