Show simple item record

dc.contributor.authorGajdov, Vladimir
dc.contributor.authorBanović Đeri, Bojana
dc.contributor.authorSamojlović, Milena
dc.contributor.authorLupulović, Diana
dc.contributor.authorLazić, Gospava
dc.contributor.authorVidanović, Dejan
dc.contributor.authorPetrović, Tamaš
dc.date.accessioned2022-05-12T08:40:58Z
dc.date.available2022-05-12T08:40:58Z
dc.date.issued2022-04-27
dc.identifier.isbn978-86-83115-45-7
dc.identifier.urihttps://repo.niv.ns.ac.rs/xmlui/handle/123456789/493
dc.description.abstractMany disease-causing viruses are clustered into subtypes, pathotypes, variants or lineages with clinical significance. Most methods for viral genome classification require the alignment of the input sequence against predefined reference sequences, which enables algorithms to compare homologous sequence features which can be computationally expensive. Moreover, highly divergent genome regions may affect the alignment algorithm’s performance. In order to overcome these obstacles, various machine learning (ML) algorithms have been used for viral genome classification. In this work, an alignment-free artificial intelligence approach has been implemented for the determination of avian influenza virus (AIV) subtype by using hemagglutinin (HA) and neuraminidase (NA) genomic sequences, for differentiating between highly and low pathogenic H5 AIV by using the HA gene sequences, for differentiating between West Nile virus (WNV) lineage 1 and 2 and for the determination of different SARS-CoV-2 variants by using whole genome sequences for both viruses. From the NCBI GenBank, hundred publicly available, randomly chosen unique, both complete and partial coding HA and NA sequences were retrieved for each H and N subtype, except for H14 and H15 for which 47 and 23 sequences were retrieved respectively, given that those were the only available sequences, whereas for WNV and SARSCoV- 2 whole genome sequences were retrieved. For training of the ML models, the data was randomly split into training (80%) and test data (20%). The accuracy, F1, precision and recall scores were evaluated for all models by using a confusion matrix. The empirical results showed that all models performed the classification task with scores >99% which suggests that this approach could be applied for accurate classification of viral genome sequences. However, this dataset is relatively small, so in order to evaluate these ML models further, more samples and sequences of different length should be used.en_US
dc.description.sponsorshipThis report was funded by Ministry of Education, Science and Technological development of Republic of Serbia by the Contract of implementation and funding of research work of NIV-NS in 2022, Contract No: 451-03-68/2022-14/200031en_US
dc.language.isootheren_US
dc.publisherSVD, Sekcija za zoonoze, Beograd (Srbija)en_US
dc.sourceZbornik kratkih sadržaja, XXIV Simpozijum epizootiologa i epidemiologa (XXIV Epizootiološki dani), Suboticasr
dc.subjectartificial intelligenceen_US
dc.subjectmachine learningen_US
dc.subjectgenomic sequences classificationen_US
dc.titlePrimena mašinskog učenja za određivanje podtipova, patotipova i linija različitih virusa korišćenjem sekvenci celih genoma ili genaen_US
dc.title.alternativeMachine learning approach for the determination of viral subtypes, pathotypes and lineages by using whole genome or gene sequencesen_US
dc.title.alternativeen_US
dc.typeArticleen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Zbornici
    Saopštenja sa naučnih skupova

Show simple item record