• English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Српски
    • Српски (Serbia)
  • Login
View Item 
  •   DSpace Home
  • NIV-NS
  • Zbornici
  • View Item
  •   DSpace Home
  • NIV-NS
  • Zbornici
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Primena mašinskog učenja za određivanje podtipova, patotipova i linija različitih virusa korišćenjem sekvenci celih genoma ili gena

Thumbnail
View/Open
primgv22.pdf (1013.Kb)
Date
2022-04-27
Author
Gajdov, Vladimir
Banović Đeri, Bojana
Samojlović, Milena
Lupulović, Diana
Lazić, Gospava
Vidanović, Dejan
Petrović, Tamaš
Metadata
Show full item record
Abstract
Many disease-causing viruses are clustered into subtypes, pathotypes, variants or lineages with clinical significance. Most methods for viral genome classification require the alignment of the input sequence against predefined reference sequences, which enables algorithms to compare homologous sequence features which can be computationally expensive. Moreover, highly divergent genome regions may affect the alignment algorithm’s performance. In order to overcome these obstacles, various machine learning (ML) algorithms have been used for viral genome classification. In this work, an alignment-free artificial intelligence approach has been implemented for the determination of avian influenza virus (AIV) subtype by using hemagglutinin (HA) and neuraminidase (NA) genomic sequences, for differentiating between highly and low pathogenic H5 AIV by using the HA gene sequences, for differentiating between West Nile virus (WNV) lineage 1 and 2 and for the determination of different SARS-CoV-2 variants by using whole genome sequences for both viruses. From the NCBI GenBank, hundred publicly available, randomly chosen unique, both complete and partial coding HA and NA sequences were retrieved for each H and N subtype, except for H14 and H15 for which 47 and 23 sequences were retrieved respectively, given that those were the only available sequences, whereas for WNV and SARSCoV- 2 whole genome sequences were retrieved. For training of the ML models, the data was randomly split into training (80%) and test data (20%). The accuracy, F1, precision and recall scores were evaluated for all models by using a confusion matrix. The empirical results showed that all models performed the classification task with scores >99% which suggests that this approach could be applied for accurate classification of viral genome sequences. However, this dataset is relatively small, so in order to evaluate these ML models further, more samples and sequences of different length should be used.
URI
https://repo.niv.ns.ac.rs/xmlui/handle/123456789/493
Collections
  • Zbornici

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV