Primena mašinskog učenja za određivanje podtipova, patotipova i linija različitih virusa korišćenjem sekvenci celih genoma ili gena

Gajdov, Vladimir; Banović Đeri, Bojana; Samojlović, Milena; Lupulović, Diana; Lazić, Gospava; Vidanović, Dejan; Petrović, Tamaš

dc.contributor.author	Gajdov, Vladimir
dc.contributor.author	Banović Đeri, Bojana
dc.contributor.author	Samojlović, Milena
dc.contributor.author	Lupulović, Diana
dc.contributor.author	Lazić, Gospava
dc.contributor.author	Vidanović, Dejan
dc.contributor.author	Petrović, Tamaš
dc.date.accessioned	2022-05-12T08:40:58Z
dc.date.available	2022-05-12T08:40:58Z
dc.date.issued	2022-04-27
dc.identifier.isbn	978-86-83115-45-7
dc.identifier.uri	https://repo.niv.ns.ac.rs/xmlui/handle/123456789/493
dc.description.abstract	Many disease-causing viruses are clustered into subtypes, pathotypes, variants or lineages with clinical significance. Most methods for viral genome classification require the alignment of the input sequence against predefined reference sequences, which enables algorithms to compare homologous sequence features which can be computationally expensive. Moreover, highly divergent genome regions may affect the alignment algorithm’s performance. In order to overcome these obstacles, various machine learning (ML) algorithms have been used for viral genome classification. In this work, an alignment-free artificial intelligence approach has been implemented for the determination of avian influenza virus (AIV) subtype by using hemagglutinin (HA) and neuraminidase (NA) genomic sequences, for differentiating between highly and low pathogenic H5 AIV by using the HA gene sequences, for differentiating between West Nile virus (WNV) lineage 1 and 2 and for the determination of different SARS-CoV-2 variants by using whole genome sequences for both viruses. From the NCBI GenBank, hundred publicly available, randomly chosen unique, both complete and partial coding HA and NA sequences were retrieved for each H and N subtype, except for H14 and H15 for which 47 and 23 sequences were retrieved respectively, given that those were the only available sequences, whereas for WNV and SARSCoV- 2 whole genome sequences were retrieved. For training of the ML models, the data was randomly split into training (80%) and test data (20%). The accuracy, F1, precision and recall scores were evaluated for all models by using a confusion matrix. The empirical results showed that all models performed the classification task with scores >99% which suggests that this approach could be applied for accurate classification of viral genome sequences. However, this dataset is relatively small, so in order to evaluate these ML models further, more samples and sequences of different length should be used.	en_US
dc.description.sponsorship	This report was funded by Ministry of Education, Science and Technological development of Republic of Serbia by the Contract of implementation and funding of research work of NIV-NS in 2022, Contract No: 451-03-68/2022-14/200031	en_US
dc.language.iso	other	en_US
dc.publisher	SVD, Sekcija za zoonoze, Beograd (Srbija)	en_US
dc.source	Zbornik kratkih sadržaja, XXIV Simpozijum epizootiologa i epidemiologa (XXIV Epizootiološki dani), Subotica	sr
dc.subject	artificial intelligence	en_US
dc.subject	machine learning	en_US
dc.subject	genomic sequences classification	en_US
dc.title	Primena mašinskog učenja za određivanje podtipova, patotipova i linija različitih virusa korišćenjem sekvenci celih genoma ili gena	en_US
dc.title.alternative	Machine learning approach for the determination of viral subtypes, pathotypes and lineages by using whole genome or gene sequences	en_US
dc.title.alternative		en_US
dc.type	Article	en_US

Files in this item

Name:: primgv22.pdf
Size:: 1013.Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Zbornici
Saopštenja sa naučnih skupova

Show simple item record