dc.contributor.author | Gajdov, Vladimir | |
dc.contributor.author | Banović Đeri, Bojana | |
dc.contributor.author | Samojlović, Milena | |
dc.contributor.author | Lupulović, Diana | |
dc.contributor.author | Lazić, Gospava | |
dc.contributor.author | Vidanović, Dejan | |
dc.contributor.author | Petrović, Tamaš | |
dc.date.accessioned | 2022-05-12T08:40:58Z | |
dc.date.available | 2022-05-12T08:40:58Z | |
dc.date.issued | 2022-04-27 | |
dc.identifier.isbn | 978-86-83115-45-7 | |
dc.identifier.uri | https://repo.niv.ns.ac.rs/xmlui/handle/123456789/493 | |
dc.description.abstract | Many disease-causing viruses are clustered into subtypes, pathotypes, variants or
lineages with clinical significance. Most methods for viral genome classification
require the alignment of the input sequence against predefined reference sequences,
which enables algorithms to compare homologous sequence features which can be
computationally expensive. Moreover, highly divergent genome regions may affect
the alignment algorithm’s performance. In order to overcome these obstacles, various
machine learning (ML) algorithms have been used for viral genome classification. In
this work, an alignment-free artificial intelligence approach has been implemented for
the determination of avian influenza virus (AIV) subtype by using hemagglutinin
(HA) and neuraminidase (NA) genomic sequences, for differentiating between highly
and low pathogenic H5 AIV by using the HA gene sequences, for differentiating
between West Nile virus (WNV) lineage 1 and 2 and for the determination of different
SARS-CoV-2 variants by using whole genome sequences for both viruses. From the
NCBI GenBank, hundred publicly available, randomly chosen unique, both complete
and partial coding HA and NA sequences were retrieved for each H and N subtype,
except for H14 and H15 for which 47 and 23 sequences were retrieved respectively,
given that those were the only available sequences, whereas for WNV and SARSCoV-
2 whole genome sequences were retrieved. For training of the ML models, the
data was randomly split into training (80%) and test data (20%). The accuracy, F1,
precision and recall scores were evaluated for all models by using a confusion matrix.
The empirical results showed that all models performed the classification task with
scores >99% which suggests that this approach could be applied for accurate
classification of viral genome sequences. However, this dataset is relatively small, so
in order to evaluate these ML models further, more samples and sequences of different
length should be used. | en_US |
dc.description.sponsorship | This report was funded by Ministry of Education, Science and Technological
development of Republic of Serbia by the Contract of implementation and funding of research work of
NIV-NS in 2022, Contract No: 451-03-68/2022-14/200031 | en_US |
dc.language.iso | other | en_US |
dc.publisher | SVD, Sekcija za zoonoze, Beograd (Srbija) | en_US |
dc.source | Zbornik kratkih sadržaja, XXIV Simpozijum epizootiologa i epidemiologa (XXIV Epizootiološki dani), Subotica | sr |
dc.subject | artificial intelligence | en_US |
dc.subject | machine learning | en_US |
dc.subject | genomic sequences classification | en_US |
dc.title | Primena mašinskog učenja za određivanje podtipova, patotipova i linija različitih virusa korišćenjem sekvenci celih genoma ili gena | en_US |
dc.title.alternative | Machine learning approach for the determination of viral subtypes, pathotypes and lineages by using whole genome or gene sequences | en_US |
dc.title.alternative | | en_US |
dc.type | Article | en_US |