K-Means Clustering and Analyze of SARS-CoV 2 DNA based on Multiple Encoding Vector and K-Mer Method
Main Article Content
Abstract
According to WHO data, coronavirus or Severe Acute Respiratory Syndrome Coronavirus 2 (SARS CoV-2) affected more than 172.6 million people worldwide in early June 2021.This virus targets human breathing, causing lung infections and even death in humans. This virus targets human respiration, causing lung infections and even death in humans. Based on this information, it is vital to investigate the coronavirus's kinship to limit its spread. This study uses the K-Means Clustering method in grouping and uses Multiple Encoding Vector in analyzing the sequences. The sequence analysis results resulted in an 18-dimensional multiple encoding vector compared with the K-Mer method based on the translation of DNA codons into amino acids. DNA Sequences of SARS CoV-2 were collected from numerous affected countries for this investigation. The simulation results found that the DNA sequence of SARS CoV-2 consisted of two clusters and the second cluster was the group that had the most members. The results also show that this method is optimal in a grouping of data with the between ss/total ss is 81.4%.