Algorithms for Calculating DNA Distances Based on the Pair Correlation

Boris  Melnikov

PDF

Published: Dec 3, 2023

Boris Melnikov

Abstract

We consider the generalization of the results of comparing algorithms for calculating DNA distances based on pair correlation. We believe that the “good” algorithms for counting the distances between a pair of given DNA sequences will give a value close to some optimal number; of course, we do not know this number in advance, but it is not necessary for this approach. Therefore, the “best” algorithms give badness values close to 0 for different triangles, and the value of the pair correlation of all triangles of the matrix (their order is N3) is close to 1. Possible deviations from this value for some pair of genomes (associated, for example, with a larger than usual number of mutations in one of these two species) should lead to approximately the same change in the value of badness for any of the triangles in which this pair forms a side.

Therefore, approximately the same increase in the value of badness is formed. Then we believe that considering all the resulting triangles, we have them in ascending order of the value of badness; and we believe that for two “good” distance calculation algorithms should get a relatively large value of the pair correlation. Conversely, the “bad” algorithms that do not correctly estimate genome proximity values are very unlikely to give 0 badness value for the triangle, as well as close badness values if the latter is relatively far from

Therefore, the “good” algorithms in the transition to a very large number of triangles should give a small value of the pair correlation coefficient.

Issue

Vol. 44 No. 12 (2023): Issue 12

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details