Tags: bioinformed/vgraph
Tags
Bfx 942 kbj rework ar (#31) Fix two bugs in the haplotype counting and read depth code used by the dbmatch2 command. These resulted in cases where the ploidies or read depths were incorrect (i.e. match annotations), but did not alter any matches: 1. The allele depth code was not smart enough when computing reference read counts when no fully-reference haplotypes were called. 2. The reference haplotype counting was overly permissive in matching reference, which resulted in some cases where the reference haplotype count was too high at the expense of "other" haplotypes. As a side-effect of the above fixes, the read counter will no longer take the mean of allele depths across each haplotype. This new version takes minimums over the entire haplotype instead. As these are both imperfect approximations, there will be cases where one works seems to work better than the other. However, the changes to the counts will generally be minor. The choice to switch from mean to minimum was not abitrary, but is intended to simplify the code and avoid (more) edge cases.
Matching indels requires a dynamic amount of reference padding to accommodate position uncertainty. Attempting to match with a smaller window will result in substantial imprecision and seemingly pathological behavior. Thus the code has been updated to use the full dynamic window, rather than allowing a fixed amount of reference padding. The impact in practice will be more sensible calls, fewer false positives, but at the expense of more no-call due to the larger regions considered. Some other minor code cleanups were added, including making all debug output go to stderr and only when requested via the --debug command line option.