various stats
2009-Mar-26, Thursday 01:50 amThere's not really a point to the stats that follow. I just spent way too long getting Microsoft SQL Server 2008 installed on my Vista machine, and I want something to show for it. So here is what I found out about my genotyping source file, after I finally got it into a database.
I hadn't really thought about it before, but where there's only one chromosome strand, only one code can be reported. So for the X, Y, and mitochondrial dna addresses, there will be only a single letter reported instead of the usual pair of letters, at least for males. I don't know how the female dataset is reported for the paired X chromosomes that a woman would have. So for the "DD", "II", and "DI" numbers below, they exclude the X, Y, and mitochondrial dna addresses. A "no call" means that the chip could not produce a reliable answer at the tested location. (edit: I'm trying to confirm at the 23andme forum that DD and II really do mean deletion and insertion.)
rows of data: 579,581
no call: 1,340 (includes X and Y chromosomes but not mitochondria)
DD deletion: 32
II insertion: 71
DI mixup: 17
mitochondrial no call: 25
mitochondrial deletion: 0
mitochondrial insertion: 1 (at i4000892, position 8286, rCRS 8285)
At first I was concerned about the mitochondrial insertion, as that would be a significant discovery. Upon googling it, though, I find a few people who complain about this same location. If 23andme cannot produce reliable answers for this address, it may eventually be removed from their reporting files. Apparently this address is a known problem point, depending on your haplogroup (maternal line).
While looking up that address, I also found discussions comparing 23andme to other companies. Some people actually pay for both and then compare result files. One person reported a 99.6% agreement on over 500,000 SNP addresses shared by both companies. That percentage seems low enough to be worrisome, actually. That's still about 200 genetic calls that could be wrong. That'd be fine if we're examining hair color, but not so fine if we're examining a drug response.
I hadn't really thought about it before, but where there's only one chromosome strand, only one code can be reported. So for the X, Y, and mitochondrial dna addresses, there will be only a single letter reported instead of the usual pair of letters, at least for males. I don't know how the female dataset is reported for the paired X chromosomes that a woman would have. So for the "DD", "II", and "DI" numbers below, they exclude the X, Y, and mitochondrial dna addresses. A "no call" means that the chip could not produce a reliable answer at the tested location. (edit: I'm trying to confirm at the 23andme forum that DD and II really do mean deletion and insertion.)
rows of data: 579,581
no call: 1,340 (includes X and Y chromosomes but not mitochondria)
DD deletion: 32
II insertion: 71
DI mixup: 17
mitochondrial no call: 25
mitochondrial deletion: 0
mitochondrial insertion: 1 (at i4000892, position 8286, rCRS 8285)
At first I was concerned about the mitochondrial insertion, as that would be a significant discovery. Upon googling it, though, I find a few people who complain about this same location. If 23andme cannot produce reliable answers for this address, it may eventually be removed from their reporting files. Apparently this address is a known problem point, depending on your haplogroup (maternal line).
While looking up that address, I also found discussions comparing 23andme to other companies. Some people actually pay for both and then compare result files. One person reported a 99.6% agreement on over 500,000 SNP addresses shared by both companies. That percentage seems low enough to be worrisome, actually. That's still about 200 genetic calls that could be wrong. That'd be fine if we're examining hair color, but not so fine if we're examining a drug response.