can you identify these data fields?
2009-Feb-10, Tuesday 05:49 pmI want to know the column identifiers for this table of data.
http://www.pnas.org/content/104/28/11694/suppl/DC1#DS
The first two columns are obviously "disease 1" and "disease 2", but I can't figure out the remaining numeric values, N1-N6. I've found the following range of values for each column, if that helps decipher their purpose.
N1: 0 - 86,355.882 (3 decimal places)
N2: 0 - 453.098 (3 decimal places, although the majority of values are just zero)
N3: 0 - 135,775 (integer)
N4: 0 - 135,833 (integer)
N5: 0 - 17,392 (integer)
N6: 0 (integer, all values are zero)
The abstract is provided at the link below, with full text (html or pdf) available in a column to the right.
http://www.pnas.org/content/104/28/11694.abstract
Any ideas? Some of their graphs are interesting, but I want to redo one of them in particular to make it easier to examine.
http://www.pnas.org/content/104/28/11694/suppl/DC1#DS
The first two columns are obviously "disease 1" and "disease 2", but I can't figure out the remaining numeric values, N1-N6. I've found the following range of values for each column, if that helps decipher their purpose.
N1: 0 - 86,355.882 (3 decimal places)
N2: 0 - 453.098 (3 decimal places, although the majority of values are just zero)
N3: 0 - 135,775 (integer)
N4: 0 - 135,833 (integer)
N5: 0 - 17,392 (integer)
N6: 0 (integer, all values are zero)
The abstract is provided at the link below, with full text (html or pdf) available in a column to the right.
http://www.pnas.org/content/104/28/11694.abstract
Any ideas? Some of their graphs are interesting, but I want to redo one of them in particular to make it easier to examine.
no subject
Date: 2009-Feb-11, Wednesday 11:23 am (UTC)The table is ordered by descending value of N1. It appears that when two or more tuples have equal values for N1, the second sort key is N3.
I hazard a guess that none of these values represent correlation between the two diseases in the tuple, as the article points out that some disease pairs had strong negative correlation and there are no negative values in the data.
I would have thought that N3 and N4 might be number of patients in the study with markers for disease 1 and markers for disease 2, but that doesn't bear out because the data aren't duplicated where you would expect them to be. Nor do N3-N6 map to their four described phenotypes ("has disease 1 but not disease 2", "has D2 but not D1", "has both D1 and D2" and "has neither D1 nor D2") respectively, because in that case N5 would always be less than or equal to N3 and N4, and sometimes it's greater.
A curious puzzle! I am downloading the rest of their data sets over a painfully slow link (wimax + Belgian weather = phail), and will report back if/when I find anything.
no subject
Date: 2009-Feb-11, Wednesday 01:31 pm (UTC)The last sentence in the article says:
"Availability. Detailed information on estimated disease overlaps for all pairs of disorders mentioned in this study is available as SI."
Which is the dataset that I'm interested in remapping. But what does the data mean?! Argh! Always label your data. :)
no subject
Date: 2009-Feb-11, Wednesday 01:41 pm (UTC)N1 = measure of cooperation between D1 and D2
N2 = measure of competition between D1 and D2
Which still leaves 3 integers to figure out.