mellowtigger | can you identify these data fields?

You're viewing

mellowtigger's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

I want to know the column identifiers for this table of data.
http://www.pnas.org/content/104/28/11694/suppl/DC1#DS

The first two columns are obviously "disease 1" and "disease 2", but I can't figure out the remaining numeric values, N1-N6. I've found the following range of values for each column, if that helps decipher their purpose.
N1: 0 - 86,355.882 (3 decimal places)
N2: 0 - 453.098 (3 decimal places, although the majority of values are just zero)
N3: 0 - 135,775 (integer)
N4: 0 - 135,833 (integer)
N5: 0 - 17,392 (integer)
N6: 0 (integer, all values are zero)

The abstract is provided at the link below, with full text (html or pdf) available in a column to the right.
http://www.pnas.org/content/104/28/11694.abstract

Any ideas? Some of their graphs are interesting, but I want to redo one of them in particular to make it easier to examine.

Threaded | Top-Level Comments Only

From:

maradydd.livejournal.com

Based on the fact that the whole table is comma-separated values, I wonder whether N6 is a separator or line-terminator.

The table is ordered by descending value of N1. It appears that when two or more tuples have equal values for N1, the second sort key is N3.

I hazard a guess that none of these values represent correlation between the two diseases in the tuple, as the article points out that some disease pairs had strong negative correlation and there are no negative values in the data.

I would have thought that N3 and N4 might be number of patients in the study with markers for disease 1 and markers for disease 2, but that doesn't bear out because the data aren't duplicated where you would expect them to be. Nor do N3-N6 map to their four described phenotypes ("has disease 1 but not disease 2", "has D2 but not D1", "has both D1 and D2" and "has neither D1 nor D2") respectively, because in that case N5 would always be less than or equal to N3 and N4, and sometimes it's greater.

A curious puzzle! I am downloading the rest of their data sets over a painfully slow link (wimax + Belgian weather = phail), and will report back if/when I find anything.

Edited Date: 2009-Feb-11, Wednesday 11:24 am (UTC)

From:

mellowtigger

The main document has a section titled "Methods" then "Data.", and it describes what they call a "pentaplet of variables". But the variables they describe don't seem to match the values in the columns. Nowhere else do I find a description of a 5-point set of numbers.

The last sentence in the article says:
"Availability. Detailed information on estimated disease overlaps for all pairs of disorders mentioned in this study is available as SI."

Which is the dataset that I'm interested in remapping. But what does the data mean?! Argh! Always label your data. :)

From:

mellowtigger

p.s. I think I've figured out two of them?
N1 = measure of cooperation between D1 and D2
N2 = measure of competition between D1 and D2

Which still leaves 3 integers to figure out.

Threaded | Top-Level Comments Only

Profile

mellowtigger

About

Mastodon

February 2026

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

the mellow tigger's musings

can you identify these data fields?

can you identify these data fields?

no subject

no subject

no subject

Profile

About

February 2026

Most Popular Tags

COVID-19 News

Science/Tech News

World News

Diversity News