Mystery Science Theater: 2009
2009-Jun-30, Tuesday 03:22 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Just in time for folk going to CONvergence to have trivia for dazzling the natives! And how appropriate that their webpage shows the MST:3K profile. *laugh* Suppose there was a government experiment in which a person born right now was forced to watch television every single day for the rest of their lives. Is there already enough Hollywood content to meet the need?
Short answer: Yes, but only if you require the subject to also view non-English films.
I've pondered this matter before, but then I said it out loud at Bear Coffee last week that I was thinking about downloading IMDB data and determining for myself if there was already more video/film content than a person could ever watch. I finally got around to following through on the commitment. :)
It turns out that IMDB is a huge mess that somehow, somehow continues to work. It is a collection of flat text files (yes, that's right), each devoted to a particular set of data. Records do not have key fields (yes, really). The primary field is just the name of the movie, which the end user can enter with any combination of double and single quotes that they'd like. (really, such a headache for a programmer trying to do string processing.) The other two fields in the "running-times" table are supposed to indicate the number of minutes and the number of episodes, respectively. But there is no data validation because users can enter anything they want, just like a wiki. Minutes might be listed as a whole number "60", or minutes and seconds "20:47", or a range "28-29", or might contain whatever text notes a person thought to be useful information. (Yes, the internet movie so-called database really is this bad.) Oh, and the field that's supposed to name how long a movie is in minutes, well that's also the place where they dump in the country-of-origin information. *boggle*
Nevertheless, I managed to import it to an OpenOffice database. I produced the following stats:
Where's Joel when you need him?
Short answer: Yes, but only if you require the subject to also view non-English films.
I've pondered this matter before, but then I said it out loud at Bear Coffee last week that I was thinking about downloading IMDB data and determining for myself if there was already more video/film content than a person could ever watch. I finally got around to following through on the commitment. :)
It turns out that IMDB is a huge mess that somehow, somehow continues to work. It is a collection of flat text files (yes, that's right), each devoted to a particular set of data. Records do not have key fields (yes, really). The primary field is just the name of the movie, which the end user can enter with any combination of double and single quotes that they'd like. (really, such a headache for a programmer trying to do string processing.) The other two fields in the "running-times" table are supposed to indicate the number of minutes and the number of episodes, respectively. But there is no data validation because users can enter anything they want, just like a wiki. Minutes might be listed as a whole number "60", or minutes and seconds "20:47", or a range "28-29", or might contain whatever text notes a person thought to be useful information. (Yes, the internet movie so-called database really is this bad.) Oh, and the field that's supposed to name how long a movie is in minutes, well that's also the place where they dump in the country-of-origin information. *boggle*
Nevertheless, I managed to import it to an OpenOffice database. I produced the following stats:
890,100 KB memory needed by OpenOffice to import the data using my OOBasic macro 466,181 records processed 19,926 records that I was unable to clean up well enough to determine the minutes 446,255 records left with countable data 29,506,703 minutes total 491,778 hours total 35,127 days total (allowing 14 hours per day for continuous viewing) 96 years total (which exceeds average lifespan for both males and females)If I limit consideration only to entries that either do not specify the country of origin or mention specifically USA, UK, Canada, and Australia, then I assume I'm looking mostly just at the English movies. Those results are as follows:
22,427,409 minutes English 373,790 hours English 26,699 days English (at 14 hours per day) 73 years total EnglishWhich becomes doable, but just barely squeaking by within the average lifespan. For Americans, the current average male lives 75 years and female lives 80 years. All of my numbers are underestimates, I should point out. I'm not convinced that my macro for cleaning up the episode-count data did a very good job. It looks like most everything was counted as a movie with only 1 episode rather than allowing for tv series which may have had multiple episodes. My totals may revise upwards if I ever decide to further clean up the awful IMDB table.
Where's Joel when you need him?