The Big Picture

Michael Patel
5 min readDec 31, 2020

I documented everything I watched this year (sans YouTube videos of how to do my job). Then, the pandemic struck, and I instantly doubled, maybe even tripled(?!), my viewing habits on Netflix, Hulu, and insert your favourite recognizable streaming service here. I tracked what I watched through a Google spreadsheet, with categories like Title, Platform, Genre, and Score.

Did I learn anything new about myself? Not entirely sure. But I decided to explore the data as a mini quarantine project. Below are some data visualizations annotated by some snarky comments. And here’s the code on my GitHub if you are interested.

I watched over 200+ titles on several different viewing platforms (Netflix, Hulu, Peacock, etc.) across 7 arbitrarily defined genre categories (Animated, Blockbuster, Comedy, Drama, Horror, Scifi, and Wildcard) with arbitrarily assigned user ratings (-2, -1, 0, +1, +2). Almost 60% of everything I watched was on Netflix. The few theatre data points were all in January, prior to the U.S. quarantine.

If you’re like me and get rather bored and distracted easily, then feast your eyes on the same streaming service data visualized differently using a more exciting racing bar chart.

Racing bar charts are my favourite chart type at the moment. As such, here is another unnecessary racing bar chart illustrating the different genres I watched. And yes, the genre categories are illogical and incoherent. (What does Wildcard or Blockbuster even mean when hardly anything was released theatrically in 2020?)

Comedy and Drama dominated the genre types. Everything in life is either a comedy or a tragedy. The final genre counts broke down as such:

Not everything requires deep learning, but deep learning makes everything so much more fun. Hence, I created a text classifier using TensorFlow that takes a feature title as input and attempts to predict its genre type. Here are a few examples from my model which I proudly call, My Garbage Text Classifier. None of these titles were part of the training data.

Since the training data distribution was so heavily imbalanced towards Comedy and Drama, the classifier predictions were skewed towards predicting Comedy or Drama…with very high (and dismaying) confidence. Adding more training data for the other genre types or re-evaluating the validation set would be where I would start to improve My Garbage Text Classifier.

To dig deeper, I scored what I watched using a scale [-2, -1, 0, +1, +2] which was intentionally highly subjective. -2 indicates that I strongly disliked something, while +2 signified that I was very impressed with something. Confounding factors such as my day-to-day mood were massive influences since my scores were not aggregated with other people (small sample size alert!).

Even though I watched many Comedies and Dramas, their scores tended to average out. For some of the other genre categories, I had far more favourable viewing experiences. It may be that I specifically tried to find titles I would think I would enjoy beforehand. In other words, a title, thumbnail image, or short description probably caught my initial attention, and because I specifically chose that title out of thousands of other choices, I was already inclined to like it. Furthermore, all the genres except Comedy and Drama had very small sample sizes.

To have some more fun, I decided to also plot titles based on the frequencies of the first alphanumeric character in their title. No surprise that titles beginning with “The” led to a high frequency of “T”. Maybe if you’re going to make a movie, name it with an “A”, “B”, “C”, “S”, or “T” at the beginning. In other words, “The” and “America” are great ways to pseudo-manipulate SEO algorithms in the U.S.

Lastly, I wanted to explore the data a little further with some qualitative comments. I watched several Paul Thomas Anderson movies, and I still don’t “get” PTA. Other auteurs like Tarantino, I understand, but I was not a fan of PTA’s work. Five titles with an average score of -0.4.

Meanwhile, for some reason, I watched several Zoey Deutch features and had relatively positive experiences (average score: +0.5). So if this was like basketball, which most of life is, then Zoey Deutch was akin to the Most Improved Player and PTA was the potential of a Joel Embiid-Ben Simmons nucleus.

Another observation I made only after obsessively tracking what I watched this year was how many first halves of movies are better than their second halves. Movies tend to start strong and end weakly. Even the great Citizen Kane fell into this trap. This is not to say that there aren’t movies with fantastic endings (Den of Thieves, Zodiac), but rather that they typically lose steam over the course of their running time. Additionally, there is a corollary which I call the “20 minute syndrome”. The crux of the corollary is that movies peak in the first 20 minutes. This is especially true of Comedies, where many of the best jokes and set-ups are in the first 20 minutes before diving into the more plodding plot and subsequent character arc elements.

Although, the most ponderous question stemming from this year’s data may be: Does Alita: Battle Angel count as Animated?

Maybe next year, I will track my podcast listening habits instead in order to leave open the potential for a “Hollywood sequel” of sorts.

And here’s the code on my GitHub if you are interested.

--

--