Observing variations of LGBTQ+ movies scores across languages

data cleaning
exploratory data analysis
plot aesthetics

Arindam Baruah


July 14, 2024

1 Introduction

This week’s TidyTuesday dataset dealt with LGBTQ+ movie data released across the globe and the interesting information that we can extract out of it. 🏳️‍🌈🏳️‍🌈

More details on this dataset here 👈

2 Methodology

The source of the data and the code used to obtain Figure 1 is delineated through Section 2.1 to Section 2.4.

2.1 Sourcing the data

lgbtq_movies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-06-25/lgbtq_movies.csv')

2.2 Setting the aesthetic theme


2.3 Data wrangling

top_langs <- lgbtq_movies %>% group_by(original_language) %>% 
  summarise(Total = n()) %>% arrange(-Total) %>% head(6)  # Top 6 most released movies by language

lgbtq_movies <- lgbtq_movies %>% filter(original_language %in% top_langs$original_language)
lgbtq_movies <- lgbtq_movies %>% mutate(original_lang = case_when(original_language == "en" ~ "English",
                                                  original_language == "pt" ~ "Portugese",
                                                  original_language == "ja" ~ "Japanese",
                                                  original_language == "fr" ~ "French",
                                                  original_language == "es" ~ "Spanish",
                                                  original_language == "de" ~ "German",
                                                  .default = original_language))

2.4 Data Visualisation

lgbtq_colors <- c("#FF0018", "#FFA52C", "#FFFF41", "#008018", "#0000F9", "#86007D", "#8B4513", "#FFD700")

title_text <- "Average vote score distribution by language in LGBTQ+ Movies"

subtitle_text <- "Visualizing the average vote scores across LGBTQ+ categorized movies based on their respective languages reveals <br> an intriguing pattern. The overall distribution of vote scores appears to be <strong><span style='color: darkred;'>bimodal</span></strong>, with noticeable peaks at scores <br> of 0 and 6. This suggests that viewers tend to either strongly dislike these movies or find them to be average. <br> Additionally, it's interesting to note that <strong> English-language </strong> movies exhibit a higher frequency of low scores compared <br> to movies in other languages. This may indicate differing audience preferences or varying production quality <br> across different language groups."

caption_text <- str_glue("{li} Arindam Baruah | {X_icon} @wizsights | {gh} arinbaruah | Source: TidyTuesday |#rstudio #ggplot2")

pl <- lgbtq_movies %>% ggplot(aes(vote_average,original_lang)) + 
  geom_density_ridges(aes(fill = factor(original_lang)), color = "grey30", linewidth = .25, alpha = .9) +
  scale_fill_manual(values = lgbtq_colors) +
  geom_vline(xintercept = c(0,5,10), linewidth = .3, linetype = "dotted", lineend = "round",alpha = 0.5) +
  labs(x = "Average Vote Score",
       title = title_text,
       subtitle = subtitle_text,
       caption = caption_text) +
2.5 Final visualisation

Figure 1: Average vote score distribution by language in LGBTQ+ Movies

Figure 1 illustrates the average vote scores of all the LGBTQ+ categorised movies across different languages. For the current analysis, languages containing at least 100 or greater number of movies were considered.

So what do we learn from this ? 🤔🕵🏻‍♀️

Key takeaways
  • It is interesting to note how the average vote score was bimodal across all languages. 🧑‍💻
  • In general, English LGBTQ+ movies scored lower than others. 🤯

3 References

