We present the results of measuring collocation similarity between Twilight (Meyer 2005) and 50 Shades of Grey (James 2011) . 50 Shades began as Twilight-fanfiction (Brennan and Large 2014). We use these texts for a case study analyzing the transformative effects of fanfiction on the narratives that fans call “canon”. Tosenberger (2014:17) asserts that “fanfiction is given life by what other spaces don’t allow, it […] fills those spaces with stories for which the canon has neither room nor desire.” Fanfiction is a narrative space to explore non-normative topics and perspectives.. Twilight narrates the romance of a teenage girl and her vampire boyfriend. 50 Shades amplifies the mostly unconsummated sexual tension in Twilight and eliminates the novel’s supernatural elements. In 50 Shades, the male protagonist is dangerous not because he is a vampire, but because of his S/M inclinations. Our challenge is to model and quantify these transformations computationally.
Paris (2016) classifies 50 Shades as “mommy porn” while Twilight has been called “abstinence porn” (Seifert 2005). In its vocabulary and collocations, Twilight seems like a non-explicit model for 50 Shades. To test this, we make an educated guess by initially selecting four terms: “soft”, “hard”, “gaze”, and “stare”. We hypothesize that words collocating with “soft” and “hard” differ between texts: in Twilight, Edward’s skin is hard, while in 50 Shades Christian’s penis is hard. Additionally, subjects and objects of stares and gazes differ between texts, with looks conveying love or longing in Twilight while conveying sexual desire in 50 Shades. For each appearance of these terms we compute the pointwise mutual information (PMI) for collocated words in a 9-token context. PMI expresses the probability of a collocation occurring given the occurrence of the individual words (Bouma 2009:3). Window size was based on mean sentence length. A baseline for comparison was computed from the same measure for the YA-novels Eleanor & Park (Rowell 2012), The Fault in Our Stars (Green 2012), and Shiver (Stiefvater 2009). We used Linguistic Inquiry & Word Count (Pennebaker et al. 2015) to calculate the percentage of words related to a specific domain as defined by LIWC’s dictionaries within the PMI-results. We then compared the percentage of words associated with the selected terms between the books and compared the LIWC-results for the PMI-data to the LIWC-results for the books as a whole.
Analyzing tokens identified by PMI as significantly collocated with the target words, more words from the LIWC-category “perception” occur around the term “hard” in 50 Shades than in Twilight (9% vs. 7%). Twilight shows more perceptions-terms around “soft” (12% vs. 15%). Thus, perceptions are more frequently described as “hard” in 50 Shades and more frequently as “soft” in Twilight. In Twilight more verbs occurred around “soft” (20%) than in 50 Shades, where 13% of significantly collocated tokens for “soft” were verbs. This suggests that more “soft” actions are taken in Twilight than in 50 Shades, which would fit our hypothesis. In 50 Shades, the word “stare” more frequently occurred near words relating to biological features or processes (9%) than in Twilight (3%). Similarly, “gaze” occurred around words relating to biological processes in 50 Shades (7% ) and only 5% in Twilight. It thus appears that biological processes and parts of the body are often looking and being looked at in both texts, but more frequently in 50 Shades.
Our analysis confirms that Twilight is non-explicit: it scores 0,01% in LIWC’s sexuality- and swearing-categories. In LIWC-categories relating to the social and to perception, the texts’ score similarly. Our results seem to confirm the hypothesis that Twilight can be regarded as the non-explicit counterpart to 50 Shades. As a next step, we intend to examine the difference in gender-related words in the texts: 4% of words for Twilight and 5% for 50 Shades were male-related, with only 1% female-related words in both . Intuitively this makes sense as both are first-person narratives by female narrators focused on their male love interests . However, male-related words were less frequent in the PMI-results for the selected terms than in the texts as a whole.
Combining PMI and LIWC-results, we developed a method to compare collocations of specific words between texts. This method is a step towards digital hermeneutics, the possibility of “interpreting with digital machines” (Romele, Severo, and Furia 2020:73). During our presentation, we will present more detailed results, baseline comparisons, and will consider possibilities to improve their evaluation and discuss possible next steps such as analysis of word embeddings.