A stylometric analysis of Joseph Conrad’s writing
Botha, Lande; Van Zyl, Maryka; Pienaar, Wikus
The language and style in the writings of Joseph Conrad, a multilingual, non-native speaker of English, have been the topic of various studies (Monod, 2005; Peters, 2006; Ophir, 2014; Simmons, 2014). Dowden (1973), Lucas (1991), Stubbs (2005), Moon (2007), Nofal, (2013), Hunter and Smith (2014) have paid a great deal of attention to different aspects of grammar in Conrad’s writing in order to describe his idiosyncratic style. These studies mostly focus on analysing and describing solitary or a small handful of texts in terms of their socio-cultural, political and personal affect. Digital versions of Conrad‟s works and digital analysis tools now make it possible to conduct a quantitative study of Conrad‟s linguistic style in all of his writing (both fiction and non-fiction) spanning his working life. The aim of this study is to statistically compare texts from various genres (novels, novella, autobiography and notes) and publication times in order to establish whether a consistent “Conradian” style is maintained across genres and time. A “corpus” of all the published works of Conrad serves as input for Stylo (0.6.0), an R-script (Eder & Rybicki, 2011). Stylo provides a cluster tree analysis in which each text is positioned according to its relation to (stylistic distance from) every other text. This analysis is based on the hundred most frequent words and gives an indication of the extent to which genre and time are factors in the lexical choices of Conrad. The wordlist and keywords function in WordSmith Tools (6.0) (Scott, 2012) allow for further lexis-based comparison of the texts. For purposes of a keywords analysis, the texts are grouped into two (or three) corpora based on the first branching in the Stylo-generated cluster tree. It is also possible to move beyond lexis and to study the grammatical aspects of Conrad‟s style quantitatively by making use of a part-of-speech-tagged version of the corpus. CLAWS4 (Garside & Smith, 1997) is used to tag the data. A (Pearl) script strips the words from the POS-tags leaving “texts” consisting entirely of word class designations. These texts serve as input for a cluster tree analysis in Stylo using bigrams, and then trigrams, which allow for comparison of the texts based on grammatical structure. This indicates the extent to which genre and time of publication are factors in the grammatical choices made by Conrad. The POS-“texts” can also be grouped into two (or three) “corpora” based on the first branching in the cluster tree analysis to serve as input for a “keywords” analysis in Wordsmith. Such an analysis gives an indication of the word classes involved in grammatical differences between the texts. Most of the CLAWS tags also contain morphological information such as tense, aspect and number giving a richer picture of the author‟s style.
Dowden, W.S. 1973. Joseph Conrad: the imaged style. Style, 7(2):245-248.
Eder, M. & Rybicki, J., 2011, Stylo version 0.6.0, computer software, Poland: University of Kraków.
Garside, R. & Smith, N., 1997, „A hybrid grammatical tagger: CLAWS4‟, (In Garside, R., Leech, G. & McEnery, A., eds. Corpus annotation: Linguistic information from computer text corpora. London: Longman. p. 102-121).
Hunter, S. & Smith, S. 2014. A network text analysis of Conrad‟s Heart of Darkness. English linguistics research, 3(2):39-53.
Lucas, M.A. 1991. Conrad‟s adjectival eccentricity. Style, 25(1):123-150.
Monod, S. 2005. Joseph Conrad‟s polyglot wordplay. Modern language review, 100:222-234.
Moon, R. 2007. Words, frequencies, and texts (particularly Conrad): a stratified approach. Journal of literary semantics, 36(1):1-33.
Nofal, K.H. 2013. Darkness in Conrad‟s Heart of Darkness: a linguistic and stylistic analysis. The Buckingham journal of language and linguistics, 6:77-93.
Ophir, E. 2014. “All our stammerings”: two kinds of inarticulateness in Conrad. A quarterly journal of short articles, notes, and reviews, 27(1):23-27.
Peters, J.G. 2006. The Cambridge introduction to Joseph Conrad. New York, USA: Cambridge University
Scott, M., 2012. WordSmith Tools version 6, computer software, Liverpool: Lexical analysis software.
Simmons, A.H. 2014. Reclaiming Conrad from his editors: the case of An Outcast of the Islands. Conradian, 46(1-2):39-52.
Stubbs, M. 2005. Conrad in the computer: examples of quantitative stylistic methods. Language and literature, 14(1):5-24.