DHASA 2017

The role of computer-mediated analyses to improve validity of linguistic evidence for Forensic Linguistic purposes

van den Berg, Karien

North-West University

The focus of this paper is the role of computer-mediated analyses to address concerns regarding the validity of linguistic data analysis for Forensic Linguistic purposes. Increasingly linguists are playing an active role in rendering court judgement in. for example, authorship attribution cases. Linguistic testimony, however, is not always accepted or admissible as evidence in a court of law owing to, amongst other things, reliability concerns and the belief that no expertise is required for determining the meaning or intended meaning of a text (Tiersma and Slolan, 2002). Following the establishment of the Daubert standard in 2000, requirements for accepting expert interpretations in court have become more stringent and applies equally to linguistic evidence as to other scientific measures that inform expert opinion. The basic requirements of these standards entail that expert opinion be based on sufficient data, achieved through appropriate and valid application of reliable principles or theory and methods given the context of the case. This requires validity, accessed through a process of validation: collecting relevant linguistic evidence by appropriate means to support claims concerning legal issues. Computational developments provide objective means of achieving such evidence and may greatly enhance the validity claims offered by linguistic experts for legal scrutiny. With the aim of demonstrating how a validation argument for Forensic Linguistic evidence may be formulated and the importance of computer assistance (e.g. through web-scraping for ground truth data, Chaski:2013) to enhance validation procedures, this paper explores methods employed or proposed by leading forensic linguists MacMenamin (1993); Shuy (2008); and Chaski (2013). Conley and O’Bar (2005:177) advise the forensic linguist to follow a linguistically driven approach to the field, looking at the law from a linguistic view point, rather than being linguists working for the law. Three main approaches to forensic linguistic analysis are discernible: forensic computational linguistics, forensic stylistics and stylometric computing. Arguing in favour of the first of these, Chaski wishes similar rigour (based on validity tests) in applying stylistic and stylometric methods “so that reliable methods of forensic authorship identification can be offered to our courts” (2013:372). I would therefore like to present Chaski’s claims in terms of a validation argument as propagated by Kane (2013) and Davies and Elder (2011), emphasising the role of computational methods in strengthening the evidence. This claim may then be compared to that presented by Shuy and MacMenamin to determine the comparative strength of these arguments. In addition, three South African based case studies (Hubbard, 1994; Kotzé, 2007 and Grundlingh, 2015) are considered in comparison to establish the extent to which local practice may align with, or differ from international practice and standards, and adhere to validity requirements. Recommendations are made for using specific computerised tools to enhance evidence based on linguistic expertise in, for example, authorship attribution cases, to be useful and valuable in a court of law. Specific consideration is given to using computer tools in overcoming challenges typical to the field of authorship attribution, viz. very small data sets and calculating correct error rates.

DHASA2017 – Abstract

The role of computer-mediated analyses to improve validity of linguistic evidence for Forensic Linguistic purposes

Digital Humanities Association of Southern Africa