DHASA2017 – Abstract

The online adaptation of A Dictionary of South African English on Historical Principles.

Van Niekerk, Tim; Le Du, Bridgitte
Dictionary Unit for South African English

The online Dictionary of South African English (DSAE, http://dsae.co.za) is an electronic version of A Dictionary of South African English on Historical Principles (Silva et al., Oxford University Press, 1996). The print edition, produced by the Dictionary Unit for South African English (DSAE) in Grahamstown, South Africa was the culmination of 25 years’ research resulting in a 1.7 million-word text with 4600 main entries documenting the development of South African English from its origins in the late 17th Century to 1995. Entries emphasise word history and show etymologies, variant spellings, compounds, derivatives and phrases. In total 14 700 word forms are represented, reflecting diverse borrowings from other South African languages; notably, the dictionary is rooted in quotation evidence, reproducing 44 000 bibliographically-documented citations.

In July 2014 a pilot online version of the dictionary was published (http://dsae.co.za) with initial funding from the DSAE’s host institution, Rhodes University, to make this reference work available since it had been out of print since 2005. At the same time, a free access model was adopted. This was a first step towards a thorough, publicly-available adaptation of the print edition for electronic platforms. In 2015 the DSAE was joined by the University of Hildesheim, Germany and subsequently by the University of Stellenbosch, South Africa in an ongoing collaborative project to improve the initial pilot online edition to meet the needs of modern digital users. The target user group of the print edition was relatively specialised, namely linguists, historians, editors, writers, translators and interested laypersons. This target group has been broadened as part of the electronic adaptation process, to include a new audience of non-specialists. Given the length and complexity of some entries, this presents challenges not faced by synchronic dictionaries (e.g. trek, n., first used in English contexts in the early 19th Century and subsequently assimilated into General English, lists 11 main senses and is over 5000 words long; some exceptional entries are longer).

The presentation will give a description of this project, highlighting some key areas of the print-to-digital adaptation in the context of the evolving field of electronic lexicography, both at the level of entry layout components (article microstructure) and navigation features (access structure). To a large extent basic expected features of electronic reference works, as well as possibilities for innovative (e.g. visual) strategies for content display, depend on the internal computational representation of the dictionary text. Some examples will be given of how generic or unstructured representations of data categories in the 2014 dictionary dataset (encoded in XML) have had to be refined so that they are increasingly fine-grained in structure, to allow greater flexibility in presentation and to facilitate improved navigation features. Additionally, current work includes enriching the dataset with new metadata to transform the dictionary from a static, text-heavy reference work to a database-like tool supporting content filtering and browsing possibilities, for example through subject-categorisation of the c. 7400 senses.

Adaptive and optional presentation devices are being introduced, allowing users to ‘show-more’ or ‘show-less’ depending on their needs, and adding the possibility of exploring horizontal relationships between entries, not traditionally part of print dictionary structure beyond cross-references, via browsing and selective query or filtering functionality. The depth and range of content encoded in this dictionary, combined with the possibilities presented by the digital environment, thus allow this reference work to be transformed from the traditional concept of a dictionary as an extended wordlist to a linguistic, cultural and encyclopaedic inventory. Additionally, offering easy access to multidimensional lexicographical data through processes of print-to-digital adaptation requires the corresponding adaptation and supplementation of the ‘front-matter’ and ‘end-matter’ produced for the print edition with sometimes new types of outer texts, namely more context-sensitive and less text-heavy user guidelines, bibliography records, further information about the South African English variety (supported by infographics), and so forth. Past and current work on adapting and supplementing these texts to meet new requirements will be illustrated together with associated topics.

While the primary aim of the talk is to present current work and invite feedback on what is essentially an electronic dictionary publishing project, a further motivation is to draw attention to a structured dataset which may be of wider interest not only to lexicographers but also to computational linguists and Digital Humanities researchers. Although it is a lexicographical database, its diachronic, bibliographically-annotated design gives it historical, cultural and to some extent literary dimensions across a wide historical span. Until now, comparable datasets describing the South African variety of English and the multilingual influences acting on it have not been available and we hope that, parallel to the DSAE

dictionary project, these data may also help optimise collaborative knowledge generation across a number of overlapping fields.