SUBTLEX-PT-BR

SUBTLEX-PT-BR is a 61 million word corpus of conversational Brazilian Portuguese, compiled using subtitle texts.

Availability: Three versions of the SUBTLEX-PT-BR are available below:

Unigram - The most basic version, with OLD20 (Orthographical Neighbourhood Density)
Lemmatised and Part Of Speech Tagged - Useful for finding the frequency of lemmas and their relative forms and Part of Speech
Bigram - Useful for collocation frequency and identifying compounds

Citation:
Tang, K. (2012) A 61 Million Word Corpus of Brazilian Portuguese Film Subtitles as a Resource for Linguistic Research. UCL Working Papers in Linguistics 24. [pdf] [bib]

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

SUBTLEX-KR

SUBTLEX-KR is a 90 million word corpus of conversational Korean, compiled using subtitle texts.

Availability: (Forthcoming), please email me if you would like to be notified when it becomes available: kevin.tang@hhu.de

Citation:
Tang, K., de Chene, B. (2014, forthcoming) A New Corpus of Colloquial Korean and its Applications. The 14th Laboratory Phonology Conference (LabPhon 14), Tachikawa, Tokyo, Japan. July 2014. [poster]

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

PRAAT TOOLKIT

A selection of Praat scripts written for specific purposes. Available at https://osf.io/8qaxp/ Please see the wiki section for details.

Please email kevin.tang@hhu.de if you have any questions or suggestions.

LINGER TOOLKIT

Linger is a software package for performing reading, listening, and other sentence processing experiments.
Even though Linger is an excellent piece of software, its analytical tools, Lingalyzer, Lingrapher, Subjector, are becoming increasingly outdated. The Linger Toolkit aims to fill this gap. Available at https://osf.io/v59gr/ Please see the wiki section for details.

Please email kevin.tang@hhu.de if you have any questions or suggestions.