SUBTLEX-PT-BR
SUBTLEX-PT-BR is a 61 million word corpus of conversational Brazilian Portuguese, compiled using subtitle texts.
Availability: Three versions of the SUBTLEX-PT-BR are available below:
- Unigram - The most basic version, with OLD20 (Orthographical Neighbourhood Density)
- Lemmatised and Part Of Speech Tagged - Useful for finding the frequency of lemmas and their relative forms and Part of Speech
- Bigram - Useful for collocation frequency and identifying compounds
Citation:
Tang, K. (2012) A 61 Million Word Corpus of Brazilian Portuguese Film Subtitles as a Resource for Linguistic Research. UCL Working Papers in Linguistics 24. [
pdf] [
bib]
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
SUBTLEX-KR
SUBTLEX-KR is a 90 million word corpus of conversational Korean, compiled using subtitle texts.
Availability: (Forthcoming), please email me if you would like to be notified when it becomes available: kevin.tang@hhu.de
Citation:
Tang, K., de Chene, B. (2014, forthcoming) A New Corpus of Colloquial Korean and its Applications. The 14th Laboratory Phonology Conference (LabPhon 14), Tachikawa, Tokyo, Japan. July 2014. [poster]
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
PRAAT TOOLKIT
A selection of Praat scripts written for specific purposes. Available at https://osf.io/8qaxp/ Please see the wiki section for details.
Please email kevin.tang@hhu.de if you have any questions or suggestions.
LINGER TOOLKIT
Linger is a software package for performing reading, listening, and other sentence processing experiments.
Even though Linger is an excellent piece of software, its analytical tools, Lingalyzer, Lingrapher, Subjector, are becoming increasingly outdated. The
Linger Toolkit aims to fill this gap. Available at
https://osf.io/v59gr/ Please see the wiki section for details.
Please email
kevin.tang@hhu.de if you have any questions or suggestions.