Course title:
Corpus linguistics
Course code: PSL210
Course status: Elective
Course leader: Marko Tadić
Course instructor: Marko Tadić
Language of instruction: English
Total hours: 8S
Form of instruction: Seminar
ECTS credits: 4
Course content by topics:
Role and place of corpus linguistics in linguistics; history of corpus-informed linguistic research; “early” corpus linguistics; Chomsky and criticism/defence of corpora; first electronic corpora; corpus as a methodological construct; text collection / corpus; pre-electronic and electronic corpus linguistics; corpus design and compilation; sampling and corpus parameters; types of corpora; results of corpus search; corpus mark-up and annotation (pre-linguistic mark-up, linguistic annotation); statistical methods in corpus linguistics; language technologies (role of corpora).
Learning outcomes at course level:
1.To explain the role and place of corpus linguistics in the broader context of linguistics; 2. To compare the types of corpora, results of corpus searches and corpus annotations; 3. To discuss the basic characteristics of early vs. more recent corpora and pre-electronic vs. electronic linguistics; 4. To critically evaluate criticisms/defences of corpora in linguistics; 5. To assess the role of corpus as a methodological construct; 6. To discuss corpus linguistics and provide competent argumentation
Learning outcomes at programme level:
IU1 | IU2 | IU3 | IU4 | IU5 | IU6 | IU7 | IU8 |
X | x | x | x |
Reading list:
McEnery, T. / Wilson, A. (1996, 22002): Corpus Linguistics. Edinburgh: Edinburgh University Press.; Kennedy, G. (1998): Introduction to Corpus Linguistics. London: Longman.; Sinclair, J. Mc. (1991): Corpus, concordance, collocation. Oxford: Oxford University Press.; Biber, D. / Conrad, S.
/ Reppen, R. (1998): Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.; Tadić, M. (1997): Računalna obradba hrvatskih korpusa: povijest, stanje i perspektive. Suvremena lingvistika 43-44, str. 387-394.; Tadić, M. (2003): Jezične tehnologije i Croatian jezik. Exlibris, Zagreb.; Sampson, G. / McCarthy, D. (2004:) Corpus Linguistics: Readings in a Widening Discipline. London-New York: Continuum.; Tognini-Bonelli,
- (2001): Corpus Linguistics at Work. Amsterdam: Benjamins.; Tadić, M. (1996): Računalna obradba hrvatskoga i nacionalni korpus. Suvremena lingvistika 41-42, str. 603-612.; Tadić, M.
(1998): Raspon, opseg i sastav korpusa hrvatskoga jezika. Filologija 30-31, str. 337-347.; Tadić,
- (2001): Procedures in Building the Croatian-English Parallel Corpus. International Journal of Corpus Linguistics (special edition), pp. 107-123.; Tadić, M. (2002): Building the Croatian National Corpus. In: LREC2002 Proceedings, Las Palmas, 27 May-2 June 2002. Pariz-Las Palmas: ELRA, Vol. II, pp. 441-446.; Tadić, Marko (2005): The Croatian Lemmatization Server. In: Southern Journal of Linguistics. 29 (2005) , 1/2; 206-217.; Selected papers from Computational Linguistics and International Journal of Corpus Linguistics and LREC Proceedings.
Assessment of student achievement: course attendance, seminar tasks, exam Quality assurance mechanism: student survey