Course title:

Corpus linguistics

Course code: PSL210
Course status: Elective
Course leader: Marko Tadić
Course instructor: Marko Tadić
Language of instruction: English
Total hours: 8S
Form of instruction: Seminar
ECTS credits: 4

Course content by topics:

Role and place of corpus linguistics in linguistics; history of corpus-informed linguistic research; “early” corpus linguistics; Chomsky and criticism/defence of corpora; first electronic corpora; corpus as a methodological construct; text collection / corpus; pre-electronic and electronic corpus linguistics; corpus design and compilation; sampling and corpus parameters; types of corpora; results of corpus search; corpus mark-up and annotation (pre-linguistic mark-up, linguistic annotation); statistical methods in corpus linguistics; language technologies (role of corpora).

Learning outcomes at course level:

1.To explain the role and place of corpus linguistics in the broader context of linguistics; 2. To compare the types of corpora, results of corpus searches and corpus annotations; 3. To discuss the basic characteristics of early vs. more recent corpora and pre-electronic vs. electronic linguistics; 4. To critically evaluate criticisms/defences of corpora in linguistics; 5. To assess the role of corpus as a methodological construct; 6. To discuss corpus linguistics and provide competent argumentation

Learning outcomes at programme level:

IU1 IU2 IU3 IU4 IU5 IU6 IU7 IU8
X x x x

Reading list:

McEnery, T. / Wilson, A. (1996, 22002): Corpus Linguistics. Edinburgh: Edinburgh University Press.; Kennedy, G. (1998): Introduction to Corpus Linguistics. London: Longman.; Sinclair, J. Mc. (1991): Corpus, concordance, collocation. Oxford: Oxford University Press.; Biber, D. / Conrad, S.

/ Reppen, R. (1998): Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.; Tadić, M. (1997): Računalna obradba hrvatskih korpusa: povijest, stanje i perspektive. Suvremena lingvistika 43-44, str. 387-394.; Tadić, M. (2003): Jezične tehnologije i Croatian jezik. Exlibris, Zagreb.; Sampson, G. / McCarthy, D. (2004:) Corpus Linguistics: Readings in a Widening Discipline. London-New York: Continuum.; Tognini-Bonelli,

  1. (2001): Corpus Linguistics at Work. Amsterdam: Benjamins.; Tadić, M. (1996): Računalna obradba hrvatskoga i nacionalni korpus. Suvremena lingvistika 41-42, str. 603-612.; Tadić, M.

(1998): Raspon, opseg i sastav korpusa hrvatskoga jezika. Filologija 30-31, str. 337-347.; Tadić,

  1. (2001): Procedures in Building the Croatian-English Parallel Corpus. International Journal of Corpus Linguistics (special edition), pp. 107-123.; Tadić, M. (2002): Building the Croatian National Corpus. In: LREC2002 Proceedings, Las Palmas, 27 May-2 June 2002. Pariz-Las Palmas: ELRA, Vol. II, pp. 441-446.; Tadić, Marko (2005): The Croatian Lemmatization Server. In: Southern Journal of Linguistics. 29 (2005) , 1/2; 206-217.; Selected papers from Computational Linguistics and International Journal of Corpus Linguistics and LREC Proceedings.

Assessment of student achievement: course attendance, seminar tasks, exam Quality assurance mechanism: student survey

Prof. Marko Tadić
Prof. Marko TadićCourse leader/instructor
Full professor at the University of Zagreb, Faculty of Humanities and Social Sciences, Department of Linguistics. He is the head of the Chair of Algebraic and Computational Linguistics at the same Department since 2001 and an associated member of the Croatian Academy of Sciences and Arts since 2008.