Corpora/Korpora

(I drafted this entry before I read about the Utah Court)

I recently ‘attended’ a webinar about how translators can use corpora to investigate their target language.

I’ve been fascinated by corpora since I first encountered the Collins Cobuild English dictionary when I was teaching English – I think it was in the 1990s. The dictionary was quite a milestone: it used a database of usage examples to show that what people say is not always the same as what language teachers say they say.

Once I even tried to learn Python, after Mark Liberman said it was a good project for Christmas, but I did not get very far and suspect he has a larger brain than I have.

If you’ve got a free weekend or two, you could do a lot worse than to spend some time messing around with Python and NLTK — there’s even an online book to guide you.

I’ve also been to a (bricks-and-mortar) seminar on corpora, but at that time I did not follow it up by preparing my own corpus.

And this is where the ecpd webinar was so helpful, because it got me that far in half an hour the following evening, using free software (BootCat and AntConc).

In a later post I will give a description of how it works.

What you are doing with BootCat is creating a corpus made of texts from the internet, presumably html ones, so it’s not that different from a Google custom search engine, which you can make for yourself

But with the second program (AntConc in my case) you can analyze the language in a variety of interesting ways.

The webinar was Using Corpora in Translation run by eCPD Webinars

What I don’t know is how useful corpora are for legal translation. The third part of the webinar was on this subject. It was suggested that analyzing legal English in this way would be particularly interesting for legal translators who are not lawyers. I should think it would be interesting for legal translators who are lawyers too! and for lawyers who want to talk about law in English. Most lawyers forget a lot of their law, and they don’t necessarily think about the language of the law in the way a translator needs to.

But translating legal texts from one language and system to another is not the same as writing about a legal system in its original language.

A German lawyer who wants to write legal English could learn a lot from a well-constructed corpus. Areas of law that suggest themselves: judgments (the formal style), legal correspondence (formal style). Areas that seem more problematic are contracts (where the legal equivalence must be checked and apparently similar clauses cannot be taken over lock stock and barrel) and all law which differs between the source and target legal systems (here an explanation is needed).

A potential source for a corpus of English judgments is www.bailii.org

One might assume that the EU bilingual databases/TM would be useful for all lawyers writing about EU law in various languages. But these are inconsistent.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.