Corpora for (legal) translators/Textkörper für (juristische) Übersetzer

I did some months ago intend to write something about my experience of using corpora for translation purposes, especially legal translation. (See earlier entry and footnotes by John Kuti there)

At that time, it appeared the free programs I might have recommended had lost their value for me because they had access to fewer corpora.

Then again, one could get fairly similar results with a Google CSE (custom search engine).

I followed a webinar on the topic last year, and it ended with a contribution by Juliette Scott, who is a legal translator who is doing a Ph.D. on the subject. She now has a weblog, called Translation & the Law: from words to deeds, which is certainly a good place to find out more.

There was also a blog post by Kevin Lossner in Translation Tribulations, entitled A NIFTY method for legal terminology (I thought NIFTY was a play on NIMBY, but I found out it is the name Juliette Scott gives her method) – Here you will find the links to use if you want to help in Juliette’s research/find out more.

From a real-life seminar in London a couple of years ago I also have a most wonderful and useful book on the subject of corpora; Working with Specialized Language: A practical guide to using corpora” by Lynne Bowker and Jennifer Pearson (dated 2002 but still useful in 2012: you can look inside at amazon)

The basic approach to making a corpus of legal texts is to collect them on the internet or from other sources and convert them all into a format readable by the corpus program. This takes a bit of time. It also raises copyright problems unless you just use it for your own purposes. This was the problem with the free software BootCat, which had lost the right to use certain sources from the Web. The free software AntConc is for a later stage of the process.

Here’s an article by Michael Wilkinson: Compiling Corpora for Use as Translation Resources

I did have some rapid success in one field of legal English late last year. I sometimes translate lawyers’ websites and also extracts from directories in which law firms are described in glowing terms. Here’s an example from a firm I have nothing to do with:

CMS Hasche Sigle
Aufbruch in eine neue Zeit – und zwar mit Schwung. Unter dieses Motto könnte man das vergangene Jahr bei CMS stellen. Schon lange gehört die Kanzlei in Hamburg zu den führenden Adressen, jedoch monierten Wettbewerber, CMS sei zu breit aufgestellt, um im Markt wirklich hervorzustechen.
Diese Zeiten gehen zu Ende: V.a. die M&A-Praxis hat zuletzt einen deutlichen Schub erhalten und sorgte für Schlagzeilen, als ein Hamburger Team zusammen mit dem internationalen CMS-Verbund Takeda bei dem €10 Mrd schweren Erwerb von Nycomed beriet. Dies spiegelte sich auch im Markt wider, die Gruppe erntete in diesem Jahr spürbar mehr Lob. Gemeinsam mit Dr. Marc Riede betreute er zudem die HSH bei der Restrukturierung von Hapag-Lloyd.

It’s quite easy to collect this kind of thing in English from UK, USA and other sites and to search it for useful expressions. I might find more ideas for words like betreuen.

But I still have the feeling that a corpus would not help me with most legal translations, because I am not trying to create a text that looks like it was written in English about English law, but one that is clearly about a foreign legal system. If I created a collection of contracts, for example, every potential match of phrase would need to be checked legally to see if it meant the same thing. I have the feeling that I’d love to computerize my vocabulary work, but it would then bypass my own brain and experience.

Language issues in US Supreme Court/”Person” und “persönlich” vor Gericht

The US Supreme Court recently decided a case in which language was discussed on the basis of corpora. The question was about the words person and personal.

The decision was FCC v. AT&T Inc.
(PDF file), decided on March 1. This is a slip opinion, which means it has not yet been officially published. It has a headnote, which they call a syllabus.

The situation was that AT&T Inc. claimed that as it was a person (all corporations are persons), it could rely on the right of personal privacy.

Language evidence was presented to show that it does not follow from the noun that the related adjective has the same meaning, particularly in compounds.

In fact, “personal” is often used to mean precisely the opposite of business-related: We speak of personal expenses and business expenses, personal life and work life, personal opinion and a company’s view. Dictionary definitions also suggest that “personal” does not ordinarily relate to artificial “persons” like corporations.

I can’t help feeling that the Supreme Court would have come to this conclusion even without the language evidence. It seems pretty obvious to me. But the definition of person has been expanded in recent years, and at all events the Court of Appeals for the Third Circuit found in favour of AT&T.

We disagree. Adjectives typically reflect the meaning of corresponding nouns, but not always. Sometimes they acquire distinct meanings of their own. The noun “crab” refers variously to a crustacean and a type of apple, while the related adjective “crabbed” can refer to handwriting that is “difficult to read,” Webster’s Third New Interna-tional Dictionary 527 (2002); “corny” can mean “using familiar and stereotyped formulas believed to appeal to the unsophisticated,” id., at 509, which has little to do with “corn,” id., at 507 (“the seeds of any of the cereal grasses used for food”); and while “crank” is “a part of anaxis bent at right angles,” “cranky” can mean “given tofretful fussiness,” id., at 530.

To see what linguistic evidence was presented, you can look at Neal Goldfarb’s amicus curiae brief, which ca also be found via his blog.

This amicus brief was filed on behalf of Project On Government Oversight, the Brechner Center for Freedom of Information, and Tax Analysts. The parties have to agree to a filing. The brief lists the dictionaries and other works cited. A partial quote:

The following are the pairings in each corpus that occurred at least ten times, listed in order of
their frequency:
COHA: personal life, personal income, personal property, personal interest, personal experience,
personal relationship, personal problem, personal reason, personal injury, personal thing,
personal appearance, personal contact, personal matter, personal friend, personal power, personal
opinion, personal fortune, personal gain, personal history, personal letter, personal use, personal
view, personal question, personal tragedy, personal physician, personal attack, personal affair…

The brief relied on three corpora: the Corpus of Historical American English (COHA), the Corpus of Contemporary American English (COCA), and the TIME Magazine Corpus, all of which are the handiwork of Prof. Mark Davies at Brigham Young University. What we did was to search for the string personal [NOUN], in order to find out what words most frequently filled the NOUN slot.

This decision seems correct and well-founded, but I can’t help wondering whether all judges can be relied on to interpret corpus evidence properly.

Via Mark Liberman on Language Log, who links to other weblogs on the topic.

Palantyping and Stenography

I’ve mentioned Stenography before. Today Jack Schofield shows there is also Palantyping, in answer to this question:

I attended a gathering in Richmond Theatre, at which the then mayor and members of the Greater London Authority were available for public questioning. It was very impressive that their words almost immediately appeared on a screen courtesy of a voice recognition system. How is it done?

Lawsuit, Shmawsuit/Yiddisch

Judge Alex Kozinski and Eugene Volokh on the use of Yiddish in court decisions:

Searching through the LEXIS legal opinions database reveals that “chutzpah” (sometimes also spelled “chutzpa,” “hutzpah,” or “hutzpa”) has appeared in 231 reported court decisions. Curiously, all but eleven of them have been filed since 1980. There are two possible explanations for this. One is that during the last 21 years there has been a dramatic increase in the actual amount of chutzpah in the United States–or at least in the U.S. legal system. This explanation seems possible, but unlikely.

The more likely explanation is that Yiddish is quickly supplanting Latin as the spice in American legal argot. As recently as 1970, a federal court not only felt the need to define “bagels”; it misdefined them, calling them “hard rolls shaped like doughnuts.” All right-thinking people know good bagels are rather soft. (Day-old bagels are rather hard, but right-thinking people do not eat day-olds, even when they are only 10 cents each.) We’ve come a long way since then.

Mind you, there’s no comparison with US language outside lawsuits.

This is a 1993 article, Lawsuit, Shmawsuit, available online.

(Via Ruth Morris, who writes on Interpreting in legal contexts and Interpreting in the Israel legal system – and has published on the same topic in England and Wales)

Austrian and German texts/Österreichisch und Deutsch

Rechtsanwalt Jens Hänsch, Dresden, compared part of an Austrian judgment he received with its German equivalent. I shamelessly reproduce both:

Was in Deutschland hieße

1. Der Beklagte wird verurteilt, an die Klägerin 1.144,50 Euro zuzüglich Zinsen in Höhe von 9,47 % seit dem 10.04.2006 zu zahlen.
2. Der Beklagte hat die Kosten des Rechtsstreits zu tragen.
3. Das Urteil ist vorläufig vollstreckbar.

heißt Im Namen der Republik wie folgt:

Die beklagte Partei ist schuldig, der klagenden Partei den Betrag von € 1.144,50 samt Zinsen in Höhe von 9,47 % seit 10.04.2006 sowie die Prozesskosten gemäß § 19a RAO zu Handen der Klagsvertreter zu bezahlen, all dies binnen 14 Tagen bei sonstiger Exekution.

At least they didn’t write ‘samt Anhang’!

On this topic, I do wish people asking questions on translators’ mailing lists would say if their text is German, Swiss or Austrian and if their audience is specifically British, American or global.

Such toe is all right now/Nachahmung in der Rechtssprache

Some Germans – lawyers or translators – can write really good legal English but tend to be more Catholic than the Pope (päpstlicher als der Papst) when doing so.

I’m reminded of this by the (new) legal writer’s quote in his latest entry:

“Much bad writing today comes not from the conventional sources of verbal dereliction—sloth, original sin, or native absence of mind—but from stylistic imitation. It is learned, an act of stylistic piety which imitates a single style, the bureaucratic style I have called The Official Style. This bureaucratic style dominates written discourse in our time, and beginning or harried or fearful writers adopt it as protective coloration.”

—Richard A. Lanham, Revising Prose vi (3d ed. 1992).

(This is quoted from Garner’s Usage Tip of the Day, which I don’t receive).

That refers to native English speakers writing English, who have less excuse, of course.

Particular features of this hyperlegalese:

use of said and aforesaid where it adds nothing

use of such instead of this/these

Here’s a site that objects to it too (Alabama Legislative Reference Service):

Rule 10. Use of “Such”
Do not use “such” as a substitute for “the,” “that,” “it,” “those,” “them,” or other similar words.
Example: “The (not ‘such’) application shall be in the form the court prescribes.” Use “such” to express “for example” or “of that kind.”

overuse of shall. I quote an example from Butt and Castle on Modern Legal Drafting:

If the Vendor shall within one month of the receipt of such notice give written notice (If the Vendor … gives would suffice)

Here is Todd Bruno of Louisiana State University, quoting Gerald Lebovits:

About said, as in aforesaid, Justice Smith asked whether one would say, “I can do with another piece of that pie, dear. Said pie is the best you’ve ever made.” About same, he asked whether one would say, “I’ve mislaid my car keys. Have you seen same?” About the illiterate such, he asked whether one would say, “Sharon Kay stubbed her toe this afternoon, but such toe is all right now.” About hereinafter called, he asked whether one would say, “You’ll get a kick out of what happened today to my secretary, hereinafter called Cuddles.” About inter alia, he asked, “Why not say, ‘Among other things?’ But, more important, in most instances inter alia is wholly unnecessary in that it supplies information needed only by fools …. So you not only insult your reader’s intelligence but go out of your way to do it in Latin yet!”

See also the Legalese Hall of Shame.

Digital thieves/Die (englische) Sprache des Urheberrechts

The Guardian recently had an article entitled Digital thieves swipe your photos – and profit from them.

Pedantic readers were having none of this theft terminology. Hence yesterday’s technology blog post: What’s the right way to talk about copyright stuff?

The aggrieved reader wrote (in part):

“I only read the heading and subheadings of this. For god’s sake, at least use the correct terminology. The photographs in question simply are not being stolen. They’re being copied. No thieves in existence there, but copiers. Illegal copiers I’m sure (whether it’s a good idea for so many things to be illegal to copy or not is another issue). You’re not helping us nor yourselves by perpetuating this kind of BS. The party who initially has possession of the item in one case no longer has the item, and in the other, does. That’s a big difference. That’s why we have different words with very different meanings to describe the two fundamentally different situations. But you’ve got them mixed up. And helped other people get them mixed up too.”

There is an attempt to fight a rearguard action from the legal point of view, but after all, a bit of polemic must surely be permitted, and the latter would be the better argument.

Comment by the author, Charles Arthur:

@ParkyDR @nickholmes: “A person is guilty of theft if he dishonestly appropriates property belonging to another with the intention of permanently depriving the other of it; and “thief” and “steal” shall be construed accordingly.”

Surely the property here is intellectual property, which courts have construed as existing in the same way that physical property does.

The “permanent deprivation” is of the opportunity to sell it (or prevent it being sold).

The Theft Act says that property ‘includes money and all other property, real or personal, including things in action and other intangible property’ – but the things in action have to be capable of appropriation.

(Dietl: chose in action (einklagbares) Forderungsrecht; obligatorischer Anspruch (der Gegenstand einer Klage sein kann); unkörperlicher Rechtsgegenstand (Wechsel, Sparguthaben, Patente, Urheberrecht, Versicherungspolice, Rente etc))

Comment by AlexC:

As a former copyright lawyer, I think “theft” is *technically* the wrong word. But then most people don’t understand the technical meaning of “theft”, so what does it really matter?

As a matter of general practice, the term “copyright theft” has been around for quite a while – e.g. at the cinema you will see anti-piracy adverts from a group called the Federation Against Copyright Theft (“FACT”).

The legal offence of copyright infringement and the legal offence of theft are so analagous that they fall within the same linguistic term “theft” in piracy-type situations.

Now, for some real fun, we could consider whether the tort of copyright infringement is analagous with the tort of conversion…

Language blogs/Sprachblogs

eduFire has an entry on The Top 21 Language Bloggers on the Web (via languagehat).

This is about learning languages and presenting a multitude of languages, rather than about linguistics, so Language Log isn’t there, for example.

It’s also a bit of a mystery that Tim Ferriss’s blog made the list on account of one sole post: How to Learn (But Not Master) Any Language in 1 Hour (Plus: A Favor).

My favourite tip on language learning from that post is that you get translations in a given language of sentences like ‘The apple is red’ and ‘It is John’s apple’, work out how many obstacles the language presents in comparison with your own (including pronunciation), and then if there are too many obstacles, you just don’t learn the language. This would have saved me a long time with Turkish. But on the other hand, it presupposes elements like subject, object, verb and noun cases (why Ferriss puts off learning Russian!).

People love discussing how to learn languages. Probably the root of machine translation is here.

Words banned in court/Verbotene Wörter im Gerichtssaal

An article of 16 June 2008 by Tresa Baldas in the National Law Journal, Courts Putting Hot-Button Words on Ice, reports that words such as rape and victim are being banned by judges because they prejudice defendants.

A steadily increasing number of courts across the United States are prohibiting witnesses and victims from uttering certain words in front of a jury, banning everything from the words “rape” to “victim” to “crime scene.”

Prosecutors and victims’ rights advocates nationwide claim the courts are going too far in trying to cleanse witness testimony, all to protect a defendant’s right to a fair trial. Concerns and fears over language restrictions have been percolating ever since judges in Nebraska and Missouri last year banned the word “rape” during rape trials.

The article contains many examples.

This relates largely to the Nebraska case reported in July 2007. From Slate:

Nebraska law offers judges broad discretion to ban evidence or language that present the danger of “unfair prejudice, confusion of the issues or misleading the jury.” And it’s not unheard-of for judges to keep certain words out of a courtroom. Words like victim have been increasingly kept out of trials, since they tend to imply that a crime was committed. And as Safi’s lawyer, Clarence Mock, explains, the word rape is just as loaded. “It’s a legal conclusion for a witness to say, ‘I was raped’ or ‘sexually assaulted.’ … That’s for a jury to decide.” His concern is that the word rape so inflames jurors that they decide a case emotionally and not rationally.

I think the judge may have gone too far in this particular case.

In the NLJ article, note in particular the last section on the appeal against the Nebraska decision:

Wendy J. Murphy of the New England School of Law, who is representing a Nebraska rape victim opposing the judge’s barring of the word “rape,” said the major battle facing prosecutors and victims now is fighting judges’ censorship orders.

To date, she said, there has been no federal court ruling on the matter. …

Murphy tried when she appealed the Nebraska judge’s decision to bar a rape victim from using the word rape. She lost the case, and is now appealing to the U.S. Supreme Court. Bowen v. Honorable Jeffre Cheuvront, No. 4:07CV3221 (D. Neb.).

At Language Log, Roger Shuy discusses the matter and adds that witnesses don’t often get to use their own words in any case:

“Using your own words” isn’t all that common in trials I’ve experienced. Among other things, you can’t introduce your own topics, you have to answer the opposing lawyer’s questions according to the form in which they are asked (usually yes/no questions, or worse, tag-questions), and you have to be ready to be interrupted at any time. Testifying requires a witness to learn a new set of communication skills, many of which can seem counterintuitive. Doing this can be daunting for anyone not trained in the special culture of the courtroom.