Using the Web as a language corpus/Economist article

The Economist currently has an article on using the Web as a language corpus. It quotes Language Log, where the whole article is given too (this link should remain functioning).

bq. Search engines, unlike the tools linguists use to analyse standard corpora, do not allow searching for a particular linguistic structure, such as “[Noun phrase] far from [verb phrase]”. This requires indirect searching via samples like “He far from succeeded”. But Philip Resnik, of the University of Maryland, has created a “Linguist’s Search Engine” (LSE) to overcome this. When trying to answer, for example, whether a certain kind of verb is generally used with a direct object, the LSE grabs a chunk of web pages (say a thousand, with perhaps a million words) that each include an example of the verb. The LSE then parses the sample, allowing the linguist to find examples of a given structure, such as the verb without an object. In short, the LSE allows a user to create and analyse a custom-made corpus within minutes.

Buch: Das österreichische Deutsch im Rahmen der EU, Markhardt

In an earlier entry I mentioned an article and dissertation by Heidemarie Markhardt on Austrian German in the EU. The dissertation has now appeared as a book, or a book based on the dissertation has appeared:

MARKHARDT, Heidemarie (2005): Das Österreichische Deutsch im Rahmen der EU., Peter Lang, Frankfurt am Main 2005 (376 Seiten). ISBN 3-631-53084-6.

It can be ordered through info@peterlang.com. Here’s the publisher’s page, with details about other books in the same series too:

Heidemarie Markhardt (Autorin)
Das Österreichische Deutsch im Rahmen der EU
Österreichisches Deutsch Sprache der Gegenwart. Herausgegeben von Rudolf Muhr und Richard Schrodt, Peter Lang Europäischer Verlag der Wissenschaften, Frankfurt, Berlin, Bern, Bruxelles etc; 2005

ISSN 1618-5714
ISBN 3-631-53084-6

The Bundeskanzleramt site also has a little blurb on the book:

Markhardt-Buch: Das Österreichische Deutsch im Rahmen der EU

Elfriede Jelinek, Literaturnobelpreisträgerin 2004, hat mehrmals darauf hingewiesen, wie sehr sich ihre Sprache und diejenige ihrer österreichischen Kollegen wie z.B. Peter Handke, Friederike Mayröcker oder Ernst Jandl von deutschen Schriftstellern unterscheide, auch durch eine andere „Musikalität“. Unsere Englisch-Übersetzerin Dr. Heidemarie Markhardt hat ein grundlegendes Werk, basierend auf ihrer Dissertation, veröffentlicht: „Das Österreichische Deutsch im Rahmen der EU“. Eingehend setzt sie sich mit den österreichischen Ausdrücken der deutschen Sprache auseinander, die durch das so genannte Protokoll Nr. 10 offiziellen Eingang in die EU fanden. Es geht nicht nur um Paradeiser (dt. Tomaten), Erdäpfel (dt. Kartoffeln) oder Marillen (dt. Aprikosen). Für Feinspitze, die das Deutsche differenziert genießen wollen, unbedingt lesenswert.

Viking online/Isländischkurs

Thanks to Rainer Langenhan for alerting me to the following:

Icelandic Online Dictionary and Readings

Quote:

bq. Working in partnership with the University of Iceland and a number of other sponsors (including The Andrew Mellon Foundation) the University of Wisconsin Digital Collections group has created the Icelandic Online Dictionary and Readings website. This website also complements the University of Iceland’s Internet course, “Icelandic Online”. Persons interested in learning a bit about Icelandic will appreciate the fact that they have access to the aforementioned course, complete with interactive lessons and exercises. Additionally, the site contains the unabridged content of the 1989 Concise Icelandic-English dictionary and a set of readings in modern Icelandic life, literature and culture. As an extra treat, visitors also have access to a collection of works by the famous Icelandic poet, Jonas Hallgrimsson. Visitors will want to make sure and read some of his well-known poems, including “The Vastness of the Universe” and “The Style of the Times”.

It does take me back to Old Norse (online course) and Bandamanna Saga. I had a quick look at Jonas Hallgrimsson too. I attempted to read The Pipit, but was disappointed to find the audio file was the English version, read in a very mysterious accent that sounded like a cross between Shetland and U.S. (Dick Ringler). It’s all useful stuff, although it teeters on the brink of burlesque:

bq. The events described in this poem appear to go back to about the time of Jónas’s seventh birthday. When his father’s estate was probated a few years later, in October 1816, it included a two-year-old ram and a three-year-old dark grey horse (þrevetru trippi, dökkgráu [1D354]). There is a good chance that these are the animals mentioned in the poem and entrusted to young Jónas’s care. Since the colt will have been born in 1813 and the poem describes it as being in its second year (á annan vetri) when these events occurred, everything points to the autumn and early winter of 1814-5, and this tallies with what is known of the weather at that time: “The autumn was mild, except for a spell of bad weather around Michaelmas [29 September]]; then on 7 November the north was hit by a violent storm that damaged hay, buildings, boats, and livestock” (ÁfÍ215).

Spiegel online in English

Spiegel Online has a curiously named section Fishwrap (Want to know what the German papers are saying?). It’s illustrated by a picture of two herring, I think, in a German tabloid-sized newspaper. I presume it’s named after the Guardian’s The Wrap, which is no longer free of charge. I thought this was a wrap-up of the news, but I may be missing something in understanding the reason for these names. Does it mean reading a summary of the news on the paper used to wrap up fish and chips (rather than raw herring)?

Here are a couple of quotes from the current Fishwrap:

bq. Will Iran be next? Will Bush wage war with every “Outpost of Tyranny”? What will happen to trans-Atlantic relations? Why does America celebrate its new president with a pompous event fit for a king or a dictator? Does this all really matter? These are the pressing questions for which German editorialists seek to divine answers on Friday.

I didn’t really start this blog in order to pull other translators’ work to pieces, so maybe I should let the text speak for itself. There are five things I’m not happy with there.

Another characteristic that is striking elsewhere in the article is the use of colloquial English that would be unusual in British or American writing. Even contractions are not common. Here is some more:

bq. Newspaper editors watched, too, and they’re having a field day with his inaugural speech in Friday’s editions. Surprisingly, only one paper predicts disaster, but if George W. Bush thought he would get the kid glove treatment over here, he can just forget it.

Referring to Die Tageszeitung:

bq. “Describing US President George W. Bush as a proselytizing crackpot leader of a superpower bristling with weapons doesn’t really move things forward, it doesn’t solve any problems and though offensive, it’s not especially original anymore,” it writes. “At the same time: Those who don’t want to take seriously this madness, which Bush did his best to show before and after his inauguration speech, are in for a terrible surprise from the US government.”

The translators are obviously briefed to use contractions frequently and to find colloquial expressions, which is particular hard for non-native speakers. It seems to me we have to look forward to a lot more of this international English. I wanted to say it’s an impoverishment of the language, but I don’t know if that’s the point. It reminds me of flying Lufthansa and reading their two-language magazine. The English there is hard to fault – it doesn’t have the errors seen above – but it follows the German very closely. I associate it with the enclosed air of a plane.

I note that Lufthansa Bordbuch in English is part of an online corpus of translations into English (and the German part is in a corpus too).

Faulty Polish translation of EU constitution / Übersetzungsfehler in polnischer EU-Verfassung

Robin Stocks at Carob reports on errors in the Polish version of the EU constitution, which has already been published in the Official Journal.

Der Standard:

bq. Warschau – Die EU-Verfassung ist sehr mangelhaft ins Polnische übersetzt worden. Die Ratifizierung könnte dadurch nicht nur um einige Monate verzögert werden, sondern könnte sogar gefährdet sein, berichtete am Mittwoch die polnische Tageszeitung “Rzeczpospolita”. Die Fehler betreffen auch wichtige Mechanismen: So ist bei der Wahl des Komitees für Sozialpolitik von einer qualifizierten Mehrheit der Mitgliedsländer die Rede, obwohl es durch eine einfache Mehrheit bestimmt wird. In der ersten Version würde Polen über nur eine Stimme verfügen, in der anderen hätte Polen dank eines komplizierten Rechnungssystems einen wesentlich größeren Einfluss bekommen.

New copyright contract law in Germany talk/Vortrag zum Urhebervertragsrecht in Saarbrücken

Rainer Langenhan reports the following talk in German on copyright law for translators. Should be of particular interest to literary translators.

Am Donnerstag, 27.01.2005, 16.30 – 18.00 Uhr, hält Prof. Dr. Maximilian Herberger im Gebäude 4, Konferenzsaal 120, in der Universität des Saarlandes einen Gastvortrag zum Thema “Urhebervertragsrecht für Übersetzer”.

Book on untranslatable words/Korinthenkacker

Christopher Moore has written a book, In Other Words, about untranslatables (via translation eXchange).

The NPR site lists some of these words/phrases from various languages. It also has an audio link for an interview.

They include ilunga, from the Tshiluba language, which has been done to death in the media.

For German, the word Korinthenkacker is given:

bq. korinthenkacker [core-in-ten-cuck-er] (noun)
A “raisin pooper” — that is, someone so taken up with life’s trivial detail that they spend all day crapping raisins. You can spot these types a mile off — it’s that irritating pen pusher or filing fanatic whose favorite job is tidying up the stationery cupboard.

I don’t think anyone’s ever told Christopher Moore the difference between raisins and currants.

It sounds like a bit of light reading, just listing such a miscellaneous collection. And is this word not chosen because it is amusing, especially to those who haven’t heard it before? It doesn’t seem untranslatable, although the best equivalents are even more colloquial (I suggest a fart in a colander). Collins has fusspot.

Strangely, Lions Club International has the web address www.korinthenkacker.de. And the book Variantenwörterbuch des Deutschen I’ve already lusted after it, but I have too much else to read) gives the synonym Tüpflischeisser.

The Policeman’s Blog

The Policeman’s Blog (via What’s New on the UK Legal Web?). On closed-circuit TV:

bq. The widespread use of CCTV in British town centres is not without its problems. The first of these is that it allows to police to arrive at situations very early on when people are highly agitated and anxious to make complaints of assault against people who happen to be standing next to them. This sets up a trail of bureaucracy which will ultimately result in the arrest, appearance in court and acquittal of both parties. A large part of my work consists of arresting people who make accusations about other people and subsequently end up getting arrested themselves. If I can generate crime numbers and get them to court I can improve my detection rate, I can also earn a bit of overtime by conducting the relevant enquiries, so it’s all win, win, win: Newtown police detection rate goes up, the participants get the satisfaction of seeing their drunken acquaintances in court and Mrs C. gets a new frock.

Walking the Streets, a traffic warden’s diary:

bq. Between one thing and another this isn’t such a bad way to make a living. The pay is better than being a Supermarket Shelf Stacker and once out of the office, you’re your own person. No time to be bored. Every day as they say, is a whole new ball game.

Website with links to language blogs/Website linkt Sprachblogs

(Amended entry)

There are a number of websites out there like ProZ and Translators Cafe that offer a meeting place for translators. Most of them have two main features of interest: an exchange between translators on questions of translation and business practice, and a job search function.

I noticed yesterday that one such site, Language Forum free translation, has a page listing a large number of recent articles from language weblogs, all posted by a ‘Senior Member’ called rssbot, and it appeared to me that registered and logged-in members could post reactions. In particular, an entry of mine on machine translation had a comment by one tupac, saying he had found something wonderful called Babelfish but misspelling it. You clicked on a title, were then taken to a page where you could make comments, and thence you could click a link to go to the original weblog.

The site describes itself as non-commercial but it obviously increases traffic to its owner(s) – in a similar way to the Belgian site that recently borrowed German blawgs. The site is run by Bernhard Huber.

Actually, at a higher level, the blogs come under this heading:

bq. The news from related translation sites
The last news and other stuff, from other language and translation sites. Les dernières nouvelles et d’autres choses d’autres sites de traduction. –Automatic RSS indexation, the forum is not responsable of the content of this pages–

bq. Misc. , Divers (6 Viewing)
News from various sites about translation and linguistic, nouvelles de sites de traduction et linguistique divers
Sub-Forums: nakedtranslations, Spanish<>English Translators, Xlation Blog, Les coups de langue de la grande rousse, BlogLatin, How to learn Swedish in 1000 difficult lessons, Language Log, Language hat, Logomacy, Lagomduktig

Anyway, I’m pleased to report that Transblawg was removed immediately I wrote to ask for it to be removed. Apparently comments are not allowed, but still, I don’t want my feed exploited on other sites, and the risk of comment somewhere in a forum there would mean I’d feel obliged to keep an eye on it.

I found the site by accident, although the owner had actually written to me in September to ask for a link.

A similar situation arose recently when a Belgian website called Izynews was providing feeds of German legal weblogs. There were discussions on a number of blawgs, for instance Udo and Clemens (in German). The latter refers to ways that work at present to stop RSS feeds from being incorporated against the owner’s will:

bq. Wer Perlen vor die Säue wirft, wird schon wissen, was er tut und muss damit leben.

bq. Ist es ihm nicht gleich, muss er auf Mittel zurückgreifen, die Nutzung per RSS/Atom einzuschränken: entweder mit eigenen technischen Mitteln, oder in kollektiver Arbeit, worauf die Argumentation von Die wunderbare Welt von Isotopp hinausläuft.

bq. Insofern scheint mir konsequent, dass ich Frames sprenge, die GALJ einbinden, Denials auf bestimmte Referrer und User Agents, die den Inhalt so klauen, dass der Leser die Herkunft nicht mehr erkennen kann, setze oder einen Original nur bei recht.us/amrecht-Popup einsetze.