Joining html sections / Viele html-Dateien verbinden?

Die meisten Gesetze kann man heutzutage online als ganze Texte finden, nicht bloß als anklickbare Einzelparagrafen. Wenn aber nur letztere Möglichkeit besteht (z.B. bei älteren Fassungen oder ausländischen Gesetzen), gibt es eine technische Möglichkeit, diese Einzel-Webseiten zu einem Dokument zu verbinden?

Vorgeschlagen wurde mir, dieses mit STAR Transit oder Deja Vu Translation Memory Software zu tun. Gibt es Alternativen?

If I find a statute online and it is only available in the form of one page per section, is there an easy way to download it and combine the sections into one document? I know there are programs that download whole sites, but how can I join the sections?

It has been suggested to me that I can do this using translation memory software – Deja Vu and STAR Transit should both do it (I have the latter). Is there any other way?

13 thoughts on “Joining html sections / Viele html-Dateien verbinden?

  1. Trados also has a similar function – SDL Trados Glue.

    Some years ago, before such a function existed, I wrote my own VisualBasicScript that would do the same. Let me know if you want me to dig it out.

    Iwan

    • Thanks, Iwan. The reason I wrote this was because I asked some time ago – on the help list? – and yesterday I received a more-or-less anonymous email enquiring if I had succeeded. I wanted to do the old version of the BGB, but I never got round to it because I no longer needed it. But I’ll bear your script in mind if I ever have the problem again.

  2. What I’d do (have done, in fact, for the Austrian RIS at some point), is use a few Shell scripts to download the HTML files, use “cat” to join them and then a few regular expressions to get rid of the superfluous markup and such.

    This requires that all the URLs are known in advance (in a list, say), or can be determined easily by incrementing a counter, or similar.

    Would you share the URL with us? That’d make it much easier.

    • Well, as I already said, but not very clearly, I don’t have an urgent need myself at the moment, because shortly after I wanted to do it, the project changed. But as a result of my asking on the Yahoogroups list ‘help’, someone else emailed me yesterday asking if I knew how to do it.
      I don’t know what Shell scripts are, but I have used freeware programs to download in the past.

      • I see. Well, the “Shell” is basically the command line for your computer, ie a text-driven user interface. It’s not point-and-click, but it’s not exactly black magic, either. The UNIX program “cat” was made specifically for the purpose of conCATenating files, so once you have all the files I imagine “cat *.html > new.html” would be all that’s required, taking all HTML files and putting them in the new.html file.

        Just to clarify, if I sound somewhat condescending, that was not my intention.

        • No, you did not sound condescending. Are you beginning to think you ought to?!
          The problem is that I wanted to try this out for myself, since there seems no point discussing it without trying it – but the project I wanted to use it for no longer requires it. I wanted the old BGB – pre-2001 – at
          http://dejure.org/gesetze/0BGB010102
          but obviously not the whole of dejure.org. I can well imagine this is straightforward, but when I tried to set Win HTTrack to do it, I couldn’t. Now I should do something else.
          I suppose the capitalization of Shell is a Germanism? It sounds a bit like the oil company.

          • I’d have used wget instead of WinHTTrack, but that really shouldn’t matter much in the end. The real problem I see is in cleaning up the files, getting rid of all the links, ads and what not. Can be done, certainly, but would still be a fairly large editing job.

            I suppose “Shell” was a Germanism, yes.

          • Yes, I see the problem. I think I have enough information in case I need it one day, which was all I wanted really. I didn’t know about the Unix possibility. If I use my Translation Memory program to join the files, that could be too expensive for someone who doesn’t use TM, so the unknown questioner should be in a better position now.

  3. Identity theft is also a well-known current term and – like digital photo and copyright theft – is not a misnomer.

    Even Charles Arthur’s retort falls into the narrow trap of the parochial British definition of theft in s. 1(1) of the UK Theft Act 1968.

    The French take a different approach: illicit or furtive snapshots, including in a ‘private place’, taken without the target’s consent are a breach of privacy (‘right of personality’)and an offence on par with theft triggering civil sanctions of damages, publication of any judgment and seizure of the offending photo.

    Also, consider who is committing theft, even in the UK, if a heavy – with a dodgy past – doesn’t like his mugshot taken in public and physically presses the photographer into handing over the film, as has happened to tourists in London who have innocently caught such shady spoilsports on camera.

  4. Variety shows sometimes included electrical acts. I wonder whether your lodger owned the apparatus or merely the intellectual property. Buster Keaton’s Electric House is the classic.

    • Thanks. Why did I expect it would be you who commented on this?

      I couldn’t find any electrical sketches in the OED. There was (inter alia) a definition of sketch dated 1892: ‘”Sketches” – the new name for small or condensed, and in some cases, mutilated stage plays, the acting time of which shall not be more than 40 minutes, and the performers in which shall not be more than six.’ This was in the Daily News.
      Maybe the content was similar to the Buster Keaton one, or maybe electrical appliances were used?

      • I’m now assailed by doubt. Given the rapidly evolving world of useful electrons at the time, were Keaton’s gags inspired by different issues to those of his predecessors?

Leave a Reply to MM Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.