In Science News Online, there is an article (of December 2003) on stylometry called Bookish Math – Statistical tests are unraveling knotty literary mysteries. Thanks to Gary Muldoon of the Forensic Linguistics mailing list for the link. Stylometry is ‘the science of measuring literary style’. The article describes methods in some detail.
bq. At first glance, it might appear that the way to pinpoint a writer’s style is to study the rarest, most striking features of his or her writing. After all, it’s the unexpected words and the unusual rhetorical flourishes that seem to mark a work as uniquely Shakespearean or Dickensian.
bq. Yet the most venerable, commonly used approach of stylometrists does the opposite: It examines how writers use bread-and-butter words such as “to” and “with.” Although this approach seems counterintuitive, it’s based on sound logic.
For example, when some of the Federalist Papers were analyzed to discover whether they were written by Alexander Hamilton or James Madison, both of whom claimed authorship, about thirty rules were used, such as a rule that Hamilton used the word ‘upon’ about ten times as often as Madison did. This kind of thing is harder to copy than unusual vocabulary. This particular study was done in the early 1960s, and stylometry has greatly developed since then.
A later technique called principle-components analysis (PCA) is described in detail with illustrations of diagrams. It showed that The Royal Book of Oz was not written by Frank L. Baum. There is more, including something about neural networks (which I don’t really understand).
The article has further links, a bibliography, and a list of sources.