[BACK HOME]

Lee Graham

The Nine Trees Cluster

On March 4th of 2006 I found the following very tight cluster of nine types of trees in the Origin of Species...


Spruce, Peach, Birch, Apple, Cherry, Maple, Pecan, Orange, Beech, They Are Rendered, Much More Close
[89x40 = 3,560]


The exact location of each tree name is given in the table below.

Tree Start index
(base 0)
Skip
Spruce 685,769 8,200
Peach 866,146 16,368
Birch 636,617 24,599
Apple 702,177 24,585
Cherry 620,239 10
Maple 612,050 24,596
Pecan 620,290 8,184
Orange 816,969 16,385
Beech 612,070 8,197

The cluster is 89 characters wide and 40 characters high. In it is written "THEY ARE RENDERED MUCH MORE CLOSE". I decided to see if that's truly the case. I checked other large texts of equal or greater length for the same cluster of trees. In each case I trimmed the text down to its first 1,009,229 characters so that it matched the length of The Origin of Species exactly, and found the best cluster of the nine trees listed above, according to the formula below.

In order to compare the results from different texts, I devised the following score formula for each cluster:

score = W * H * max(W/2H , 2H/W)

W * H is the area of the cluster (width times height) and max(W/2H , 2H/W) is a factor that reflects the degree to which the cluster deviates from an aesthetic 2:1 aspect ratio. As the enclosing rectangle's aspect ratio deviates from 1:1, the more likely it becomes to contain a given cluster anyway, but I'll post up more on that in the near future. This extra factor makes the score more closely approximate what one might intuitively consider a "tight" cluster than simply giving the raw area. It penalizes clusters with aspect ratios that are far from 2:1. As you'd expect, a lower score means a better cluster. I didn't break ties based on area, so when I ran a search on The Origin of Species for the cluster, it found one slightly larger in area, but with the same score as the one shown above.

Here are the results of that experiment...

Text Author(s)

Cluster dim.

Score

The Origin of Species Charles Darwin

89 x 42

3,960.5

The Descent of Man Charles Darwin

92 x 42

4,232.0

The History of England David Hume

97 x 42

4,704.5

A Treatise of Human Nature David Hume

91 x 49

4,802.0

War and Peace Leo Tolstoy (trans: Louise & Aylmer Maude)

102 x 47

5,202.0

Ulysses James Joyce

103 x 49

5,304.5

Wealth of Nations Adam Smith

103 x 51

5,304.5

The Count of Monte Cristo Alexandre Dumas

105 x 51

5,512.5

Summa Theologica Part I-II Thomas Aquinas

98 x 53

5,618.0

Shakespeare's First Folio/35 Plays William Shakespeare

109 x 54

5,940.5

The KJV Bible Unknown

109 x 56

6,272.0


Not too bad for a cooked list.. er.. um, I mean a cluster I found near the words "They Are Rendered Much More Close". It's interesting (and meaningful?) to note that the runner up is also a book by Darwin, and that the book that came in dead last is the KJV Bible.

These results have prompted me to work a bit harder at evaluating the "goodness" of a cluster, so that I can come up with experiments potentially much more impressive than this one. I will post more on that front in the future.

[BACK TO THE ORIGIN CODES]