The Origin Codes webpage was written tongue-in-cheek. I expect it was cheesy enough to be obvious. It was created to show just how easily one can find skip codes, and tight-fitting clusters of related words that seem to predict events that occurred subsequent to the publication of the original text. Provided you have a computer to do the grunt work for you, and a large enough document to process, you can find just about anything if you use a little care. If fact, though it may look as though I put a great deal of effort into the Origin Codes, it was remarkably simple, and required very little effort at all. Not only was the effort on my part minimal, but the computer program was written in such a way that it barely scratched the surface in terms of examining the superabundance of hidden skip codes in the text. All of the skip codes and clusters in the Origin Codes were found by a computer program that limited itself to skip distances between 2 and 50,000 letters (in order that the program would run faster). In a text over a million characters in length this constraint means that only a small fraction of the available letter sequences were even brought into consideration by the computer in order to find the codes and clusters I've presented. Had I made full use of the resources, it's quite likely that better codes and tighter clusters would have been found. There are several important pieces of information that are deliberately left out in order to exaggerate the "effect" of the codes. 1) A great many of the skip code words shown in the grid images occur an astronomical number of times in the text, so that finding one instance that is clustered tightly with the other words in a cluster is trivial. This is especially true for words that are very short, and for words containing letters that are very common in the English language such as R, S, T, L, N, and E. Consider the following: the text from Darwin's book is 1,009,229 characters long. Using an even harsher skip distance constraint than the one mentioned above (2 to 5,000 letters instead of 2 to 50,000) the word "LEE", can be found 7,309,040 times - that's seven times more than the number of letters in the bare text! The number of ways to find skip codes is truly mind-boggling, and makes the task far easier than one might first presume. In the mind of the reader then, failing to grasp the sheer vastness of this search space will likely artificially boost the significance of any words and clusters discovered. As a final illustration of this point, here are two images of the Human Genome Project grid from the Origin Codes. The first is the cluster I presented in the link off the main page. It contains a single instance of "DNA" highlighted in bright red. The second image shows the same grid, but all instances of "DNA" which fit inside have been highlighted. It should be clear from this example that the placement of "DNA" in the original image is more akin to an artistic choice than anything with some deeper meaning. The same is true of longer words, although there are less occurrences of them. If one picks a few words, the computer can find the closest-fitting group. Given the number of possibilities, that grouping is often very tight.
2) The second important piece of information that has been left out is that I tell the computer exactly what to look for. I tell it precisely which words to search out, and it finds the smallest clustering of those words that it can. Darwin's text, through the computer program, is not telling you or I anything about events of historical significance that have or will transpire in the outside world (well, the skip codes aren't telling us anything - the actual text is certainly telling us a lot about biology!). The only information the computer is providing is the size and location of the cluster that has been requested, if there is one to be found. Everything else comes from me. It's up to me to determine not only what constitutes an interesting group of words, but exactly what those words are. It's as if I were to scan the first billion digits of pi to find my own birthday. Sure, I might find it, but no significance can be attached to that discovery. For example: when searching for the Yitzhak Rabin clusters, I had originally tried to cluster the words "Rabin" and "Yitzhak", but the computer found no instances of the latter. This wasn't a serious problem; I simply compensated by finding both "Yigal" and "Amir", and all the other related words. The artificially inflated impression of significance was not diminished, especially since I didn't mention that "Yitzhak" is nowhere to be found. Note also that it's completely up to me to decide which words are "related" to an historical event as well. Instead of looking at a cluster and declaring "Gee, look what the computer found!", a more appropriate comment would be "Gee, look what this guy told the computer to search for!". Sure, it's a lot less exciting, but that's the point. 3) The clusters that have been presented for specific historical events are by no means the only ones present in the text. If I were to use different words to "describe" an event, I'd find different clusters of different sizes, and in different locations. Even if I were to stick with the same group of words, there are other clusters to be found - the computer simply tries to find the smallest one. The fact that a cluster can be found for an event is not so much a matter of luck as it is a matter of perseverance on my part - if no cluster is found for a particular set of words, I need only try another set of words. Eventually a set can be found that will cluster nicely. Take, for example, the assassination of Yitzhak Rabin. I presented the following two clusters in the Origin Codes... ![]() Amir, Assassin, Rabin, Dead, Mid, East [24x23] |

|
Those are almost certainly not the best clusters I could have found (the second one is quite large). Had I chosen different words, however, I could have presented other clusters. For example, by adding a few more words, I could have built upon the following one...
Or maybe this one...
Or there's this one...
And this one...
And this one too...
I could go on indefinitely here, continually producing variations on this theme, which reminds me...
Another approach would be to find clusters that seem to contradict historical events. Like this one...
Although what I've presented here is by no means a mathematical disproof of the Bible Codes and Torah Codes, I think I've shown here that, at the very least, they're a lot less impressive than they first seem. These clusters are everywhere in a large enough text, and with the right computer program, they're easy to find. By picking and choosing which words to cluster, one can find "predictions" of just about anything one likes. Given the enormous number of possible sets of words that one might use to "describe" an event, the different tenses of verbs, plural vs. singular forms, first and last names, nicknames, acronyms... etc, it shouldn't be surprising that these tight-fitting groups can be found. For a much better treatment of the issues and problems with Bible and Torah codes, and for a more technical and mathematical debunking, visit Brendan McKay's page on the subject or THIS PAGE from the New Mexicans for Science and Reason. There are two good articles on the CSICOP website regarding this issue as well, HERE and HERE. I guess I'll end this here, with a few more clusters I found in the text - just for fun ;)
That last one refers to THIS :) NOTE: For my fellow computer geeks out there: HERE is the Origin of Species text file that contains all these clusters. If anyone would like the code-hunting computer program I wrote, just email me (lee AT stellaralchemy DOT com). I spent very little time on it, so it's not especially user friendly (otherwise I'd just post it up on the web). |