What I'm Doing and How and Why I'm Doing It

It was an accident that I ran across Burton Holmes and his travelogues. On my way back from an antiquarian book fair, I stopped at the small shop the Saratoga (CA) public library runs to sell donated and deaccessioned books. For thirty dollars they had a set of the Holmes lectures (one volume, of ten, missing!) on their table of special items. I opened a volume at random and found myself in Siberia, on the railroad, in 1901; and suddenly I was staring at a marble building on a mud street. I was hooked. I bought the books and decided that somehow I would find a way to put these out where other people could see them.

Within the last couple of years the Web has become a Big Thing. In September 1996, Karen and I registered the Hidden-Knowledge domain with InterNIC, and I chose to put this material up as my first large tryout piece. (Karen, who designs web pages professionally, has other pages that you should also check out!)

There isn't that much text in the lectures; they are mostly photographs, with enough narration to string them together and carry the narrative of the Trip Report. I scanned the text in using an OCR program, which was, as always, a wretched experience. Every OCR package promises great performance; none deliver. I could have saved time by typing the text in myself, by hand, instead of using the OCR software and then editing it back into English. But I have a long-term program of evaluating OCR performance and used this as a benchmark. It was not a successful benchmark, but that's life and the current state of the art.

The photographs were scanned in at 300 dpi, 30-color, from an HP scanner into Win95 Photoshop 3.0.5. Within Photoshop I immediately subsampled the pictures down to 150 dpi, to keep the size manageable (the 150 dpi color scans average about a megabyte each), and stored them on my hard disk. There are 137 of them for the Trans-Siberian Railway lecture alone, and this is only a third of volume eight, so you can imagine what the storage requirements would be for the full set of fourteen volumes.

The resulting 150 dpi scans are *surprisingly* crisp and clear. On a large monitor (21-inch, 1240x1024 full color) I can see details that are completely indistinguishable on the printed page, without resorting to magnifiers. It is also clear which of the photographs were altered or marked up for "greater clarity" by the book's production staff (see "Friendly Offices" as a horrible example of "improvement"). I used Photoshop to compensate for inadequate or uneven lighting or exposure, where that was possible. In the case of the picture taken by available light in the dining car, with 1901 technology, there wasn't much I could do; but if you think this version is problematic, you should see the way the original was printed in the book.

All compensation work was done with the 150dpi basic scans. These were converted to gray-scale, extraneous bits covered up, and the final results saved as highly compressed JPEG files. These expand quite well, with (to me) amazingly good fidelity to the originals. That anything could look so good, considering that it was taken from a hundred-year old book by a flatbed scanner, is amazing in itself.

The 150 dpi scans are too large, and take too long to download for most people. So I subsampled these again, down to 75 dpi, and these are the ones you see in the main pages. Each 75 dpi scan is linked to the 150 dpi version, so just click on the image you see and you will get a more detailed version to examine.

Holmes gave his original photographic slides to UCLA in 1957, the year before he died. David Ziegler, the curator of the Holmes collection there, has had some of the slide originals scanned in, and tells me that the results are stunning. Under the copyright laws, the originals are probably still in copyright; in any case, UCLA controls them. I would very much like to see some of these come out into daylight again, and David and I have had some preliminary discussions of how this might happen. Additionally, some of the slides and photos are available for commercial use from the stock photo company MUSE, in Seattle. I respect their rights and wish them well. In the meantime, from material in the public domain, I present this small sample.

I put this package together to teach people about Holmes, and about railroads, and about Russia at the turn of the century. To some extent I did this to teach myself. I wanted to become more familiar with Photoshop (an excellent product, made by my employer, Adobe Systems, Inc., and sold at shops near you, or by mail order from those awful catalogs that come every day). For many years I have been following the slow progress of OCR packages; I bought a couple of the best and tested them on this text.

And railroads have always fascinated me. For many years I traveled to the World Science Fiction Convention (my major summer holiday) by train, and this has taken me across the US by train several times. (When the convention is overseas, I fly to the general area and then take the train. No fool I.) Some day I will ride across Russia on the Trans-Siberian Railroad. What an amazing project the construction of the railroad must have been! And what do we in the US, in 1996, know about it? (n.b. This is a rhetorical question. The correct answer is "Almost nothing".)

The Web has Links. I am annotating the text (and these essays, too) with links to other sources of information. Where there is a reference to something that is the current equivalent of the 1901 reference, you should follow the link; and try not to be surprised. In other cases I have tossed in what amount to footnotes, interesting observations, maps, and anything else I can find that might make your journey more memorable. I'll continue to add links as I find them, so by all means drop back for additional visits.

Pictures can take a long time loading. Even little ones like the JPEG images we have here will make you wait. I have broken the story rather arbitrarily into thirteen chapters, so that you can see a reasonable lump of information without having to wait for the whole thing to load. The chapter titles simply identify, rather generally, the subject matter of each batch of text and pictures. Holmes did not break the piece up this way; it was published as one long essay. I've tried to maintain approximately the same layouts that the original book used, but this is not always possible in HTML. (Just for comparison, I intend to redo some of these pages in PDF, and put them up on this site for the you to see the difference.)

You can get to the pages from the contents page, or from the pages themselves. At the bottom of each chapter is a button that will always take you back to the contents page, and buttons that will take you to the next or the previous chapter. Various annotations can take you to distant URLs, or to short essays and footnotes (some of which then have other URLs to follow for further information). Be an explorer! Try the links and see where they take you!

Special thanks to: The CIA, for their marvelous map of Russia (our tax dollars at work). The good folks at the Saratoga, CA Public Library, without whom I would have probably have spent the summer rebuilding my garage. Karen, who looked at me oddly when I suggested this project, but has not yet said anything bad about it (excepting, of course, "Oh, yes, dear, I'm sure that would be an interesting idea...").

Comments? Send me mail: mjward@hidden-knowledge.com will get to me by a circuitous route of diffusion, but get to me it will. I would love to hear what you think of this project.

Michael Ward
San Jose, CA


This page created 1996; most recent update 6 June 2011.