[Genealib] RE: Newspaper indexing
Larry Naukam
lnaukam at mcls.rochester.lib.ny.us
Mon Jul 3 13:45:25 EDT 2006
Karen Miller wrote: Our local paper takes a dim view of digitization or even
presenting full text of ANYTHING on line
Amen! I recently had a chance to talk over the phone with the local Gannett
newspaper librarian, and I came away with the idea that with as many holes
as our clipping file has, it's far better than what they have themselves.
That is: when the afternoon paper (now out of business) moved to the main
building in the 1980's THEY THREW OUT THE NEWSAPPER LIBRARY!!!!!
Back beyond 1970, they only have limited clips. There is virtually nothing
before 1920, and so on. What we do have, and is our next digitization
project, is a WPA produced 500,000 item 19th century newspaper index.
Goodness knows it not perfect but is incredibly useful. We looked at
digitizing the underlying papers, and Cornell DCAPS came up with the
estimate that it would take 7 petabytes (yes, petabytes - that's 7 million
gigabytes) of storage just for the original scans, let alone any OCR'd
items.
Some things that putative digitizers sometimes leave out of the equation is
not only what media are you going to store it on, but how often are you
going to refresh it? Where will you keep the backups? How will you know that
they are being made? If you OCR it, what is an acceptable error rate
(besides zero per cent?)
More information about the genealib
mailing list