A researcher at Georgetown University in Washington DC, Kalev Leetaru, is adding millions of newly digitized images to the photo-sharing website Flickr, in the framework of a fellowship sponsored by Yahoo!, that is the owner of Flickr.
The project aims to make available up to 14.6 million images on Flickr, all extracted from books, magazines and newspapers published over a 500 year period and digitized by the Internet Archive organization. Currently, the collection uploaded in Flickr comprises about 2.6 million public domain images, extracted and catalogued after the digitization processes.
The digitization activity of libraries, in facts, focuses on text, rather than on images, and makes the book pages available as PDF or text searchable works, while the illustrations are difficult to be search and found. That’s because the OCR program that digitizes the millions of public domain books to converge in the Internet Archive normally discards the illustrations, and this is definitely a pity considering the richness and value of at least some of those ancient images, and in any case considering that most of the images that are in the books, newspapers and catalogues are not available anywhere else in the world.
The code developed by Leetaru recovers those parts of the page which include images: the images are cropped, cleaned up, and uploaded to Flickr along with the text that appears next to them (about 500 words before and 500 after the image), along with detailed description ((title, year of publication, authors, and publisher) and searchable tags that make retrieval very easy.
“Stretching half a millennium, it’s amazing to see the total range of images and how the portrayals of things have changed over time.” said Leetaru to BBC.
Surf the whole collection: