OCR And Orphan Works

As I have written about before, I always find it fascinating to read about different scanning projects, especially when it comes to scanning old stuff.

Over at the GalleyCat blog, Jason Boog writes about using Optical Character Recognition software to dig through orphan works.

What the heck are “orphan works”? I didn’t know either. According to Wikipedia:

An orphan work is a copyrighted work for which the copyright owner cannot be identified and contacted.

Here’s the project that the GalleyCat editor was working on:

While researching an essay about New York City poets and the Great Depression last year, this GalleyCat editor read through hundreds of pages from 1930s novels, periodicals, and self-published materials that couldn’t leave the New York Public Library.

He used his digital camera to take pictures and then ABBYY FineReader Express to OCR the text.

The results were impressive. Check out the GalleyCat post to see more.

(Photo by p0psicle)





About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.


ScanSnap ix1600	ScanSnap iX1400


ScanSnap S1300i	ScanSnap iX100

Provided by Amazon

OCR And Orphan Works

About the Author

Leave a Reply 0 comments

Leave a Reply: