Weekend At Linux - Going Paperless

Weekend At Bernies I installed my first Linux distribution back in the early 1990s. With my super-slow dialup modem, I downloaded the 20 or so floppy disk images of Slackware, copied them to floppies, and installed it.[1]

Since then, I have used Linux off and on, but haven’t done so in quite a few years. In the early years of DocumentSnap, one of the more popular posts was about how to use the Fujitsu ScanSnap in Linux, which works (apparently) fairly well thanks to the SANE project.

It was with great interest that I came across this post by Nathan Willis over on Linux.com, entitled Weekend Project: Create a Paperless Linux Office.

Nathan takes us through how he uses gscan2pdf to scan documents to searchable PDF.

That’s where optical character recognition (OCR) comes in. OCR recognizes letterforms in the scanned document image and outputs actual text, which is precisely what we’re after. But rather than run a command-line OCR program on every scanned image and produce a .txt file, it’s better to combine the two into a single document, and hopefully a single step. That’s the purpose of gscan2pdf, a lightweight GUI application that has a built-in SANE scanner interface, an OCR engine, and the ability to write PDF documents that embed the OCRed text and use the scanned image as a background for improved legibility.

I wasn’t familiar with gscan2pdf, so if you are looking at going paperless using Linux, check out the article.

How about you, Linux fanatics? What software do you use to go paperless? I’d love to hear about it in the comments.

(Photo by KobraSoft)

I think it was on a 386, but don’t quote me on that. ↩





About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 3 comments

backyard - November 14, 2011 Reply

You won't go back to PDF once you use djvu. The quality is much better and the file size is about half the size. I use djview4 and I can convert it to a PDF if I need to send something to someone.

backyard - November 13, 2011 Reply

I've used Linux in the hopes of going paperless. It's been a process that I've been working on for awhile. I'm surprised how much paper we get in our lives. It doesn't seem so bad because it's all put up. Gscan2pdf has been my main tool to achieve my paperless goal. The only downside to gscan2pdf is that the scan clean up uses unpaper and unpaper has many different switches that you can use to set the clean up, deskewing, despeckle etc yet gscan2pdf only allows you to set a few so you won't be able to effectively use unpaper.

I also use truecrypt to create a file folder that is encrypted since I have documents that have some personal info. I usually save in djvu format since it is very compact and it's quality is better than pdf. I also OCR most everything since you can index the text file for searching. I use recoll to index the files for search. I also make multiple backup of the files with one being local and the other on dropbox.

My next goal is to get my books into digital format. I'm doing this with scantailor. You basically use a camera to scan the book in. I don't have many which is good since it's a bit tedious use this method.