[Question] OCR Software?

I need to extract data from scans of 5000000 individual pages. I think I should OCR all the documents, but I don't know what would be the best option. I don't know if we should buy the software or use an online service. I think the software would be less expensive, but I don't know what would be the best product to use. Any recommendation?
 
I've got more recent experience. But not much. I was doing OCR stuff in the 2007-2009 era. But 10 years is a lot of time in the technology world.

Back then, we were using Nuance libraries to integrate OCR into own applications. But it sounds like Nuance has exited the OCR market.
 
And Caere sold OmniPage to other owners as well. I almost wonder if investigating mobile options would be better, since the idea of ingesting documents using a phone camera seems to be gaining momentum.

--Patrick
 
I had to OCR a bunch of documents for work a few years ago. We just used Acrobat Pro - if your docs are all just plain text with no tables / graphs this does the job well enough.
 
OCR and SDK software is definitely the way to extract data from 5 million pages! It sounds like you're weighing the pros and cons of purchasing software vs. using an online service, and both have their benefits. With software, you can work offline and don't have to worry about internet connectivity, but with an online service, you don't have to worry about storage space, and it's often easier to scale up. If you're in the software market, I'd recommend checking out Smart Engines SDK. It's a solid piece of tech specifically designed for high-volume OCR processing, so it would be perfect for your needs.
 
Holy crap, a spam bot that posts a link that sounds/looks actually sensible and on topic?!
What has the world come to?
 

Dave

Staff member
Holy crap, a spam bot that posts a link that sounds/looks actually sensible and on topic?!
What has the world come to?
Might not be a spam bot at all. IP address is unique. For now I’m assuming that this is a well versed person who stumbled onto the question. Either way, very informative post!
 
Top