Let’s imagine that you are freelance translator and your customer asked you to translate a file in PDF format. As usual PDF files are recognized and it is not a problem to count words in such files. Just copy the text to MS Word and perform word counting using a built-in word count engine. So, you implicitly agree on this job. But when you get this PDF file and open it, you understand that it is unrecognized. You may know that there is possible to combine in PDF both recognized text and unrecognized images. Let’s imagine also that unfortunately you didn’t agree with your customer that for scan jobs you are paid on a per hour basis and therefore your customer demands job to be done on a per word basis.
So, you need to count words in this PDF file in any ways. How can you perform this? There are two methods to count words in PDF file: free of charge and paid…
Let’s begin from the free of charge method. So, to count unrecognized PDF file, you need to recognize it at first. It is cool if you have already bought some good paid OCR program like Abbyy FineReader or if you have Adobe Acrobat Professional which has a built-in OCR engine. But we are reviewing free of charge ways to count unrecognized PDF file and therefore we need to get a free OCR tool to recognize your PDF file.
After searching for free OCR tools I chose FreeOCR because this program can recognize PDF files. FreeOCR can be downloaded from http://www.paperfile.net/freeocr.exe
After installation (by the way, FreeOCR requires the .Net Framework V2.0 from Microsoft installed) run the program. You will get a window like on the screenshot attached. To recognize PDF file, it is advisable to click Open PDF button, choose your PDF file, choose OCR language and then click OCR button. After recognition, export the text, which you have got, to Word.
Get some statistics using the MS Word built-in tool (in MS Word 2007 click Review > Word Count).
But I would like to draw your attention that downloaded FreeOCR has only English OCR language installed. More OCR languages you can find out on http://www.paperfile.net/ocr_lang.htm
So, let’s see the summary of this free way:
Also you can submit your file to a free online OCR at http://www.free-ocr.com/ (OCR available only for English, German, French, Italian, Dutch or Spanish). This method has almost the same pros and contras as the previous method plus there is bigger risk for the safety of your information and you should wait while your file will be downloaded on the web-site.
I would like to notice that for all methods, which are considering in this article, you need to have a good quality and resolution of images in your unrecognized PDF file to ensure the most accuracy word count.
Now I propose to consider the paid alternative. There is software which has been developed especially for counting. As example, I will consider AnyCount software. This program can count words, characters and lines in more than 36 formats. Also it can count words in unrecognized PDF files. To count such PDF file, you need just to choose it, choose a PDF Graphic Recognition language and click Count button. The program will recognize your PDF file and count it automatically.
You can evaluate the program downloading it from https://anycount.com/try-free/
The summary of the paid way of unrecognized PDF files counting is the following:
So, as you may see, the free variant of unrecognized PDF counting will be expedient to use as a temporary and quick one-time solution. If you need a quick and extensive word count (or any other statistics, like character and line count), it is better to use a professional word count software.