Let’s imagine that you are freelance translator and your customer asked you to translate a file in PDF format. As usual PDF files are recognized and it is not a problem to count words in such files. Just copy the text to MS Word and perform word counting using a built-in word count engine. So, you implicitly agree on this job. But when you get this PDF file and open it, you understand that it is unrecognized. You may know that there is possible to combine in PDF both recognized text and unrecognized images. Let’s imagine also that unfortunately you didn’t agree with your customer that for scan jobs you are paid on a per hour basis and therefore your customer demands job to be done on a per word basis.
So, you need to count words in this PDF file in any ways. How can you perform this? There are two methods to count words in PDF file: free of charge and paid…
Let’s begin from the free of charge method. So, to count unrecognized PDF file, you need to recognize it at first. It is cool if you have already bought some good paid OCR program like Abbyy FineReader or if you have Adobe Acrobat Professional which has a built-in OCR engine. But we are reviewing free of charge ways to count unrecognized PDF file and therefore we need to get a free OCR tool to recognize your PDF file.
After searching for free OCR tools I chose FreeOCR because this program can recognize PDF files. FreeOCR can be downloaded from http://www.paperfile.net/freeocr.exe
After installation (by the way, FreeOCR requires the .Net Framework V2.0 from Microsoft installed) run the program. You will get a window like on the screenshot attached. To recognize PDF file, it is advisable to click Open PDF button, choose your PDF file, choose OCR language and then click OCR button. After recognition, export the text, which you have got, to Word.
Get some statistics using the MS Word built-in tool (in MS Word 2007 click Review > Word Count).
But I would like to draw your attention that downloaded FreeOCR has only English OCR language installed. More OCR languages you can find out on http://www.paperfile.net/ocr_lang.htm
So, let’s see the summary of this free way:
1. Free of charge
1. Time-consuming process.
2. Only English OCR language installed. Thus it is necessary to download the other languages from the web-site.
3. Small number of available OCR languages.
4. The program can recognize only one page per one time. Therefore you need to switch pages if you have more than one page in you PDF file.
5. Sometimes you will get mistakes in recognized file.
6. Recognized text doesn’t erase when you open and recognize a new document.
7. Due to the fact that software is free of charge, you cannot be 100% sure that it is fully safety for your information.
8. Not fully accuracy counting by MS Word.
Also you can submit your file to a free online OCR at http://www.free-ocr.com/ (OCR available only for English, German, French, Italian, Dutch or Spanish). This method has almost the same pros and contras as the previous method plus there is bigger risk for the safety of your information and you should wait while your file will be downloaded on the web-site.
I would like to notice that for all methods, which are considering in this article, you need to have a good quality and resolution of images in your unrecognized PDF file to ensure the most accuracy word count.
Now I propose to consider the paid alternative. There is software which has been developed especially for counting. As example, I will consider AnyCount 7.0 software. This program can count words, characters and lines in more than 36 formats. Also it can count words in unrecognized PDF files. To count such PDF file, you need just to choose it, choose a PDF Graphic Recognition language and click Count button. The program will recognize your PDF file and count it automatically.
You can evaluate the program downloading it from http://www.anycount.com/download.html
The summary of the paid way of unrecognized PDF files counting is the following:
1. Time-saving process. You do not need to spend your time on recognition of the PDF file.
2. Support for more than 20 graphic recognition languages.
3. Accuracy counting.
4. Support counting for more than 35 formats. Therefore you will be able to use the program to count the most part of your files.
5. Developed by the reliable well-known software company Advanced International Translations Ltd. that develops software for the translation industry (such as Projetex, Translation Office 3000, etc.) since 2001. Thus you can be sure that your information will not be used by third-parties.
So, as you may see, the free variant of unrecognized PDF counting will be expedient to use as a temporary and quick one-time solution. If you need a quick and extensive word count (or any other statistics, like character and line count), it is better to use a professional word count software.