“This Document Contains Renderable Text” Acrobat 8
Do you ever get this message when trying to OCR a document? This means that the document has already gone through the OCR process, either completely or partially. If you find yourself having trouble editing text and you get this message then follow these steps:
- Open your PDF document and go to File > Save as
- For the ‘Save as Type’ choose TIFF, which is a type of image

- Acrobat will make separate images for each page in the document.
- Next go to File > Create PDF from Multiple Files and choose your TIFF files. Alternatively, you can select all the TIFF files and drag them as a group onto the Adobe Acrobat icon, and Acrobat will ask if you want to combine them (you do).

- Then follow the normal OCR process: [How to Edit a Scanned Document in Acrobat 8]




How can I implement the save as option in VB.Net. I need to save the PDF Files to postscript so finshing commands can be added before sending the file to the printer.
Hi Donna,
I’m afraid VB.net is a bit outside the realm of our expertise. Perhaps you could try:
The developer’s resource:
http://partners.adobe.com/
or the Adobe Forums:
http://www.adobe.com/support/forums/
Good luck.
Wow that’s a horrible kludge.
What are you thinking of? There is no reason, from a user perspective that this makes any sense. The page I am trying to recognize, in an 800 page document, has nothing on it but a scanned image and an Acrobat footer with the page number. I am asking to recognize JUST THIS PAGE, and yet Acrobat refuses to cooperate.
Yuck. So you want me to disassemble this entire file and reassemble it? How about fixing your bug instead?
Thanks.
In your case I would look at how the document was originally scanned in. If you open the scanned PDF file and go to File > Properties and look for the PDF Producer. My guess is that it will say something other than Adobe Acrobat, probably the name of your scanner. If this is the case, this means that the PDF is third party and may not work correctly with Acrobat. Adobe has two recommended methods for scanning.
1. Go to File > Create PDF > From Scanner > choose your scanner and click Scan or
2. Scan to an image format with your scanner software and then convert to PDF.
Using either of the above methods will not produce the “Renderable Text” error.
Hope this helps,
Mitch
Please note that we are not affiliated with Adobe Systems.
@D. Peterson: I can see how this could be a bit frustrating, but it would be unnecessary to “disassemble the entire file and reassemble” it. You could extract the single page, use the above steps to save as a TIF and run OCR on that, and then delete the existing page and insert the new one. Just as easy on an 800 page PDF as a 10 page PDF.
One little bit of renderable text at the bottom of each page makes it impossible to OCR the thing! Frankly, the TIFF workaround is terrible. It’s difficult for me to think of a more tedious solution. Why can’t Acrobat simply IGNORE the renderable text?!? For the kind of money we paid for this program, I expect better solutions than “convert the entire document into TIFF and then import it back into Acrobat!” Honestly, this has been a problem for years and years. Please fix this problem.
Or, if ignoring the renderable text is somehow difficult for Adobe to do, how about a function within Acrobat that converts the entire document into bitmapped form? In essence, it would “flatten” the entire document (renderable text and all) into a bitmapped form. It would accomplish *within* Acrobat what the silly TIFF-export/import workaround accomplishes. For the user, this single extra step wouldn’t be a big deal.
Let me add that Adobe has often described this issue in support forums as if it were a user problem. In essence, they have said, “The stupid user is trying to OCR a document that doesn’t need it!” (e.g., see http://acrobatsupport.com/document-contains-renderable-text) But, please understand, that we really do get it. The document really does need to be OCRed. It’s just that the OCR is prevented by a little bit of rendered text that someone has added somewhere to the document (e.g., a little notice at the bottom of the page). Don’t write us off as idiots. This error does not ONLY come when someone is trying to OCR a document that has already been OCRed.
ederosia,
We agree with you - Acrobat’s OCR function is far from perfect. Currently our best workaround is the PDF to TIFF to PDF option.
Please note that we are not affiliated with Adobe, we merely offer our suggestions to the Acrobat community as a free service.
Mitch
My mistake. I thought you were affiliated with Adobe because of the URL and your use of the Acrobat logo.
Can you please tell me what you think of this blog entry, written by an Adobe employee? It’s at http://blogs.adobe.com/acrolaw/2007/06/acrobat_81_update_fix_for_render.html The author describes a fix made by Adobe to this whole problem. However, I’ve tried the steps the writer recommends, and it didn’t solve the problem for me. Furthermore, I’ve read the Adobe Knowledge Base Article to which the author refers, and it doesn’t even refer to the fix he described. But, as I say, he seems at least semi-affiliated with Adobe. Can you comment on whether Adobe really has fixed this problem?
Although the blog author works for Adobe, his posts aren’t really official recommendations.
-
The 8.1.1 update addresses OCR, but only in regards to Asian language fonts:
http://www.adobe.com/support/downloads/detail.jsp?ftpID=3796
-
There’s also an 8.1.2 patch out there. You may want to apply that. No guarantees on OCR improvement.
http://www.adobe.com/support/downloads/detail.jsp?ftpID=3849
-
Best of luck,
Mitch
Crop the page to remove the renderable text, then Acrobat renders the OCR.
I’ve worked with a lot of legal exhibits that have gone to court and come out of court with what some refer to as Court Branding. At the top of every page is blue text that identifies the document and page number. It is that bit of text that interfers with the OCR process. We are talking about thousands of documents (really) that need to be searchable. The simple solution is to delete the text, on each and every page, sometimes 300 or 400 pages. There has got to be a better way. Tonight, I came across a similar problem with Bates numbers digitally stamped at the bottom of the page. I could not delete that. Cropping 600 pages (tonight’s document) is time consuming and believe me this was a conglomeration of many different documents, different sizes, portrait and landscape. Cropping would have cut off text that needed to be searched. I guarantee you this is only the beginning, there will be many more of these types of situations. I don’t know how these digital stamps are generated and I would like to know an easier way then mentioned previously to get rid of them.
Just an FYI, Nitro PDF has an option to insert bates numbers, but obviously you guys need the opposite ability.
Any software developers want to make some money, here’s a great idea for a simple utility, that removes them and does nothing else. If 1000 people paid 50$ for such a utility (and believe me, software customized for the legal industry is expen$$$ive) that would earn you fifty thousand.
I might just have to program this myself. So what’s the best programming language for this task?