Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
ocr - where is the recognized text? |
Post Reply |
Author | |
wubuer
Beginner Joined: 25 Nov 15 Status: Offline Points: 1 |
Post Options
Thanks(0)
Posted: 25 Nov 15 at 7:56AM |
thanks, it's help a lot.
|
|
vladob
Beginner Joined: 13 Jan 12 Status: Offline Points: 4 |
Post Options
Thanks(0)
|
Many thanks for your precious help
It works
Have a nice day
V.
|
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
OCR text is often inserted into an invisible text object that cannot be seen but can be extracted with GetPageText text extraction functions within QPL.
int ret = QP.LoadFromFile("ocred.pdf", ""); string s = QP.GetPageText(3); // you can also try option 7 or 8. |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Vladimir!
Don't know if i understand your question right but ... First there's a scanned invoice for example. It's scanned as an image to pdf first. You can view this pdf via QuickPDF, changing properties and so on but textextraction isn't possible. Then there are ocr-tools available going through this pdf making readable textcontent from the "image-pdf". For this the "image-pdf" remains the same but additionally the ocr-tool inserts real textcontent. Now you can extract this text with QuickPDF and things like fulltext search and others are possible. With QuickPDF you can determine if there's an "ocr-ed" 'cause while textextraction there's an option to extract with fontnames... ocr-fonts are very special fonts and mostly inside the fontname there's an "ocr" too. The other thing how to determine an ocr-pdf is: If the inserted imagecount is the same than the pagecount and if the images have the same dimensions as the pages. I hope i could help a little bit and perhaps now you have further ideas ;-) Cheers and welcome here, Ingo |
|
vladob
Beginner Joined: 13 Jan 12 Status: Offline Points: 4 |
Post Options
Thanks(0)
|
Hi all
I have following question, when you ask OCR software to read picture PDF (scanned pictures into PDF), OCR engines inject recognized text into PDF file, can you let me know where? I mean how I can access those recognized text with QuickPDF?
Many thanks
Vladimir
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store