Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Is posible extract Text from this PDF |
Post Reply |
Author | |
bart_bender
Beginner Joined: 04 Oct 11 Location: Spain Status: Offline Points: 17 |
Post Options
Thanks(0)
Posted: 15 Nov 11 at 12:45PM |
Hello,
I'm trying extract text from the next PDF document Tiff6.pdf -> click from download http://www.megaupload.com/?d=4WP3ZVO0 the GetPageText method always return empty strings, The security flag value for ( 5 = Content Copying or Extraction ) is (6 = Allowed) Is posible extract the text? If i open the document with acrobat reader i can saved the text from the document. I'm using the 8.12 DLL version with Vb.NET thanks in advance best regards |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Bart!
What's the name of the pdf?! Yes, it's tiff...pdf! I don't wanna wait the seconds if the "mega-upload" starts but i think it's not possible to extract text 'cause the content was a tiff which was converted to a pdf-document (but it's still an image). That's a main problem. There are ocr-tools to add the textcontent read from the inserted image into the pdf. Cheers, Ingo |
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
The text extraction is working pretty well on this PDF with 8.13 beta 2. I am getting the correct string results from this PDF.
This file is secured with a master password but that is not a problem for QPL 8.11. If you were using QPL 7.xx then you would need to call QP.SetAdvancePassword(""); before QP.LoadFromFile() Can you send me the source code you are using to text extraction. Andrew.
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi again!
Cause Andrew told it's working i've used megaupload
and waited the 45 seconds :-(
and yes... he's right i can extract the textcontent, too.
I've tried QP 7.26 and it works without SetAdvancePassword.
Only Load, decrypt and extract and it works.
Cheers, Ingo
|
|
bart_bender
Beginner Joined: 04 Oct 11 Location: Spain Status: Offline Points: 17 |
Post Options
Thanks(0)
|
Hello Again,
Andrew, Ingo thanks for your help I'm using the 8.12 version Andrew. This is a sample of the code that i'm using. Private Function pGetPDFContent(ByVal documento As MemoryStream, Optional ByVal Password As String = "") As String documento.Seek(0, SeekOrigin.Begin) Dim docid As Integer = Qp.LoadFromString(documento.GetBuffer, Password) docid = Qp.SelectedDocument Return pGetPDFContent(docid) End Function Private Function pGetPDFContent(ByVal docId As Integer, Optional ByVal CloseDoc As Boolean = True) As String Qp.SelectDocument(docId) Dim salida As String = "" For i = 1 To Qp.PageCount Qp.SelectPage(i) Qp.RotatePage(-Qp.PageRotation) salida &= Qp.GetPageText(0) Next If CloseDoc Then Qp.RemoveDocument(docId) End If Return salida End Function |
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
1. Can you check the value of docid to make sure it is not 0.
2. You shouldn't need the QP.RotatePage. I suspect that the document is not being loaded correctly. If the document is loaded correctly then QP.PageCount should be 121. Andrew.
|
|
bart_bender
Beginner Joined: 04 Oct 11 Location: Spain Status: Offline Points: 17 |
Post Options
Thanks(0)
|
Yes, the document load correcly and the page count is 121 but the
GetPageText return "" with or without rotation
|
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
GetPageText(0) uses a very simple algorithm to extract text and is not suitable for all documents.
Can you change the code to GetPageText(3) and see if the text is extracted correctly. We have just re added GetPageText(1) which uses the more complex text extraction options but only outputs the raw text strings similar to option 0. This will be released with the 8.13 Final Release due to be released very soon. Andrew.
|
|
bart_bender
Beginner Joined: 04 Oct 11 Location: Spain Status: Offline Points: 17 |
Post Options
Thanks(0)
|
Thanks Andrew
|
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
Most of the professional PDF tools such as Acrobat Pro, Nitro Professional, Foxit Phantom have inbuilt OCR engines that create searchable text based on the OCR results.
Andrew. |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store