Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - ExtractFilePageText problem
  FAQ FAQ  Forum Search   Register Register  Login Login

ExtractFilePageText problem

 Post Reply Post Reply
Author
Message
oyo@nois.no View Drop Down
Beginner
Beginner


Joined: 06 Jun 12
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote oyo@nois.no Quote  Post ReplyReply Direct Link To This Post Topic: ExtractFilePageText problem
    Posted: 30 Aug 16 at 9:44AM
Hi
I'm having problem extracting the text from the linked pdf. 
http://download.nois.no/isycad/outgoing/debenu/EKOZ-AK-M-85892-001-01.zip
The result looks something like this (strange characters):
 
 
Do anyone know what the problem might be?
 
Regards
Øyvind Knappskog Olsen
Norconsult Informasjonssystemer AS
 
Back to Top
mLipok View Drop Down
Senior Member
Senior Member
Avatar

Joined: 23 Apr 14
Location: Poland, Zabrze
Status: Offline
Points: 449
Post Options Post Options   Thanks (0) Thanks(0)   Quote mLipok Quote  Post ReplyReply Direct Link To This Post Posted: 30 Aug 16 at 2:41PM
Post code snippet showing how you trying to do that.

Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600
Back to Top
oyo@nois.no View Drop Down
Beginner
Beginner


Joined: 06 Jun 12
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote oyo@nois.no Quote  Post ReplyReply Direct Link To This Post Posted: 30 Aug 16 at 4:58PM
Hi
 
Here is my code. It Works fine for all other pdf files. Maybe there is something wrong with the pdf or maybe it is read protected... Adobe Version is 1.5. 
 

dllFile = result + "DebenuPDFLibraryDLL1115.dll";

if (File.Exists(dllFile))

{

PDFLibrary qp = new PDFLibrary(dllFile);

// A new blank document is created at this point in memory

int docID = qp.NewDocument();

// Unlock the library

int res = qp.UnlockKey(licenseKey);

string li = qp.LicenseInfo();

int lec = qp.LastErrorCode();

// Check to see if the library has been successfully unlocked

if (qp.Unlocked() == 1)

{

// Load the document that you want to extract text from into memory

qp.LoadFromFile(pdfFile, "");

int iNumPages = qp.PageCount();

// Traverse all pages

string documentText = "";

for (int nPage = 1; nPage <= iNumPages; nPage++)

{

 string pageText = qp.ExtractFilePageText(pdfFile, ""

 

, nPage, 3);

}
}
}
 
Regards
Øyvind
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 30 Aug 16 at 10:40PM
Adobe version is 1.4.
There are no security settings - it's all allowed.
No passwords... nothing.
Only web optimized - that's all.
My extractions have a similar result than the one from Oyvind.
Should have something to do with fonts, used character codes, codepages ...?

Somebody with a more detailed analysis here? Come on! ;-)

Cheers and welcome here,
Ingo

Cheers,
Ingo

Back to Top
mLipok View Drop Down
Senior Member
Senior Member
Avatar

Joined: 23 Apr 14
Location: Poland, Zabrze
Status: Offline
Points: 449
Post Options Post Options   Thanks (0) Thanks(0)   Quote mLipok Quote  Post ReplyReply Direct Link To This Post Posted: 30 Aug 16 at 11:40PM
Hello, Ingo.
I'm little busy, as I'm working on several projects as AutoIt MVP.
I'll try to look at this in few next days.

Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600
Back to Top
oyo@nois.no View Drop Down
Beginner
Beginner


Joined: 06 Jun 12
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote oyo@nois.no Quote  Post ReplyReply Direct Link To This Post Posted: 05 Sep 16 at 11:11AM
Hi
 
Thanks for your interest in my problem.
Have you had a chance to look at it?
 
Regards
Øyvind
Back to Top
mLipok View Drop Down
Senior Member
Senior Member
Avatar

Joined: 23 Apr 14
Location: Poland, Zabrze
Status: Offline
Points: 449
Post Options Post Options   Thanks (0) Thanks(0)   Quote mLipok Quote  Post ReplyReply Direct Link To This Post Posted: 06 Sep 16 at 12:35AM
Try to use 

$oQP.SelectPage($iPage_idx)
$sDocumentText &= $oQP.GetPageText(8) & @CRLF

Btw. I test it with DebenuPDFLibraryAX1311.dll, and I see the same problem.

Regards,
mLipok
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600
Back to Top
oyo@nois.no View Drop Down
Beginner
Beginner


Joined: 06 Jun 12
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote oyo@nois.no Quote  Post ReplyReply Direct Link To This Post Posted: 06 Sep 16 at 9:53AM
Hi

I tried what you suggested but it didn't work.

Ingo says something about the pdf being web optimized. Could that be a problem?

Could there be any other problems With the file?

Regards
Øyvind
Back to Top
mLipok View Drop Down
Senior Member
Senior Member
Avatar

Joined: 23 Apr 14
Location: Poland, Zabrze
Status: Offline
Points: 449
Post Options Post Options   Thanks (0) Thanks(0)   Quote mLipok Quote  Post ReplyReply Direct Link To This Post Posted: 06 Sep 16 at 4:10PM
I know that this is not working (I said that I see this problem). This was only not related remark about using SelectPage... GetPageText.

Sorry for my English..,

I can not say if this is related to the case mentioned by Ingo - I just do not know as I'm normal user as you are, and I'm not PDF technology expert.
Sorry....

Try to post to this email: Debenu Support <support@debenu.com>

Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600
Back to Top
oyo@nois.no View Drop Down
Beginner
Beginner


Joined: 06 Jun 12
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote oyo@nois.no Quote  Post ReplyReply Direct Link To This Post Posted: 08 Sep 16 at 8:16AM
OK. Thank you.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store