Print Page | Close Window

ExtractFilePageText problem

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3380
Printed Date: 28 Sep 24 at 10:57PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: ExtractFilePageText problem
Posted By: oyo@nois.no
Subject: ExtractFilePageText problem
Date Posted: 30 Aug 16 at 9:44AM
Hi
I'm having problem extracting the text from the linked pdf. 
http://download.nois.no/isycad/outgoing/debenu/EKOZ-AK-M-85892-001-01.zip" rel="nofollow - http://download.nois.no/isycad/outgoing/debenu/EKOZ-AK-M-85892-001-01.zip
The result looks something like this (strange characters):
 
 
Do anyone know what the problem might be?
 
Regards
Øyvind Knappskog Olsen
Norconsult Informasjonssystemer AS
 



Replies:
Posted By: mLipok
Date Posted: 30 Aug 16 at 2:41PM
Post code snippet showing how you trying to do that.



-------------
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600


Posted By: oyo@nois.no
Date Posted: 30 Aug 16 at 4:58PM
Hi
 
Here is my code. It Works fine for all other pdf files. Maybe there is something wrong with the pdf or maybe it is read protected... Adobe Version is 1.5. 
 

dllFile = result + "DebenuPDFLibraryDLL1115.dll";

if (File.Exists(dllFile))

{

PDFLibrary qp = new PDFLibrary(dllFile);

// A new blank document is created at this point in memory

int docID = qp.NewDocument();

// Unlock the library

int res = qp.UnlockKey(licenseKey);

string li = qp.LicenseInfo();

int lec = qp.LastErrorCode();

// Check to see if the library has been successfully unlocked

if (qp.Unlocked() == 1)

{

// Load the document that you want to extract text from into memory

qp.LoadFromFile(pdfFile, "");

int iNumPages = qp.PageCount();

// Traverse all pages

string documentText = "";

for (int nPage = 1; nPage <= iNumPages; nPage++)

{

 string pageText = qp.ExtractFilePageText(pdfFile, ""

 

, nPage, 3);

}
}
}
 
Regards
Øyvind


Posted By: Ingo
Date Posted: 30 Aug 16 at 10:40PM
Adobe version is 1.4.
There are no security settings - it's all allowed.
No passwords... nothing.
Only web optimized - that's all.
My extractions have a similar result than the one from Oyvind.
Should have something to do with fonts, used character codes, codepages ...?

Somebody with a more detailed analysis here? Come on! ;-)

Cheers and welcome here,
Ingo



-------------
Cheers,
Ingo



Posted By: mLipok
Date Posted: 30 Aug 16 at 11:40PM
Hello, Ingo.
I'm little busy, as I'm working on several projects as AutoIt MVP.
I'll try to look at this in few next days.



-------------
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600


Posted By: oyo@nois.no
Date Posted: 05 Sep 16 at 11:11AM
Hi
 
Thanks for your interest in my problem.
Have you had a chance to look at it?
 
Regards
Øyvind


Posted By: mLipok
Date Posted: 06 Sep 16 at 12:35AM
Try to use 

$oQP.SelectPage($iPage_idx)
$sDocumentText &= $oQP.GetPageText(8) & @CRLF

Btw. I test it with DebenuPDFLibraryAX1311.dll, and I see the same problem.

Regards,
mLipok


-------------
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600


Posted By: oyo@nois.no
Date Posted: 06 Sep 16 at 9:53AM
Hi

I tried what you suggested but it didn't work.

Ingo says something about the pdf being web optimized. Could that be a problem?

Could there be any other problems With the file?

Regards
Øyvind


Posted By: mLipok
Date Posted: 06 Sep 16 at 4:10PM
I know that this is not working (I said that I see this problem). This was only not related remark about using SelectPage... GetPageText.

Sorry for my English..,

I can not say if this is related to the case mentioned by Ingo - I just do not know as I'm normal user as you are, and I'm not PDF technology expert.
Sorry....

Try to post to this email: Debenu Support <support@debenu.com>



-------------
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600


Posted By: oyo@nois.no
Date Posted: 08 Sep 16 at 8:16AM
OK. Thank you.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk