Print Page | Close Window

Problem with GetPageTex

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1739
Printed Date: 19 Apr 25 at 6:11AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Problem with GetPageTex
Posted By: lopezik
Subject: Problem with GetPageTex
Date Posted: 15 Feb 11 at 11:20PM
Hello to all forum users!

First of all sorry for my bad English.

I have a problem with the library functions to text extraction from a PDF file.
GetPageText function is not working as it should.
I have a file which does not return all the characters. How can that be?
I also tried ExtractFilePageText - without success.

I'm using:
Quick PDF Library 7.23
Delphi XE
Windows XP 32bit

For the tests I threw sample PDF file here: http://uploading.com/files/fe47f7cf/pdf_test.pdf/

Please help.
Thanks for any tips.



Replies:
Posted By: Ingo
Date Posted: 16 Feb 11 at 1:48PM
Hi Lopezik!

This behavior depends on your used fonts and/or characterset.
You should search here in the forum-threads with strings like "unicode" or "utf8" or ...
We have unicode-samples here, too.
Then you should get all characters.
Without unicode i've got only "Ró y Wiatrów 7" from your pdf.

Cheers and welcome here,
Ingo



Posted By: Dimitry
Date Posted: 16 Feb 11 at 2:52PM
The problem was replicated and fixed.
Please check next Quick PDF Library versions. 


-------------
Regards,
Dmitry


Posted By: lopezik
Date Posted: 16 Feb 11 at 10:38PM
Thank you for your reply, but unfortunately the problem is still valid.
I used the following code, but it does not help.

procedure TForm1.Button1Click(Sender: TObject);
var
 QP: TQuickPDF;
 S: AnsiString;
 FS: TFileStream;
 UTF8BOM: AnsiString;
begin
 QP := TQuickPDF.Create;
 try
   QP.UnlockKey('');
   QP.LoadFromFile('c:\pdf_test.pdf');
   S := QP.GetPageText(0);
   FS := TFileStream.Create('c:\pdf_test.txt', fmCreate);
   UTF8BOM := #$EF#$BB#$BF;
   FS.Write(UTF8BOM[1], Length(UTF8BOM));
   if Length(S) > 0 then
     FS.Write(S[1], Length(S));
   FS.Free;
 finally
   QP.Free;
 end;
end;


A newer version (Quick PDF Library 7.24 Beta 2) it does not give advice.

Still missing one character - 'z' with the dot

Do you have any ideas?


Posted By: Ingo
Date Posted: 16 Feb 11 at 10:42PM
Hi!

Yes. The prob is still valid.
Dimitry has written that the bug is fixed and future releases
won't have this problem.
So you can wait up to the next release 7.25.

Cheers, Ingo


Posted By: lopezik
Date Posted: 16 Feb 11 at 11:05PM
Hi

Waiting impatiently. This is a very big problem for us.
When can we expect the first beta of 7.25 version?




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk