Print Page | Close Window

Extracting text problem

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1085
Printed Date: 01 Jul 24 at 5:22AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Extracting text problem
Posted By: RobertN
Subject: Extracting text problem
Date Posted: 13 May 09 at 3:04PM

I have created a simple form in Excel with cells that have '@VariableName'

in them. I print to PDF and then open the pdf using QuickPDF and delphi.
I want to scan the pdf for all text that has mailto:@somevariablename - '@somevariablename' and get the fontsize,coordinates,etc and then convert them into formfields.
The purpose is to create a pdf form filler that i can save the results from.
 
I tried to do a GetPageText(3) but the results don't have any readable text. If I try a pdf with formfields i get the extracted text properly.
 
How do I extract this text ?
 
Thank you,
Robert



Replies:
Posted By: Ingo
Date Posted: 14 May 09 at 1:48AM
Hi Robert!

In your case i think the content of "@some..." will be single strings/words ...
So it should be better to use GetPageText(4).

Perhaps it's possible for you to send me a sample of your files and then i'll try to extract the strings with "@some..."?

ingo  [ dot ]  schmoekel  ( at )  ewetel  [ dot ]  net

Cheers, Ingo
 


Posted By: RobertN
Date Posted: 14 May 09 at 8:39AM
Hi Ingo,
 
here is a sample pdf file with the "@sometext" in it.
http://www.mediafire.com/file/wzmmoznzuwf/Temperature_Transmitter_Template.pdf - http://www.mediafire.com/file/wzmmoznzuwf/Temperature_Transmitter_Template.pdf
it was generated using excel and printed to PDF via PrimoPDF.
I have tried a few other printer drivers, but the result was the same.
 
I tried  GetPageText() with 0,1,2,3,4 but all with the same result.
I can open it in Acrobat Reader and extract the text without a problem.
 
 
Thank you,
Robert


Posted By: Ingo
Date Posted: 14 May 09 at 8:51AM
Hi!

I would be careful about the versions of PrimoPDF. They are using the ghostscript-library and with older versions (before 8.15) QuickPDF still has problems while extracting! Your pdf was made with PrimoPDF and ghostscript-version 8.50 ... so this is okay. Looking in the extracted text i can find many variables beginning with "@" ... so i think basically it's working.
Adobe Reader (8.1) and Foxit (3.0) can't find "@sometext", too.
Is it a special moment while adding "@sometext" to the content?
How do you do this?
Any code parts for us here to check?

Cheers, Ingo



Posted By: RobertN
Date Posted: 14 May 09 at 9:47AM
here is essentially what i'm doing in Delphi 7.
 
 
procedure TForm1.Button2Click(Sender: TObject);
var oDoc : TQuickPDF0713;
    sTemp,sFilename : string;
begin
  sFilename := 'c:\Temperature_Transmitter_Template.pdf';
  oDoc := TQuickPDF0713.Create;
  try
  if oDoc.UnlockKey('...') = 1
  then begin
         if oDoc.LoadFromFile(sFilename) = 1
         then begin
                sTemp := oDoc.GetPageText(0);
                ShowMessage(sTemp);
                // this returns an empty string
              end
         else begin
                ShowMessage('invalid PDF');
              end;
       end
  else begin
         ShowMessage('Invalid KEY');
       end;
  finally
    FreeAndNil(oDoc);
  end;
end;
The output is blank for GetPagetext() 0,1
for 2 - I get the text coordinates,etc in CSV format
for 3 and 4 - I get the same as 2, but all text is garbled.
Do i need to convert it.
 
sample output :
"UBTAOI+Arial",#000000,6.71,60.1272,118.3487,295.7056,118.3487,295.7056,124.6588,60.1272,124.6588,"())*++,-../*, )0 +-)0)+*.("
 
Thanks,
Robert


Posted By: Ingo
Date Posted: 14 May 09 at 10:01AM
I've sent an email in this case to Debenu ... ;-)

Cheers, Ingo


Posted By: deabrew
Date Posted: 14 May 09 at 5:53PM
Hello Robert, Ingo,

I'd like to confirm that Ingo has notified me, and that we will support this issue in a future version (fairly shortly).

Regards, Karl.


Posted By: RobertN
Date Posted: 14 May 09 at 8:17PM

I just recreated the PDF sample using DoPDF print driver instead of PrimoPDF and everything works now in detecting the text using QuickPDF.

Thank you again for the quick responses.


Posted By: deabrew
Date Posted: 14 May 09 at 8:34PM
Excellent -- note, we have also added support for this functionality within the next build (7.14) of QPL.

Cheers, -Karl



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk