Debenu Quick PDF Library - PDF SDK Community Forum : Extract text

Debenu Quick PDF Library - PDF SDK Community Forum : Extract text http://www.quickpdf.org/forum/ Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved. Wed, 17 Jun 2026 17:57:54 +0000 Tue, 30 May 2006 02:39:40 +0000 http://blogs.law.harvard.edu/tech/rss Web Wiz Forums 11.01 360 www.quickpdf.org/forum/RSS_post_feed.asp?TID=423 <![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]> http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png http://www.quickpdf.org/forum/ <![CDATA[Extract text : Hi There, I'm having a few...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1890.html#1890 Author: tren
Subject: 423
Posted: 30 May 06 at 2:39AM

Hi There,

I'm having a few issues with GetPageText(4), the one that returns each word and its quads. Several of the "words" still contain spaces in them, or they repeat themselves constantly. This issue doesn't happen if I extract a single line with GetPageText(3).

Here is some example output:

By Line:
"EOFGEO+Palatino-Roman",#000000,12.29,119.3814,705.3093,492.3365,705.3093,492.3365,717.7753,119.3814,717.7753,"nature, and thereby - or so he thought - freedom. Later, Bentham"

By Word:
"EOFGEO+Palatino-Roman",#000000,12.29,119.3814,705.3093,157.6965,705.3093,157.6965,717.7753,119.3814,717.7753,"naturnature,"
"EOFGEO+Palatino-Roman",#000000,12.29,162.4776,705.3093,229.2728,705.3093,229.2728,717.7753,162.4776,717.7753,"and therthereby"
"EOFGEO+Palatino-Roman",#000000,12.29,234.0539,705.3093,240.1997,705.3093,240.1997,717.7753,234.0539,717.7753,"-"
"EOFGEO+Palatino-Roman",#000000,12.29,244.9807,705.3093,256.5469,705.3093,256.5469,717.7753,244.9807,717.7753,"or"
"EOFGEO+Palatino-Roman",#000000,12.29,261.3279,705.3093,273.2506,705.3093,273.2506,717.7753,261.3279,717.7753,"so"
"EOFGEO+Palatino-Roman",#000000,12.29,278.0317,705.3093,291.0730,705.3093,291.0730,717.7753,278.0317,717.7753,"he"
"EOFGEO+Palatino-Roman",#000000,12.29,295.8541,705.3093,339.1324,705.3093,339.1324,717.7753,295.8541,717.7753,"thought"
"EOFGEO+Palatino-Roman",#000000,12.29,343.9135,705.3093,492.3365,705.3093,492.3365,717.7753,343.9135,717.7753,"- frfreedom. LaterLater, Bentham"

Is this a known issue? I'm tempted to do string processing and compare the two outputs but would prefer not to. Any guidance appreciated.]]> Tue, 30 May 2006 02:39:40 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1890.html#1890 <![CDATA[Extract text : "...why did you write QP.Free...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1889.html#1889 Author: Ingo
Subject: 423
Posted: 30 May 06 at 2:24AM

"...why did you write QP.Free two times?..."

Hi Quicker!

I've done it to prevent memory-problems.
Each 100 pages i'm starting new. So i can extract any document.

Best regards,
Ingo
]]> Tue, 30 May 2006 02:24:22 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1889.html#1889 <![CDATA[Extract text : Hi Quicker! It's the code...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1888.html#1888 Author: Ingo
Subject: 423
Posted: 30 May 06 at 2:21AM

Hi Quicker!

It's the code here in the thread.

Best regards,
Ingo
]]> Tue, 30 May 2006 02:21:16 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1888.html#1888 <![CDATA[Extract text : Ingo wrote:Hi Ulrich! I've...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1886.html#1886 Author: Quicker
Subject: 423
Posted: 30 May 06 at 12:58AM

Originally posted by Ingo Ingo wrote:

Hi Ulrich!

I've written already to you...
A last idea:
What about CombineLayers before extraction?

Best regards,
Ingo

Ingo,
please write your solution (what you wrote to Ulrich) here...

]]> Tue, 30 May 2006 00:58:04 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1886.html#1886 <![CDATA[Extract text : ukobsa wrote:Hi Ingo, here's...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1885.html#1885 Author: Quicker
Subject: 423
Posted: 30 May 06 at 12:56AM

Originally posted by ukobsa ukobsa wrote:

Hi Ingo,

here's the code I use (based on code of one of your former postings)

greetings,
Ulrich

Hi Ulrich,
why did you write QP.Free two times?

]]> Tue, 30 May 2006 00:56:50 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1885.html#1885 <![CDATA[Extract text : Hi Ulrich! I've written...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1884.html#1884 Author: Ingo
Subject: 423
Posted: 29 May 06 at 3:25PM

Hi Ulrich!

I've written already to you...
A last idea:
What about CombineLayers before extraction?

Best regards,
Ingo
]]> Mon, 29 May 2006 15:25:54 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1884.html#1884 <![CDATA[Extract text : Hi Ingo, thanks for your help...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1883.html#1883 Author: ukobsa
Subject: 423
Posted: 29 May 06 at 9:50AM

Hi Ingo,

thanks for your help but unfortunatly it doesn't work. It still cannot extract the word 'Test'. It only extracts the additional information:

"BAAAAA+TimesNewRomanPSMT",#000000,12.00,56.7000,776.6920,77.4240,776.6920,77.4240,784.7920,56.7000,784.7920,""

Also when I save the file and reload it bofore then it cannot extract anything (That's why I have set it in comments oin the code below).

here's the code I use (based on code of one of your former postings)

FName := 'c:\temp\test4.pdf';
QP := TiSEDQuickPDF.Create;
try
    QP.UnlockKey('');
    dafh := QP.DAOpenFile(FName, '');
    //QP.SaveToFile(FName);
    //dafh := QP.DAOpenFile(FName, '');
    x := QP.DAGetPageCount(dafh);
    STR := '';

    AssignFile(cf, FName + '_ex2.txt');
    Rewrite(cf);

    i1 := 1;
    pc := 0;

    for i := 1 to x do
    begin
      dapr := QP.DAFindPage(dafh, i);
      STR := QP.DAExtractPageText(dafh, dapr, 3);
      WriteLn(cf, Trim(STR));
      pc := pc + 1;
      if (pc = 100) then
      begin
        pc := 0;
        QP.DACloseFile(dafh);
        QP.Free;
        QP := TiSEDQuickPDF.Create;
        QP.UnlockKey('');
        dafh := QP.DAOpenFile(FName, '');
      end;
    end;
    QP.DACloseFile(dafh);
    CloseFile(cf);
finally
    QP.Free;
end;

Do you have any additional idea? As far as I have seen from looking on the code it seems that QuickPDF has problems this text, where the single letters are referenced objects (?)

I have emailed my test-PDF to you.

greetings,
Ulrich]]> Mon, 29 May 2006 09:50:51 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1883.html#1883 <![CDATA[Extract text : Hi Quicker! I didn't get...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1882.html#1882 Author: Ingo
Subject: 423
Posted: 29 May 06 at 7:58AM

Hi Quicker!

I didn't get any files from you.
Put them anywhere online and i'll see.
I think what i've written to Ulrich would help you, too.

Best regards,
Ingo

]]> Mon, 29 May 2006 07:58:10 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1882.html#1882 <![CDATA[Extract text : Hi Ulrich! I've done the...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1881.html#1881 Author: Ingo
Subject: 423
Posted: 29 May 06 at 7:46AM

Hi Ulrich!

I've done the same with Word and the PDFCreator.
Extraction is possible:
First LoadFromFile
then SaveToFile //only to be sure that the file is readable with quickpdf
again LoadFromFile //the same saved file
then DAExtractPageText //with option 3!!!

Best regards,
Ingo
]]> Mon, 29 May 2006 07:46:31 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1881.html#1881 <![CDATA[Extract text : Please check accounts on ewetel.net...]]> http://www.quickpdf.org/forum/extract-text_topic423_post1880.html#1880 Author: Quicker
Subject: 423
Posted: 29 May 06 at 7:31AM

Please check accounts on ewetel.net and pdf-analyzer.com]]> Mon, 29 May 2006 07:31:54 +0000 http://www.quickpdf.org/forum/extract-text_topic423_post1880.html#1880