<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Debenu Quick PDF Library - PDF SDK Community Forum : Extract text</title>
  <link>http://www.quickpdf.org/forum/</link>
  <description><![CDATA[This is an XML content feed of; Debenu Quick PDF Library - PDF SDK Community Forum : I need help - I can help : Extract text]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Fri, 01 May 2026 05:05:52 +0000</pubDate>
  <lastBuildDate>Tue, 30 May 2006 02:39:40 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.01</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>www.quickpdf.org/forum/RSS_post_feed.asp?TID=423</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]></title>
   <url>http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png</url>
   <link>http://www.quickpdf.org/forum/</link>
  </image>
  <item>
   <title><![CDATA[Extract text : Hi There,  I&amp;#039;m having a few...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1890.html#1890</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=276">tren</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 30 May 06 at 2:39AM<br /><br />Hi There,<br /><br />I'm having a few issues with GetPageText(4), the one that returns each word and its quads. Several of the "words" still contain spaces in them, or they repeat themselves constantly. This issue doesn't happen if I extract a single line with GetPageText(3).<br /><br />Here is some example output:<br /><br />By Line:<br />"EOFGEO+Palatino-Roman",#000000,12.29,119.3814,705.3093,492.3365,705.3093,492.3365,717.7753,119.3814,717.7753,"nature, and thereby - or so he thought - freedom. Later, Bentham"<br /><br />By Word:<br />"EOFGEO+Palatino-Roman",#000000,12.29,119.3814,705.3093,157.6965,705.3093,157.6965,717.7753,119.3814,717.7753,"naturnature,"<br />"EOFGEO+Palatino-Roman",#000000,12.29,162.4776,705.3093,229.2728,705.3093,229.2728,717.7753,162.4776,717.7753,"and therthereby"<br />"EOFGEO+Palatino-Roman",#000000,12.29,234.0539,705.3093,240.1997,705.3093,240.1997,717.7753,234.0539,717.7753,"-"<br />"EOFGEO+Palatino-Roman",#000000,12.29,244.9807,705.3093,256.5469,705.3093,256.5469,717.7753,244.9807,717.7753,"or"<br />"EOFGEO+Palatino-Roman",#000000,12.29,261.3279,705.3093,273.2506,705.3093,273.2506,717.7753,261.3279,717.7753,"so"<br />"EOFGEO+Palatino-Roman",#000000,12.29,278.0317,705.3093,291.0730,705.3093,291.0730,717.7753,278.0317,717.7753,"he"<br />"EOFGEO+Palatino-Roman",#000000,12.29,295.8541,705.3093,339.1324,705.3093,339.1324,717.7753,295.8541,717.7753,"thought"<br />"EOFGEO+Palatino-Roman",#000000,12.29,343.9135,705.3093,492.3365,705.3093,492.3365,717.7753,343.9135,717.7753,"- frfreedom. LaterLater, Bentham"<br /><br />Is this a known issue? I'm tempted to do string processing and compare the two outputs but would prefer not to. Any guidance appreciated.]]>
   </description>
   <pubDate>Tue, 30 May 2006 02:39:40 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1890.html#1890</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text : &amp;#034;...why did you write QP.Free...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1889.html#1889</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 30 May 06 at 2:24AM<br /><br />"...why did you write QP.Free two times?..."<br /><br />Hi Quicker!<br /><br />I've done it to prevent memory-problems.<br />Each 100 pages i'm starting new. So i can extract any document.<br /><br />Best regards,<br />Ingo<br />]]>
   </description>
   <pubDate>Tue, 30 May 2006 02:24:22 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1889.html#1889</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text : Hi Quicker!  It&amp;#039;s the code...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1888.html#1888</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 30 May 06 at 2:21AM<br /><br />Hi Quicker!<br /><br />It's the code here in the thread.<br /><br />Best regards,<br />Ingo<br />]]>
   </description>
   <pubDate>Tue, 30 May 2006 02:21:16 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1888.html#1888</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text :   Ingo wrote:Hi Ulrich! I&amp;#039;ve...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1886.html#1886</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=351">Quicker</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 30 May 06 at 12:58AM<br /><br /><P><table width="99%"><tr><td class="BBquote"><img src="forum_images/quote_box.png" title="Originally posted by Ingo" alt="Originally posted by Ingo" style="vertical-align: text-bottom;" /> <strong>Ingo wrote:</strong><br /><br />Hi Ulrich! <BR><BR>I've written already to you... <BR>A last idea: <BR>What about CombineLayers before extraction? <BR><BR>Best regards, <BR>Ingo <BR></td></tr></table> </P><P>Ingo,<BR>please write your solution (what you wrote to Ulrich) here...</P>]]>
   </description>
   <pubDate>Tue, 30 May 2006 00:58:04 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1886.html#1886</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text :   ukobsa wrote:Hi Ingo, here&amp;#039;s...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1885.html#1885</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=351">Quicker</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 30 May 06 at 12:56AM<br /><br /><P><table width="99%"><tr><td class="BBquote"><img src="forum_images/quote_box.png" title="Originally posted by ukobsa" alt="Originally posted by ukobsa" style="vertical-align: text-bottom;" /> <strong>ukobsa wrote:</strong><br /><br />Hi Ingo, <BR><BR>here's the code I use (based on code of one of your former postings) <BR><BR><BR>greetings, <BR>Ulrich</td></tr></table> </P><P>Hi Ulrich,<BR>why did you write QP.Free two times?</P>]]>
   </description>
   <pubDate>Tue, 30 May 2006 00:56:50 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1885.html#1885</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text : Hi Ulrich!  I&amp;#039;ve written...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1884.html#1884</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 29 May 06 at 3:25PM<br /><br />Hi Ulrich!<br /><br />I've written already to you...<br />A last idea:<br />What about CombineLayers before extraction?<br /><br />Best regards,<br />Ingo<br />]]>
   </description>
   <pubDate>Mon, 29 May 2006 15:25:54 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1884.html#1884</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text : Hi Ingo,  thanks for your help...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1883.html#1883</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=370">ukobsa</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 29 May 06 at 9:50AM<br /><br />Hi Ingo,<br /><br />thanks for your help but unfortunatly it doesn't work. It still cannot extract the word 'Test'. It only extracts the additional information:<br /><br />"BAAAAA+TimesNewRomanPSMT",#000000,12.00,56.7000,776.6920,77.4240,776.6920,77.4240,784.7920,56.7000,784.7920,""<br /><br />Also when I save the file and reload it bofore then it cannot extract anything (That's why I have set it in comments oin the code below).<br /><br />here's the code I use (based on code of one of your former postings)<br /><br />  FName := 'c:\temp\test4.pdf';<br />  QP := TiSEDQuickPDF.Create;<br />  try<br />&nbsp;&nbsp;&nbsp;&nbsp;QP.UnlockKey('');<br />&nbsp;&nbsp;&nbsp;&nbsp;dafh := QP.DAOpenFile(FName, '');<br />&nbsp;&nbsp;&nbsp;&nbsp;//QP.SaveToFile(FName);<br />&nbsp;&nbsp;&nbsp;&nbsp;//dafh := QP.DAOpenFile(FName, '');<br />&nbsp;&nbsp;&nbsp;&nbsp;x := QP.DAGetPageCount(dafh);<br />&nbsp;&nbsp;&nbsp;&nbsp;STR := '';<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;AssignFile(cf, FName + '_ex2.txt');<br />&nbsp;&nbsp;&nbsp;&nbsp;Rewrite(cf);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;i1 := 1;<br />&nbsp;&nbsp;&nbsp;&nbsp;pc := 0;<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;for i := 1 to x do<br />&nbsp;&nbsp;&nbsp;&nbsp;begin<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dapr := QP.DAFindPage(dafh, i);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;STR := QP.DAExtractPageText(dafh, dapr, 3);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;WriteLn(cf, Trim(STR));<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pc := pc + 1;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (pc = 100) then<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pc := 0;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QP.DACloseFile(dafh);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QP.Free;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QP := TiSEDQuickPDF.Create;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QP.UnlockKey('');<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dafh := QP.DAOpenFile(FName, '');<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end;<br />&nbsp;&nbsp;&nbsp;&nbsp;end;<br />&nbsp;&nbsp;&nbsp;&nbsp;QP.DACloseFile(dafh);<br />&nbsp;&nbsp;&nbsp;&nbsp;CloseFile(cf);<br />  finally<br />&nbsp;&nbsp;&nbsp;&nbsp;QP.Free;<br />  end;<br /><br />Do you have any additional idea? As far as I have seen from looking on the code it seems that QuickPDF has problems this text, where the single letters are referenced objects (?)<br /><br />I have emailed my test-PDF to you.<br /><br />greetings,<br />Ulrich]]>
   </description>
   <pubDate>Mon, 29 May 2006 09:50:51 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1883.html#1883</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text : Hi Quicker!  I didn&amp;#039;t get...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1882.html#1882</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 29 May 06 at 7:58AM<br /><br />Hi Quicker!<br /><br />I didn't get any files from you.<br />Put them anywhere online and i'll see.<br />I think what i've written to Ulrich would help you, too.<br /><br />Best regards,<br />Ingo<br /><br />]]>
   </description>
   <pubDate>Mon, 29 May 2006 07:58:10 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1882.html#1882</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text : Hi Ulrich!  I&amp;#039;ve done the...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1881.html#1881</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 29 May 06 at 7:46AM<br /><br />Hi Ulrich!<br /><br />I've done the same with Word and the PDFCreator.<br />Extraction is possible:<br />First LoadFromFile<br />then SaveToFile //only to be sure that the file is readable with quickpdf<br />again LoadFromFile //the same saved file<br />then DAExtractPageText //with option 3!!!<br /><br />Best regards,<br />Ingo<br />]]>
   </description>
   <pubDate>Mon, 29 May 2006 07:46:31 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1881.html#1881</guid>
  </item> 
  <item>
   <title><![CDATA[Extract text : Please check accounts on ewetel.net...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text_topic423_post1880.html#1880</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=351">Quicker</a><br /><strong>Subject:</strong> 423<br /><strong>Posted:</strong> 29 May 06 at 7:31AM<br /><br />Please check accounts on ewetel.net and pdf-analyzer.com]]>
   </description>
   <pubDate>Mon, 29 May 2006 07:31:54 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text_topic423_post1880.html#1880</guid>
  </item> 
 </channel>
</rss>