<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Debenu Quick PDF Library - PDF SDK Community Forum : ExtractFilePageText</title>
  <link>http://www.quickpdf.org/forum/</link>
  <description><![CDATA[This is an XML content feed of; Debenu Quick PDF Library - PDF SDK Community Forum : I need help - I can help : ExtractFilePageText]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Sun, 05 Apr 2026 13:12:22 +0000</pubDate>
  <lastBuildDate>Wed, 16 Aug 2017 08:18:13 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.01</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>www.quickpdf.org/forum/RSS_post_feed.asp?TID=3492</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]></title>
   <url>http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png</url>
   <link>http://www.quickpdf.org/forum/</link>
  </image>
  <item>
   <title><![CDATA[ExtractFilePageText : Hi Reg,I&amp;#039;ve made some tests...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13891.html#13891</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 3492<br /><strong>Posted:</strong> 16 Aug 17 at 8:18AM<br /><br />Hi Reg,<br><br>I've made some tests with the pdf...<br>The source is from BricsCAD. <br>It's converted from dwg-format.<br>I myself have the same probs while extracting text.<br>Perhaps a codepage problem?<br>Rendering works but there are few text parts overlaying each other.<br>BTW: At the end there's a malformed xref table.<br><br>With google i've found many community-posts having to do with problems using the direct pdf-export-function from BricsCAD.<br>Another thing: Encoding is identity-H - this can be a problem, too.<br>My advice you won't get a proper textextraction with pdf-documents from the same source. Sorry. Anyway... If you'll succeed please let us know with your "how to...". Thanks.<br><br>]]>
   </description>
   <pubDate>Wed, 16 Aug 2017 08:18:13 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13891.html#13891</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText : http://elcc.se/download/ExtractFilePageText.zipI...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13890.html#13890</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2970">REGH</a><br /><strong>Subject:</strong> 3492<br /><strong>Posted:</strong> 16 Aug 17 at 6:47AM<br /><br /><a href="http://elcc.se/download/ExtractFilePageText.zip" target="_blank" rel="nofollow">http://elcc.se/download/ExtractFilePageText.zip</a><br>I would like to use option=3 to get the bounding box coordinates for creation of links, but since the text with this option is gibberish, I tried in addition to use option=2 and merging them together, but I can't find an obvious way to match the results with each other in order to get a result of bounding box coordinates <b>and </b>readable text...<br><br>]]>
   </description>
   <pubDate>Wed, 16 Aug 2017 06:47:59 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13890.html#13890</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText : Option 2 is like option 3 but...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13889.html#13889</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 3492<br /><strong>Posted:</strong> 15 Aug 17 at 7:37PM<br /><br />Option 2 is like option 3 but a bit more accurate in extracting.<div>Don't mix the options. Each resulting content can differ a little bit (otherwise the two options make no sense) and this can lead to "nearly" duplicate content.</div><div>At the top you've used the DASetTextExtractionOptions - this will work only with DA-functions! Don't mix both types of functions!</div><div>Your hoster wants my email-adress - he won't get it ;-)</div><div>If the ttf-font is not common and if it's not embedded this can lead in bad extraction, too.</div><div>&nbsp;</div><div><br></div><div><br></div><div><br></div>]]>
   </description>
   <pubDate>Tue, 15 Aug 2017 19:37:33 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13889.html#13889</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText : Hi Ingo,The file I&amp;#039;m extracting...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13888.html#13888</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2970">REGH</a><br /><strong>Subject:</strong> 3492<br /><strong>Posted:</strong> 15 Aug 17 at 5:29PM<br /><br />Hi Ingo,<br>The file I'm extracting texts from is created from a CAD drawing (having TTF texts).<br>When I tried my code, but instead used a pdf created from MS Word there is no problem.<br>However, this is my VB code for testing the text extraction:<br><font color="#0000FF">&nbsp;&nbsp;&nbsp; <font color="#000000">QP.UnlockKey (strLicenseKey)</font><br>&nbsp;&nbsp;&nbsp; <font color="#000000">QP.DASetTextExtractionOptions 12, 0</font> <font color="#339900">'Include rotated texts</font><br>&nbsp;&nbsp;&nbsp; <font color="#000000">QP.DASetTextExtractionOptions 8, 1</font> <font color="#339900">'Ignorera duplicates</font><br>&nbsp;&nbsp;&nbsp; <font color="#000000">QP.DASetTextExtractionOptions 5, 1</font><font color="#339900"> 'Sort</font><br>&nbsp;&nbsp;&nbsp; <br>&nbsp;&nbsp;&nbsp; <font color="#000000"><font color="#0000FF">For </font>iOption = 2 <font color="#0000FF">To </font>3 <font color="#0000FF">Step </font>1<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strTmpText = QP.ExtractFilePageText("C:\Temp\N09A.pdf", "", 1, iOption)<br>&nbsp;&nbsp;&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; iOutFileNo = FreeFile<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strOutFileName = "C:\Temp\Option=" &amp; iOption &amp; ".txt"<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color="#0000FF">Open </font>strOutFileName <font color="#0000FF">For Output As</font> #iOutFileNo<br>&nbsp;&nbsp;&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TextArray = Split(strTmpText, vbCr)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color="#0000FF">For </font>i = 0 <font color="#0000FF">To UBound</font>(TextArray) - 1<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color="#0000FF">Print </font>#iOutFileNo, TextArray(i)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color="#0000FF">Next </font>i<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color="#0000FF">Close </font>#iOutFileNo<br>&nbsp;&nbsp;&nbsp; <font color="#0000FF">Next </font>iOption<br><br>One of the rows in each generated text file which seems to refer to the same text and looks like below for Option=2:<br>67.46,526.32,#000000,1.4,"AAAAAA+ArialNarrow","ELDU 400V A-Matning, STV AH.N09A, BIOLINJE 1, HE.B34.10.01.11"<br><br>And för Option=3 the same text is:<br>"AAAAAA+ArialNarrow",#000000,13.73,67.4646,523.4061,416.9278,523.4061,416.9278,523.4061,67.4646,523.4061," ? ???&nbsp;&nbsp; ?A&nbsp; ?&nbsp; ? ? ??? ?A?&nbsp;&nbsp;&nbsp; A ? ???? ???&nbsp; ???&nbsp; ??&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "<br><br>Option=2 gives me readable text, but Option=3 doesn't.<br>Here's a link to a zip containing the two textfiles and the pdf used for the test.<br><a href="http://www.filehosting.org/file/details/686981/ExtractFilePageText.zip" target="_blank" rel="nofollow">http://www.filehosting.org/file/details/686981/ExtractFilePageText.zip</a><br><br></font></font>]]>
   </description>
   <pubDate>Tue, 15 Aug 2017 17:29:36 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13888.html#13888</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText : Hi Reg,strange behavior you&amp;#039;re...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13887.html#13887</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 3492<br /><strong>Posted:</strong> 14 Aug 17 at 9:22PM<br /><br />Hi Reg,<div><br></div><div>strange behavior you're telling from.</div><div>For me the extract functions are the most stable ones in the library.</div><div>What you should do is:</div><div>Post your relevant code snippet here - so perhaps somebody here can determine problems inside your code.</div><div>Upload the pdf you're working with anywhere to a free file hoster - so we can try own extractions to see if the problem is the pdf itself ;-)</div><div><br></div><div>Cheers and welcome here,</div><div>Ingo</div><div><br></div>]]>
   </description>
   <pubDate>Mon, 14 Aug 2017 21:22:17 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13887.html#13887</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText : Hi!I&amp;#039;m using ExtractFilePageText...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13886.html#13886</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2970">REGH</a><br /><strong>Subject:</strong> 3492<br /><strong>Posted:</strong> 14 Aug 17 at 5:16PM<br /><br />Hi!<br>I'm using <strong>ExtractFilePageText</strong> trying to extract textstrings and bounding box coordinates for the texts. But when I use option 3 (<em>Font Name, Text Color, Text Size, X1, Y1, X2, Y2, X3, Y3, X4, Y4, Text</em>) I don't get the text i (human) readable format. None of the options that result in readable text gives me the bounding box coordinates. Is there a way to work around this?<br><br>I tried to make two extractions (option=2 and option=3) putting the results into two different arrays, and then merge them together. But when I use option 2 some text objects are&nbsp; read twice which gives me two arrays having different number of texts...<br><br>]]>
   </description>
   <pubDate>Mon, 14 Aug 2017 17:16:03 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext_topic3492_post13886.html#13886</guid>
  </item> 
 </channel>
</rss>