Print Page | Close Window

SetTextExtractionOptions

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: Sample Code
Forum Description: Share Debenu Quick PDF Library sample code with other forum members
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2266
Printed Date: 01 May 24 at 12:46AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: SetTextExtractionOptions
Posted By: steve
Subject: SetTextExtractionOptions
Date Posted: 15 May 12 at 12:32PM
Hi,

Has anyone managed to use SetTextExtractionOptions?

In my scenario I'm trying to extract text from a PDF which has "small caps" font effect. e.g.
EXTRACT M
which is being extracted as

E XTRACT M E

By default QuickPdf Library seems to use changes in font style / size as a cue for word boundary detection, a feature I was hoping I could disable by setting:

int setSuccess = _pdfLibrary.SetTextExtractionOptions(1, 1);             //(returning 1, so being set)
string pageText = _pdfLibrary.GetPageText(4);

OptionId 1 = Use Font information matching when grouping to separate text blocks
and
1 = Ignore



Any suggestions or help much appreciated!

Steve



Replies:
Posted By: steve
Date Posted: 15 Jun 12 at 10:29AM
Support kindly responded with the following.  Using (3,1) helped me in the majority of cases:

"I also find that using all three options works pretty well.

 

  SetTextExtractionOptions(1,1);

  SetTextExtractionOptions(2,1);

  SetTextExtractionOptions(3,1);

 

You may option 6 might improve the results slightly also.

 

  SetTextExtractionOptions(6,1);

"





Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk