Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
SetTextExtractionOptions |
Post Reply |
Author | |
steve
Beginner Joined: 15 May 12 Status: Offline Points: 3 |
Post Options
Thanks(0)
Posted: 15 May 12 at 12:32PM |
Hi,
Has anyone managed to use SetTextExtractionOptions? In my scenario I'm trying to extract text from a PDF which has "small caps" font effect. e.g. EXTRACT ME which is being extracted as E XTRACT M E By default QuickPdf Library seems to use changes in font style / size as a cue for word boundary detection, a feature I was hoping I could disable by setting: int setSuccess = _pdfLibrary.SetTextExtractionOptions(1, 1); //(returning 1, so being set) string pageText = _pdfLibrary.GetPageText(4); OptionId 1 = Use Font information matching when grouping to separate text blocks and 1 = Ignore Any suggestions or help much appreciated! Steve |
|
steve
Beginner Joined: 15 May 12 Status: Offline Points: 3 |
Post Options
Thanks(0)
|
Support kindly responded with the following. Using (3,1) helped me in the majority of cases:
"I also find that using all three options works pretty well.
SetTextExtractionOptions(1,1); SetTextExtractionOptions(2,1); SetTextExtractionOptions(3,1);
You may option 6 might improve the results slightly also.
SetTextExtractionOptions(6,1); " |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store