Print Page | Close Window

How to optimize text extraction?

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: General Discussion
Forum Description: Discussion board for Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=669
Printed Date: 23 Nov 24 at 4:06AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: How to optimize text extraction?
Posted By: Dmitry
Subject: How to optimize text extraction?
Date Posted: 11 Mar 07 at 3:40AM
Hi to all!
I have a question. How to optimize time of executing the function GetPageText? Average time of execution is about one second per page. It's too long for me :-) How to reduce this time?

if I understand correctly during the text extraction qPDF library extract also all images from page. May be it will be more faster not to extract and save to harddrive images ???



Replies:
Posted By: marian_pascalau
Date Posted: 11 Mar 07 at 1:48PM
Dmitry,
there is only one way to influence the Text extraction: the Option parameter.
 
As you may know there are 5 parameters:
0: contents scan
1: internally same as 0
2: contents scan, CVS output
3: CVS text collection with rendering (may read image dictionary)
4: CVS text collection with rendering and word separation.
 
As information for you using the 0-2 Option may bring some improvements.


Posted By: Dmitry
Date Posted: 12 Mar 07 at 4:38AM
marian_pascalau, yes I know. But I need exactly parameter 5.


Posted By: Ingo
Date Posted: 12 Mar 07 at 5:00AM
". . .
qPDF library extract also all images from page
. . ."

Hi!

The actual library version doesn't extract the images anymore.

Best regards,
Ingo
 


Posted By: marian_pascalau
Date Posted: 12 Mar 07 at 5:39AM
Hi Dmitry, Hi Ingo,
I cannot follow both of you:
Dmitry, what do you mean with parameter 5?
Ingo, is it now working as expected or this is an error?
 
Marian


Posted By: Ingo
Date Posted: 12 Mar 07 at 7:11AM
Hi Marian!

It's working like accepted...
I think months ago this was fixed...
Here's a thread pointing in the same direction:
http://www.quickpdf.org/forum/search_results_posts.asp?SearchID=20070312070924&KW=asachoi

Best regards,
Ingo



Posted By: Dmitry
Date Posted: 13 Mar 07 at 5:22AM
marian_pascalau, sorry, I meant parameter 4

Ingo, please give me just direct link to the thread.


Posted By: marian_pascalau
Date Posted: 13 Mar 07 at 5:27AM
Dmitry, if you consider a sponsorship and I will try to optimize the text extraction (Option=4) for you. Otherwise you should to use the option 2 and split text with your own program.


Posted By: Dmitry
Date Posted: 13 Mar 07 at 6:28AM
marian_pascalau
No, thanks



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk