Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Searching a string in an existing PDF file
  FAQ FAQ  Forum Search   Register Register  Login Login

Searching a string in an existing PDF file

 Post Reply Post Reply
Author
Message
balane78 View Drop Down
Beginner
Beginner
Avatar

Joined: 13 Feb 12
Location: Paris
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote balane78 Quote  Post ReplyReply Direct Link To This Post Topic: Searching a string in an existing PDF file
    Posted: 15 Feb 12 at 1:58PM
Hi
Sorry for this newbie question which will most probably looks stupid but I am crawling since yesterday in documentation.
Which function should I use to search a predefined string inside a PDF file and get page number ?
Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 5:27PM
Hi,

the normal way is more than one step.

You may gon page by page through your document. Depending from your goal, stop with the first result or building a list of places, where the searchstring is found.

First get the text of a page by GetPageText. Depending from your wishes and knoeledge about the PDF you want to be searched, use the right extractoption. In the result of GetPageText you uses the correct kind of search (which you have to code in your program by yourself), depending on choosen extractoption. Play with extractionoptions and look into its output. Keep an eye on option 3 or 4.

Important! Because a PDF is by definition not a kind of wordprocessing datafile, textextraction may not guarantee to detect words as words. In a PDF a word can be drawn letter by letter and in a wrong order. So the textextraction of QuickPDF has a harder job, as it seems to be. But in normal case (shall mean: a text is written with one font, one size and without tricks in order) you have only problems with words going from end of a line to start of next line. They will possible come as two words, but are in your searchstring only one word.

To get more information use the search for other posts in forum, dealing with textextraction and searching words.


Werner



Edited by edvoigt - 15 Feb 12 at 5:35PM
Back to Top
balane78 View Drop Down
Beginner
Beginner
Avatar

Joined: 13 Feb 12
Location: Paris
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote balane78 Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 6:22PM
OK thanks.
BTW I wonder how is Acrobat Reader search tool working.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3530
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 7:35PM
Acrobat Reader comes along with an over 100-mb-installation...
so it's probably a bit faster ;-)
Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 7:56PM
Hi,

I did the following test: a very small word-text, printed as PDF with PDF-Creator. It looks so:

Test search-
ing with acro-
bat

For better understanding, inside it looks so:
[(T)-15.8907(e)-2.05734(s)3.21993(t)0.721099( )-3.16695(f)7.49943(o)-6.3339(r)-4.55617( )-3.16695(s)3.21993(e)-2.05734(a)-2.05734(r)-4.55617(c)-2.05726(h)5.7217(-)333]TJ
11.52 TL
T*[(i)0.721099(n)5.7217(g)5.7217( )-27.2782(w)10.7194(i)0.721099(t)0.721099(h)5.7217( )-3.16695(a)-2.05734(c)-2.05734(r)-4.55617(o)-6.3339(-)]TJ
11.4 TL
T*[(b)-6.3339(a)-2.05734(t)0.721161( )]TJ


I have marked the word Test by red.


Try a search for "search-", Acrobat Reader X dont find it! But "searching" is found.

Conclusion: Acrobat does a lot of things, to get the searchresults. It seems, as they would prepare the text by omitting some chars (newline, -+newline). Its a little bit like a compiler ignores comments and spaces.

In most cases, it finds you are searching for.

You see it depend on quality of textextraction, preparation  and searchtactics.

Werner
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store