Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Extract text next to a Tag field
  FAQ FAQ  Forum Search   Register Register  Login Login

Extract text next to a Tag field

 Post Reply Post Reply
Author
Message
chrisreed View Drop Down
Team Player
Team Player
Avatar

Joined: 29 Apr 13
Location: Australia
Status: Offline
Points: 35
Post Options Post Options   Thanks (0) Thanks(0)   Quote chrisreed Quote  Post ReplyReply Direct Link To This Post Topic: Extract text next to a Tag field
    Posted: 29 Apr 13 at 6:42AM
I am looking to trial QuickPDF to see if it can extract text values from a medical report as follows:
 
Exam Date: 23/11/2011     DOB: 26/03/1966     MRN: C1234567
 
Referring Dr: A, Smith      Sonographer: G Perry
 
etc....
 
The idea is to locate certain predefined Tag Fields in bold and then read the text value next to them.
 
eg. I would use some QuickPDF function to search for the Tag Exam Date: and then read in the text value right next to this (23/11/2011).
 
Is this something that QuickPDF can do and what function would I call?
 
Thanks Chris.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 29 Apr 13 at 7:30AM
Hi Chris!
 
This looks as if the "tag-text" is always at the same place.
In this case you can use the extract functionalities. They offer
an additional option to extract text with position data. So it's
possible for you to determine detailed the string position
you wanna see.
Another point of view: Search with "pos" (=Delphi) through
the extracted textcontent of a page for your tags and take
the text following behind.
This function you can use for my my ideas:
 
Cheers and welcome here,
Ingo
 


Edited by Ingo - 29 Apr 13 at 7:30AM
Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 29 Apr 13 at 7:47AM
Chris,

As Ingo suggests, the easiest option to get working would be to use GetPageText(7) to get the formatted raw text and then do some string searching to find the text you need.  QPL has no such concept as "near to" or "to the right of".

Andrew.
Back to Top
chrisreed View Drop Down
Team Player
Team Player
Avatar

Joined: 29 Apr 13
Location: Australia
Status: Offline
Points: 35
Post Options Post Options   Thanks (0) Thanks(0)   Quote chrisreed Quote  Post ReplyReply Direct Link To This Post Posted: 29 Apr 13 at 12:23PM
Yes the Header text will always be in the same position, but there are other Tags further down in the report which can be anywhere.  I was hoping to search on certain key tag names that were also in BOLD format so I don't accidentally choose similar text in the main report.
 
It looks like I can use SetTextExtractionOptions to get the details of the word formats (Font, Colour, Size etc..) but I wonder if that also includes whether it is bold or not?
 
Anyway thanks for that info.  I will install the trial version and give it a go.
 
 
Back to Top
chrisreed View Drop Down
Team Player
Team Player
Avatar

Joined: 29 Apr 13
Location: Australia
Status: Offline
Points: 35
Post Options Post Options   Thanks (0) Thanks(0)   Quote chrisreed Quote  Post ReplyReply Direct Link To This Post Posted: 30 Apr 13 at 10:04AM
Well I got it partially working but I have come across a few things I don't understand:
 
If I use the function LoadFromFile to open a PDF then the HasFontResources function returns "1" (ie. PDF document has NOT been scanned in and so has readable text).
 
If I open the same file using DAOpenFile or DAOpenFileReadOnly it returns a "0" why is this?
 
 
Also is the general idea to use DA Functions only with other DA Functions....
ie. DAOpenFileReadOnly -> DASetTextExtractionOptions -> DAExtractPageText
and LoadFromFile -> SetTextExtractionOptions -> ExtractFilePageText
 
or can we mix and match between the different functions?
 
Chris
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 30 Apr 13 at 10:44AM
Hi Chris!
 
Don't mix DA- and non-DA-functions ;-)
DA-functions need less memory - so it's
good to avoid memory leaks while working
on large documents.
 
Cheers, Ingo
 
Back to Top
chrisreed View Drop Down
Team Player
Team Player
Avatar

Joined: 29 Apr 13
Location: Australia
Status: Offline
Points: 35
Post Options Post Options   Thanks (0) Thanks(0)   Quote chrisreed Quote  Post ReplyReply Direct Link To This Post Posted: 30 Apr 13 at 10:49AM
Thanks for that Ingo - any idea about the HasFontResources problem?
 
Chris
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 30 Apr 13 at 10:56AM
Hi Chris!
 
Which problem?
HasFontResources is a non-DA-function - so don't use it with DAOpen...
A DA-function begins with "DA" ;-)
 
Cheers, Ingo
 
Back to Top
chrisreed View Drop Down
Team Player
Team Player
Avatar

Joined: 29 Apr 13
Location: Australia
Status: Offline
Points: 35
Post Options Post Options   Thanks (0) Thanks(0)   Quote chrisreed Quote  Post ReplyReply Direct Link To This Post Posted: 30 Apr 13 at 11:02AM
Ah! I see, but there doesn't seem to be an equivalent DAHasFontResources function, so how can I determine if a PDF has been scanned in (ie. an image) or has been created (ie. has text)?
 
Chris
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store