Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Extract text next to a Tag field |
Post Reply |
Author | |
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
Posted: 29 Apr 13 at 6:42AM |
I am looking to trial QuickPDF to see if it can extract text values from a medical report as follows:
Exam Date: 23/11/2011 DOB: 26/03/1966 MRN: C1234567
Referring Dr: A, Smith Sonographer: G Perry
etc....
The idea is to locate certain predefined Tag Fields in bold and then read the text value next to them.
eg. I would use some QuickPDF function to search for the Tag Exam Date: and then read in the text value right next to this (23/11/2011).
Is this something that QuickPDF can do and what function would I call?
Thanks Chris.
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Chris!
This looks as if the "tag-text" is always at the same place. In this case you can use the extract functionalities. They offer an additional option to extract text with position data. So it's possible for you to determine detailed the string position you wanna see. Another point of view: Search with "pos" (=Delphi) through the extracted textcontent of a page for your tags and take the text following behind. This function you can use for my my ideas: Cheers and welcome here, Ingo Edited by Ingo - 29 Apr 13 at 7:30AM |
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
Chris, Andrew.
|
|
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
|
Yes the Header text will always be in the same position, but there are other Tags further down in the report which can be anywhere. I was hoping to search on certain key tag names that were also in BOLD format so I don't accidentally choose similar text in the main report.
It looks like I can use SetTextExtractionOptions to get the details of the word formats (Font, Colour, Size etc..) but I wonder if that also includes whether it is bold or not?
Anyway thanks for that info. I will install the trial version and give it a go.
|
|
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
|
Well I got it partially working but I have come across a few things I don't understand:
If I use the function LoadFromFile to open a PDF then the HasFontResources function returns "1" (ie. PDF document has NOT been scanned in and so has readable text).
If I open the same file using DAOpenFile or DAOpenFileReadOnly it returns a "0" why is this?
Also is the general idea to use DA Functions only with other DA Functions....
ie. DAOpenFileReadOnly -> DASetTextExtractionOptions -> DAExtractPageText
and LoadFromFile -> SetTextExtractionOptions -> ExtractFilePageText
or can we mix and match between the different functions?
Chris
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Chris!
Don't mix DA- and non-DA-functions ;-) DA-functions need less memory - so it's good to avoid memory leaks while working on large documents. Cheers, Ingo |
|
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
|
Thanks for that Ingo - any idea about the HasFontResources problem?
Chris
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Chris!
Which problem? HasFontResources is a non-DA-function - so don't use it with DAOpen... A DA-function begins with "DA" ;-) Cheers, Ingo |
|
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
|
Ah! I see, but there doesn't seem to be an equivalent DAHasFontResources function, so how can I determine if a PDF has been scanned in (ie. an image) or has been created (ie. has text)?
Chris
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store