Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - DASetTextExtractionArea with different origin
  FAQ FAQ  Forum Search   Register Register  Login Login

DASetTextExtractionArea with different origin

 Post Reply Post Reply
Author
Message
Cirunz View Drop Down
Beginner
Beginner
Avatar

Joined: 18 Mar 14
Location: Rome
Status: Offline
Points: 3
Post Options Post Options   Thanks (0) Thanks(0)   Quote Cirunz Quote  Post ReplyReply Direct Link To This Post Topic: DASetTextExtractionArea with different origin
    Posted: 18 Mar 14 at 2:32PM
Hi, I'm trying to extract text in a specific area, on a large number of pdf files.
My first approach is to loop for every file, open the file, select the page and proceed to extract the text with GetPageText:
//Code to initialize dll reference DPDF
int i = 0;
int mode = 7;
List<string> foundlines = new List<string>();
for (; i < pdffiles.Length; i++)
{
if (DPDF.LoadFromFile(pdffiles, "") != 0)
{
if (DPDF.SelectPage(1) != 0)//I'm always searching in the first page
{
DPDF.SetMeasurementUnits(1);//Millimeters
DPDF.SetOrigin(1);//Left-Top margin

//field contains extraction area data
if (DPDF.SetTextExtractionArea(field.Left, field.Top, field.Width, field.Height) == 1)
{
foundlines.Add(DPDF.GetPageText(mode).ToString().Trim());
}
DPDF.RemoveDocument(DPDF.SelectedDocument());
}
else
{
errormessage = "SelectPage: " + pdffiles;
break;
}
}
else
{
errormessage = "LoadFromFile: " + pdffiles;
break;
}
}//Extraction cycle end here

if (string.IsNullOrEmpty(errormessage))
{
if (foundlines != null && foundlines.Count > 0)
{
File.WriteAllLines(@"C:\resultlines.txt", foundlines.ToArray());
result = true;
}
}

It works fine, but it's not very fast, and it uses lot of memory.
Worried by this results, I choosed to give a try to the ExtractFilePageText, so to keep low CPU and memory occupation.
So I've changed the above cycle in this way:
int i = 0;
int mode = 7;
List<string> foundlines = new List<string>();
DPDF.SetMeasurementUnits(1);//Millimeters
DPDF.SetOrigin(1);//Left-Top margin
for (; i < pdffiles.Length; i++)
{
//field contains extraction area data
if (DPDF.DASetTextExtractionArea(field.Left, field.Top, field.Width, field.Height) == 1)
{
foundlines.Add(DPDF.ExtractFilePageText(pdffiles, "", 1, mode).ToString().Trim());
}
}//Extraction cycle end here

if (foundlines != null && foundlines.Count > 0)
{
File.WriteAllLines(@"C:\resultlines.txt", foundlines.ToArray());
result = true;
}

This does not find anything.
There is a simple explanation for this: Documentation says DASetTextExtractionArea is relative to the bottom left corner of the page, and do no mention a way to make the SetOrigin (or the SetMeasurementUnits), affect this function.

There is not a way to do so? The ExtractFilePageText can be only used with the default origin?

Thank you.

Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 19 Mar 14 at 10:58AM
Cirunz,
Yes.  It is a complex thing to explain.  The DA functions do not support the SetOrigin function as SetOrigin is not a DA supported functions.  You cannot normally mix DA and non DA functions as they use different functions to process the file. The exception to this rule are that most of the Extract* functions do use the DA code and and not the non DA functions.

You need to adjust the Y position by calling 

YPos := QP.DAGetPageHeight(dahandle, dapageref) - YPos;

Andrew.
Back to Top
Cirunz View Drop Down
Beginner
Beginner
Avatar

Joined: 18 Mar 14
Location: Rome
Status: Offline
Points: 3
Post Options Post Options   Thanks (0) Thanks(0)   Quote Cirunz Quote  Post ReplyReply Direct Link To This Post Posted: 19 Mar 14 at 11:03AM
Originally posted by AndrewC AndrewC wrote:

You need to adjust the Y position by calling 

YPos := QP.DAGetPageHeight(dahandle, dapageref) - YPos;

Andrew.

Thank you Andrew, this is really helpfull.
I have a mixed scenario, so I will use this function to adjust the coordinates, depending on the case.

Thanks again.
Fabio.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store