Print Page | Close Window

Word’s bounds

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=565
Printed Date: 22 Nov 24 at 6:49PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Word’s bounds
Posted By: Dmitry
Subject: Word’s bounds
Date Posted: 17 Nov 06 at 5:28AM
Hi!
Please tell me how to define bounds(in pixels) of exact word.

For example.
I could get the bounds of exact line using function GetPageText(4):

"PDMFNG+CMR10", {Font}
#000000, {Color}
10.90, {Font Size}
72.0009,628.7132,512.1819,628.7132,512.1819,639.6223,72.0009,639.6223, {Bounds}
"Although there is a long history of empirical research into the deformation behaviour of" {Text line}

But how can I define the bounds of word 'history' for example?



Replies:
Posted By: ukobsa
Date Posted: 17 Nov 06 at 10:22AM
Hi Dmitry,

As far as I know: If GetPageText cannot split the text to single words then you can't. Maybe you can make the pdf available so that someone can debug it in source to see, why it can't splitt into single words and if there is a solution.

best regards,
Uli


Posted By: swb1
Date Posted: 17 Nov 06 at 10:43AM

The PDF Specification allows for a virtually unlimited number of ways in which text can be laid out in a document all the way from one character at time to a paragraph wrapped down an entire page. This means that is nearly impossible to know for certain where a word is or that it even is a word using text extraction methods.

 

The easiest solution to your problem lies in controlling how the document is created. If you have no control over that you are probably out of luck.



Posted By: Dmitry
Date Posted: 18 Nov 06 at 12:13AM
Thanks to all. But what will say Ingo?


Posted By: Dmitry
Date Posted: 18 Nov 06 at 4:32AM
BTW, I just develope the next algorithm


ASentence := Text; //Sentence
AWord := ErrorWord; // The word that needs to be underlined
WPos := Pos(AWord, ASentence);
BefWord := Copy(ASentence, 1, WPos-1);
AfWord := Copy(ASentence, 1, WPos+Length(AWord));

BM:= TBitmap.Create; // Creating bitmap
BM.Canvas.Font.Name:=FontName;
BM.Canvas.Font.Size:=FontSize;

WordBeginPos:=(BM.Canvas.TextWidth(BefWord));
WordEndPos:=(BM.Canvas.TextWidth(AfWord));
Afword:=IntToStr(WordEndPos);

LTop := LineBounds.LeftTop.Y;
LLeft := LineBounds.LeftTop.X+4;

Qp.SetLineColor(1,0,0); // red
Qp.DrawLine(LLeft+WordBeginPos,LTop,LLeft+WordEndPos,LTop);


But this method defines the wrong begin and end positions of the AWord.


Posted By: Ingo
Date Posted: 18 Nov 06 at 2:19PM
Hi Dimitry!

I don't know much enough about pdf ;-)
My thoughts are: You have the beginning point (pixel) of the string and you have the ending point (pixel) of the string... and you can get the length of the string... i think with a bit maths it should be possible to get the position of a special word you're searching for...?

Best regards,
Ingo



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk