Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Tip for extracting unicode text from PDF files |
Post Reply |
Author | |
Rowan
Moderator Group Joined: 10 Jan 09 Status: Offline Points: 398 |
Post Options
Thanks(0)
Posted: 29 Nov 10 at 1:12PM |
(I'll preface this tip by saying that we're going to make a change in 2011 that will render this tip out dated pretty quick, but in the mean time, this should help people who are experiencing trouble extracting unicode text).
For all of the text extraction functions there is a sentence that often gets ignored: "The result is encoded using UTF-8 in the Delphi and DLL editions of the library." I've forgotten about this several times myself. The strings returned by the DLL and Delphi editions are UTF-8 strings and as such need to be decoded before you will see the unicode characters. The reason this issue is so easy to overlook is that if there are no unicode characters in your string them the GetPageText will appear to function completely normally. If you're using Delphi, then you can decode the UTF8 string with some code like this: var QP: TQuickPDF; S: AnsiString; FS: TFileStream; UTF8BOM: AnsiString; begin QP := TQuickPDF.Create; try QP.UnlockKey(' license key here '); QP.LoadFromFile('license.pdf'); S := QP.GetPageText(0); FS := TFileStream.Create('license.txt', fmCreate); UTF8BOM := #$EF#$BB#$BF; FS.Write(UTF8BOM[1], Length(UTF8BOM)); if Length(S) > 0 then FS.Write(S[1], Length(S)); FS.Free; finally QP.Free; end; end; In a future version of Quick PDF (maybe 7.24) we will move away from 8-bit strings so that the Delphi, DLL and ActiveX editions all use 16-bit strings... this should help avoid a lot of the confusion. Originally there was only 8-bit strings in QPL. The current situation is a compromise - most of the functions still use 8-bit strings. Some of the functions return Unicode strings. For the ActiveX these are UTF-16 strings. But for the Delphi and DLL we had to keep compatibility so the solution was to use UTF-8 strings. Sorry for any confusion that this has caused any of you. Cheers, - Rowan.
|
|
hbarclay
Team Player Joined: 29 Oct 05 Location: United States Status: Offline Points: 39 |
Post Options
Thanks(0)
|
I assume there will be overloaded functions for Delphi so the existing code using ansistrings will continue to work and people can make the change to wide strings when they are ready. Thanks Harry |
|
Rowan
Moderator Group Joined: 10 Jan 09 Status: Offline Points: 398 |
Post Options
Thanks(0)
|
Hi Harry,
Yes, we always try to maintain backwards compatibility, so we'll do our best to not make anyones life difficult in upgrading to the new version when we made the change to wide strings. Cheers, - Rowan.
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store