Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > General Discussion
  New Posts New Posts RSS Feed - Unicode text extraction?
  FAQ FAQ  Forum Search   Register Register  Login Login

Unicode text extraction?

 Post Reply Post Reply
Author
Message
phildick View Drop Down
Beginner
Beginner


Joined: 14 Oct 09
Location: Poland
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote phildick Quote  Post ReplyReply Direct Link To This Post Topic: Unicode text extraction?
    Posted: 14 Oct 09 at 2:36PM

Welcome,

I need a pdf component for Delphi 2009 to extract text from pdf files. I installed and tested QuickPDF. I tried both Delphi 2009 and ActiveX versions and both extracted only ASCII text without any international characters (Polish in my case).

I am a little disappointed, especially because there is a note "Full Unicode support" in the feature list (http://www.quickpdflibrary.com/products/quickpdf/features.php).

Is there any way I can extract full text with all characters?

Best regards,

Bartek


Back to Top
shimax View Drop Down
Beginner
Beginner


Joined: 03 Oct 09
Location: Japan
Status: Offline
Points: 6
Post Options Post Options   Thanks (0) Thanks(0)   Quote shimax Quote  Post ReplyReply Direct Link To This Post Posted: 15 Oct 09 at 12:41AM
Hello, Bartek
 
As discussed in
 
it seems that unicode text extraction does not work well as expected.
 
In my case as well Japanese characters are not extracted at all.
 
I contacted with the support, but I have not yet got an answer for a week except they recieved my email. So I think to implement unicode support is a very diffcult task for some reasons or they are so busy for other problems or for developing new features.
 
Not only in text extraction but also in other features there seems to be many unicode-related problems in QuickPDF. Regretabbly, full unicode support is not true at least as far as the version is 7.16.
Back to Top
Wheeley View Drop Down
Senior Member
Senior Member
Avatar

Joined: 30 Oct 05
Location: United States
Status: Offline
Points: 146
Post Options Post Options   Thanks (0) Thanks(0)   Quote Wheeley Quote  Post ReplyReply Direct Link To This Post Posted: 15 Oct 09 at 1:35AM
The next release should have more support for unicode. I was told they are removing the function ToPDFUnicode. If they do that, then unicode support must be enhanced somehow.

Wheeley
Back to Top
Michel_K17 View Drop Down
Newbie
Newbie
Avatar
www.exp-systems.com

Joined: 25 Jan 03
Status: Offline
Points: 297
Post Options Post Options   Thanks (0) Thanks(0)   Quote Michel_K17 Quote  Post ReplyReply Direct Link To This Post Posted: 15 Oct 09 at 3:55AM
I have received the same assurances as well. They (Debenu) have been very good at addressing specific issues as we bring them up. On the unicode front, at least we can now save/merge PDF files with unicode characters in the path.

Support for unicode characters as part of the metadata is coming with the next beta (which is what I was waiting for). Smile

For text extraction, I don't know.

Michel
Michel
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 15 Oct 09 at 9:03AM
Hi All!

QuickPDF is a very complete and extensive library
and the unicode-support should touch nearly all modules.
So please be a bit patient. I'm pretty sure that it's only a matter of time ;-)

Cheers, Ingo
Back to Top
phildick View Drop Down
Beginner
Beginner


Joined: 14 Oct 09
Location: Poland
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote phildick Quote  Post ReplyReply Direct Link To This Post Posted: 15 Oct 09 at 9:41AM

Hi Ingo,

Maybe it is, but I installed the demo modules in my Delphi 2009 (which is fully Unicode now), and all the string parameters are declared as AnsiString, not String. Even if it's backward compatibility, which I completely understand, there could be a "wide string" version of every string routine, as it was done in Windows API years ago. BTW the last non-Unicode Windows OS was released in 2000 (Windows ME), so it's been almost ten years since.

Furthermore, I imported the ActiveX version in which all parameters are passed as WideString (so it should be fully Unicode), and it produced the same result as earlier - only ANSI characters in the extracted text.

Best regards,

Bartek

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store