Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
DAExtractPageText problem |
Post Reply |
Author | |||
dpreznik
Beginner Joined: 03 Dec 10 Status: Offline Points: 6 |
Post Options
Thanks(0)
Posted: 03 Dec 10 at 5:53PM |
||
Dear experts,
I am trying to create an application in C# to extract text from pdf. I am using DAExtractPageText() method. But the text returned by this method is distorted. Some characters are missing, and blank spaces are inserted here and there within words.
Could you please tell me if it is possible to fix it?
Thank you very much,
Dmitriy
|
|||
Paddy
Beginner Joined: 24 Mar 10 Status: Offline Points: 8 |
Post Options
Thanks(0)
|
||
Are you using the DLL edition or the ActiveX edition? And also, does your PDF contain any Unicode characters?
|
|||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
||
Hi Dmitriy!
Try option "0" ... The same or is it better? Generally you can say that extraction works like the textcontent was inserted. First in first out. If the first word on a page is "ello" and at the end of the page you see this and insert a "H" before the "ello", while extraction the "H" was extracted at the end of the page-content. With option "4" you can extract word by word with position-data. Regarding these position data you can contain the real textrows by your own. There's no support by QuickPDF. BTW: A small warning... Don't mix DA-functions with non-DA-functions - this won't work ;-) Cheers and welcome here, Ingo |
|||
dpreznik
Beginner Joined: 03 Dec 10 Status: Offline Points: 6 |
Post Options
Thanks(0)
|
||
I am using DLL edition. I am not sure if my PDF contains Unicode characters.
|
|||
dpreznik
Beginner Joined: 03 Dec 10 Status: Offline Points: 6 |
Post Options
Thanks(0)
|
||
Hi Ingo,
Thank you for your answer. No, it is not better.
Probably that is what happened to me. And I think there is no solution for it that I could use.
Thank you very much for the warning.
May I ask one more question?
I found Quick PDF Lite. Would it support extracting images from a PDF document? I tried it, but don't know yet how to apply those methods, that are different from the Professional Quick PDF.
I would use it with C++.
Thank you very much.
Dmitriy
|
|||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
||
Hi Dmitriy!
You can only extract images you had inserted in the same session.
No chance on other documents.
Cheers, Ingo
|
|||
dpreznik
Beginner Joined: 03 Dec 10 Status: Offline Points: 6 |
Post Options
Thanks(0)
|
||
Thank you very much for your answer.< id="gwProxy" ="">< ="ifofjsCall==''jsCall;elsesetTimeout'jsCall',500;" id="jsProxy" ="">
|
|||
Giuseppe
Beginner Joined: 19 Nov 10 Location: Italy Status: Offline Points: 10 |
Post Options
Thanks(0)
|
||
hi, the algoritm is corrupted, you must use a work around, set deltax and deltay and remake the words...
|
|||
billycl
Beginner Joined: 23 Feb 11 Status: Offline Points: 1 |
Post Options
Thanks(0)
|
||
DAExtractPageText with Options=4 return
TQuickPDF0723.AddArcToPath(CenterX, as 1 word I think now only space character is delimiter Is it possible (in future) to define more delimiters "(),.:-" I see this result tquickpdf0723. addarctopath( centerx, in software which use Adobe Acrobat Pro (acrobat = slow) |
|||
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store