Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Extract text from PDF with Layout. |
Post Reply |
Author | |
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
Posted: 02 Apr 08 at 9:48AM |
Hi everybody !
I'm working with Visual Basic 6. And my goal, now, is to extract the text from a PDF file to import in a Oracle DB. I've found an OCX that give me the entire text in a string variable. But without separate data like the file. My test file contain value placed in columns. And I need to have these values separated by a semi-column for example. So, I would like to know if your library can permit to do this ? Thanks in advance P.S: I can send you test file if you need to understand ( I don't know if my explanation is clear .. ) |
|
chicks
Debenu Quick PDF Library Expert Joined: 29 Oct 05 Location: United States Status: Offline Points: 251 |
Post Options
Thanks(0)
|
Your best bet is probably pdftohtml. Its XML output option provides positional information. You can then do an XSL transform to get the data into your final format. It's worked well for me in the past.
|
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Hi,
Thanks you for you answer! My goal is to have the content of the PDF in my app, in a variable to treat it. If I'll can, I'll prefer to don't use the file convertion. But I keep your solution as last solution. |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi!
I'm wondering... Perhaps i don't understand but... Why not use the textextract-functions from QuickPDF? They are working page by page - so you can get the textcontent of each page. With option 3 you can get the single textstrings from each page with additional data like position on the page, font, color, ... I can't imagine that you need more ;-) Best regards, Ingo |
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Hi Ingo,
Yes, my question is just to know if QuickPDF can extract the text from a PDF having columns, and return me text separated following the PDF layout... I've check the iSEDQuickPDF 5.11 Reference Guide.pdf, and see 3 functions : - GetPageLayout - SetPageLayout - ExtractFilePageContent I've see that there are vb6 examples code. I hope that one of theses help me. |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi!
GetPageMode and GetPageLayout only retrieve what you're seeing opening a document... This won't help you. For textextraction you can use this functions: DAExtractPageText ExtractFilePageText GetPageText With option "3" you'll get csv-strings with position data (in pixel) and much more. With these data you can rebuild your pdf-layout as a textfile. Best regards, Ingo |
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Re-Hi ;)
Thank you very much for your help Ingo ! I'm trying methodes that you send me. But is it possible that I upload my test file to show you my exact need ? Thank you again for your help ! Edit : I've try your function ( DAExtractPageText with DAOpenFile and DAFindPage ) to try to get the PDF text, and with the option 3, and I get a string like that :
But all text fields ( at the end of lines ) contain spaces or not, but no values of my file... Can you help me ? Edited by devMan - 03 Apr 08 at 9:35AM |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
ingo [dot] schmoekel [at] ewetel [dot] net
|
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Email sent !
Thank you !! |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi!
I've get the same result... No content! How the pdf was created? Is it only scanned? Anyway i've tested more than one function - QP can't extract in this case. Sorry. Best regards, Ingo |
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Hello,
Oula .... my pdf test is a part of an other pdf file... And I think that the person who have split the file have create a bad file... ( With an other OCX, it return me strings and numbers not displayed in the file.... Ok, I try with a new test file ! Edit : I've take a new file to test and now QuickPDF york very fine !! It's exactly what I'm searching !! It parse each part of my PDF as fields with the option 3 in the methode DAExtractPageText() ! And with the position of fields, I'll can use it to select an aera in the file... So, I think my compagny will buy a liscence of your ActiveX ! Edited by devMan - 04 Apr 08 at 2:36AM |
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Can you tell me where can I found conditions to purchase a license and all other informations about QuickZip please ?
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
I don't know where you can get "QuickZip" :-)
Perhaps you mean "QuickPDF" ;-) Have a look here: http://www.quickpdf.org/forum/forum_posts.asp?TID=698 Best regards, Ingo Edited by Ingo - 04 Apr 08 at 5:54AM |
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Oops
|
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Last question (normally) :
We are a company of 120 users, and in my team, we are 5 developers. We need to by 1 license for everyone, or more ? And after that, last technical question : If we scan a paper, with a standard scanner, is there possibility that the quickpdf don't extract the text correctly ? Have you some recommendations ? Thanks you for all of your support ! Edited by devMan - 04 Apr 08 at 7:57AM |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Normally a scanned text will be an image later...
There are less scanner who can do scanning in an ocr-mode... Then you can do textextraction, too. For your company you need one Enterprise-license... Best regards, Ingo |
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
And how many cost this Enterprise-license?
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Sorry... The correct version-name is "Site License". It's with source. If you have it you can send me the invoice or one of the smallest file from the source package and then you'll get a password for the source section to get the latest version. Please keep in mind: We're doing this here 'cause we like to help... we get nothing and we want nothing... one for all and all for one ;-)
We've nothing to do with the iSED-team. It still sells these old version 5.11. Here many talented "pdf-artists" had pushed the version number up to 6.02... and that's not the end.
Best regards,
Ingo
|
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
OK If I've understand, the iSEQ team sell only the licence for the v5.11 of QuickPDF, and you and your team, you're developing the new versions. If it's the case, if we want to use your (better) version of QuickPDF, should have we something to pay ?? Or we must buy a licence for the 5.11 version on the iSEQ site ? |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi!
Buy a iSed-site-license ... and send me a copy of the invoice as pdf or one of the smallest source-file. Then you'll get access to our last version... and you have to pay nothing. Best regards, Ingo |
|
devMan
Beginner Joined: 02 Apr 08 Location: Luxembourg Status: Offline Points: 14 |
Post Options
Thanks(0)
|
Ok thank you.
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store