Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Text lines assembling in VB6 |
Post Reply |
Author | |
alinux
Team Player Joined: 09 Dec 08 Location: France Status: Offline Points: 20 |
Post Options
Thanks(0)
Posted: 26 Nov 10 at 7:40PM |
It's a basic sample of text lines assembling from GetPageText(4) function result; the results depend on quality of scan & OCR process.
In the case of tables, the OCR engine may detect, "read" & process the tables by line or by column independent of you so you'll need a sort array function for sorting the page lines array by y coordinate of each line. Private Function full_lines(get_page_text As String) As String 'page text lines array (0,N) - y1 or y2 word coordinate, (1,N) - line words Dim dmp_pge() As String ReDim dmp_pge(1, 0) 'page words array dmp_lns = Split(get_page_text, vbCrLf) For i = 0 To UBound(dmp_lns) If dmp_lns(i) <> "" Then 'word line array dmp_wrd = Split(dmp_lns(i), ",") flag_exist = False For j = UBound(dmp_pge, 2) To 0 Step -1 If dmp_wrd(4) = dmp_pge(0, j) Then 'add next word in the same line If dmp_pge(1, j) <> "" Then dmp_pge(1, j) = dmp_pge(1, j) & " " & dmp_wrd(UBound(dmp_wrd)) Else dmp_pge(1, j) = dmp_wrd(UBound(dmp_wrd)) flag_exist = True Exit For End If DoEvents Next If Not flag_exist Then If dmp_pge(1, UBound(dmp_pge, 2)) <> "" Then ReDim Preserve dmp_pge(1, UBound(dmp_pge, 2) + 1) End If 'add y1 word(line) coordinate & first word of the new line dmp_pge(0, UBound(dmp_pge, 2)) = dmp_wrd(4) dmp_pge(1, UBound(dmp_pge, 2)) = dmp_wrd(UBound(dmp_wrd)) End If End If DoEvents Next 'need sort array function in the case of tables (the OCR engine may identify & "read" the table by column so you must sort the lines array by the y coordinate - see dmp_pge definition) 'sort_array dmp_pge For i = 0 To UBound(dmp_pge, 2) If full_lines = "" Then full_lines = dmp_pge(1, i) Else full_lines = full_lines & vbCrLf & dmp_pge(1, i) DoEvents Next full_lines = Replace(full_lines, """", "") End Function |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Alinux!
Thanks for this. I think here are many users looking for this option in QuickPDF. Your sample could be a starting point for options like "keep the original layout in txt, too"... Thanks for sharing with us! Cheers, Ingo |
|
Rowan
Moderator Group Joined: 10 Jan 09 Status: Offline Points: 398 |
Post Options
Thanks(0)
|
Great job.
|
|
alinux
Team Player Joined: 09 Dec 08 Location: France Status: Offline Points: 20 |
Post Options
Thanks(0)
|
Thanks guys.
Cheers, |
|
alinux
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store