Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > Sample Code
  New Posts New Posts RSS Feed - Text lines assembling in VB6
  FAQ FAQ  Forum Search   Register Register  Login Login

Text lines assembling in VB6

 Post Reply Post Reply
Author
Message
alinux View Drop Down
Team Player
Team Player


Joined: 09 Dec 08
Location: France
Status: Offline
Points: 20
Post Options Post Options   Thanks (0) Thanks(0)   Quote alinux Quote  Post ReplyReply Direct Link To This Post Topic: Text lines assembling in VB6
    Posted: 26 Nov 10 at 7:40PM
It's a basic sample of text lines assembling  from GetPageText(4) function result; the results depend on quality of scan & OCR process.
In the case of tables, the OCR engine may detect, "read" & process the tables by line or by column independent of you so you'll need a sort array function for sorting the page lines array by y coordinate of each line.


Private Function full_lines(get_page_text As String) As String

'page text lines array (0,N) - y1 or y2 word coordinate, (1,N) - line words
Dim dmp_pge() As String

ReDim dmp_pge(1, 0)

'page words array
dmp_lns = Split(get_page_text, vbCrLf)

For i = 0 To UBound(dmp_lns)
    If dmp_lns(i) <> "" Then

    'word line array
        dmp_wrd = Split(dmp_lns(i), ",")

        flag_exist = False
        For j = UBound(dmp_pge, 2) To 0 Step -1
            If dmp_wrd(4) = dmp_pge(0, j) Then

               'add next word in the same line
                If dmp_pge(1, j) <> "" Then dmp_pge(1, j) = dmp_pge(1, j) & " " & dmp_wrd(UBound(dmp_wrd)) Else dmp_pge(1, j) = dmp_wrd(UBound(dmp_wrd))
                flag_exist = True
                Exit For
            End If
            DoEvents
        Next
        If Not flag_exist Then
            If dmp_pge(1, UBound(dmp_pge, 2)) <> "" Then
                ReDim Preserve dmp_pge(1, UBound(dmp_pge, 2) + 1)
            End If

           'add y1 word(line) coordinate & first word of the new line
            dmp_pge(0, UBound(dmp_pge, 2)) = dmp_wrd(4)
            dmp_pge(1, UBound(dmp_pge, 2)) = dmp_wrd(UBound(dmp_wrd))
        End If
    End If
    DoEvents
Next

'need sort array function in the case of tables (the OCR engine may identify & "read" the table by column so you must sort the lines array by the y coordinate - see dmp_pge definition)
'sort_array dmp_pge

For i = 0 To UBound(dmp_pge, 2)
    If full_lines = "" Then full_lines = dmp_pge(1, i) Else full_lines = full_lines & vbCrLf & dmp_pge(1, i)
    DoEvents
Next
full_lines = Replace(full_lines, """", "")

End Function
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 26 Nov 10 at 8:13PM
Hi Alinux!

Thanks for this.
I think here are many users looking for this option in QuickPDF.
Your sample could be a starting point for options like "keep
the original layout in txt, too"...
Thanks for sharing with us!

Cheers, Ingo
Back to Top
Rowan View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 10 Jan 09
Status: Offline
Points: 398
Post Options Post Options   Thanks (0) Thanks(0)   Quote Rowan Quote  Post ReplyReply Direct Link To This Post Posted: 27 Nov 10 at 9:20AM
Great job.
Back to Top
alinux View Drop Down
Team Player
Team Player


Joined: 09 Dec 08
Location: France
Status: Offline
Points: 20
Post Options Post Options   Thanks (0) Thanks(0)   Quote alinux Quote  Post ReplyReply Direct Link To This Post Posted: 27 Nov 10 at 11:41AM
Thanks guys.

Cheers,
alinux
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store