Print Page | Close Window

problem in Text Extraction

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=213
Printed Date: 25 Nov 24 at 2:57AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: problem in Text Extraction
Posted By: Gerald manickam
Subject: problem in Text Extraction
Date Posted: 16 Dec 05 at 2:54AM

Hello,

 I used quick pdf to Extract the text  from a pdf file.For Some Pages It Gives the error " scan line Index out of Range".

 The Code That i Used is Here Below

Language : VB 6.0

O.S. WIn-2000 Prof.

--------------------------------------------------------
Dim data As New iSED.QuickPDF

FileHandle = data.DAOpenFile(PdfFileName, Key)
PageReference = data.DAFindPage(FileHandle, 1)
ImageListId = data.DAGetPageImageList(FileHandle, PageReference)
ImageListCount = data.DAGetImageListCount(FileHandle, ImageListId)

For idxPageCnt = 1 To data.PageCount                                                            
           data.SelectPage (idxPageCnt)                                                            
            ' The Below Line It Creates The Error and it does not Read the File content
     sFullPageArr = Split(data.GetPageText(3), Chr$(13), -1, vbTextCompare)                 
            For i = 0 To UBound(sFullPageArr) - 1
                arrsplit = Split(sFullPageArr(i), ",", -1, vbTextCompare)
                If UBound(arrsplit) > 11 Then
                    For z = 11 To UBound(arrsplit)
                        If arrsplit2 = "" Then
                                arrsplit2 = arrsplit(z)
                        Else
                                arrsplit2 = arrsplit2 & "," & arrsplit(z)
                        End If
                    Next
  end if
     next
Next

----------------------------------------------------------

Help Me Solve The Error.

With Regards,

P.Gerald Manickam

 




Replies:
Posted By: Ingo
Date Posted: 16 Dec 05 at 3:23AM
Hi!
First you're using the DA-functions and then...
Why you don't use DAExtractPageText instead of GetPageText?
Have a try.


-------------
Cheers,
Ingo



Posted By: Gerald manickam
Date Posted: 16 Dec 05 at 5:27AM

Sir,

  Now I used  DAExtractPageText().The same Error Comes Here when the option is 3. I also Used Other  arguments 1 and 2 as options to DAExtractPageText().

  The Function works . But It Returns only 2 co-ordinates. How can i Identify the left,top,right,bottom of a line.

Regards,

P.Gerald Manickam

 

  



Posted By: Ingo
Date Posted: 16 Dec 05 at 5:35AM
Hi Gerald,

you need only two coordinates 'cause you've the font height in pixel and you know the text length...?


-------------
Cheers,
Ingo



Posted By: Gerald manickam
Date Posted: 19 Dec 05 at 12:29AM

Sir,

  how can i get the font height in pixel.A sample output from a page i have given here.in This Which is Pixel value.

283.43,193.96,#FFFFFF,1.00,"BPGDPE+Aur00","According to the findings,"

Regards,

P.Gerald Manickam



Posted By: Ingo
Date Posted: 19 Dec 05 at 2:08AM
Hi Gerald!

Read in the QuickPDF-documentation in the chapter "Fonts".
You can use "FontSize"...

Good luck,
Ingo


-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk