Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - DAExtractPageText losing characters
  FAQ FAQ  Forum Search   Register Register  Login Login

DAExtractPageText losing characters

 Post Reply Post Reply
Author
Message
Mike4ql View Drop Down
Beginner
Beginner


Joined: 26 Jul 10
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote Mike4ql Quote  Post ReplyReply Direct Link To This Post Topic: DAExtractPageText losing characters
    Posted: 08 Oct 10 at 11:39AM

I am trying to extract the text from a PDF and most of it works fine but occasionally letters are missed in the extract.    This appears to be because the PDF is using octal codes for the characters.

This is the text which should be produced and is rendered correctly by DARenderPageToString:
Top Line ->  6 fyodor dostoyevsky
Space ->
Next Line -> flowers in a stuffy city apartment, but because everybody is

Here is the command extract for this same section

BT
0 0 0 1 k
/GS0 gs
/T1_0 1 Tf
8.25 0 0 8.25 262.7389 564.0571 Tm
[(\036)-100(\035)-55(\034)-100(\033)-100(\034)-100(\032)-100( )-100(\033)-100(\034)-100(\031)-100(\030)-82(\034)-45(\035)-100(\027)-100(\026)-100(\031)-100(\025)-100(\035)]TJ
8.5 0 0 8.5 83.3622 564.0571 Tm
(\f)Tj
10.5104 0 0 10.25 83.3622 543.058 Tm
[(\023)10(o)10(w)10(e)10(r)10(s)10( )-125(i)10(n)10( )-126(a)10( )-125(s)10(t)10(u)10(f)10(f)10(y)10( )-125(c)10(i)10(t)10(y)10( )-126(a)10(p)10(a)10(r)10(t)10(m)10(e)10(n)10(t)10(,)47( )-126(b)10(u)10(t)10( )-125(b)10(e)10(c)10(a)10(u)10(s)10(e)10( )-125(e)10(v)10(e)10(r)10(y)10(b)10(o)10(d)10(y)10( )-125(i)10(s )]TJ

The DAExtractPageText (option 3) returns 2 lines with an empty string and a space (or perhaps 2) for the Top Line and misses out the "fl" from the begining of the Next Line.

Is there any way I can correct this?

Back to Top
Mike4ql View Drop Down
Beginner
Beginner


Joined: 26 Jul 10
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote Mike4ql Quote  Post ReplyReply Direct Link To This Post Posted: 12 Oct 10 at 7:15PM
Has nobody else seen this?  
 
It seems to be a fundamental flaw preventing anyone from using PDF Quick to extract text from a PDF.
 
I would be grateful for any suggestions.
 
Mike
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store