Print Page | Close Window

GetPageText trunc the text with unicode char 65533

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2007
Printed Date: 29 Sep 24 at 1:15AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: GetPageText trunc the text with unicode char 65533
Posted By: bart_bender
Subject: GetPageText trunc the text with unicode char 65533
Date Posted: 20 Oct 11 at 2:39PM
Hello,
I'm extracting the Text  from a PDF document with GetPageText method and this return a truncated text
http://demos.sdm.es/marketing/images/doc.pdf - http://demos.sdm.es/marketing/images/doc.pdf

The last char is 65533 unicode o 63 in ascii

I'm using the DLL version in a vb .net proyect

any idea?

Thanks in advance
Best regards



Replies:
Posted By: Ingo
Date Posted: 20 Oct 11 at 10:30PM
Hi Bart!

Where's the problem?
I've extracted your form completely and it's all okay.
I've removed the adress ...
I've done it with version 7.26:

  FORMA DE PAGO: TRANSFERENCIA
  REF. DE PAGO :                          REF.:        /001164/756661 NIF:A23453445
     9364651          444,24
     8 ESTERIL PL.1,5             160   1,870       299,20 0,967            144,48
        PUNTO VERDE x 100               1,147         1,84                    1,84
   349 FRESA RACION PAK 6           4   5,180        20,72 2,608             10,29
        PUNTO VERDE x 100               3,421         0,14                    0,14
   658 CACAOLAT SYS RAC 6          24   5,180       124,32 2,608             61,73
        PUNTO VERDE x 100               3,421         0,82                    0,82
   TOTAL DE PUNTO VERDE ***         2,80 ***
                                   146,32               72,98
                      04,00%         5,85 07,00%         5,11
             447,04          227,74                           10,96        230,26
     GIRONA                                                           CONTRAVALOR
     CR NACIONAL II KM 7O8,2                                             EN PTA.
     P.P.                                                                 38312



Posted By: bart_bender
Date Posted: 21 Oct 11 at 12:49PM
Hello Ingo,

Thanks for your answer.

I'm using QuickPDFDLL0811.dll
I got this result with the argument 0

  EMISOR: GIRONA CR NACIONAL II KM 7O8,2 17181 AIGUAVIVA
  TEL.:972478000                                                      PAG.:   1
                                          YOIGO
  FECHA FACTURA: 02-01-2009               NOMBRE RAZON SOCIAL
  NUM. FACTURA : (  )  0075123534         DIRECCION
  P. SUMINISTRO: 02-01-09                 08006 BARCELONA
  FECHA DE PAGO: 05-04-09                 BARCELONA
  FORMA DE PAGO: TRANSFERENCIA
  REF. DE PAGO :                          REF.:        /001164/756661 NIF:A23453445

     9364651          444,24

     8 ESTERIL PL.1,5             160   1,870       299,20 0,967            144,48
        PUNTO VERDE x 100               1,147         1,84                    1,84
   349 FRESA RACION PAK 6           4   5,180    





Posted By: AndrewC
Date Posted: 24 Oct 11 at 8:03AM
There was a truncating bug in 8.11 that has been fixed in 8.12.  The text is extracting correctly in 8.12 using GetPageText 0, 3 and 7.  8.12 is a free upgrade for owners of 8.11.  The bug was caused by the recent major Unicode changes to the 8.11 version from 7.26.

Interestingly your PDF is 705mm wide by 998mm high which seems a big big for invoice.  This currently affects the output of 8.12 with option 7.

Andrew.


Posted By: bart_bender
Date Posted: 24 Oct 11 at 2:19PM
Thanks for your help Andrew


Posted By: bart_bender
Date Posted: 24 Oct 11 at 3:07PM
Hello Andrew

I'm testing the new version and i got the text truncated with the 8.12 equals 8.11

The Size that return dll.QuickPDFStringResultLength(instanceID) is half that real text
In the 8.12 version is necesary to change two lines in the RC method for the Visual Basic Class

  Private Function SR(ByVal data As IntPtr) As String
1-->>>   Dim size As Integer = dll.QuickPDFStringResultLength(instanceID) <<<--
            Dim result As Byte() = New Byte(size - 1) {}
            Marshal.Copy(data, result, 0, size)
2-->>>   Return Encoding.Default.GetString(result) <<<--
        End Function

1 ->>  Dim size As Integer = dll.QuickPDFStringResultLength(instanceID) * 2
2 ->>  Return Encoding.Unicode.GetString(result)

Best Regards






Posted By: AndrewC
Date Posted: 26 Oct 11 at 3:25AM
Thanks for the fixes.  I will test it and then get these changes into the VB code.  I was doing all my testing with C# and it is working correctly.

The C# code for your reference is

        private string SR(IntPtr data)
        {
            int size = dll.QuickPDFStringResultLength(instanceID);
            byte[] result = new byte[size * 2];
            Marshal.Copy(data, result, 0, size * 2);
            return Encoding.Unicode.GetString(result);
        }

so in VB it probably should be

        Private Function SR(ByVal data as IntPtr) As String
            Dim size As Integer = dll.QuickPDFStringResultLength(instanceID)
            Dim result As Byte() =  New Byte(size * 2 - 1)  {}   // as per post below
            Marshal.Copy(data, result, 0, size * 2)
            Return Encoding.Unicode.GetString(result)
        End Function



Andrew.


Posted By: bart_bender
Date Posted: 26 Oct 11 at 10:54AM
Hello,
Change the code line Dim result As Byte() =  New Byte(size * 2)  {}  to Dim result As Byte() =  New Byte(size * 2 - 1)  {}





Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk