Print Page | Close Window

How to get internal structure of the PDF file?

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: General Discussion
Forum Description: Discussion board for Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2095
Printed Date: 22 Nov 24 at 7:27PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: How to get internal structure of the PDF file?
Posted By: saravanan6
Subject: How to get internal structure of the PDF file?
Date Posted: 10 Jan 12 at 5:36AM
Hi All,

    I would like to know if there is any tool available for getting internal structure(XML BASED) of the PDF file likewise Open XML representation for MS-OFFICE 2007?

Please enlighten me on this...?


Thanks & Regards,
P.SARAVANAN




Replies:
Posted By: Ingo
Date Posted: 10 Jan 12 at 7:46AM
Hi!
 
You can create such a tool WITH QuickPDF but QP doesn't offer
a ready made functionality for this purpose.
You should try PDFCosEdit:
http://www.pdftron.com/pdfcosedit/ - http://www.pdftron.com/pdfcosedit/
The demo mode already allows to browse through the pdf-objects.
Please keep in mind that pdf is for presentation and so the objects
are one after the other and not well structured.
 
Cheers and welcome here,
Ingo
 


Posted By: edvoigt
Date Posted: 10 Jan 12 at 9:08AM
Hi,

the structures inside PDF (directories, arrays, objects) are at most no xml. Only some parts (XMP, XFA) are XML. To investigate a PDF is rather complicated, because the structure is only in generell tree-like. On the other hand it is possible (and saves resources) to have more than one relation to objects form different places in the PDF.  So the first idea, to represent a PDF-structure by a tree is not showing the reality. In truth it is a graph and therefore it is not possible without transforming (doubling/inherit parts) the structure to get it in xml.

So I think, there is no such tool, as you want it - without you make it. It would be possible with QuickPDF (GetPageContent..., GetObject...), but it is more than a half-hour-job, depending on how deep you want to look inside.

To see internals there is a dialog based tool for free, the enfocus-browser:
quotation from http://www.enfocus.com/product.php?id=4530 - http://www.enfocus.com/product.php?id=4530 :
To the knowledgeable user, the Enfocus Browser offers functionality to get information on all types of objects (Dictionaries, Arrays, Streams, ...) in the PDF. Starting with the "Info" and "Root" dictionaries, the application resolves all indirect object references while you dig deeper into the data structure. If enabled, relevant parts of the data from a PDF file can even be altered, offering very low-level editing capabilities.

Another program gives you a look inside the PDF (but with a different goal) is there to find: http://blog.zeltser.com/post/3235995383/pdf-stream-dumper-malicious-file-analysis - http://blog.zeltser.com/post/3235995383/pdf-stream-dumper-malicious-file-analysis . It is made to look deeper in, but in dialog too.

In hope it helps,

Werner



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk