Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Exported images >> original file size?
  FAQ FAQ  Forum Search   Register Register  Login Login

Exported images >> original file size?

 Post Reply Post Reply
Author
Message
Dave View Drop Down
Beginner
Beginner


Joined: 15 Feb 12
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dave Quote  Post ReplyReply Direct Link To This Post Topic: Exported images >> original file size?
    Posted: 15 Feb 12 at 6:27PM
Hi all,

C# user here (I'm using the dll).
I've had a good look around and I can't see anything that describes this problem. If there is, feel free to point me in the right direction!

My PDF's should only have one image on each page (the scanner vendor's app. makes image-only PDFs) and I need to extract the image in order to make some changes to it. It gets written to a new PDF much later in the process.

I'm using SaveImageDataToFile but my testing PDF, a two-page file of 43Kb, is exporting 11Mb images per page.

Interestingly, I created my test PDF with the same library...I know my original source image was a svelte 27Kb G4 TIFF!

Is there a way of exporting the image using anything like the same compression the PDF format itself must use? 
My alternative is to extract the resolution and dimensional data and use some third-party library (tifflib port for C#, anyone?) to compress my images into a more manageable (network-friendly) size.

Any pointers or ideas most welcome.

Thanks!
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 7:38PM
Hi Dave!

The original images were inserted with the
original image properties shown as a snapshot
into the pdf-page. xtracting the image with the
original properties build the original image
again - so it's bigger?
You should use the RenderPage-functions
dealing with the dpi-values... this should
result in smaller files.

Cheers and welcome here,
Ingo



Edited by Ingo - 15 Feb 12 at 7:41PM
Back to Top
Dave View Drop Down
Beginner
Beginner


Joined: 15 Feb 12
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dave Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 8:29PM
Hi Ingo and thanks for the welcome!

Good thinking - I didn't even see the GDI+ functions in there - but I've tried them and I'm still getting files that are well in excess of the original PDF sizes. My guess is that the GDI engine is using LZW compression for TIFF (and I don't blame it: without knowing the color depth, it's the safest 'small' option).
The GDI engine is also clipping the page slightly - printing margins, perhaps?

Hmm... the search goes on ;) 
Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 8:49PM
Hi Dave,

with 43KB is your PDF indeed rather small, so that images inside would (I guess) be high compressable and have high resolution too.

For easy rendering look at this and use the right box (cropbox or trimbox at most)
http://www.quickpdf.org/forum/size-reduction-and-dpi_topic2146.html

The rendering-idea is no solution, if you need the images in original dimensions.

Cheers,
Werner
Back to Top
Dave View Drop Down
Beginner
Beginner


Joined: 15 Feb 12
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dave Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 12 at 9:04PM
My test image is black and white - most of the guys here will scan in 1-bit - so the images are small. But: I will have to allow for the occasional 32-bit color image as well.
I wonder:....if the scanning app. makes jpg files when it is asked for color? Now that would make my life easy! I'll check!
Thanks for your comment, Werner. I think you're right that rendering won't work!

Best, (MfG Werner)
Dave 
Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 16 Feb 12 at 9:56AM
Hi,

if we calculate without compression and headerinformations, one pixel in a b/w-image uses one bit. But in a fullcolor-format every pixel has four bytes. This is a (uncompressed) factor of 8*32=256! So 43KB grows up to approx. 11MB. So it's clear where the size comes.

But the question is why.

You should try to make your tests around getting your knowledge about the image inside. You may ask for imagetype, resolution and sizes. I guess, that QuickPDF is thinking, the embedded image is a jpg, but I guess only...

May be that the scanning app. is drawing the image into the PDF and wrong saying it is color?

You may figure out something if you make a list of all image properties QuickPDF is giving. Then this is to compare with the (if so) known data of the image source for building your test-PDF.

No real help, but a step?

Werner
Back to Top
Dave View Drop Down
Beginner
Beginner


Joined: 15 Feb 12
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dave Quote  Post ReplyReply Direct Link To This Post Posted: 16 Feb 12 at 2:55PM
Hi Werner - good advice! My background is in document scanning and I can confirm your maths...this is exactly what is happening!
However, this is an interesting problem because I KNOW the image in the PDF is a TIFF - I put it there! ;)

Consider this code:

qp.UnlockKey("<license>");
            bool _error = false;
            int _id = 0;
            const string TifFile = "C:\\1.tif";
            const string PDFFile = "C:\\1.pdf";
            const string NewImageFile = "C:\\image.tif";
            int _dpiX = 0;
            int _dpiY = 0;

            _id = qp.NewDocument();
            if (_id == 0)
            {
                _error = true;
                return;
            }

            // now, add the image from the temp. location
            _id = qp.AddImageFromFile(TifFile, 1);

            // select this as the current image
            qp.SelectImage(_id);
            if (_id == 0)
            {
                _error = true;
                return;
            }

            qp.SelectImage(_id);

            // Draw image on the current page
            _dpiX = qp.ImageHorizontalResolution();
            if (_dpiX == 0) _dpiX = 72;
            _dpiY = qp.ImageVerticalResolution();
            if (_dpiY == 0) _dpiY = 72;

            // check the original pagesize

            double ImageWidthInPoints = (double)qp.ImageWidth() / _dpiX * 72.0;
            double ImageHeightInPoints = (double)qp.ImageHeight() / _dpiY * 72.0;

            qp.SetPageDimensions(ImageWidthInPoints, ImageHeightInPoints);
            qp.SetOrigin(1);
            qp.DrawImage(0, 0, ImageWidthInPoints, ImageHeightInPoints);

            if (qp.SaveToFile(PDFFile) != 1)
            {
                _error = true;
                return;
            }

            /* 
             * Now we have a PDF with the TIF in it. 
             *   The resolution is correct so we know QPDF is reading the image correctly
             *   The file exists on the disk and is a little larger than the source TIF. That's
             *   okay because we expect an overhead from the PDF wrapper
             * 
             * Okay, so now reverse the process. Let's extract the same file and see what
             *   happens.
             *   
             * We can make some assumptions: the PDF only has one image, only one page. 
             *   This makes the selection logic easy. In the real world,
             *   we would be passing parameters that change these values.
             */
            
            int _DocRef = qp.LoadFromFile(PDFFile, "");
            if (_DocRef == 0)
            {
                _error = true;
                return;
            }

            if (qp.SelectPage(1) == 0)
            {
                _error = true;
                return;
            }

            int imageList = qp.GetPageImageList(0);
            if (imageList == 0)
            {
                _error = true;
                return;
            }
            int ImageListCounter = qp.GetImageListCount(imageList);
            int FindImages = qp.FindImages();

            // for reasons best known to PDF, my file has 37 items in the FindImages list...
            // so, let's check them *all* for resolution and hope one matches the '200'
            //  we know our original TIF had...
            int p = 0;
            int[] _set = new int[FindImages];
            for (int j = 0; j <= FindImages-1; j++)
            {
                p = qp.GetImageID(j+1);
                if (p > 0)
                {
                    _set[j] = p;
                }
            }

            int[,] imageids = new int[36,2];
            for (int j=1; j<=36; j++)
            {

                imageids[j - 1, 0] = qp.SelectImage(_set[j - 1]);
                imageids[j - 1, 1] = qp.ImageHorizontalResolution(); 
            }
                        
            int ImageItem = qp.GetImageListItemIntProperty(imageList, 1, 400);
            // now we can read (to file) the first image on the current page

            if (qp.SaveImageListItemDataToFile(imageList, 1, 0, NewImageFile) == 0)
            {
                _error = true;
                return;
            }

Now, if you run this you'll find that I don't get a HorizontalResolution in any one of the 37 image entries! So: where the hell's my image gone?? ;)

I'm confused: the PDF is 40Kb..so it MUST have a well-compressed copy of my TIF in there - so why the heck can't I get it out in that format?

Best,
Dave
Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 17 Feb 12 at 8:47AM
Hi Dave,

I did my own test, beginning with a scan, to get a pure b/w-tiff. After this I coded only (delphi, but easy to read, I think):

  QP.SetOrigin(0);                 // Bottomleft
  iid := QP.AddImageFromFile('File0001.tif', 0); // type -1 brings 332KB!
  QP.SelectImage(iid);
  Memo1.Lines.Add(Format('type=%d',[
QP.ImageType])); // type=3=tiff
  Memo1.Lines.Add(Format('h=%d',[QP.ImageHorizontalResolution])); // 96dpi, ok

  QP.DrawImage(25, 250, w, h);
  QP.SaveToFile('FileWithTiff.pdf');

This takes my tif, shows me some properties and saves a pdf. The PDF-size corresponds to the tif.
Inside it looks good:
/Subtype /Image
/Width 258
/Height 438
/ColorSpace /DeviceGray
/BitsPerComponent 1

And now the export of our image:

  QP.LoadFromFile('FileWithTiff.pdf', '');
  QP.SelectPage(1);
  lid := QP.GetPageImageList(0);
  Memo1.Lines.Add(Format('n=%d',[QP.GetImageListCount(lid)])); // one image found
   QP.GetImageListItemIntProperty(lid, 1, 400) // reports a 2 = BMP
  QP.SaveImageListItemDataToFile(lid, 1, 0, 'File0001export.tif');

The extraction brings a difference in size of 15830-14652=1178. Why?
A look inside makes it more clear. On start the first two bytes are 'II' - TIFF, the saved image starts with 'BM' - Bitmap not TIFF.

So we are on the right way.

Open is the question, why QPL detects the image in input as TIFF and after putting into PDF, it sounds BMP? In the description above you see only the two advises:

/ColorSpace /DeviceGray
/BitsPerComponent 1

From only this it is a question of interpretation, really known is only pure b/w-image. The streamdata give (for me) no advice to make a sure decision between TIFF and BMP, because I'm not familar enough with the internals.


Werner
Back to Top
Dave View Drop Down
Beginner
Beginner


Joined: 15 Feb 12
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dave Quote  Post ReplyReply Direct Link To This Post Posted: 17 Feb 12 at 1:55PM
Thanks Werner - yes, I can read Delphi!

So it's not my environment then ;)
Okay, I raise this as a question to support now and let them know this tread exists.

Thanks again for your help; it was really useful to know I am not doing something wrong!

Best regards,
Dave

Back to Top
samb View Drop Down
Beginner
Beginner


Joined: 09 Feb 11
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote samb Quote  Post ReplyReply Direct Link To This Post Posted: 22 Feb 12 at 5:45PM
FYI LibTiff .net port http://bitmiracle.com/libtiff/

Confirmed in my environment too with 8.14b5. Also tried with a TIFF LZW, PNG, GIF, and JPG files and only the JPG was returned in it's original format.  Going to streams instead of files doesn't help either.

I think the issue has to deal with the way the PDF files handle images as edvoigt was getting at.  PDF files store image data, but not image files.  If you open your created PDF with notepad, you can see your image data after the "stream" keyword.  Notice that the image data is missing the typical file type header bytes "II" to signify that it's a TIFF and just starts with the image contents.  The PDF does store the compression algorithm "/Filter /CCITFAXDECODE" so it knows how to interpret it for rendering, but it doesn't know or care that it was a "TIF" image.

So, unfortunately, returning image data is a bit more complicated than just pulling the data from the stream section, and QuickPDF obviously doesn't handle all cases as expected.

One other issue that you may run into with TIFs.  One of the properties of a TIF is rowsperstrip.  Say you have an image 100 pixels tall and rowsperstrip set to 10.  If you add that image to a PDF (with QuickPDF at least, not sure if this is a universal problem), it will actually add 10 images of 10 pixels high each.  If you go to retrieve those images, you will have to retrieve all 10 strips and merge them together (with libtif). 
I believe that the default behavior in GDI+ for windows XP is to set the rowsperstrip to the full image height, but in Windows 7 the default behavior is to set it to some set value (25?).
The only other way around this is to set the rowsperstrip property to the full height of the image.  And of course, you can't do this directly with GDI+.  LibTiff.Net has an example of doing it though.

GDI+ is also going to give you headaches trying to edit those images (standard Graphics operations wont work against images with indexed pixel types such as black and white).

So, because of the quirks with PDF, QuickPDF's and GDI+, I've found it easier to only give and retrieve Bitmap images to QuickPDF and just let it handle compression (it appears to use Flate for black and white, which is not as small as G4, but not terrible).  With .Net it's easy enough to compress an bitmap to send it across the network, then turn it back into a bitmap before dumping it in the next PDF.  This will obviously add some processing time, but at least it works. 


byte[] compressedimagedata;
byte[] imagedata = QuickPDF.SaveImageDataToString(...)
using (MemoryStream imagedatastream = new MemoryStream(imagedata))
{
     Image image = Image.FromStream(imagedatastream);
     using (MemoryStream compressedimagedatastream = new MemoryStream())
     {
          image.Save(compressedimagedatastream, ImageFormat.PNG);
          compressedimagedata = commpressedimagedatastream.ToArray();
     }
}




Edited by samb - 22 Feb 12 at 5:48PM
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store