Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > Sample Code
  New Posts New Posts RSS Feed - Extract text and images and insert into new PDF
  FAQ FAQ  Forum Search   Register Register  Login Login

Extract text and images and insert into new PDF

 Post Reply Post Reply
Author
Message
Rowan View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 10 Jan 09
Status: Offline
Points: 398
Post Options Post Options   Thanks (0) Thanks(0)   Quote Rowan Quote  Post ReplyReply Direct Link To This Post Topic: Extract text and images and insert into new PDF
    Posted: 08 Jan 10 at 5:46AM
This Delphi sample code shows you how to extract text and images from one PDF and insert it into a new PDF at the same locations.

----------------------------
var
  FH: Integer;
  PR: Integer;
  SL: TStringList;
  Data: string;
  Font: string;
  Color: string;
  Size: string;
  X1, Y1, X2, Y2, X3, Y3, X4, Y4: string;
  Text: string;
  X: Integer;
  IL: Integer;
  TextBlockLeft: Double;
  TextBlockTop: Double;
  PageNum: Integer;
  ImageData: string;
  ImageLeft, ImageTop, ImageWidth, ImageHeight: Double;

...

// Open the file in direct access mode and store the file handle
FH := QP.DAOpenFile('Xpod1228090001.pdf', '');

// Loop through all the pages
for PageNum := 1 to QP.DAGetPageCount(FH) do
begin
  // Start a new document
  QP.NewDocument;

  // Specify that images should be compressed
  QP.CompressImages(1);

  // Get a page reference to the current page
  PR := QP.DAFindPage(FH, PageNum);

  // Create a string list to hold the text data
  SL := TStringList.Create;
  try

    // Extract the text from the current page
    SL.Text := QP.DAExtractPageText(FH, PR, 4);

    // Add each block of text to the new documen
    for X := 0 to SL.Count - 1 do
    begin
      Data := SL[X];

      Font := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Font) + 1);
      Color := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Color) + 1);
      Size := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Size) + 1);

      X1 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X1) + 1);
      Y1 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y1) + 1);
      X2 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X2) + 1);
      Y2 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y2) + 1);
      X3 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X3) + 1);
      Y3 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y3) + 1);
      X4 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X4) + 1);
      Y4 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y4) + 1);

      Text := Copy(Data, 2, Length(Data) - 2);

      // Replace the utf-8 encoded TM symbol with the
      // PDF WinAnsi character code
      if Pos(#226#132#162, Text) > 0 then
        Text := StringReplace(Text, #226#132#162, #153,
          [rfReplaceAll]);

      // Set the text size
      QP.SetTextSize(StrToFloat(Size));

      // Draw the text, shift up by the font's "descent" value
      QP.DrawText(StrToFloat(X1),
        StrToFloat(Y1) - QP.GetTextDescent,
        Text);
    end;
  finally
    SL.Free;
  end;

  // Find all the images on the page
  IL := QP.DAGetPageImageList(FH, PR);

  // Loop through all the images
  for X := 1 to QP.DAGetImageListCount(FH, IL) do
  begin

    // Read the image data
    ImageData := QP.DAGetImageDataToString(FH, IL, X);

    // Add the image data to the new document
    QP.AddImageFromString(ImageData, 0);

    // Determine the location and size of the image on the page
    ImageLeft := QP.DAGetImageDblProperty(FH, IL, X, 501);
    ImageTop := QP.DAGetImageDblProperty(FH, IL, X, 502);
    ImageWidth := QP.DAGetImageDblProperty(FH, IL, X, 503) -
      QP.DAGetImageDblProperty(FH, IL, X, 501);
    ImageHeight := QP.DAGetImageDblProperty(FH, IL, X, 502) -
      QP.DAGetImageDblProperty(FH, IL, X, 508);

    // Draw the image onto the new document's page
    QP.DrawImage(ImageLeft, ImageTop, ImageWidth, ImageHeight);

  end;  // End image loop

  // Compress the page description commands
  QP.CompressContent;

  // Save the file
  QP.SaveToFile('XPod-' + IntToStr(PageNum) + '.pdf');

  // Remove the document
  QP.RemoveDocument(QP.SelectedDocument);

end;  // End page loop

Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 16 Dec 10 at 6:30AM
You extract the font and colour from the the StringList.  How can we select the font used (SelectFont() ???) for the new page based on the font name returned from the DAExtractPageText call ?

I would like to be able to select an existing font in the document so I can then do some calculations on character widths of the strings returned from the DAExtractPageText call.

I have tried using GetFormFontCount() but I assume this is only for Form fields.  GetFontCount() and GetFontName() ???

Back to Top
HNRSoftware View Drop Down
Senior Member
Senior Member


Joined: 13 Feb 11
Location: Washington, USA
Status: Offline
Points: 88
Post Options Post Options   Thanks (0) Thanks(0)   Quote HNRSoftware Quote  Post ReplyReply Direct Link To This Post Posted: 15 Feb 11 at 1:08AM
Hi Rowan - the code mostly does what you said, but it creates a new 1-page pdf file for each of the pages in the original file.  I tested it on a pretty tough file and it did very well.  My test file uses a very small font and that might account for some odd crowding of the text in the new document(s).  My guess is that I am seeing a font substitution - pretty similar, but not identical.
 
I am not there in my testing yet, but I would assume that this is NOT the proper way to copy pages from one file to another, but it does show text and image extraction quite well.
 
Howard
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store