Print Page | Close Window

Extract text and images and insert into new PDF

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: Sample Code
Forum Description: Share Debenu Quick PDF Library sample code with other forum members
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1308
Printed Date: 22 Nov 24 at 7:06PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Extract text and images and insert into new PDF
Posted By: Rowan
Subject: Extract text and images and insert into new PDF
Date Posted: 08 Jan 10 at 5:46AM
This Delphi sample code shows you how to extract text and images from one PDF and insert it into a new PDF at the same locations.

----------------------------
var
  FH: Integer;
  PR: Integer;
  SL: TStringList;
  Data: string;
  Font: string;
  Color: string;
  Size: string;
  X1, Y1, X2, Y2, X3, Y3, X4, Y4: string;
  Text: string;
  X: Integer;
  IL: Integer;
  TextBlockLeft: Double;
  TextBlockTop: Double;
  PageNum: Integer;
  ImageData: string;
  ImageLeft, ImageTop, ImageWidth, ImageHeight: Double;

...

// Open the file in direct access mode and store the file handle
FH := QP.DAOpenFile('Xpod1228090001.pdf', '');

// Loop through all the pages
for PageNum := 1 to QP.DAGetPageCount(FH) do
begin
  // Start a new document
  QP.NewDocument;

  // Specify that images should be compressed
  QP.CompressImages(1);

  // Get a page reference to the current page
  PR := QP.DAFindPage(FH, PageNum);

  // Create a string list to hold the text data
  SL := TStringList.Create;
  try

    // Extract the text from the current page
    SL.Text := QP.DAExtractPageText(FH, PR, 4);

    // Add each block of text to the new documen
    for X := 0 to SL.Count - 1 do
    begin
      Data := SL[X];

      Font := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Font) + 1);
      Color := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Color) + 1);
      Size := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Size) + 1);

      X1 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X1) + 1);
      Y1 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y1) + 1);
      X2 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X2) + 1);
      Y2 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y2) + 1);
      X3 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X3) + 1);
      Y3 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y3) + 1);
      X4 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(X4) + 1);
      Y4 := Copy(Data, 1, Pos(',', Data) - 1);
      Delete(Data, 1, Length(Y4) + 1);

      Text := Copy(Data, 2, Length(Data) - 2);

      // Replace the utf-8 encoded TM symbol with the
      // PDF WinAnsi character code
      if Pos(#226#132#162, Text) > 0 then
        Text := StringReplace(Text, #226#132#162, #153,
          [rfReplaceAll]);

      // Set the text size
      QP.SetTextSize(StrToFloat(Size));

      // Draw the text, shift up by the font's "descent" value
      QP.DrawText(StrToFloat(X1),
        StrToFloat(Y1) - QP.GetTextDescent,
        Text);
    end;
  finally
    SL.Free;
  end;

  // Find all the images on the page
  IL := QP.DAGetPageImageList(FH, PR);

  // Loop through all the images
  for X := 1 to QP.DAGetImageListCount(FH, IL) do
  begin

    // Read the image data
    ImageData := QP.DAGetImageDataToString(FH, IL, X);

    // Add the image data to the new document
    QP.AddImageFromString(ImageData, 0);

    // Determine the location and size of the image on the page
    ImageLeft := QP.DAGetImageDblProperty(FH, IL, X, 501);
    ImageTop := QP.DAGetImageDblProperty(FH, IL, X, 502);
    ImageWidth := QP.DAGetImageDblProperty(FH, IL, X, 503) -
      QP.DAGetImageDblProperty(FH, IL, X, 501);
    ImageHeight := QP.DAGetImageDblProperty(FH, IL, X, 502) -
      QP.DAGetImageDblProperty(FH, IL, X, 508);

    // Draw the image onto the new document's page
    QP.DrawImage(ImageLeft, ImageTop, ImageWidth, ImageHeight);

  end;  // End image loop

  // Compress the page description commands
  QP.CompressContent;

  // Save the file
  QP.SaveToFile('XPod-' + IntToStr(PageNum) + '.pdf');

  // Remove the document
  QP.RemoveDocument(QP.SelectedDocument);

end;  // End page loop




Replies:
Posted By: AndrewC
Date Posted: 16 Dec 10 at 6:30AM
You extract the font and colour from the the StringList.  How can we select the font used (SelectFont() ???) for the new page based on the font name returned from the DAExtractPageText call ?

I would like to be able to select an existing font in the document so I can then do some calculations on character widths of the strings returned from the DAExtractPageText call.

I have tried using GetFormFontCount() but I assume this is only for Form fields.  GetFontCount() and GetFontName() ???



Posted By: HNRSoftware
Date Posted: 15 Feb 11 at 1:08AM
Hi Rowan - the code mostly does what you said, but it creates a new 1-page pdf file for each of the pages in the original file.  I tested it on a pretty tough file and it did very well.  My test file uses a very small font and that might account for some odd crowding of the text in the new document(s).  My guess is that I am seeing a font substitution - pretty similar, but not identical.
 
I am not there in my testing yet, but I would assume that this is NOT the proper way to copy pages from one file to another, but it does show text and image extraction quite well.
 
Howard



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk