Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Extract text and images and insert into new PDF |
Post Reply |
Author | |
Rowan
Moderator Group Joined: 10 Jan 09 Status: Offline Points: 398 |
Post Options
Thanks(0)
Posted: 08 Jan 10 at 5:46AM |
This Delphi sample code shows you how to extract text and images from one PDF and insert it into a new PDF at the same locations. ---------------------------- var FH: Integer; PR: Integer; SL: TStringList; Data: string; Font: string; Color: string; Size: string; X1, Y1, X2, Y2, X3, Y3, X4, Y4: string; Text: string; X: Integer; IL: Integer; TextBlockLeft: Double; TextBlockTop: Double; PageNum: Integer; ImageData: string; ImageLeft, ImageTop, ImageWidth, ImageHeight: Double; ... // Open the file in direct access mode and store the file handle FH := QP.DAOpenFile('Xpod1228090001.pdf', ''); // Loop through all the pages for PageNum := 1 to QP.DAGetPageCount(FH) do begin // Start a new document QP.NewDocument; // Specify that images should be compressed QP.CompressImages(1); // Get a page reference to the current page PR := QP.DAFindPage(FH, PageNum); // Create a string list to hold the text data SL := TStringList.Create; try // Extract the text from the current page SL.Text := QP.DAExtractPageText(FH, PR, 4); // Add each block of text to the new documen for X := 0 to SL.Count - 1 do begin Data := SL[X]; Font := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(Font) + 1); Color := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(Color) + 1); Size := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(Size) + 1); X1 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(X1) + 1); Y1 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(Y1) + 1); X2 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(X2) + 1); Y2 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(Y2) + 1); X3 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(X3) + 1); Y3 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(Y3) + 1); X4 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(X4) + 1); Y4 := Copy(Data, 1, Pos(',', Data) - 1); Delete(Data, 1, Length(Y4) + 1); Text := Copy(Data, 2, Length(Data) - 2); // Replace the utf-8 encoded TM symbol with the // PDF WinAnsi character code if Pos(#226#132#162, Text) > 0 then Text := StringReplace(Text, #226#132#162, #153, [rfReplaceAll]); // Set the text size QP.SetTextSize(StrToFloat(Size)); // Draw the text, shift up by the font's "descent" value QP.DrawText(StrToFloat(X1), StrToFloat(Y1) - QP.GetTextDescent, Text); end; finally SL.Free; end; // Find all the images on the page IL := QP.DAGetPageImageList(FH, PR); // Loop through all the images for X := 1 to QP.DAGetImageListCount(FH, IL) do begin // Read the image data ImageData := QP.DAGetImageDataToString(FH, IL, X); // Add the image data to the new document QP.AddImageFromString(ImageData, 0); // Determine the location and size of the image on the page ImageLeft := QP.DAGetImageDblProperty(FH, IL, X, 501); ImageTop := QP.DAGetImageDblProperty(FH, IL, X, 502); ImageWidth := QP.DAGetImageDblProperty(FH, IL, X, 503) - QP.DAGetImageDblProperty(FH, IL, X, 501); ImageHeight := QP.DAGetImageDblProperty(FH, IL, X, 502) - QP.DAGetImageDblProperty(FH, IL, X, 508); // Draw the image onto the new document's page QP.DrawImage(ImageLeft, ImageTop, ImageWidth, ImageHeight); end; // End image loop // Compress the page description commands QP.CompressContent; // Save the file QP.SaveToFile('XPod-' + IntToStr(PageNum) + '.pdf'); // Remove the document QP.RemoveDocument(QP.SelectedDocument); end; // End page loop |
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
You extract the font and colour from the the StringList. How can we select the font used (SelectFont() ???) for the new page based on the font name returned from the DAExtractPageText call ?
I would like to be able to select an existing font in the document so I can then do some calculations on character widths of the strings returned from the DAExtractPageText call. I have tried using GetFormFontCount() but I assume this is only for Form fields. GetFontCount() and GetFontName() ??? |
|
HNRSoftware
Senior Member Joined: 13 Feb 11 Location: Washington, USA Status: Offline Points: 88 |
Post Options
Thanks(0)
|
Hi Rowan - the code mostly does what you said, but it creates a new 1-page pdf file for each of the pages in the original file. I tested it on a pretty tough file and it did very well. My test file uses a very small font and that might account for some odd crowding of the text in the new document(s). My guess is that I am seeing a font substitution - pretty similar, but not identical.
I am not there in my testing yet, but I would assume that this is NOT the proper way to copy pages from one file to another, but it does show text and image extraction quite well.
Howard
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store