Print Page | Close Window

Find text by font size?

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2648
Printed Date: 05 May 25 at 6:44AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Find text by font size?
Posted By: Skylla
Subject: Find text by font size?
Date Posted: 21 May 13 at 7:57PM
I am trying to find out how to extract text with "specific font size" from pdf file via C#? For example search pdf file, find 22pt text and extract it. Is there a way to accomplist this via quick pdf? Any ideas or sample codes? Need help from gurus! Thank you!



Replies:
Posted By: Ingo
Date Posted: 21 May 13 at 8:12PM
Hi Skylla!

That's easy stuff so you should succeed by your own ;-)
Take a starting code for beginners:
http://www.quickpdflibrary.com/help/getting-started-activex.php
Insert a LoadFromFile...
Insert a PageCount...
Then create a loop with PageCount
...and there insert the functionality of ExtractFilePageText with Option 3.
Here you can read all about option 3 and then you know how to do:
http://www.quickpdflibrary.com/help/quickpdf/ExtractFilePageText.php

Cheers and welcome here,
Ingo



Posted By: Skylla
Date Posted: 21 May 13 at 9:51PM
Thank you for your good starting points!


Posted By: AndrewC
Date Posted: 23 May 13 at 5:25AM
Skylla,

Here is some sample code that shows how easy it is to extract the text with the new text block functions.

            QP.LoadFromFile("99pages.pdf", "");
            for (int i = 1; i <= QP.PageCount();i++)
            {
                QP.SelectPage(i);
                int id = QP.ExtractPageTextBlocks(3);
                for (int w=1 ; w<=QP.GetTextBlockCount(id) ; w++)
                {
                    double size = QP.GetTextBlockFontSize(id, w);

                    if (Math.Round(size) == 22)
                        MessageBox.Show("Page :" + i.ToString() + " Word:" + w.ToString() + "'" + QP.GetTextBlockText(id, w) + "'");
                }
                QP.ReleaseTextBlocks(id);
            }


Andrew.



Posted By: Skylla
Date Posted: 23 May 13 at 8:38AM
Hi Andrew.

Thank you for your sample. Tried that code, in runtime i got the following results;

id = 1476395009,
Result of qp.GetTextBlockCount(id) = 0
so loop in  for (int w = 1; w <= qp.GetTextBlockCount(id); w++) not succeed. Do you have an idea what is happening?

        var qp = new PDFLibrary("C:\\DebenuPDFLibraryDLL0914.dll");
        const string licenseKey = "licencekey";
        var result = qp.UnlockKey(licenseKey);
        if (qp.LibraryLoaded())
        {
            if (result == 1)
            {
                qp.LoadFromFile("aaa.pdf", "");
                for (int i = 1; i <= qp.PageCount(); i++)
                {
                    qp.SelectPage(i);
                    int id = qp.ExtractPageTextBlocks(3);
                    for (int w = 1; w <= qp.GetTextBlockCount(id); w++)
                    {
                        double size = qp.GetTextBlockFontSize(id, w);

                        if (Math.Round(size) == 22)
                            Response.Write("Page :" + i.ToString(CultureInfo.InvariantCulture) + " Word:" + w.ToString(CultureInfo.InvariantCulture) + "'" + qp.GetTextBlockText(id, w) + "'" + "<br>");
                    }
                    qp.ReleaseTextBlocks(id);
                }
            }
        }


Posted By: AndrewC
Date Posted: 23 May 13 at 9:01AM
If it returned 0 then it is not finding any text on the page.  I would need to see the PDF file before I could make and further comments.

Andrew.


Posted By: Skylla
Date Posted: 23 May 13 at 2:05PM
Its a basic pdf actually which is create by me for testing.

http://speedy.sh/KMPmW/aaa.pdf

Just 24, 23, 22, 20 pt text's in it. Created with word, saved as pdf.


Posted By: AndrewC
Date Posted: 24 May 13 at 8:25AM
My code is working correctly with your PDF and is returning the 22pt font from both pages.

Is LoadFromFile returning 1 in your case ?  Does QP.PageCount return 1 or 2 ?  It should be 2 for your PDF.  It could be a permissions problem.  I suspect LoadFromFile is failing. By default QPL always has a single blank page allocated in memory it could be that is the reason nothing is being extracted.

You may then try string s= QP.GetPageText(7);  MessageBox.Show(s);  to make sure the text is actually being extracted.

Andrew.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk