Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Find text by font size?
  FAQ FAQ  Forum Search   Register Register  Login Login

Find text by font size?

 Post Reply Post Reply
Author
Message
Skylla View Drop Down
Beginner
Beginner


Joined: 21 May 13
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote Skylla Quote  Post ReplyReply Direct Link To This Post Topic: Find text by font size?
    Posted: 21 May 13 at 7:57PM
I am trying to find out how to extract text with "specific font size" from pdf file via C#? For example search pdf file, find 22pt text and extract it. Is there a way to accomplist this via quick pdf? Any ideas or sample codes? Need help from gurus! Thank you!
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3529
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 21 May 13 at 8:12PM
Hi Skylla!

That's easy stuff so you should succeed by your own ;-)
Take a starting code for beginners:
http://www.quickpdflibrary.com/help/getting-started-activex.php
Insert a LoadFromFile...
Insert a PageCount...
Then create a loop with PageCount
...and there insert the functionality of ExtractFilePageText with Option 3.
Here you can read all about option 3 and then you know how to do:
http://www.quickpdflibrary.com/help/quickpdf/ExtractFilePageText.php

Cheers and welcome here,
Ingo

Back to Top
Skylla View Drop Down
Beginner
Beginner


Joined: 21 May 13
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote Skylla Quote  Post ReplyReply Direct Link To This Post Posted: 21 May 13 at 9:51PM
Thank you for your good starting points!
Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 23 May 13 at 5:25AM
Skylla,

Here is some sample code that shows how easy it is to extract the text with the new text block functions.

            QP.LoadFromFile("99pages.pdf", "");
            for (int i = 1; i <= QP.PageCount();i++)
            {
                QP.SelectPage(i);
                int id = QP.ExtractPageTextBlocks(3);
                for (int w=1 ; w<=QP.GetTextBlockCount(id) ; w++)
                {
                    double size = QP.GetTextBlockFontSize(id, w);

                    if (Math.Round(size) == 22)
                        MessageBox.Show("Page :" + i.ToString() + " Word:" + w.ToString() + "'" + QP.GetTextBlockText(id, w) + "'");
                }
                QP.ReleaseTextBlocks(id);
            }


Andrew.

Back to Top
Skylla View Drop Down
Beginner
Beginner


Joined: 21 May 13
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote Skylla Quote  Post ReplyReply Direct Link To This Post Posted: 23 May 13 at 8:38AM
Hi Andrew.

Thank you for your sample. Tried that code, in runtime i got the following results;

id = 1476395009,
Result of qp.GetTextBlockCount(id) = 0
so loop in  for (int w = 1; w <= qp.GetTextBlockCount(id); w++) not succeed. Do you have an idea what is happening?

        var qp = new PDFLibrary("C:\\DebenuPDFLibraryDLL0914.dll");
        const string licenseKey = "licencekey";
        var result = qp.UnlockKey(licenseKey);
        if (qp.LibraryLoaded())
        {
            if (result == 1)
            {
                qp.LoadFromFile("aaa.pdf", "");
                for (int i = 1; i <= qp.PageCount(); i++)
                {
                    qp.SelectPage(i);
                    int id = qp.ExtractPageTextBlocks(3);
                    for (int w = 1; w <= qp.GetTextBlockCount(id); w++)
                    {
                        double size = qp.GetTextBlockFontSize(id, w);

                        if (Math.Round(size) == 22)
                            Response.Write("Page :" + i.ToString(CultureInfo.InvariantCulture) + " Word:" + w.ToString(CultureInfo.InvariantCulture) + "'" + qp.GetTextBlockText(id, w) + "'" + "<br>");
                    }
                    qp.ReleaseTextBlocks(id);
                }
            }
        }
Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 23 May 13 at 9:01AM
If it returned 0 then it is not finding any text on the page.  I would need to see the PDF file before I could make and further comments.

Andrew.
Back to Top
Skylla View Drop Down
Beginner
Beginner


Joined: 21 May 13
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote Skylla Quote  Post ReplyReply Direct Link To This Post Posted: 23 May 13 at 2:05PM
Its a basic pdf actually which is create by me for testing.

http://speedy.sh/KMPmW/aaa.pdf

Just 24, 23, 22, 20 pt text's in it. Created with word, saved as pdf.
Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 24 May 13 at 8:25AM
My code is working correctly with your PDF and is returning the 22pt font from both pages.

Is LoadFromFile returning 1 in your case ?  Does QP.PageCount return 1 or 2 ?  It should be 2 for your PDF.  It could be a permissions problem.  I suspect LoadFromFile is failing. By default QPL always has a single blank page allocated in memory it could be that is the reason nothing is being extracted.

You may then try string s= QP.GetPageText(7);  MessageBox.Show(s);  to make sure the text is actually being extracted.

Andrew.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store