Print Page | Close Window

C# - Extract pages based on a keyword match

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: Sample Code
Forum Description: Share Debenu Quick PDF Library sample code with other forum members
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2325
Printed Date: 22 Nov 24 at 7:11PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: C# - Extract pages based on a keyword match
Posted By: AndrewC
Subject: C# - Extract pages based on a keyword match
Date Posted: 02 Jul 12 at 7:03AM

This code will iterate through all pages in a PDF file and if the extracted text contains the 'keyword' then the page is added to a list and all matching pages are extracted into a new document.

Of course, you can make the matching more complex to suit your needs.

Andrew.

             string keyword = "garden";

        string extractPages = "";
        int foundCount = 0;

        QP.LoadFromFile("originalfile.pdf", "");

        // Iterate through each page in the document
        for (int page = 1; page <= QP.PageCount(); page++)
        {
            // look for pages that match

            QP.SelectPage(page);
            string TextContent = QP.GetPageText(0);  // Can also use option 8.

            if (TextContent.Contains(keyword))  // we found a page
            {
                if (foundCount != 0)
                    extractPages = extractPages + ",";

                extractPages = extractPages + page.ToString();
                    
                foundCount++;
            } 
        }

        if (foundCount > 0)
        {
            QP.ExtractPageRanges(extractPages);
            QP.SaveToFile("out.pdf");
        }
        else
            MessageBox.Show("Keyword not found");

        QP.RemoveDocument(QP.SelectedDocument());
    }




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk