Print Page | Close Window

extract text

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: Sample Code
Forum Description: Share Debenu Quick PDF Library sample code with other forum members
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1326
Printed Date: 28 Apr 24 at 12:59AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: extract text
Posted By: kavaler
Subject: extract text
Date Posted: 27 Jan 10 at 2:08PM
Hello
I scanned the document
Has save it by name doc1.pdf
How can I from Delphi take the text from doc1.pdf?



Replies:
Posted By: JanN
Date Posted: 27 Jan 10 at 2:31PM
Hi,

QuickPdf is not able to extract text from image-only pdf files. Therefor you will need special OCR tools like OmniPage or Abbyy.


Posted By: kavaler
Date Posted: 27 Jan 10 at 4:02PM
Can you tell me what I should   do for this purpose?


Posted By: JanN
Date Posted: 27 Jan 10 at 4:13PM
If I were you, I would buy OmniPage Professional (version is available for around 100$ on the net). It is able to recognize the text in scanned documents and convert them to searchable pdf files or to text files.


Posted By: kavaler
Date Posted: 27 Jan 10 at 4:21PM
How can I use it  (OmniPage Proffessional) from Delphi?


Posted By: JanN
Date Posted: 27 Jan 10 at 4:33PM
Google is your friend... ;)

OmniPage Professional is a standalone product. You can configure it to grab files from a specified folder and output the convertet files to another. Then you can work with those resulting files in Delphi.


Posted By: Shotgun Tom
Date Posted: 27 Jan 10 at 5:42PM
Be aware that stand alone products, like OmniPage, are not redistributable.  That means if you are just doing this for your own use then they will work fine. 
 
If, however, you are creating a program for others you'll need to obtain a SDK that may be distributed with your product.  The SDK's are quite abit more expensive.  You would be looking at prices that range between $400 and $3000.
 
Tom


Posted By: Ingo
Date Posted: 27 Jan 10 at 9:04PM
Hi Tom!

There's a better (less expensive) method...
The delphi-solution from kavaler and an additional OmniPage Pro ;-)

Cheers, Ingo


Posted By: dsola
Date Posted: 05 Feb 10 at 12:50PM
Hi,
Try this
   http://jocr.sourceforge.net/index.html

It's free and results for me were satisfactory.
If the document is scanned well and have "normal" fonts maybe this will be enough
.

If You need working example just ask.


-------------
registered QuickPDF user


Posted By: kavaler
Date Posted: 06 Feb 10 at 9:48AM
hello
Can you tell me how can I use it in Delhpi?


Posted By: dsola
Date Posted: 08 Feb 10 at 8:23AM
Hi,
I'll post delphi code shortly but for now try this.

This is content of test.cmd file

rem begin
 djpeg -grayscale -pnm YourPictureName.jpg YourPictureName.pnm
 gocr -i YourPictureName.pnm -o YourExtractedText.txt
rem end

test.cmd, djpeg.exe, gocr.exe, YourPictureName.jpg are together in the same directory.

With this You can test if this method satisfies Your needs.

-------------
registered QuickPDF user


Posted By: dsola
Date Posted: 09 Feb 10 at 7:13AM
Here is part of Delphi code

// ttt.pbm  - image for OCR (withh TFreeBitmap JPG can be converted to PBM or useing djpeg.exe)
// ttt.txt - result of OCR
procedure TOIBCaptchaKiller.Do_OCR;
var
  StartupInfo : TStartupInfo;
  ProcessInfo : TProcessInformation;
  Res:boolean;
  cmdLine:array[0..512] of char;
  lpExitCode: DWORD;
begin
  FillChar (StartupInfo, SizeOf(StartupInfo), 0);
  StartupInfo.cb := SizeOf(StartupInfo);
  StartupInfo.wShowWindow := SW_SHOWNORMAL;//SW_HIDE
  if  CreateProcess (nil,Pchar('gocr -i ttt.pbm -o ttt.txt'),nil,nil,FALSE, 0,nil,nil, StartupInfo, ProcessInfo) then begin

  GetExitCodeProcess(ProcessInfo.hProcess, lpExitCode);

      while lpExitCode = STILL_ACTIVE do begin         // sve dok se test applikacija ne ugasi nejdi nikud
        sleep(100);
        Application.ProcessMessages;
        GetExitCodeProcess(ProcessInfo.hProcess, lpExitCode);
      end;

    CloseHandle(ProcessInfo.hProcess);
    CloseHandle(ProcessInfo.hThread);

    end;
end;



-------------
registered QuickPDF user


Posted By: kavaler
Date Posted: 11 Feb 10 at 10:12AM
hello
I can't understand how  may I to use it in Delphi
this is my e-mail
islam261@gmail.com
adress please send me example(s)  about this



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk