OCR Jobs Management Portlet

From Gcube Wiki
Revision as of 12:39, 26 September 2011 by Stefanos.tsaklas (Talk | contribs) (Created page with 'This is a portlet that uses OCR Stateful Web Service to submit,poll status and delete OCR jobs. ''' == Submit a OCR job == ''' To submit a new OCR job, …')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This is a portlet that uses OCR Stateful Web Service to submit,poll status and delete OCR jobs.



Submit a OCR job


To submit a new OCR job, you can use the 'Submit Job' tab of the portlet. The input fields are:

  • Job name (optional): this is a way to identify easily the job, if you fill the field the unique job id will have as

prefix this job name.

  • Execution type: select where you want the OCRing process to be executed, "gcube" or "glite" worker nodes. In case of glite, you must also upload a proxy file using the "Upload proxy" form.
  • Type of job: select the granularity of OCRing: In "Single" mode you give as input a pdf file, in "Bulk" mode you

give as input a zip file of many pdfs.

  • Language: select the language of the input pdfs, you can choose between "English","Deutch","French","Italian","Dutch","Spanish".
  • Input access: select the way to give the input (pdf file or .zip with many pdfs). You can select "Reference" option to give a http/ftp reference such as "http://dl.dropbox.com/u/19792897/NobelAnnounce.pdf" or "ftp://www.di.uoa.gr/NobelAnnounce.pdf", "CMS Reference" option to give a cms uri such as "cms://14c1fb40-9116-11e0-90f7-ca34f60d2e2d/c7feb1e0-d4bd-11e0-a12e-fda94ff03821", or "Upload" option in which you use the form to upload a file from your filesystem.
  • Reference/Upload file: use this field to give a value for the input access you chose previously. In case of "Reference" and "CMS Reference" a textbox is shown, in case of "Upload" a form appears to upload your file.
  • Upload proxy: this field only appears if you have selected "glite" as execution type. Use this form to upload your proxy file for the grid.


Example: We want to perform OCRing on a zipped file with many pdfs. We submit a ocr job with job name "many_pdfs", we choose to be executed in gcube nodes, its type to be "Bulk", and we choose to give the zip file through http, so we choose "Reference" as Input Access and give the http reference below. Since the execution type is not "glite", we don't need to provide a proxy file.


After pressing the "Submit" button, we get the job id, which is the job name we chose plus a unique identifier.


Submit ocr.jpg


Poll status of a OCR job

You can poll the status of previously submitted ocr jobs by using the "Poll status" tab. Choose a job id from the drop-down list and you can see the status of the submitted job.


The ocr job may be still running, in which case you can just see trivial information of the job: Poll ocr not completed.jpg


it may have completed execution with an error (e.g. proxy file was invalid or had expired), in which case you can see the "Error" and "Error Details" messages: Poll status completed error.jpg


Or it may have completed execution without errors, in which case you can use the "Download" buttons to retrieve output files of the job:


Poll ocr completed.jpg

Delete a OCR job


If you want a OCR job to stop appearing in your listboxes, you can use the "Delete Job" tab. You simply have to choose a job id from the drop-down list and confirm that you want the job to be deleted.

Delete ocr job.jpg