Text Recognition for own materials

The Transcribus way

Transcribus is a tool made for researchers, archives for text recognition, layout analysis both for handwritten and printed text for several languages. Currently free to use.

Install transcribus software to your own machine. It acts as the way to transfer data to the Transcribus servers, where all the "heavy-lifting" with the materials is done.

It might be useful to also create your own account to find own materials better.

Video example of Transcribus

Start Transcribus software locally

Login

Imagen no Localizada
Fig: Transcribus main window

Upload material to Transcribus server

Note the usage terms vs. the terms of which your material have. If there are no limitations, then upload your material to the Transcribus.

  • Pick 'Import' icon from top toolbar.
  • Select a folder from your machine to upload images to Transcribus

Wait for a while to get images transferred to the server. Depending of material amount and other users on the server this can take a while or be ready immediately.

Create a collection if needed. Open up the collection .

Run the OCR

  • Go to Tools > 'Text Recognition'

Transcribus can use HTR (Handwritten Text Recognition) or basic OCR (Optical Character Recognition) at the moment.

Pick OCR and click 'Run...'

Wait a moment, Tesseract background servers will do text recognition and store the results.

Export the OCR results

On the top menu bar, pick 'Export' and select the desired output formats.

Click then 'OK' , you'll get the export to your email. (TODO: how long does the exporting take...(?))

results matching ""

    No results matching ""