Table of Contents

.ds Aq ’

Name

lstmtraining - Training program for LSTM-based networks.

Synopsis

lstmtraining --continue_from train_output_dir/continue_from_lang.lstm --old_traineddata bestdata_dir/continue_from_lang.traineddata --traineddata train_output_dir/lang/lang.traineddata --max_iterations NNN --debug_interval 0|-1 --train_listfile train_output_dir/lang.training_files.txt --model_output train_output_dir/newlstmmodel

Description

lstmtraining(1) trains LSTM-based networks using a list of lstmf files and starter traineddata file as the main input. Training from scratch is not recommended to be done by users. Finetuning (example command shown in synopsis above) or replacing a layer options can be used instead. Different options apply to different types of training. Read the [training documentation](m[blue]https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.htmlm[]) for details.

Options

Aq--debug_interval Aq

How often to display the alignment. (type:int default:0)

Aq--net_mode Aq

Controls network behavior. (type:int default:192)

Aq--perfect_sample_delay Aq

How many imperfect samples between perfect ones. (type:int default:0)

Aq--max_image_MB Aq

Max memory to use for images. (type:int default:6000)

Aq--append_index Aq

Index in continue_from Network at which to attach the new network defined by net_spec (type:int default:-1)

Aq--max_iterations Aq

If set, exit after this many iterations. A negative value is interpreted as epochs, 0 means infinite iterations. (type:int default:0)

Aq--target_error_rate Aq

Final error rate in percent. (type:double default:0.01)

Aq--weight_range Aq

Range of initial random weights. (type:double default:0.1)

Aq--learning_rate Aq

Weight factor for new deltas. (type:double default:0.001)

Aq--momentum Aq

Decay factor for repeating deltas. (type:double default:0.5)

Aq--adam_beta Aq

Decay factor for repeating deltas. (type:double default:0.999)

Aq--stop_training Aq

Just convert the training model to a runtime model. (type:bool default:false)

Aq--convert_to_int Aq

Convert the recognition model to an integer model. (type:bool default:false)

Aq--sequential_training Aq

Use the training files sequentially instead of round-robin. (type:bool default:false)

Aq--debug_network Aq

Get info on distribution of weight values (type:bool default:false)

Aq--randomly_rotate Aq

Train OSD and randomly turn training samples upside-down (type:bool default:false)

Aq--net_spec Aq

Network specification (type:string default:)

Aq--continue_from Aq

Existing model to extend (type:string default:)

Aq--model_output Aq

Basename for output models (type:string default:lstmtrain)

Aq--train_listfile Aq

File listing training files in lstmf training format. (type:string default:)

Aq--eval_listfile Aq

File listing eval files in lstmf training format. (type:string default:)

Aq--traineddata Aq

Starter traineddata with combined Dawgs/Unicharset/Recoder for language model (type:string default:)

Aq--old_traineddata Aq

When changing the character set, this specifies the traineddata with the old character set that is to be replaced (type:string default:)

History

lstmtraining(1) was first made available for tesseract4.00.00alpha.

Resources

Main web site: m[blue]https://github.com/tesseract-ocrm[] Information on training tesseract LSTM: m[blue]https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.htmlm[]

See Also

tesseract(1)

Copying

Copyright (C) 2012 Google, Inc. Licensed under the Apache License, Version 2.0

Author

The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).


Table of Contents