shapeclustering.1.asc 1.6 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
  1. SHAPECLUSTERING(1)
  2. ==================
  3. :doctype: manpage
  4. NAME
  5. ----
  6. shapeclustering - shape clustering training for Tesseract
  7. SYNOPSIS
  8. --------
  9. shapeclustering -D 'output_dir'
  10. -U 'unicharset' -O 'mfunicharset'
  11. -F 'font_props' -X 'xheights'
  12. 'FILE'...
  13. DESCRIPTION
  14. -----------
  15. shapeclustering(1) takes extracted feature .tr files (generated by
  16. tesseract(1) run in a special mode from box files) and produces a
  17. file *shapetable* and an enhanced unicharset. This program is still
  18. experimental, and is not required (yet) for training Tesseract.
  19. OPTIONS
  20. -------
  21. -U 'FILE'::
  22. The unicharset generated by unicharset_extractor(1).
  23. -D 'dir'::
  24. Directory to write output files to.
  25. -F 'font_properties_file'::
  26. (Input) font properties file, where each line is of the following form, where each field other than the font name is 0 or 1:
  27. 'font_name' 'italic' 'bold' 'fixed_pitch' 'serif' 'fraktur'
  28. -X 'xheights_file'::
  29. (Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at 32pt on 300 dpi. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ]
  30. 'font_name' 'xheight'
  31. -O 'FILE'::
  32. The output unicharset that will be given to combine_tessdata(1).
  33. SEE ALSO
  34. --------
  35. tesseract(1), cntraining(1), unicharset_extractor(1), combine_tessdata(1),
  36. unicharset(5)
  37. <https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html>
  38. COPYING
  39. -------
  40. Copyright \(C) Google, 2011
  41. Licensed under the Apache License, Version 2.0
  42. AUTHOR
  43. ------
  44. The Tesseract OCR engine was written by Ray Smith and his research groups
  45. at Hewlett Packard (1985-1995) and Google (2006-2018).