Skip to content

Releases: scribeocr/scribe.js

v0.8.0

09 Mar 09:39
Compare
Choose a tag to compare

What's Changed

  • Added scribe CLI command
    • If scribe.js is installed globally (npm i -g scribe.js-ocr), the scribe command can be used to process documents from the command line.
      • For example, scribe recognize analyst_report.png runs OCR on an image and saves the result as a PDF.
    • This feature is still experimental and command/argument names and features may change without warning.
  • Added new intermediate data format .scribe for storing and loading document data.
    • Given OCR is computationally expensive, it is often desirable to save results for later use without losing data.
    • By saving results to .scribe files, results can be re-loaded later (e.g. to export with slightly different settings).
      • While several other output formats can be re-loaded later (notably .hocr and .pdf), only .scribe can be re-loaded without any data being lost in the export/import process.
      • .scribe files only contain the text layer; they do not contain embedded images or PDF files.
        • .scribe files can be loaded alongside image/PDF files to restore both image and text data.

Full Changelog: v0.7.4...v0.8.0

v0.7.4

03 Mar 08:08
Compare
Choose a tag to compare

What's Changed

  • Fixed bug causing crash for certain PDF input documents.
  • Added support for bold + italic style (previously only bold or italic style)
  • Added support for underline style.
    • Underlined text is currently detected automatically when importing a text-native PDF or Abbyy XML file.
  • Disabled ligatures by default.
    • To re-enable, set scribe.opt.ligatures to true.

Full Changelog: v0.7.3...v0.7.4

v0.7.3

03 Mar 08:02
Compare
Choose a tag to compare

What's Changed

  • Updated HTML export to support Node.js

Full Changelog: v0.7.2...v0.7.3

v0.7.2

20 Feb 04:25
Compare
Choose a tag to compare

What's Changed

  • Added HTML output format (browser only).
    • This implementation is still preliminary; the implementation may change substantially in future versions.
  • Standardized fonts and font names

Full Changelog: v0.7.1...v0.7.2

v0.7.1

09 Feb 19:46
Compare
Choose a tag to compare

What's Changed

  • Standardized fonts and font names

Full Changelog: v0.7.0...v0.7.1

v0.7.0

07 Jan 08:38
Compare
Choose a tag to compare

What's Changed

  • Major rework of PDF export implementation.
    • Writing to PDF is faster and uses less memory.
      • Documents that used to crash due to memory errors now run almost instantly.
    • For many inputs, output PDF file sizes are now much smaller.
  • Fixed memory leaks within OCR module.
  • Misc bug fixes.

Full Changelog: v0.6.1...v0.7.0

v0.6.1

17 Dec 05:25
Compare
Choose a tag to compare

What's Changed

  • Fixed Node.js support on Windows (#9)
  • Fixed platform-related installation issues (#27, #29)
  • Increased use of workers in Node.js version, enabling much better performance using a single process.

Full Changelog: v0.5.1...v0.6.1

v0.5.1

10 Dec 09:30
Compare
Choose a tag to compare

What's Changed

  • Fixed bug causing crashes when recognizing certain PDFs using Node.js (#26)
  • Minor updates

Full Changelog: v0.5.0...v0.5.1

v0.5.0

25 Nov 09:08
Compare
Choose a tag to compare

What's Changed

  • Added config argument to recognize, which allows for passing arguments to Tesseract.js (#22)
  • Added support for parsing PDF text at various orientations (90/180/270 degrees).
  • Minor improvements to OCR quality.
  • Various improvements to imports of HOCR and native PDF text.
  • Added saveAs utility function for saving files.
  • Added opt.kerning option that can be used to enable or disable kerening.

Full Changelog: v0.4.1...v0.5.0

v0.4.1

10 Nov 19:24
Compare
Choose a tag to compare

What's Changed

  • Implemented parallel processing by default for Node.js version
    • To restore the previous behavior (1 worker), set scribe.opt.workerN = 1 before calling any functions.
  • Non-default behavior for extracting text from PDF files is now handled by setting the properties of scribe.opt.usePDFText.
  • Added Nimbus Mono font (similar to Courier)
  • Improvements to text extraction from PDF files.
  • Improvements to text positioning.

Full Changelog: v0.3.1...v0.4.1

Note: This post combines changes for 0.4.0 and 0.4.1 since the former was only the most recent version for a few hours.