Releases: scribeocr/scribe.js
Releases · scribeocr/scribe.js
v0.8.0
What's Changed
- Added
scribeCLI command- If
scribe.jsis installed globally (npm i -g scribe.js-ocr), thescribecommand can be used to process documents from the command line.- For example,
scribe recognize analyst_report.pngruns OCR on an image and saves the result as a PDF.
- For example,
- This feature is still experimental and command/argument names and features may change without warning.
- If
- Added new intermediate data format
.scribefor storing and loading document data.- Given OCR is computationally expensive, it is often desirable to save results for later use without losing data.
- By saving results to
.scribefiles, results can be re-loaded later (e.g. to export with slightly different settings).- While several other output formats can be re-loaded later (notably
.hocrand.pdf), only.scribecan be re-loaded without any data being lost in the export/import process. .scribefiles only contain the text layer; they do not contain embedded images or PDF files..scribefiles can be loaded alongside image/PDF files to restore both image and text data.
- While several other output formats can be re-loaded later (notably
Full Changelog: v0.7.4...v0.8.0
v0.7.4
What's Changed
- Fixed bug causing crash for certain PDF input documents.
- Added support for bold + italic style (previously only bold or italic style)
- Added support for underline style.
- Underlined text is currently detected automatically when importing a text-native PDF or Abbyy XML file.
- Disabled ligatures by default.
- To re-enable, set
scribe.opt.ligaturestotrue.
- To re-enable, set
Full Changelog: v0.7.3...v0.7.4
v0.7.3
v0.7.2
What's Changed
- Added HTML output format (browser only).
- This implementation is still preliminary; the implementation may change substantially in future versions.
- Standardized fonts and font names
Full Changelog: v0.7.1...v0.7.2
v0.7.1
v0.7.0
What's Changed
- Major rework of PDF export implementation.
- Writing to PDF is faster and uses less memory.
- Documents that used to crash due to memory errors now run almost instantly.
- For many inputs, output PDF file sizes are now much smaller.
- Writing to PDF is faster and uses less memory.
- Fixed memory leaks within OCR module.
- Misc bug fixes.
Full Changelog: v0.6.1...v0.7.0
v0.6.1
v0.5.1
v0.5.0
What's Changed
- Added
configargument torecognize, which allows for passing arguments to Tesseract.js (#22) - Added support for parsing PDF text at various orientations (90/180/270 degrees).
- Minor improvements to OCR quality.
- Various improvements to imports of HOCR and native PDF text.
- Added
saveAsutility function for saving files. - Added
opt.kerningoption that can be used to enable or disable kerening.
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
- Implemented parallel processing by default for Node.js version
- To restore the previous behavior (1 worker), set
scribe.opt.workerN = 1before calling any functions.
- To restore the previous behavior (1 worker), set
- Non-default behavior for extracting text from PDF files is now handled by setting the properties of
scribe.opt.usePDFText. - Added Nimbus Mono font (similar to Courier)
- Improvements to text extraction from PDF files.
- Improvements to text positioning.
Full Changelog: v0.3.1...v0.4.1
Note: This post combines changes for 0.4.0 and 0.4.1 since the former was only the most recent version for a few hours.