Releases: scribeocr/scribe.js
Releases · scribeocr/scribe.js
v0.8.0
What's Changed
- Added
scribe
CLI command- If
scribe.js
is installed globally (npm i -g scribe.js-ocr
), thescribe
command can be used to process documents from the command line.- For example,
scribe recognize analyst_report.png
runs OCR on an image and saves the result as a PDF.
- For example,
- This feature is still experimental and command/argument names and features may change without warning.
- If
- Added new intermediate data format
.scribe
for storing and loading document data.- Given OCR is computationally expensive, it is often desirable to save results for later use without losing data.
- By saving results to
.scribe
files, results can be re-loaded later (e.g. to export with slightly different settings).- While several other output formats can be re-loaded later (notably
.hocr
and.pdf
), only.scribe
can be re-loaded without any data being lost in the export/import process. .scribe
files only contain the text layer; they do not contain embedded images or PDF files..scribe
files can be loaded alongside image/PDF files to restore both image and text data.
- While several other output formats can be re-loaded later (notably
Full Changelog: v0.7.4...v0.8.0
v0.7.4
What's Changed
- Fixed bug causing crash for certain PDF input documents.
- Added support for bold + italic style (previously only bold or italic style)
- Added support for underline style.
- Underlined text is currently detected automatically when importing a text-native PDF or Abbyy XML file.
- Disabled ligatures by default.
- To re-enable, set
scribe.opt.ligatures
totrue
.
- To re-enable, set
Full Changelog: v0.7.3...v0.7.4
v0.7.3
v0.7.2
What's Changed
- Added HTML output format (browser only).
- This implementation is still preliminary; the implementation may change substantially in future versions.
- Standardized fonts and font names
Full Changelog: v0.7.1...v0.7.2
v0.7.1
v0.7.0
What's Changed
- Major rework of PDF export implementation.
- Writing to PDF is faster and uses less memory.
- Documents that used to crash due to memory errors now run almost instantly.
- For many inputs, output PDF file sizes are now much smaller.
- Writing to PDF is faster and uses less memory.
- Fixed memory leaks within OCR module.
- Misc bug fixes.
Full Changelog: v0.6.1...v0.7.0
v0.6.1
v0.5.1
v0.5.0
What's Changed
- Added
config
argument torecognize
, which allows for passing arguments to Tesseract.js (#22) - Added support for parsing PDF text at various orientations (90/180/270 degrees).
- Minor improvements to OCR quality.
- Various improvements to imports of HOCR and native PDF text.
- Added
saveAs
utility function for saving files. - Added
opt.kerning
option that can be used to enable or disable kerening.
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
- Implemented parallel processing by default for Node.js version
- To restore the previous behavior (1 worker), set
scribe.opt.workerN = 1
before calling any functions.
- To restore the previous behavior (1 worker), set
- Non-default behavior for extracting text from PDF files is now handled by setting the properties of
scribe.opt.usePDFText
. - Added Nimbus Mono font (similar to Courier)
- Improvements to text extraction from PDF files.
- Improvements to text positioning.
Full Changelog: v0.3.1...v0.4.1
Note: This post combines changes for 0.4.0
and 0.4.1
since the former was only the most recent version for a few hours.