You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2022/08/14 04:08:49 UTC
[Bug 8026] New: t/extracttext.t tesseract test fails on some installations
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8026
Bug ID: 8026
Summary: t/extracttext.t tesseract test fails on some
installations
Product: Spamassassin
Version: 4.0.0
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P2
Component: Regression Tests
Assignee: dev@spamassassin.apache.org
Reporter: sidney@sidney.com
Target Milestone: Undefined
On my copy of FreeBSD 13.1-RELEASE installed on a VirtualBox VM with tesseract
5.1.0 installed from FreeBSD's pkg repository, test t/extracttext.t
consistently fails because tesseract reads the "XJ" characters in the test jpg
file as "X]J".
Recreating the test file using a font that is more tesseract-friendly seems to
help. Since the test is not intended to test the limits of tesseract's OCR
capabilities, this seems like a proper fix. I've redone the test data using Tex
Gyre Bonum font as per the results in https://superuser.com/a/1543382
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 8026] t/extracttext.t tesseract test fails on some installations
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8026
Sidney Markowitz <si...@sidney.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #1 from Sidney Markowitz <si...@sidney.com> ---
It pointed out in another comment in the superuser article linked to in the
previous comment, the fint used seems to be less important than font size.
After initial experiments worked on freebsd but failed in differtent ways on
macOS, I found settings that succeed using hte the available versions of
tesseract on all platforms I tried.
These tests revealed a bug when tesseract is installed in a directory that has
a space in the pathname, but that is a more minor issue. See bug 8027
trunk % svn ci -m "bug 8026 - Update extracttest.t with test data that works
with more versions of tesseract"
Sending MANIFEST
Deleting t/data/spam/extracttext/gtube_jpg.eml
Adding t/data/spam/extracttext/gtube_png.eml
Sending t/extracttext.t
Transmitting file data ...done
Committing transaction...
Committed revision 1903411.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 8026] t/extracttext.t tesseract test fails on some installations
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8026
Sidney Markowitz <si...@sidney.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sidney@sidney.com
Target Milestone|Undefined |4.0.0
--
You are receiving this mail because:
You are the assignee for the bug.