You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by GitBox <gi...@apache.org> on 2020/10/27 10:41:55 UTC

[GitHub] [pdfbox] SchwingSK commented on pull request #89: PDFBOX-5002: fix word detection in PDFTextStripper

SchwingSK commented on pull request #89:
URL: https://github.com/apache/pdfbox/pull/89#issuecomment-717151484


   After testing with 14646 PDFs, I reduced the five-space rule down to only one, as it gives even better results, and does not break more TestTextStripper tests.
   5 spaces: 965 pages with at least one space fixed out of 14841 pages
   1 space: 1083 pages with at least one space fixed out of 14841 pages
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org