You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tim Allison <ta...@apache.org> on 2022/06/06 18:44:52 UTC

Text extraction performance

All,
  Martin Thoma of pypdf2 has set up some comparison tests on text
extraction: https://github.com/py-pdf/benchmarks

     Cheers,

             Tim

Re: Text extraction performance

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 06.06.2022 um 20:44 schrieb Tim Allison:
> All,
>    Martin Thoma of pypdf2 has set up some comparison tests on text
> extraction: https://github.com/py-pdf/benchmarks


https://github.com/py-pdf/benchmarks/blob/main/read/results/tika/2201.00069.txt

Maybe it's because of the vertical text, he didn't use the detectAngles 
option.

Tilman