You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/02/10 18:45:41 UTC

[jira] [Commented] (TIKA-1332) Create "eval" code

    [ https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861665#comment-15861665 ] 

Tim Allison commented on TIKA-1332:
-----------------------------------

Some more work is required, but I think tika-eval is getting close to being ready to commit.  

If anyone has a chance to review, code is on my [github fork|https://github.com/tballison/tika/tree/TIKA-1302] and the beginnings of wiki documentation are now up on our [wiki|https://wiki.apache.org/tika/TikaEval].

Thank you!

> Create "eval" code
> ------------------
>
>                 Key: TIKA-1332
>                 URL: https://issues.apache.org/jira/browse/TIKA-1332
>             Project: Tika
>          Issue Type: Sub-task
>          Components: cli, general, server
>            Reporter: Tim Allison
>         Attachments: comparison_reports.xml
>
>
> For this issue, we can start with code to gather statistics on each run (# of exceptions per file type, most common exceptions per file type, number of metadata items, total text extracted, etc).  We should also be able to compare one run against another.  Going forward, there's plenty of room to improve.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)