You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2017/11/01 12:48:44 UTC

RE: Running tika-eval on the Rackspace vm

Sorry. Fixed.

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Tuesday, October 31, 2017 6:08 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

Am 31.10.2017 um 20:53 schrieb Allison, Timothy B.:
>> It's not possible to rename / remove the files / directories mentioned in part 1 due to not having the permissions.
> Gah.  Sorry.  Tilman, I added you to "collab" and chgrp to collab on /work /data2/docs /data3/batch_runs and /data4/batch_runs.

But the directories themselves don't have "w" rights for group so I can't profit from my membership... (unless I missed something, I haven't done much *nix since the 90ies) For example I can't rename /work/batch-apps/tika_working/logs to /work/batch-apps/tika_working/___logs .

Tilman


>
>> The directory is named batch-apps, not batch_apps.
> Fixed.  Thank you.
>
>> Re the "A" version - is this the "good" version, so I could simply  download tika-app and put it there? Or just build tika with a specific  PDFBox version?
> If the current version of tika-app has the right version of PDFBox for your "before" examples, then y, you can just download tika-app.jar.  We release less frequently than PDFBox, so it's possible that you'll want to build from scratch with the most recent previous release of PDFBox.
>
> In my mind, A is the "before/baseline" version and B is the 
> SNAPSHOT/RC version.  So, hopefully, B is the "good" one. 😊
>
> Let me know what other problems you encounter.
>
> Cheers,
>
>               Tim
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org



Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
There's definitively some problem with creating a temp file... I 
inserted this line in dumpXLSX

TempFile.createTempFile("tilman", "txt");

and got an exception.

I also added " -Djava.io.tmpdir=/tmp" to the call but this didn't help.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 07.11.2017 um 16:21 schrieb Allison, Timothy B.:
> Great!  Thank you, Tilman!
>
> I updated the wiki based on your feedback.  Let me know if I should add anything else while the experience is fresh.

Please change "Run the PDFParser tests..." into "Build tika-parsers 
separately to make sure that this version is added to the repository and 
will be used by the tika-app build. Run the PDFParser tests...."

This is because building tika-app does not trigger a rebuild of 
tika-parsers.

Tilman


>
> Best,
>
>           Tim
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Monday, November 6, 2017 3:00 PM
> To: dev@pdfbox.apache.org
> Subject: Re: Running tika-eval on the Rackspace vm
>
> I think I was successful, the report now makes sense, as if Tim had created it himself :-) The two issues I just created are related to a comparison between 2.0.8 and 2.0.4.
>
> So for that next board report, we can now (additional to the existing
> text) tell that there is now a second committer who can run the tests.
>
> Tilman
>
> Am 05.11.2017 um 22:06 schrieb Tilman Hausherr:
>> I've come closer to find out what's happening. I found out that
>> tika-app was running with PDFBox 2.0.7 all the time regardless of what
>> pdfbox version is in the pom.xml.
>>
>> Apparently, building tika-app uses tika-parsers from the repository
>> (instead building tika-parsers it again), which needs 2.0.7.
>> Explicitely building tika-parsers before building tika-app helps.
>>
>> This is new to me, in PDFBox  if one builds the app all dependencies
>> are built as well.
>>
>> Tilman
>>
>> Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
>>> So it's done:
>>> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>>>
>>> I wonder why the differences are so few, especially in meta where I
>>> KNOW that there are differences, due to the handling of empty strings
>>> with BOM. Maybe it is because I skipped the "A" phase and used
>>> existing data from a 2.0.4 run that I found, or because I use a
>>> current tika trunk and not the existing binary that was on the server.
>>>
>>> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk
>>> and then compare with the "B" I did.
>>>
>>> Tilman
>>>
>>>
>>> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>>>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>>>> I'm not sure what you mean by...sorry
>>>>>> - "H" is missing, which is identical to "C"
>>>>
>>>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>>>
>>>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing.
>>>> Of course it is obvious that it has to be done, but I am a
>>>> perfectionist. I'd like to have this documentation for the "me" in a
>>>> few months when I have forgotten what I did the last days. Or for
>>>> the next person.
>>>>
>>>> Thanks for the fixes you did. I wonder why writing to /tmp didn't
>>>> work - it did work from the command line. I've started the command
>>>> again, I'm not sure when I will report about it. I'm a bit exhausted
>>>> from non-software activities :-(
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: Running tika-eval on the Rackspace vm

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Great!  Thank you, Tilman!

I updated the wiki based on your feedback.  Let me know if I should add anything else while the experience is fresh.

Best,

         Tim

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Monday, November 6, 2017 3:00 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

I think I was successful, the report now makes sense, as if Tim had created it himself :-) The two issues I just created are related to a comparison between 2.0.8 and 2.0.4.

So for that next board report, we can now (additional to the existing
text) tell that there is now a second committer who can run the tests.

Tilman

Am 05.11.2017 um 22:06 schrieb Tilman Hausherr:
> I've come closer to find out what's happening. I found out that 
> tika-app was running with PDFBox 2.0.7 all the time regardless of what 
> pdfbox version is in the pom.xml.
>
> Apparently, building tika-app uses tika-parsers from the repository 
> (instead building tika-parsers it again), which needs 2.0.7.
> Explicitely building tika-parsers before building tika-app helps.
>
> This is new to me, in PDFBox  if one builds the app all dependencies 
> are built as well.
>
> Tilman
>
> Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
>> So it's done:
>> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>>
>> I wonder why the differences are so few, especially in meta where I 
>> KNOW that there are differences, due to the handling of empty strings 
>> with BOM. Maybe it is because I skipped the "A" phase and used 
>> existing data from a 2.0.4 run that I found, or because I use a 
>> current tika trunk and not the existing binary that was on the server.
>>
>> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
>> and then compare with the "B" I did.
>>
>> Tilman
>>
>>
>> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>>> I'm not sure what you mean by...sorry
>>>>> - "H" is missing, which is identical to "C"
>>>
>>>
>>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>>
>>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. 
>>> Of course it is obvious that it has to be done, but I am a 
>>> perfectionist. I'd like to have this documentation for the "me" in a 
>>> few months when I have forgotten what I did the last days. Or for 
>>> the next person.
>>>
>>> Thanks for the fixes you did. I wonder why writing to /tmp didn't 
>>> work - it did work from the command line. I've started the command 
>>> again, I'm not sure when I will report about it. I'm a bit exhausted 
>>> from non-software activities :-(
>>>
>>> Tilman
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
I think I was successful, the report now makes sense, as if Tim had 
created it himself :-) The two issues I just created are related to a 
comparison between 2.0.8 and 2.0.4.

So for that next board report, we can now (additional to the existing 
text) tell that there is now a second committer who can run the tests.

Tilman

Am 05.11.2017 um 22:06 schrieb Tilman Hausherr:
> I've come closer to find out what's happening. I found out that 
> tika-app was running with PDFBox 2.0.7 all the time regardless of what 
> pdfbox version is in the pom.xml.
>
> Apparently, building tika-app uses tika-parsers from the repository 
> (instead building tika-parsers it again), which needs 2.0.7. 
> Explicitely building tika-parsers before building tika-app helps.
>
> This is new to me, in PDFBox  if one builds the app all dependencies 
> are built as well.
>
> Tilman
>
> Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
>> So it's done:
>> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>>
>> I wonder why the differences are so few, especially in meta where I 
>> KNOW that there are differences, due to the handling of empty strings 
>> with BOM. Maybe it is because I skipped the "A" phase and used 
>> existing data from a 2.0.4 run that I found, or because I use a 
>> current tika trunk and not the existing binary that was on the server.
>>
>> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
>> and then compare with the "B" I did.
>>
>> Tilman
>>
>>
>> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>>> I'm not sure what you mean by...sorry
>>>>> - "H" is missing, which is identical to "C"
>>>
>>>
>>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>>
>>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. 
>>> Of course it is obvious that it has to be done, but I am a 
>>> perfectionist. I'd like to have this documentation for the "me" in a 
>>> few months when I have forgotten what I did the last days. Or for 
>>> the next person.
>>>
>>> Thanks for the fixes you did. I wonder why writing to /tmp didn't 
>>> work - it did work from the command line. I've started the command 
>>> again, I'm not sure when I will report about it. I'm a bit exhausted 
>>> from non-software activities :-(
>>>
>>> Tilman
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
I've come closer to find out what's happening. I found out that tika-app 
was running with PDFBox 2.0.7 all the time regardless of what pdfbox 
version is in the pom.xml.

Apparently, building tika-app uses tika-parsers from the repository 
(instead building tika-parsers it again), which needs 2.0.7. Explicitely 
building tika-parsers before building tika-app helps.

This is new to me, in PDFBox  if one builds the app all dependencies are 
built as well.

Tilman

Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
> So it's done:
> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>
> I wonder why the differences are so few, especially in meta where I 
> KNOW that there are differences, due to the handling of empty strings 
> with BOM. Maybe it is because I skipped the "A" phase and used 
> existing data from a 2.0.4 run that I found, or because I use a 
> current tika trunk and not the existing binary that was on the server.
>
> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
> and then compare with the "B" I did.
>
> Tilman
>
>
> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>> I'm not sure what you mean by...sorry
>>>> - "H" is missing, which is identical to "C"
>>
>>
>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>
>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. Of 
>> course it is obvious that it has to be done, but I am a 
>> perfectionist. I'd like to have this documentation for the "me" in a 
>> few months when I have forgotten what I did the last days. Or for the 
>> next person.
>>
>> Thanks for the fixes you did. I wonder why writing to /tmp didn't 
>> work - it did work from the command line. I've started the command 
>> again, I'm not sure when I will report about it. I'm a bit exhausted 
>> from non-software activities :-(
>>
>> Tilman
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
So it's done:
/work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017

I wonder why the differences are so few, especially in meta where I KNOW 
that there are differences, due to the handling of empty strings with 
BOM. Maybe it is because I skipped the "A" phase and used existing data 
from a 2.0.4 run that I found, or because I use a current tika trunk and 
not the existing binary that was on the server.

I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
and then compare with the "B" I did.

Tilman


Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>> I'm not sure what you mean by...sorry
>>> - "H" is missing, which is identical to "C"
>
>
> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>
> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. Of 
> course it is obvious that it has to be done, but I am a perfectionist. 
> I'd like to have this documentation for the "me" in a few months when 
> I have forgotten what I did the last days. Or for the next person.
>
> Thanks for the fixes you did. I wonder why writing to /tmp didn't work 
> - it did work from the command line. I've started the command again, 
> I'm not sure when I will report about it. I'm a bit exhausted from 
> non-software activities :-(
>
> Tilman
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
> I'm not sure what you mean by...sorry
>> - "H" is missing, which is identical to "C"


I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM

In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. Of 
course it is obvious that it has to be done, but I am a perfectionist. 
I'd like to have this documentation for the "me" in a few months when I 
have forgotten what I did the last days. Or for the next person.

Thanks for the fixes you did. I wonder why writing to /tmp didn't work - 
it did work from the command line. I've started the command again, I'm 
not sure when I will report about it. I'm a bit exhausted from 
non-software activities :-(

Tilman



RE: Running tika-eval on the Rackspace vm

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Tilman,
  Thank you for the toe-stubbing.  I'm sorry that it wasn't easier...

I created a new user with collab permissions and ran through the process.

You are right about the privileges on the tmp directory... POI needs a tmp directory to write xlsx.  I created a tmp directory in /work/eval and added a direction to set tmp dir via -Djava.io.tmpdir=tmp

I'm not sure what you mean by...sorry
>- "H" is missing, which is identical to "C"

I updated the permissions on appBatchExecutor.sh

I also added a recommendation to umask g+rw before starting. 

Let me know if I need to fix anything else or if I missed something you've already identified but I missed. ☹

Thank you, again.

Best,

        Tim

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Thursday, November 2, 2017 5:47 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

I'm almost done... then I got this when doing the last step:


[tilman@cloud-server-02 eval]$ java -jar tika-eval-1.17-SNAPSHOT.jar Report -db pdfboxAvsB
0    [main] INFO  org.apache.tika.eval.reports.Report  - Writing report: 
All Mimes In A to mimes/all_mimes_A.xlsx Exception in thread "main" java.io.IOException: Permission denied
         at java.io.UnixFileSystem.createFileExclusively(Native Method)
         at java.io.File.createTempFile(File.java:2024)
         at
org.apache.poi.util.DefaultTempFileCreationStrategy.createTempFile(DefaultTempFileCreationStrategy.java:110)
         at org.apache.poi.util.TempFile.createTempFile(TempFile.java:66)
         at
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:924)
         at org.apache.tika.eval.reports.Report.dumpXLSX(Report.java:85)
         at org.apache.tika.eval.reports.Report.writeReport(Report.java:64)
         at
org.apache.tika.eval.reports.ResultsReporter.execute(ResultsReporter.java:305)
         at
org.apache.tika.eval.reports.ResultsReporter.main(ResultsReporter.java:266)
         at
org.apache.tika.eval.TikaEvalCLI.handleReport(TikaEvalCLI.java:264)
         at org.apache.tika.eval.TikaEvalCLI.execute(TikaEvalCLI.java:52)
         at org.apache.tika.eval.TikaEvalCLI.main(TikaEvalCLI.java:273)


I changed the source, and now I got the path, it is /work/eval/reports/mimes/all_mimes_A.xlsx . The file exists and it is empty.

I tried with a 1.16 version and the same happened.

Then I thought, maybe the file with the permission problem isn't the target at all; could this be some temp file / temp directory where I don't have permission?

smaller improvements for the documentation:

- appBatchExecutor.sh should have 775 permission or the documentation should have "nohup sh ./appBatchExecutor.sh &"

- "H" is missing, which is identical to "C"

- mention that "pdfboxAvsB" db files are to be removed before starting? 
I had accidentally aborted a run and couldn't restart.


Tilman

memo for me:


java -jar tika-eval-1.17-SNAPSHOT.jar Compare -extractsA
/data4/batch_runs/pdfbox_2_0_4 -extractsB
/data4/batch_runs/pdfbox_2_0_9-SNAPSHOT1 -db pdfboxAvsB

java -jar tika-eval-1.17-SNAPSHOT.jar Report -db pdfboxAvsB


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
I'm almost done... then I got this when doing the last step:


[tilman@cloud-server-02 eval]$ java -jar tika-eval-1.17-SNAPSHOT.jar 
Report -db pdfboxAvsB
0    [main] INFO  org.apache.tika.eval.reports.Report  - Writing report: 
All Mimes In A to mimes/all_mimes_A.xlsx
Exception in thread "main" java.io.IOException: Permission denied
         at java.io.UnixFileSystem.createFileExclusively(Native Method)
         at java.io.File.createTempFile(File.java:2024)
         at 
org.apache.poi.util.DefaultTempFileCreationStrategy.createTempFile(DefaultTempFileCreationStrategy.java:110)
         at org.apache.poi.util.TempFile.createTempFile(TempFile.java:66)
         at 
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:924)
         at org.apache.tika.eval.reports.Report.dumpXLSX(Report.java:85)
         at org.apache.tika.eval.reports.Report.writeReport(Report.java:64)
         at 
org.apache.tika.eval.reports.ResultsReporter.execute(ResultsReporter.java:305)
         at 
org.apache.tika.eval.reports.ResultsReporter.main(ResultsReporter.java:266)
         at 
org.apache.tika.eval.TikaEvalCLI.handleReport(TikaEvalCLI.java:264)
         at org.apache.tika.eval.TikaEvalCLI.execute(TikaEvalCLI.java:52)
         at org.apache.tika.eval.TikaEvalCLI.main(TikaEvalCLI.java:273)


I changed the source, and now I got the path, it is 
/work/eval/reports/mimes/all_mimes_A.xlsx . The file exists and it is empty.

I tried with a 1.16 version and the same happened.

Then I thought, maybe the file with the permission problem isn't the 
target at all; could this be some temp file / temp directory where I 
don't have permission?

smaller improvements for the documentation:

- appBatchExecutor.sh should have 775 permission or the documentation 
should have "nohup sh ./appBatchExecutor.sh &"

- "H" is missing, which is identical to "C"

- mention that "pdfboxAvsB" db files are to be removed before starting? 
I had accidentally aborted a run and couldn't restart.


Tilman

memo for me:


java -jar tika-eval-1.17-SNAPSHOT.jar Compare -extractsA 
/data4/batch_runs/pdfbox_2_0_4 -extractsB 
/data4/batch_runs/pdfbox_2_0_9-SNAPSHOT1 -db pdfboxAvsB

java -jar tika-eval-1.17-SNAPSHOT.jar Report -db pdfboxAvsB