You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2017/10/23 18:54:18 UTC

Running tika-eval on the Rackspace vm

All,

If anyone would like to join the fun in running tika-eval on the Rackspace vm, I posted this: https://wiki.apache.org/tika/TikaEvalOnVM .  You’ll need access to the vm, of course, but I’m happy to grant that to anyone who wants to chip in and help with regression tests.  There are some areas for improvements in the process and documentation. 😊

Cheers,

                Tim

P.S. For those who used the vm earlier and found it wonky, it was indeed wonky because I had failed to add a swap file.  With that change in place, the vm works quite well.




Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
There's definitively some problem with creating a temp file... I 
inserted this line in dumpXLSX

TempFile.createTempFile("tilman", "txt");

and got an exception.

I also added " -Djava.io.tmpdir=/tmp" to the call but this didn't help.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 07.11.2017 um 16:21 schrieb Allison, Timothy B.:
> Great!  Thank you, Tilman!
>
> I updated the wiki based on your feedback.  Let me know if I should add anything else while the experience is fresh.

Please change "Run the PDFParser tests..." into "Build tika-parsers 
separately to make sure that this version is added to the repository and 
will be used by the tika-app build. Run the PDFParser tests...."

This is because building tika-app does not trigger a rebuild of 
tika-parsers.

Tilman


>
> Best,
>
>           Tim
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Monday, November 6, 2017 3:00 PM
> To: dev@pdfbox.apache.org
> Subject: Re: Running tika-eval on the Rackspace vm
>
> I think I was successful, the report now makes sense, as if Tim had created it himself :-) The two issues I just created are related to a comparison between 2.0.8 and 2.0.4.
>
> So for that next board report, we can now (additional to the existing
> text) tell that there is now a second committer who can run the tests.
>
> Tilman
>
> Am 05.11.2017 um 22:06 schrieb Tilman Hausherr:
>> I've come closer to find out what's happening. I found out that
>> tika-app was running with PDFBox 2.0.7 all the time regardless of what
>> pdfbox version is in the pom.xml.
>>
>> Apparently, building tika-app uses tika-parsers from the repository
>> (instead building tika-parsers it again), which needs 2.0.7.
>> Explicitely building tika-parsers before building tika-app helps.
>>
>> This is new to me, in PDFBox  if one builds the app all dependencies
>> are built as well.
>>
>> Tilman
>>
>> Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
>>> So it's done:
>>> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>>>
>>> I wonder why the differences are so few, especially in meta where I
>>> KNOW that there are differences, due to the handling of empty strings
>>> with BOM. Maybe it is because I skipped the "A" phase and used
>>> existing data from a 2.0.4 run that I found, or because I use a
>>> current tika trunk and not the existing binary that was on the server.
>>>
>>> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk
>>> and then compare with the "B" I did.
>>>
>>> Tilman
>>>
>>>
>>> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>>>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>>>> I'm not sure what you mean by...sorry
>>>>>> - "H" is missing, which is identical to "C"
>>>>
>>>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>>>
>>>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing.
>>>> Of course it is obvious that it has to be done, but I am a
>>>> perfectionist. I'd like to have this documentation for the "me" in a
>>>> few months when I have forgotten what I did the last days. Or for
>>>> the next person.
>>>>
>>>> Thanks for the fixes you did. I wonder why writing to /tmp didn't
>>>> work - it did work from the command line. I've started the command
>>>> again, I'm not sure when I will report about it. I'm a bit exhausted
>>>> from non-software activities :-(
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: Running tika-eval on the Rackspace vm

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Great!  Thank you, Tilman!

I updated the wiki based on your feedback.  Let me know if I should add anything else while the experience is fresh.

Best,

         Tim

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Monday, November 6, 2017 3:00 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

I think I was successful, the report now makes sense, as if Tim had created it himself :-) The two issues I just created are related to a comparison between 2.0.8 and 2.0.4.

So for that next board report, we can now (additional to the existing
text) tell that there is now a second committer who can run the tests.

Tilman

Am 05.11.2017 um 22:06 schrieb Tilman Hausherr:
> I've come closer to find out what's happening. I found out that 
> tika-app was running with PDFBox 2.0.7 all the time regardless of what 
> pdfbox version is in the pom.xml.
>
> Apparently, building tika-app uses tika-parsers from the repository 
> (instead building tika-parsers it again), which needs 2.0.7.
> Explicitely building tika-parsers before building tika-app helps.
>
> This is new to me, in PDFBox  if one builds the app all dependencies 
> are built as well.
>
> Tilman
>
> Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
>> So it's done:
>> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>>
>> I wonder why the differences are so few, especially in meta where I 
>> KNOW that there are differences, due to the handling of empty strings 
>> with BOM. Maybe it is because I skipped the "A" phase and used 
>> existing data from a 2.0.4 run that I found, or because I use a 
>> current tika trunk and not the existing binary that was on the server.
>>
>> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
>> and then compare with the "B" I did.
>>
>> Tilman
>>
>>
>> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>>> I'm not sure what you mean by...sorry
>>>>> - "H" is missing, which is identical to "C"
>>>
>>>
>>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>>
>>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. 
>>> Of course it is obvious that it has to be done, but I am a 
>>> perfectionist. I'd like to have this documentation for the "me" in a 
>>> few months when I have forgotten what I did the last days. Or for 
>>> the next person.
>>>
>>> Thanks for the fixes you did. I wonder why writing to /tmp didn't 
>>> work - it did work from the command line. I've started the command 
>>> again, I'm not sure when I will report about it. I'm a bit exhausted 
>>> from non-software activities :-(
>>>
>>> Tilman
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
I think I was successful, the report now makes sense, as if Tim had 
created it himself :-) The two issues I just created are related to a 
comparison between 2.0.8 and 2.0.4.

So for that next board report, we can now (additional to the existing 
text) tell that there is now a second committer who can run the tests.

Tilman

Am 05.11.2017 um 22:06 schrieb Tilman Hausherr:
> I've come closer to find out what's happening. I found out that 
> tika-app was running with PDFBox 2.0.7 all the time regardless of what 
> pdfbox version is in the pom.xml.
>
> Apparently, building tika-app uses tika-parsers from the repository 
> (instead building tika-parsers it again), which needs 2.0.7. 
> Explicitely building tika-parsers before building tika-app helps.
>
> This is new to me, in PDFBox  if one builds the app all dependencies 
> are built as well.
>
> Tilman
>
> Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
>> So it's done:
>> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>>
>> I wonder why the differences are so few, especially in meta where I 
>> KNOW that there are differences, due to the handling of empty strings 
>> with BOM. Maybe it is because I skipped the "A" phase and used 
>> existing data from a 2.0.4 run that I found, or because I use a 
>> current tika trunk and not the existing binary that was on the server.
>>
>> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
>> and then compare with the "B" I did.
>>
>> Tilman
>>
>>
>> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>>> I'm not sure what you mean by...sorry
>>>>> - "H" is missing, which is identical to "C"
>>>
>>>
>>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>>
>>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. 
>>> Of course it is obvious that it has to be done, but I am a 
>>> perfectionist. I'd like to have this documentation for the "me" in a 
>>> few months when I have forgotten what I did the last days. Or for 
>>> the next person.
>>>
>>> Thanks for the fixes you did. I wonder why writing to /tmp didn't 
>>> work - it did work from the command line. I've started the command 
>>> again, I'm not sure when I will report about it. I'm a bit exhausted 
>>> from non-software activities :-(
>>>
>>> Tilman
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
I've come closer to find out what's happening. I found out that tika-app 
was running with PDFBox 2.0.7 all the time regardless of what pdfbox 
version is in the pom.xml.

Apparently, building tika-app uses tika-parsers from the repository 
(instead building tika-parsers it again), which needs 2.0.7. Explicitely 
building tika-parsers before building tika-app helps.

This is new to me, in PDFBox  if one builds the app all dependencies are 
built as well.

Tilman

Am 04.11.2017 um 14:48 schrieb Tilman Hausherr:
> So it's done:
> /work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017
>
> I wonder why the differences are so few, especially in meta where I 
> KNOW that there are differences, due to the handling of empty strings 
> with BOM. Maybe it is because I skipped the "A" phase and used 
> existing data from a 2.0.4 run that I found, or because I use a 
> current tika trunk and not the existing binary that was on the server.
>
> I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
> and then compare with the "B" I did.
>
> Tilman
>
>
> Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
>> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>>> I'm not sure what you mean by...sorry
>>>> - "H" is missing, which is identical to "C"
>>
>>
>> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>>
>> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. Of 
>> course it is obvious that it has to be done, but I am a 
>> perfectionist. I'd like to have this documentation for the "me" in a 
>> few months when I have forgotten what I did the last days. Or for the 
>> next person.
>>
>> Thanks for the fixes you did. I wonder why writing to /tmp didn't 
>> work - it did work from the command line. I've started the command 
>> again, I'm not sure when I will report about it. I'm a bit exhausted 
>> from non-software activities :-(
>>
>> Tilman
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
So it's done:
/work/eval/pdfbox_2_0_4_Vs_2_0_8-SNAPSHOT_reports_03112017

I wonder why the differences are so few, especially in meta where I KNOW 
that there are differences, due to the handling of empty strings with 
BOM. Maybe it is because I skipped the "A" phase and used existing data 
from a 2.0.4 run that I found, or because I use a current tika trunk and 
not the existing binary that was on the server.

I'm thinking of creating a new "A" with 2.0.4 with current tika trunk 
and then compare with the "B" I did.

Tilman


Am 03.11.2017 um 22:14 schrieb Tilman Hausherr:
> Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
>> I'm not sure what you mean by...sorry
>>> - "H" is missing, which is identical to "C"
>
>
> I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM
>
> In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. Of 
> course it is obvious that it has to be done, but I am a perfectionist. 
> I'd like to have this documentation for the "me" in a few months when 
> I have forgotten what I did the last days. Or for the next person.
>
> Thanks for the fixes you did. I wonder why writing to /tmp didn't work 
> - it did work from the command line. I've started the command again, 
> I'm not sure when I will report about it. I'm a bit exhausted from 
> non-software activities :-(
>
> Tilman
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 03.11.2017 um 21:38 schrieb Allison, Timothy B.:
> I'm not sure what you mean by...sorry
>> - "H" is missing, which is identical to "C"


I just meant the steps in https://wiki.apache.org/tika/TikaEvalOnVM

In segment 3, "execute: nohup ./appBatchExecutor.sh &" is missing. Of 
course it is obvious that it has to be done, but I am a perfectionist. 
I'd like to have this documentation for the "me" in a few months when I 
have forgotten what I did the last days. Or for the next person.

Thanks for the fixes you did. I wonder why writing to /tmp didn't work - 
it did work from the command line. I've started the command again, I'm 
not sure when I will report about it. I'm a bit exhausted from 
non-software activities :-(

Tilman



RE: Running tika-eval on the Rackspace vm

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Tilman,
  Thank you for the toe-stubbing.  I'm sorry that it wasn't easier...

I created a new user with collab permissions and ran through the process.

You are right about the privileges on the tmp directory... POI needs a tmp directory to write xlsx.  I created a tmp directory in /work/eval and added a direction to set tmp dir via -Djava.io.tmpdir=tmp

I'm not sure what you mean by...sorry
>- "H" is missing, which is identical to "C"

I updated the permissions on appBatchExecutor.sh

I also added a recommendation to umask g+rw before starting. 

Let me know if I need to fix anything else or if I missed something you've already identified but I missed. ☹

Thank you, again.

Best,

        Tim

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Thursday, November 2, 2017 5:47 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

I'm almost done... then I got this when doing the last step:


[tilman@cloud-server-02 eval]$ java -jar tika-eval-1.17-SNAPSHOT.jar Report -db pdfboxAvsB
0    [main] INFO  org.apache.tika.eval.reports.Report  - Writing report: 
All Mimes In A to mimes/all_mimes_A.xlsx Exception in thread "main" java.io.IOException: Permission denied
         at java.io.UnixFileSystem.createFileExclusively(Native Method)
         at java.io.File.createTempFile(File.java:2024)
         at
org.apache.poi.util.DefaultTempFileCreationStrategy.createTempFile(DefaultTempFileCreationStrategy.java:110)
         at org.apache.poi.util.TempFile.createTempFile(TempFile.java:66)
         at
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:924)
         at org.apache.tika.eval.reports.Report.dumpXLSX(Report.java:85)
         at org.apache.tika.eval.reports.Report.writeReport(Report.java:64)
         at
org.apache.tika.eval.reports.ResultsReporter.execute(ResultsReporter.java:305)
         at
org.apache.tika.eval.reports.ResultsReporter.main(ResultsReporter.java:266)
         at
org.apache.tika.eval.TikaEvalCLI.handleReport(TikaEvalCLI.java:264)
         at org.apache.tika.eval.TikaEvalCLI.execute(TikaEvalCLI.java:52)
         at org.apache.tika.eval.TikaEvalCLI.main(TikaEvalCLI.java:273)


I changed the source, and now I got the path, it is /work/eval/reports/mimes/all_mimes_A.xlsx . The file exists and it is empty.

I tried with a 1.16 version and the same happened.

Then I thought, maybe the file with the permission problem isn't the target at all; could this be some temp file / temp directory where I don't have permission?

smaller improvements for the documentation:

- appBatchExecutor.sh should have 775 permission or the documentation should have "nohup sh ./appBatchExecutor.sh &"

- "H" is missing, which is identical to "C"

- mention that "pdfboxAvsB" db files are to be removed before starting? 
I had accidentally aborted a run and couldn't restart.


Tilman

memo for me:


java -jar tika-eval-1.17-SNAPSHOT.jar Compare -extractsA
/data4/batch_runs/pdfbox_2_0_4 -extractsB
/data4/batch_runs/pdfbox_2_0_9-SNAPSHOT1 -db pdfboxAvsB

java -jar tika-eval-1.17-SNAPSHOT.jar Report -db pdfboxAvsB


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
I'm almost done... then I got this when doing the last step:


[tilman@cloud-server-02 eval]$ java -jar tika-eval-1.17-SNAPSHOT.jar 
Report -db pdfboxAvsB
0    [main] INFO  org.apache.tika.eval.reports.Report  - Writing report: 
All Mimes In A to mimes/all_mimes_A.xlsx
Exception in thread "main" java.io.IOException: Permission denied
         at java.io.UnixFileSystem.createFileExclusively(Native Method)
         at java.io.File.createTempFile(File.java:2024)
         at 
org.apache.poi.util.DefaultTempFileCreationStrategy.createTempFile(DefaultTempFileCreationStrategy.java:110)
         at org.apache.poi.util.TempFile.createTempFile(TempFile.java:66)
         at 
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:924)
         at org.apache.tika.eval.reports.Report.dumpXLSX(Report.java:85)
         at org.apache.tika.eval.reports.Report.writeReport(Report.java:64)
         at 
org.apache.tika.eval.reports.ResultsReporter.execute(ResultsReporter.java:305)
         at 
org.apache.tika.eval.reports.ResultsReporter.main(ResultsReporter.java:266)
         at 
org.apache.tika.eval.TikaEvalCLI.handleReport(TikaEvalCLI.java:264)
         at org.apache.tika.eval.TikaEvalCLI.execute(TikaEvalCLI.java:52)
         at org.apache.tika.eval.TikaEvalCLI.main(TikaEvalCLI.java:273)


I changed the source, and now I got the path, it is 
/work/eval/reports/mimes/all_mimes_A.xlsx . The file exists and it is empty.

I tried with a 1.16 version and the same happened.

Then I thought, maybe the file with the permission problem isn't the 
target at all; could this be some temp file / temp directory where I 
don't have permission?

smaller improvements for the documentation:

- appBatchExecutor.sh should have 775 permission or the documentation 
should have "nohup sh ./appBatchExecutor.sh &"

- "H" is missing, which is identical to "C"

- mention that "pdfboxAvsB" db files are to be removed before starting? 
I had accidentally aborted a run and couldn't restart.


Tilman

memo for me:


java -jar tika-eval-1.17-SNAPSHOT.jar Compare -extractsA 
/data4/batch_runs/pdfbox_2_0_4 -extractsB 
/data4/batch_runs/pdfbox_2_0_9-SNAPSHOT1 -db pdfboxAvsB

java -jar tika-eval-1.17-SNAPSHOT.jar Report -db pdfboxAvsB


RE: Running tika-eval on the Rackspace vm

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Sorry. Fixed.

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Tuesday, October 31, 2017 6:08 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

Am 31.10.2017 um 20:53 schrieb Allison, Timothy B.:
>> It's not possible to rename / remove the files / directories mentioned in part 1 due to not having the permissions.
> Gah.  Sorry.  Tilman, I added you to "collab" and chgrp to collab on /work /data2/docs /data3/batch_runs and /data4/batch_runs.

But the directories themselves don't have "w" rights for group so I can't profit from my membership... (unless I missed something, I haven't done much *nix since the 90ies) For example I can't rename /work/batch-apps/tika_working/logs to /work/batch-apps/tika_working/___logs .

Tilman


>
>> The directory is named batch-apps, not batch_apps.
> Fixed.  Thank you.
>
>> Re the "A" version - is this the "good" version, so I could simply  download tika-app and put it there? Or just build tika with a specific  PDFBox version?
> If the current version of tika-app has the right version of PDFBox for your "before" examples, then y, you can just download tika-app.jar.  We release less frequently than PDFBox, so it's possible that you'll want to build from scratch with the most recent previous release of PDFBox.
>
> In my mind, A is the "before/baseline" version and B is the 
> SNAPSHOT/RC version.  So, hopefully, B is the "good" one. 😊
>
> Let me know what other problems you encounter.
>
> Cheers,
>
>               Tim
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org



Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 31.10.2017 um 20:53 schrieb Allison, Timothy B.:
>> It's not possible to rename / remove the files / directories mentioned in part 1 due to not having the permissions.
> Gah.  Sorry.  Tilman, I added you to "collab" and chgrp to collab on /work /data2/docs /data3/batch_runs and /data4/batch_runs.

But the directories themselves don't have "w" rights for group so I 
can't profit from my membership... (unless I missed something, I haven't 
done much *nix since the 90ies) For example I can't rename 
/work/batch-apps/tika_working/logs to 
/work/batch-apps/tika_working/___logs .

Tilman


>
>> The directory is named batch-apps, not batch_apps.
> Fixed.  Thank you.
>
>> Re the "A" version - is this the "good" version, so I could simply  download tika-app and put it there? Or just build tika with a specific  PDFBox version?
> If the current version of tika-app has the right version of PDFBox for your "before" examples, then y, you can just download tika-app.jar.  We release less frequently than PDFBox, so it's possible that you'll want to build from scratch with the most recent previous release of PDFBox.
>
> In my mind, A is the "before/baseline" version and B is the SNAPSHOT/RC version.  So, hopefully, B is the "good" one. 😊
>
> Let me know what other problems you encounter.
>
> Cheers,
>
>               Tim
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: Running tika-eval on the Rackspace vm

Posted by "Allison, Timothy B." <ta...@mitre.org>.
> It's not possible to rename / remove the files / directories mentioned in part 1 due to not having the permissions.

Gah.  Sorry.  Tilman, I added you to "collab" and chgrp to collab on /work /data2/docs /data3/batch_runs and /data4/batch_runs.

> The directory is named batch-apps, not batch_apps.
Fixed.  Thank you.

> Re the "A" version - is this the "good" version, so I could simply  download tika-app and put it there? Or just build tika with a specific  PDFBox version?

If the current version of tika-app has the right version of PDFBox for your "before" examples, then y, you can just download tika-app.jar.  We release less frequently than PDFBox, so it's possible that you'll want to build from scratch with the most recent previous release of PDFBox.

In my mind, A is the "before/baseline" version and B is the SNAPSHOT/RC version.  So, hopefully, B is the "good" one. 😊

Let me know what other problems you encounter.

Cheers,

             Tim



RE: Running tika-eval on the Rackspace vm

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Will fix both.  Thank you!

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Monday, October 30, 2017 4:21 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

It's not possible to rename / remove the files / directories mentioned in part 1 due to not having the permissions.

Tilman

Am 30.10.2017 um 14:14 schrieb Tilman Hausherr:
> I almost had some time today, so I had a look at 
> https://wiki.apache.org/tika/TikaEvalOnVM
>
> The directory is named batch-apps, not batch_apps.
>
> Re the "A" version - is this the "good" version, so I could simply 
> download tika-app and put it there? Or just build tika with a specific 
> PDFBox version?
>
> Tilman
>
> Am 23.10.2017 um 20:54 schrieb Allison, Timothy B.:
>> All,
>>
>> If anyone would like to join the fun in running tika-eval on the 
>> Rackspace vm, I posted this:
>> https://wiki.apache.org/tika/TikaEvalOnVM .  You’ll need access to 
>> the vm, of course, but I’m happy to grant that to anyone who wants to 
>> chip in and help with regression tests.  There are some areas for 
>> improvements in the process and documentation. 😊
>>
>> Cheers,
>>
>>                  Tim
>>
>> P.S. For those who used the vm earlier and found it wonky, it was 
>> indeed wonky because I had failed to add a swap file.  With that 
>> change in place, the vm works quite well.
>>
>>
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
It's not possible to rename / remove the files / directories mentioned 
in part 1 due to not having the permissions.

Tilman

Am 30.10.2017 um 14:14 schrieb Tilman Hausherr:
> I almost had some time today, so I had a look at
> https://wiki.apache.org/tika/TikaEvalOnVM
>
> The directory is named batch-apps, not batch_apps.
>
> Re the "A" version - is this the "good" version, so I could simply 
> download tika-app and put it there? Or just build tika with a specific 
> PDFBox version?
>
> Tilman
>
> Am 23.10.2017 um 20:54 schrieb Allison, Timothy B.:
>> All,
>>
>> If anyone would like to join the fun in running tika-eval on the 
>> Rackspace vm, I posted this: 
>> https://wiki.apache.org/tika/TikaEvalOnVM .  You’ll need access to 
>> the vm, of course, but I’m happy to grant that to anyone who wants to 
>> chip in and help with regression tests.  There are some areas for 
>> improvements in the process and documentation. 😊
>>
>> Cheers,
>>
>>                  Tim
>>
>> P.S. For those who used the vm earlier and found it wonky, it was 
>> indeed wonky because I had failed to add a swap file.  With that 
>> change in place, the vm works quite well.
>>
>>
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Running tika-eval on the Rackspace vm

Posted by Tilman Hausherr <TH...@t-online.de>.
I almost had some time today, so I had a look at
https://wiki.apache.org/tika/TikaEvalOnVM

The directory is named batch-apps, not batch_apps.

Re the "A" version - is this the "good" version, so I could simply 
download tika-app and put it there? Or just build tika with a specific 
PDFBox version?

Tilman

Am 23.10.2017 um 20:54 schrieb Allison, Timothy B.:
> All,
>
> If anyone would like to join the fun in running tika-eval on the Rackspace vm, I posted this: https://wiki.apache.org/tika/TikaEvalOnVM .  You’ll need access to the vm, of course, but I’m happy to grant that to anyone who wants to chip in and help with regression tests.  There are some areas for improvements in the process and documentation. 😊
>
> Cheers,
>
>                  Tim
>
> P.S. For those who used the vm earlier and found it wonky, it was indeed wonky because I had failed to add a swap file.  With that change in place, the vm works quite well.
>
>
>