You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2015/07/10 13:57:05 UTC
first stack trace report from pdfbox 2.0.0 trunk
All,
I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
Caveats/Notes
The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
I stopped the batch run early. This only covered ~50k pdfs.
I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
Cheers,
Tim
RE: first stack trace report from pdfbox 2.0.0 trunk
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank you!
I think I'll wait until PDFBOX-2883 is resolved if that's ok. That looks major.
-----Original Message-----
From: John Hewson [mailto:john@jahewson.com]
Sent: Tuesday, July 14, 2015 8:34 PM
To: dev@pdfbox.apache.org
Subject: Re: first stack trace report from pdfbox 2.0.0 trunk
> On 14 Jul 2015, at 13:49, Tilman Hausherr <TH...@t-online.de> wrote:
>
> Am 14.07.2015 um 22:35 schrieb John Hewson:
>>> On 14 Jul 2015, at 13:20, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>> Am 14.07.2015 um 21:37 schrieb Allison, Timothy B.:
>>>> Interesting, yes: 781/781172.pdf, 490/490376.pdf and 029/029423.pdf. Are you running your own regression testing against govdocs1?
>>> Yes, from time to time for the last few months.
>>>
>>>> Is it duplicated effort for me to do anything with 2.0.0?
>>> Partly yes. The only difference is that I didn't do any text extraction.
>>>
>>>> Or, is your point that should I wait until PDFBOX-2842 is completed?
>>> Yes :-)
>> Good news, PDFBOX-2842 is now complete.
>
> No, the 029423 file is still throwing an exception :-(
>
Ok, I've just fixed this, hopefully it works.
- John
> Tilman
>
>
>>
>> - John
>>
>>> Tilman
>>>
>>>> Thank you!
>>>>
>>>> Best,
>>>>
>>>> Tim
>>>> -----Original Message-----
>>>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>>>> Sent: Tuesday, July 14, 2015 12:47 PM
>>>> To: dev@pdfbox.apache.org
>>>> Subject: Re: first stack trace report from pdfbox 2.0.0 trunk
>>>>
>>>> Hi Tim,
>>>>
>>>> Currently there is at least one known regression, mentioned in
>>>> PDFBOX-2842, it applies to 029423 but also to other files.
>>>>
>>>> Tilman
>>>>
>>>> Am 10.07.2015 um 13:57 schrieb Allison, Timothy B.:
>>>>> All,
>>>>> I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
>>>>>
>>>>> Caveats/Notes
>>>>>
>>>>> The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
>>>>>
>>>>> I stopped the batch run early. This only covered ~50k pdfs.
>>>>>
>>>>> I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
>>>>>
>>>>> I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
>>>>>
>>>>> I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
>>>>>
>>>>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: first stack trace report from pdfbox 2.0.0 trunk
Posted by John Hewson <jo...@jahewson.com>.
> On 14 Jul 2015, at 13:49, Tilman Hausherr <TH...@t-online.de> wrote:
>
> Am 14.07.2015 um 22:35 schrieb John Hewson:
>>> On 14 Jul 2015, at 13:20, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>> Am 14.07.2015 um 21:37 schrieb Allison, Timothy B.:
>>>> Interesting, yes: 781/781172.pdf, 490/490376.pdf and 029/029423.pdf. Are you running your own regression testing against govdocs1?
>>> Yes, from time to time for the last few months.
>>>
>>>> Is it duplicated effort for me to do anything with 2.0.0?
>>> Partly yes. The only difference is that I didn't do any text extraction.
>>>
>>>> Or, is your point that should I wait until PDFBOX-2842 is completed?
>>> Yes :-)
>> Good news, PDFBOX-2842 is now complete.
>
> No, the 029423 file is still throwing an exception :-(
>
Ok, I’ve just fixed this, hopefully it works.
— John
> Tilman
>
>
>>
>> — John
>>
>>> Tilman
>>>
>>>> Thank you!
>>>>
>>>> Best,
>>>>
>>>> Tim
>>>> -----Original Message-----
>>>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>>>> Sent: Tuesday, July 14, 2015 12:47 PM
>>>> To: dev@pdfbox.apache.org
>>>> Subject: Re: first stack trace report from pdfbox 2.0.0 trunk
>>>>
>>>> Hi Tim,
>>>>
>>>> Currently there is at least one known regression, mentioned in
>>>> PDFBOX-2842, it applies to 029423 but also to other files.
>>>>
>>>> Tilman
>>>>
>>>> Am 10.07.2015 um 13:57 schrieb Allison, Timothy B.:
>>>>> All,
>>>>> I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
>>>>>
>>>>> Caveats/Notes
>>>>>
>>>>> The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
>>>>>
>>>>> I stopped the batch run early. This only covered ~50k pdfs.
>>>>>
>>>>> I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
>>>>>
>>>>> I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
>>>>>
>>>>> I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
>>>>>
>>>>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: first stack trace report from pdfbox 2.0.0 trunk
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 14.07.2015 um 22:35 schrieb John Hewson:
>> On 14 Jul 2015, at 13:20, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>> Am 14.07.2015 um 21:37 schrieb Allison, Timothy B.:
>>> Interesting, yes: 781/781172.pdf, 490/490376.pdf and 029/029423.pdf. Are you running your own regression testing against govdocs1?
>> Yes, from time to time for the last few months.
>>
>>> Is it duplicated effort for me to do anything with 2.0.0?
>> Partly yes. The only difference is that I didn't do any text extraction.
>>
>>> Or, is your point that should I wait until PDFBOX-2842 is completed?
>> Yes :-)
> Good news, PDFBOX-2842 is now complete.
No, the 029423 file is still throwing an exception :-(
Tilman
>
> — John
>
>> Tilman
>>
>>> Thank you!
>>>
>>> Best,
>>>
>>> Tim
>>> -----Original Message-----
>>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>>> Sent: Tuesday, July 14, 2015 12:47 PM
>>> To: dev@pdfbox.apache.org
>>> Subject: Re: first stack trace report from pdfbox 2.0.0 trunk
>>>
>>> Hi Tim,
>>>
>>> Currently there is at least one known regression, mentioned in
>>> PDFBOX-2842, it applies to 029423 but also to other files.
>>>
>>> Tilman
>>>
>>> Am 10.07.2015 um 13:57 schrieb Allison, Timothy B.:
>>>> All,
>>>> I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
>>>>
>>>> Caveats/Notes
>>>>
>>>> The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
>>>>
>>>> I stopped the batch run early. This only covered ~50k pdfs.
>>>>
>>>> I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
>>>>
>>>> I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
>>>>
>>>> I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
>>>>
>>>>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: first stack trace report from pdfbox 2.0.0 trunk
Posted by John Hewson <jo...@jahewson.com>.
> On 14 Jul 2015, at 13:20, Tilman Hausherr <TH...@t-online.de> wrote:
>
> Am 14.07.2015 um 21:37 schrieb Allison, Timothy B.:
>> Interesting, yes: 781/781172.pdf, 490/490376.pdf and 029/029423.pdf. Are you running your own regression testing against govdocs1?
>
> Yes, from time to time for the last few months.
>
>> Is it duplicated effort for me to do anything with 2.0.0?
> Partly yes. The only difference is that I didn't do any text extraction.
>
>> Or, is your point that should I wait until PDFBOX-2842 is completed?
>
> Yes :-)
Good news, PDFBOX-2842 is now complete.
— John
>
> Tilman
>
>>
>> Thank you!
>>
>> Best,
>>
>> Tim
>> -----Original Message-----
>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>> Sent: Tuesday, July 14, 2015 12:47 PM
>> To: dev@pdfbox.apache.org
>> Subject: Re: first stack trace report from pdfbox 2.0.0 trunk
>>
>> Hi Tim,
>>
>> Currently there is at least one known regression, mentioned in
>> PDFBOX-2842, it applies to 029423 but also to other files.
>>
>> Tilman
>>
>> Am 10.07.2015 um 13:57 schrieb Allison, Timothy B.:
>>> All,
>>> I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
>>>
>>> Caveats/Notes
>>>
>>> The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
>>>
>>> I stopped the batch run early. This only covered ~50k pdfs.
>>>
>>> I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
>>>
>>> I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
>>>
>>> I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
>>>
>>>
>>> Cheers,
>>>
>>> Tim
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>
Re: first stack trace report from pdfbox 2.0.0 trunk
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 14.07.2015 um 21:37 schrieb Allison, Timothy B.:
> Interesting, yes: 781/781172.pdf, 490/490376.pdf and 029/029423.pdf. Are you running your own regression testing against govdocs1?
Yes, from time to time for the last few months.
> Is it duplicated effort for me to do anything with 2.0.0?
Partly yes. The only difference is that I didn't do any text extraction.
> Or, is your point that should I wait until PDFBOX-2842 is completed?
Yes :-)
Tilman
>
> Thank you!
>
> Best,
>
> Tim
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, July 14, 2015 12:47 PM
> To: dev@pdfbox.apache.org
> Subject: Re: first stack trace report from pdfbox 2.0.0 trunk
>
> Hi Tim,
>
> Currently there is at least one known regression, mentioned in
> PDFBOX-2842, it applies to 029423 but also to other files.
>
> Tilman
>
> Am 10.07.2015 um 13:57 schrieb Allison, Timothy B.:
>> All,
>> I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
>>
>> Caveats/Notes
>>
>> The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
>>
>> I stopped the batch run early. This only covered ~50k pdfs.
>>
>> I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
>>
>> I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
>>
>> I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
>>
>>
>> Cheers,
>>
>> Tim
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
RE: first stack trace report from pdfbox 2.0.0 trunk
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Interesting, yes: 781/781172.pdf, 490/490376.pdf and 029/029423.pdf. Are you running your own regression testing against govdocs1? Is it duplicated effort for me to do anything with 2.0.0? Or, is your point that should I wait until PDFBOX-2842 is completed?
Thank you!
Best,
Tim
-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de]
Sent: Tuesday, July 14, 2015 12:47 PM
To: dev@pdfbox.apache.org
Subject: Re: first stack trace report from pdfbox 2.0.0 trunk
Hi Tim,
Currently there is at least one known regression, mentioned in
PDFBOX-2842, it applies to 029423 but also to other files.
Tilman
Am 10.07.2015 um 13:57 schrieb Allison, Timothy B.:
> All,
> I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
>
> Caveats/Notes
>
> The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
>
> I stopped the batch run early. This only covered ~50k pdfs.
>
> I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
>
> I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
>
> I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
>
>
> Cheers,
>
> Tim
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: first stack trace report from pdfbox 2.0.0 trunk
Posted by Tilman Hausherr <TH...@t-online.de>.
Hi Tim,
Currently there is at least one known regression, mentioned in
PDFBOX-2842, it applies to 029423 but also to other files.
Tilman
Am 10.07.2015 um 13:57 schrieb Allison, Timothy B.:
> All,
> I just posted the first stacktrace report from my initial partial batch run of against govdocs1 here: https://issues.apache.org/jira/secure/attachment/12744700/pdfbox_reports_2_0_0_20150709.zip
>
> Caveats/Notes
>
> The run yesterday did not include the fixes that were made in PDFBOX-2370 or PDFBOX-2862.
>
> I stopped the batch run early. This only covered ~50k pdfs.
>
> I forgot to turn on accesspermission checking. Some of the pdfs in here would normally have been skipped.
>
> I haven't reviewed any of the exceptions. They may be caused by code on the Tika side.
>
> I'll plan to re-run with the latest trunk on Tuesday. I need to turn back to the actual eval code for a bit. :)
>
>
> Cheers,
>
> Tim
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org