You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by Brian Carrier <ca...@digital-evidence.org> on 2009/03/13 21:20:01 UTC

Broken textextract regression tests

The regression tests started to fail with the r751690/r751664  
checkins (PDFBOX-420: additional CJK support).

Can we modify the automated build system to perform the textextract  
regression test and report any failures?

brian

Re: Broken textextract regression tests

Posted by Brian Carrier <ca...@digital-evidence.org>.

On Mar 13, 2009, at 5:11 PM, Andreas Lehmkühler wrote:

> Brian Carrier schrieb:
>> The regression tests started to fail with the r751690/r751664  
>> checkins
>> (PDFBOX-420: additional CJK support).
> Are all tests failing or only some of them? Perhaps we have to create
> new result files for comparing?

Just some of them:

10101-AR.pdf
Acrobat9.pdf
Garcia2003.....
sample_font_solidconvertor.pdf
terms_and_conditions_...
whats_new.pdf

They are all differences in non-ASCII / control characters.  Some  
have chars removed and others have chars added.  I haven't looked at  
the patch to determine why.

If you sync up to before those revisions, can you run the regression  
tests on your system w/out errors?

brian

Re: Broken textextract regression tests

Posted by Andreas Lehmkühler <an...@lehmi.de>.

Brian Carrier schrieb:
> The regression tests started to fail with the r751690/r751664 checkins
> (PDFBOX-420: additional CJK support).
Are all tests failing or only some of them? Perhaps we have to create
new result files for comparing?

Andreas Lehmkühler

Re: Broken textextract regression tests

Posted by Brian Carrier <ca...@digital-evidence.org>.

Note that the full regression test suite takes a long time and some  
of the tests are currently broken (outside of the recent text  
extraction failures).

https://issues.apache.org/jira/browse/PDFBOX-394

I haven't run the full test suite in a while, but there are some  
performance tests and I seem to recall that they were slow. The text  
extraction test isnt' bad.  It takes about 4 minutes on my system  
(including a build and test).


brian

On Mar 17, 2009, at 8:58 AM, Daniel Wilson wrote:

>>> We could either change the Hudson configuration to run "ant
> test dist", or include "test" as a dependency to the "dist" target.
> I'd prefer the latter option, as that forces more people to run the
> tests during normal development.
>
> While getting more people testing is definitely good, newer  
> developers may
> struggle with the junit setup and be turned off if they can't get a  
> build
> without junit working.  I know I struggled with the whole java  
> build process
> and junit when I got started on this.  I was an experienced  
> developer, but
> not a <b>java</b> developer at the time.
>
> So I would suggest the former option.
>
> Daniel Wilson
>
> On 3/17/09, Jukka Zitting <ju...@gmail.com> wrote:
>>
>> Hi,
>>
>>
>> On Fri, Mar 13, 2009 at 9:20 PM, Brian Carrier
>> <ca...@digital-evidence.org> wrote:
>>> Can we modify the automated build system to perform the textextract
>>> regression test and report any failures?
>>
>>
>> Our Hudson build does an "ant dist" whenever changes are committed to
>> the trunk. We could either change the Hudson configuration to run  
>> "ant
>> test dist", or include "test" as a dependency to the "dist" target.
>> I'd prefer the latter option, as that forces more people to run the
>> tests during normal development.
>>
>> In any case, the "test" target doesn't seem to make the build fail
>> even when there are test failures. That should probably also be
>> changed.
>>
>> BR,
>>
>>
>> Jukka Zitting
>>

Re: Broken textextract regression tests

Posted by Daniel Wilson <wi...@gmail.com>.

>>We could either change the Hudson configuration to run "ant
test dist", or include "test" as a dependency to the "dist" target.
I'd prefer the latter option, as that forces more people to run the
tests during normal development.

While getting more people testing is definitely good, newer developers may
struggle with the junit setup and be turned off if they can't get a build
without junit working.  I know I struggled with the whole java build process
and junit when I got started on this.  I was an experienced developer, but
not a <b>java</b> developer at the time.

So I would suggest the former option.

Daniel Wilson

On 3/17/09, Jukka Zitting <ju...@gmail.com> wrote:
>
> Hi,
>
>
> On Fri, Mar 13, 2009 at 9:20 PM, Brian Carrier
> <ca...@digital-evidence.org> wrote:
> > Can we modify the automated build system to perform the textextract
> > regression test and report any failures?
>
>
> Our Hudson build does an "ant dist" whenever changes are committed to
> the trunk. We could either change the Hudson configuration to run "ant
> test dist", or include "test" as a dependency to the "dist" target.
> I'd prefer the latter option, as that forces more people to run the
> tests during normal development.
>
> In any case, the "test" target doesn't seem to make the build fail
> even when there are test failures. That should probably also be
> changed.
>
> BR,
>
>
> Jukka Zitting
>

Re: Broken textextract regression tests

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Fri, Mar 13, 2009 at 9:20 PM, Brian Carrier
<ca...@digital-evidence.org> wrote:
> Can we modify the automated build system to perform the textextract
> regression test and report any failures?

Our Hudson build does an "ant dist" whenever changes are committed to
the trunk. We could either change the Hudson configuration to run "ant
test dist", or include "test" as a dependency to the "dist" target.
I'd prefer the latter option, as that forces more people to run the
tests during normal development.

In any case, the "test" target doesn't seem to make the build fail
even when there are test failures. That should probably also be
changed.

BR,

Jukka Zitting