You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Maruan Sahyoun <sa...@fileaffairs.de> on 2013/12/08 15:43:05 UTC

Building/enhancing a test suite for PDFBox

Hi,

as we are handling and closing issues using PDFs provided by users of the library what do you think about adding these files to a test suite if these can be used to check for a behavior of handling specific issues. 

The benefit would be that we can write tests around these issues to ensure that forthcoming releases are still able to handle these files.

An idea for a naming convention would be something like <issue number><short description> e.g. 1769-invalid_xref.pdf

WDYT

Maruan Sahyoun


Re: Building/enhancing a test suite for PDFBox

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Yes,

that’s my observation too. In addition Bavarian deals with positive documents too whereas Isartor only has false documents (from a PDF/A perspective). So it’s more generic. 

Maruan Sahyoun


Am 09.12.2013 um 17:12 schrieb Guillaume Bailleul <gb...@gmail.com>:

> Hi,
> 
> what is in place for PDF/A validation is too specific, as you said, we
> only expect an error code (as we only validate isartor files). Bavaria
> Test suite contains a format where conforming and non conforming are
> handled, it is IMO a better source of inspiration.
> 
> BR,
> 
> Guillaume
> 
> On Mon, Dec 9, 2013 at 4:32 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>> Hi,
>> 
>> I fully agree that the target should be to have automated tests. wo that the benefit will be limited. As for error codes/messages we could reuse/generalize what’s in place for the PDF/A validator. Bavarian test suite from pdflib also has a good set of test/result descriptions.
>> 
>> BR
>> Maruan Sahyoun
>> 
>> Am 09.12.2013 um 16:00 schrieb Timo Boehme <ti...@ontochem.com>:
>> 
>>> Hi,
>>> 
>>> this would be a valuable resource, especially if the test can be automated - thus we need to somehow specify the expected result (exception, warning, result document/text) for automated processing. Maybe we should start using error codes?
>>> 
>>> 
>>> Best,
>>> Timo
>>> 
>>> 
>>> 
>>> Am 08.12.2013 15:43, schrieb Maruan Sahyoun:
>>>> Hi,
>>>> 
>>>> as we are handling and closing issues using PDFs provided by users of the library what do you think about adding these files to a test suite if these can be used to check for a behavior of handling specific issues.
>>>> 
>>>> The benefit would be that we can write tests around these issues to ensure that forthcoming releases are still able to handle these files.
>>>> 
>>>> An idea for a naming convention would be something like <issue number><short description> e.g. 1769-invalid_xref.pdf
>>>> 
>>>> WDYT
>>>> 
>>>> Maruan Sahyoun
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Timo Boehme
>>> OntoChem GmbH
>>> H.-Damerow-Str. 4
>>> 06120 Halle/Saale
>>> T: +49 345 4780474
>>> F: +49 345 4780471
>>> timo.boehme@ontochem.com
>>> 
>>> _____________________________________________________________________
>>> 
>>> OntoChem GmbH
>>> Geschäftsführer: Dr. Lutz Weber
>>> Sitz: Halle / Saale
>>> Registergericht: Stendal
>>> Registernummer: HRB 215461
>>> _____________________________________________________________________
>>> 
>> 


Re: Building/enhancing a test suite for PDFBox

Posted by Guillaume Bailleul <gb...@gmail.com>.
Hi,

what is in place for PDF/A validation is too specific, as you said, we
only expect an error code (as we only validate isartor files). Bavaria
Test suite contains a format where conforming and non conforming are
handled, it is IMO a better source of inspiration.

BR,

Guillaume

On Mon, Dec 9, 2013 at 4:32 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
> Hi,
>
> I fully agree that the target should be to have automated tests. wo that the benefit will be limited. As for error codes/messages we could reuse/generalize what’s in place for the PDF/A validator. Bavarian test suite from pdflib also has a good set of test/result descriptions.
>
> BR
> Maruan Sahyoun
>
> Am 09.12.2013 um 16:00 schrieb Timo Boehme <ti...@ontochem.com>:
>
>> Hi,
>>
>> this would be a valuable resource, especially if the test can be automated - thus we need to somehow specify the expected result (exception, warning, result document/text) for automated processing. Maybe we should start using error codes?
>>
>>
>> Best,
>> Timo
>>
>>
>>
>> Am 08.12.2013 15:43, schrieb Maruan Sahyoun:
>>> Hi,
>>>
>>> as we are handling and closing issues using PDFs provided by users of the library what do you think about adding these files to a test suite if these can be used to check for a behavior of handling specific issues.
>>>
>>> The benefit would be that we can write tests around these issues to ensure that forthcoming releases are still able to handle these files.
>>>
>>> An idea for a naming convention would be something like <issue number><short description> e.g. 1769-invalid_xref.pdf
>>>
>>> WDYT
>>>
>>> Maruan Sahyoun
>>>
>>
>>
>> --
>>
>> Timo Boehme
>> OntoChem GmbH
>> H.-Damerow-Str. 4
>> 06120 Halle/Saale
>> T: +49 345 4780474
>> F: +49 345 4780471
>> timo.boehme@ontochem.com
>>
>> _____________________________________________________________________
>>
>> OntoChem GmbH
>> Geschäftsführer: Dr. Lutz Weber
>> Sitz: Halle / Saale
>> Registergericht: Stendal
>> Registernummer: HRB 215461
>> _____________________________________________________________________
>>
>

Re: Building/enhancing a test suite for PDFBox

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

I fully agree that the target should be to have automated tests. wo that the benefit will be limited. As for error codes/messages we could reuse/generalize what’s in place for the PDF/A validator. Bavarian test suite from pdflib also has a good set of test/result descriptions.

BR
Maruan Sahyoun

Am 09.12.2013 um 16:00 schrieb Timo Boehme <ti...@ontochem.com>:

> Hi,
> 
> this would be a valuable resource, especially if the test can be automated - thus we need to somehow specify the expected result (exception, warning, result document/text) for automated processing. Maybe we should start using error codes?
> 
> 
> Best,
> Timo
> 
> 
> 
> Am 08.12.2013 15:43, schrieb Maruan Sahyoun:
>> Hi,
>> 
>> as we are handling and closing issues using PDFs provided by users of the library what do you think about adding these files to a test suite if these can be used to check for a behavior of handling specific issues.
>> 
>> The benefit would be that we can write tests around these issues to ensure that forthcoming releases are still able to handle these files.
>> 
>> An idea for a naming convention would be something like <issue number><short description> e.g. 1769-invalid_xref.pdf
>> 
>> WDYT
>> 
>> Maruan Sahyoun
>> 
> 
> 
> -- 
> 
> Timo Boehme
> OntoChem GmbH
> H.-Damerow-Str. 4
> 06120 Halle/Saale
> T: +49 345 4780474
> F: +49 345 4780471
> timo.boehme@ontochem.com
> 
> _____________________________________________________________________
> 
> OntoChem GmbH
> Geschäftsführer: Dr. Lutz Weber
> Sitz: Halle / Saale
> Registergericht: Stendal
> Registernummer: HRB 215461
> _____________________________________________________________________
> 


Re: Building/enhancing a test suite for PDFBox

Posted by Timo Boehme <ti...@ontochem.com>.
Hi,

this would be a valuable resource, especially if the test can be 
automated - thus we need to somehow specify the expected result 
(exception, warning, result document/text) for automated processing. 
Maybe we should start using error codes?


Best,
Timo



Am 08.12.2013 15:43, schrieb Maruan Sahyoun:
> Hi,
>
> as we are handling and closing issues using PDFs provided by users of the library what do you think about adding these files to a test suite if these can be used to check for a behavior of handling specific issues.
>
> The benefit would be that we can write tests around these issues to ensure that forthcoming releases are still able to handle these files.
>
> An idea for a naming convention would be something like <issue number><short description> e.g. 1769-invalid_xref.pdf
>
> WDYT
>
> Maruan Sahyoun
>


-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________


Re: Building/enhancing a test suite for PDFBox

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
> Hi Maruan,
> 
> In my opinion, it is a good idea.
> 
> As these files are independant of PDFBox version, we could create a
> hierarchy of directory in http://svn.apache.org/repos/asf/pdfbox/ (not
> in trunk).
> 

good idea

> As for PDF/A isartor test suite set of file, we could create
> subdirectories to group files per types of error.
> 
> When we made preflight, we created a file expected_errors.txt,
> containing information on expected error code returned by the API. it
> could be a good idea to have something similar.
> 
> 
> BR,
> 
> Guillaume
> 
> 

in addition to PDF/A isartor, which are as far as I understand all files with errors, there could be some conforming ones similar to the bavaria test from pdflib

> 
> 
> On Sun, Dec 8, 2013 at 3:43 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>> 
>> Hi,
>> 
>> as we are handling and closing issues using PDFs provided by users of the library what do you think about adding these files to a test suite if these can be used to check for a behavior of handling specific issues.
>> 
>> The benefit would be that we can write tests around these issues to ensure that forthcoming releases are still able to handle these files.
>> 
>> An idea for a naming convention would be something like <issue number><short description> e.g. 1769-invalid_xref.pdf
>> 
>> WDYT
>> 
>> Maruan Sahyoun
>> 


Re: Building/enhancing a test suite for PDFBox

Posted by Guillaume Bailleul <gb...@gmail.com>.
Hi Maruan,

In my opinion, it is a good idea.

As these files are independant of PDFBox version, we could create a
hierarchy of directory in http://svn.apache.org/repos/asf/pdfbox/ (not
in trunk).

As for PDF/A isartor test suite set of file, we could create
subdirectories to group files per types of error.

When we made preflight, we created a file expected_errors.txt,
containing information on expected error code returned by the API. it
could be a good idea to have something similar.


BR,

Guillaume




On Sun, Dec 8, 2013 at 3:43 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>
> Hi,
>
> as we are handling and closing issues using PDFs provided by users of the library what do you think about adding these files to a test suite if these can be used to check for a behavior of handling specific issues.
>
> The benefit would be that we can write tests around these issues to ensure that forthcoming releases are still able to handle these files.
>
> An idea for a naming convention would be something like <issue number><short description> e.g. 1769-invalid_xref.pdf
>
> WDYT
>
> Maruan Sahyoun
>