You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2008/11/28 00:42:44 UTC

[jira] Created: (PDFBOX-391) Remove or replace troublesome test files

Remove or replace troublesome test files
----------------------------------------

                 Key: PDFBOX-391
                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
             Project: PDFBox
          Issue Type: Sub-task
            Reporter: Jukka Zitting
            Priority: Blocker
             Fix For: 0.8.0-incubator


One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.

The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:

    $ find test -name '*.pdf'
    test/encryption/encrypted_doc_no_id.pdf
    test/input/10101-AR.pdf
    test/input/601501018.pdf
    test/input/Exolab.pdf
    test/input/FreedomExpressions.pdf
    test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
    test/input/Garcia2004_thesis.pdf
    test/input/Hd301212.pdf
    test/input/JavaMail-1.2.pdf
    test/input/Liste732004001452_001_0.pdf_0_.pdf
    test/input/Michel2001__Review_p2_structured.pdf
    test/input/News-Oct-2001-RUS.pdf
    test/input/OLS2000-rsync.pdf
    test/input/OSP_framework.pdf
    test/input/SphericalHomeomorphism.pdf
    test/input/T05140.pdf
    test/input/TEST_SetCharSpacing_Error.pdf
    test/input/amyuni2_05d__pdf1_3_acro4x.pdf
    test/input/authentication.pdf
    test/input/c21-5916 .pdf
    test/input/citi-tr-00-4.ps.gz.pdf
    test/input/connection_pool.pdf
    test/input/cweb.pdf
    test/input/data-000001.pdf
    test/input/defensive_driving_class_schedule.pdf
    test/input/ekb_deutsch.pdf
    test/input/emsv4a4.pdf
    test/input/fdeb.pdf
    test/input/frweb-f-332-18.pdf
    test/input/hexnumberproblem.pdf
    test/input/irs tax guide for small businesses.pdf
    test/input/jose-lugo-test.pdf
    test/input/jun2003.pdf
    test/input/null_thread_bead.pdf
    test/input/ocalc.pdf
    test/input/openoffice-test-document.pdf
    test/input/org.eclipse.platform.doc.isv.pdf
    test/input/pdf_with_lots_of_fields.pdf
    test/input/rc5.pdf
    test/input/reservedparkingsalaryreductionauthorization.pdf
    test/input/ruminations.pdf
    test/input/sampleForSpec.pdf
    test/input/sample_fonts_solidconvertor.pdf
    test/input/sha256.pdf
    test/input/simple-openoffice.pdf
    test/input/surface_interpolation.pdf
    test/input/tech_report.pdf
    test/input/terms_and_conditions_book.pdf
    test/input/test_rotate_270.pdf
    test/input/warp.pdf
    test/input/welcome.pdf
    test/input/whats_new.pdf
    test/input/yaddatest.pdf
    test/pdfparser/genko_oc_shiryo1.pdf
    test/pdfreader/debug.xml.pdf
    test/pdfreader/excel.pdf
    test/pdfreader/ollix_test_2005-03-11_bin.pdf
    test/pdfreader/pdfbox_webpage.pdf

My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.

For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730224#action_12730224 ] 

Andreas Lehmkühler commented on PDFBOX-391:
-------------------------------------------

With version 793461 I've removed the encryption testfiles I forget yesterday. They are attached to PDFBOX-492 too.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-391.
---------------------------------------

    Resolution: Fixed

With version 793349 I've removed all testfiles in question. They will be automatically downloaded from PDFBOX-492 if needed during the testcases.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653524#action_12653524 ] 

Jukka Zitting commented on PDFBOX-391:
--------------------------------------

I quickly browsed through the test files, and only the following look like something that I'd feel comfortable redistributing within an Apache project:

    test/input/cweb.pdf
    test/input/data-000001.pdf
    test/input/Liste732004001452_001_0.pdf_0_.pdf
    test/input/openoffice-test-document
    test/input/sample_fonts_solidconvertor.pdf
    test/input/sampleForSpec.pdf
    test/input/simple-openoffice.pdf
    test/input/yaddatest.pdf
    test/pdfreader/debug.xml.pdf
    test/pdfreader/excel.pdf
    test/pdfreader/ollix_test_2005-03-11_bin.pdf
    test/pdfreader/pdfbox_webpage.pdf

Note that there is a clear distinction between using and redistributing something. We could still come up with a way to use the test suite in our Hudson CI build and individually by each developer, but we probably can't keep the documents in svn and we definitely can't release them as a part of PDFBox.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671937#action_12671937 ] 

Brian Carrier commented on PDFBOX-391:
--------------------------------------

The trunk now supports a feature for "external" test files to be stored in the input-ext directory. If the test suite finds that directory, it will process its contents:

Sending        build.xml
Sending        src/test/java/org/apache/pdfbox/util/TestTextStripper.java
Transmitting file data ..
Committed revision 742644.

Now we need an automated way to populate the 'input-ext' directory with the files that were removed.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728672#action_12728672 ] 

Jukka Zitting commented on PDFBOX-391:
--------------------------------------

We could simply attach the tests here in Jira and point developers to get them from here.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728663#action_12728663 ] 

Andreas Lehmkühler commented on PDFBOX-391:
-------------------------------------------

Now that the CMAP-Files are on their way to the maven-repository, the last question is where to put the test files which can't be longer in svn. 
Is it ok to put them on pdfbox homepage? Or is that too "official"? As an alternative we can put them on someones homepage on people.a.o, can't we?

Any ideas, suggestions, objections??

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730268#action_12730268 ] 

Andreas Lehmkühler commented on PDFBOX-391:
-------------------------------------------

>From that point of view I agree with you. I'll change that behaviour. But first of all some of the smaller tests (FDF and encryption) have to be adjusted. Both expect their files to be there. I'll make that optional too.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730411#action_12730411 ] 

Andreas Lehmkühler commented on PDFBOX-391:
-------------------------------------------

I've removed the automatic download and the inputfiles for TestFDF are now optional. It seems that the encryption test is never called.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Daniel Wilson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707971#action_12707971 ] 

Daniel Wilson commented on PDFBOX-391:
--------------------------------------

What about files that are attached to issues?  Those, in my opinion, form some of the most valuable test cases.

Additionally, I have received specific permission from the owner to attach a couple of the test files.

I'm seeing our entire rendering test knocked out here.  I understand there can be legal issues, but the quality of our development will surely drop if we can't test an entire area like that.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730265#action_12730265 ] 

Jukka Zitting commented on PDFBOX-391:
--------------------------------------

Re: automatically downloaded

It would be better if the user had to explicitly request these test files by running "ant get.testfiles" before building the project. If the user didn't do that, then the relevant tests would simply not run.

The licensing of these files is quite unclear, so I'd prefer if people had to explicitly decide to want them instead of them being automatically downloaded by PDFBox as a part of the normal build process.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708014#action_12708014 ] 

Andreas Lehmkühler commented on PDFBOX-391:
-------------------------------------------

As Jukka already stated in his comment, we have to remove the troublesome testfiles from svn and we can't release them as part of PDFBox but of course we can use them in our test arena. We have to place them somewhere else (perhaps as a zip in the maven repository??) and to modifiy the build process to get these files automatically and use them in our test suite.

I think the question about the files attached to issues is a quite difficult one. There are many of them and some of the issue creators allows us to redistribute the files by activating the "grant"-checkbox. But I'm afraid that some of these people aren't in the position to give us the permission because they aren't the authors of these documents, e.g. PDFBOX-450. Finally we have to doublecheck the attached docs before we'll put them to the "offical" test-cases.



> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-391) Remove or replace troublesome test files

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730076#action_12730076 ] 

Andreas Lehmkühler commented on PDFBOX-391:
-------------------------------------------

WIth version 793340 I've added support for processing an "external" testfile directory named input-ext as it is already available for TestTextStripper.

> Remove or replace troublesome test files
> ----------------------------------------
>
>                 Key: PDFBOX-391
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-391
>             Project: PDFBox
>          Issue Type: Sub-task
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> One issue raised by the license review (PDFBOX-366) is the status of the various test PDF files included in the test directory. Many of these don't seem to come with a license that would permit redistribution within an Apache project, so our only option seems to be to remove or replace the files before we can make the first Apache release.
> The full list of potentially (I haven't looked at all of these in detail so some might be OK for us to keep) troublesome test files is:
>     $ find test -name '*.pdf'
>     test/encryption/encrypted_doc_no_id.pdf
>     test/input/10101-AR.pdf
>     test/input/601501018.pdf
>     test/input/Exolab.pdf
>     test/input/FreedomExpressions.pdf
>     test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf
>     test/input/Garcia2004_thesis.pdf
>     test/input/Hd301212.pdf
>     test/input/JavaMail-1.2.pdf
>     test/input/Liste732004001452_001_0.pdf_0_.pdf
>     test/input/Michel2001__Review_p2_structured.pdf
>     test/input/News-Oct-2001-RUS.pdf
>     test/input/OLS2000-rsync.pdf
>     test/input/OSP_framework.pdf
>     test/input/SphericalHomeomorphism.pdf
>     test/input/T05140.pdf
>     test/input/TEST_SetCharSpacing_Error.pdf
>     test/input/amyuni2_05d__pdf1_3_acro4x.pdf
>     test/input/authentication.pdf
>     test/input/c21-5916 .pdf
>     test/input/citi-tr-00-4.ps.gz.pdf
>     test/input/connection_pool.pdf
>     test/input/cweb.pdf
>     test/input/data-000001.pdf
>     test/input/defensive_driving_class_schedule.pdf
>     test/input/ekb_deutsch.pdf
>     test/input/emsv4a4.pdf
>     test/input/fdeb.pdf
>     test/input/frweb-f-332-18.pdf
>     test/input/hexnumberproblem.pdf
>     test/input/irs tax guide for small businesses.pdf
>     test/input/jose-lugo-test.pdf
>     test/input/jun2003.pdf
>     test/input/null_thread_bead.pdf
>     test/input/ocalc.pdf
>     test/input/openoffice-test-document.pdf
>     test/input/org.eclipse.platform.doc.isv.pdf
>     test/input/pdf_with_lots_of_fields.pdf
>     test/input/rc5.pdf
>     test/input/reservedparkingsalaryreductionauthorization.pdf
>     test/input/ruminations.pdf
>     test/input/sampleForSpec.pdf
>     test/input/sample_fonts_solidconvertor.pdf
>     test/input/sha256.pdf
>     test/input/simple-openoffice.pdf
>     test/input/surface_interpolation.pdf
>     test/input/tech_report.pdf
>     test/input/terms_and_conditions_book.pdf
>     test/input/test_rotate_270.pdf
>     test/input/warp.pdf
>     test/input/welcome.pdf
>     test/input/whats_new.pdf
>     test/input/yaddatest.pdf
>     test/pdfparser/genko_oc_shiryo1.pdf
>     test/pdfreader/debug.xml.pdf
>     test/pdfreader/excel.pdf
>     test/pdfreader/ollix_test_2005-03-11_bin.pdf
>     test/pdfreader/pdfbox_webpage.pdf
> My suggestion is that (in line with PDFBOX-368) we create a new src/test/resources directory where we move all reviewed and accepted test cases. Once all these files have been reviewed, we just drop the remaining ones for which an acceptable license could not be found. It would be nice if replacements could be created for such test cases, but in some cases (special PDF constructs, etc.) that might be a bit troublesome so I guess we'll just need to live with some reduction in test coverage due to this.
> For more background, see the discussions at http://markmail.org/message/z7meilylwifef7db and http://markmail.org/message/cuyylr6zqs4fwdiz.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.