You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tyler Palsulich <tp...@gmail.com> on 2015/03/11 02:53:58 UTC

Parser test resources

Hi Folks,

This has irked me for a while -- do we need all of the tika-parser test
resources in a flat directory? Can we convert them to standard package
resource paths? Or, do enough parsers have overlapping test resource
dependencies where it makes sense to have them _all_ under one directory?

It would be nice to easily know which files are used for which tests.

Tyler

RE: Parser test resources

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Hi Tyler,
  
This has started to irk me as well, a bit.  I don't think there's much overlap, although there is some.  I think navigating standard package resource paths might be cumbersome even with a good IDE... perhaps start with high-level subdirectories as chm is now doing?

-----Original Message-----
From: Tyler Palsulich [mailto:tpalsulich@gmail.com] 
Sent: Tuesday, March 10, 2015 9:54 PM
To: dev@tika.apache.org
Subject: Parser test resources

Hi Folks,

This has irked me for a while -- do we need all of the tika-parser test
resources in a flat directory? Can we convert them to standard package
resource paths? Or, do enough parsers have overlapping test resource
dependencies where it makes sense to have them _all_ under one directory?

It would be nice to easily know which files are used for which tests.

Tyler

Re: Parser test resources

Posted by Tyler Palsulich <tp...@gmail.com>.
Good points. Maybe it's a good idea to keep the new files organized, like
chm, but leave the old ones where they are? The test-documents directory
has 460 entries right now.

Tyler

On Wed, Mar 11, 2015 at 8:43 AM, Nick Burch <ap...@gagravarr.org> wrote:

> On Tue, 10 Mar 2015, Tyler Palsulich wrote:
>
>> Or, do enough parsers have overlapping test resource dependencies where
>> it makes sense to have them _all_ under one directory?
>>
>
> I believe that most of the test files get used for both detection and
> parsing unit tests
>
>  It would be nice to easily know which files are used for which tests.
>>
>
> 5 lines of perl should give you that, or fewer if you don't want to be
> able to understand the perl... ;-)
>
> Many, but not all of the test files are of the form test<filetype>.<ext>
> or test<filetype>_<special type/description>.<ext>, which I find makes it
> fairly easy to spot what files go with what. Not all though. Would fixing
> the few files not in that format help, or hinder do you think?
>
> Nick
>

Re: Parser test resources

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 10 Mar 2015, Tyler Palsulich wrote:
> Or, do enough parsers have overlapping test resource dependencies where 
> it makes sense to have them _all_ under one directory?

I believe that most of the test files get used for both detection and 
parsing unit tests

> It would be nice to easily know which files are used for which tests.

5 lines of perl should give you that, or fewer if you don't want to be 
able to understand the perl... ;-)

Many, but not all of the test files are of the form test<filetype>.<ext> 
or test<filetype>_<special type/description>.<ext>, which I find makes it 
fairly easy to spot what files go with what. Not all though. Would fixing 
the few files not in that format help, or hinder do you think?

Nick