You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Giuseppe Totaro <to...@gmail.com> on 2015/03/23 05:39:53 UTC

Re: Review Request 32291: ISATab parsers (preliminary version)

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/
-----------------------------------------------------------

(Updated March 23, 2015, 4:39 a.m.)


Review request for tika and Chris Mattmann.


Bugs: TIKA-1580
    https://issues.apache.org/jira/browse/TIKA-1580


Repository: tika


Description
-------

ISATab parsers. This preliminary solution provides three parsers, one for each ISA-Tab filetype (Investigation, Study, Assay).


Diffs
-----

  trunk/tika-bundle/pom.xml 1667912 
  trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 1667912 
  trunk/tika-parsers/pom.xml 1667912 
  trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1667912 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite profiling_NMR spectroscopy.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt PRE-CREATION 

Diff: https://reviews.apache.org/r/32291/diff/


Testing
-------

Tested on sample ISA-Tab files downloaded from http://www.isa-tools.org/format/examples/.


Thanks,

Giuseppe Totaro


Re: Review Request 32291: ISATab parsers (preliminary version)

Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/#review77557
-----------------------------------------------------------



trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java
<https://reviews.apache.org/r/32291/#comment125670>

    would be good to note here that the Parser only populates metadata per Tim A.'s comments.


- Chris Mattmann


On March 23, 2015, 5:04 p.m., Giuseppe Totaro wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32291/
> -----------------------------------------------------------
> 
> (Updated March 23, 2015, 5:04 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Bugs: TIKA-1580
>     https://issues.apache.org/jira/browse/TIKA-1580
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> ISATab parsers. This preliminary solution provides three parsers, one for each ISA-Tab filetype (Investigation, Study, Assay).
> 
> 
> Diffs
> -----
> 
>   trunk/tika-bundle/pom.xml 1668683 
>   trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 1668683 
>   trunk/tika-parsers/pom.xml 1668683 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1668683 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite profiling_NMR spectroscopy.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32291/diff/
> 
> 
> Testing
> -------
> 
> Tested on sample ISA-Tab files downloaded from http://www.isa-tools.org/format/examples/.
> 
> 
> Thanks,
> 
> Giuseppe Totaro
> 
>


Re: Review Request 32291: ISATab parsers (preliminary version)

Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/#review78158
-----------------------------------------------------------

Ship it!


Ship It!

- Chris Mattmann


On March 23, 2015, 5:04 p.m., Giuseppe Totaro wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32291/
> -----------------------------------------------------------
> 
> (Updated March 23, 2015, 5:04 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Bugs: TIKA-1580
>     https://issues.apache.org/jira/browse/TIKA-1580
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> ISATab parsers. This preliminary solution provides three parsers, one for each ISA-Tab filetype (Investigation, Study, Assay).
> 
> 
> Diffs
> -----
> 
>   trunk/tika-bundle/pom.xml 1668683 
>   trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 1668683 
>   trunk/tika-parsers/pom.xml 1668683 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1668683 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite profiling_NMR spectroscopy.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32291/diff/
> 
> 
> Testing
> -------
> 
> Tested on sample ISA-Tab files downloaded from http://www.isa-tools.org/format/examples/.
> 
> 
> Thanks,
> 
> Giuseppe Totaro
> 
>


Re: Review Request 32291: ISATab parsers (preliminary version)

Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/#review78159
-----------------------------------------------------------

Ship it!


I have ran this in production and it works awesome! Thanks Giuseppe!

- Chris Mattmann


On March 23, 2015, 5:04 p.m., Giuseppe Totaro wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32291/
> -----------------------------------------------------------
> 
> (Updated March 23, 2015, 5:04 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Bugs: TIKA-1580
>     https://issues.apache.org/jira/browse/TIKA-1580
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> ISATab parsers. This preliminary solution provides three parsers, one for each ISA-Tab filetype (Investigation, Study, Assay).
> 
> 
> Diffs
> -----
> 
>   trunk/tika-bundle/pom.xml 1668683 
>   trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 1668683 
>   trunk/tika-parsers/pom.xml 1668683 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1668683 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite profiling_NMR spectroscopy.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32291/diff/
> 
> 
> Testing
> -------
> 
> Tested on sample ISA-Tab files downloaded from http://www.isa-tools.org/format/examples/.
> 
> 
> Thanks,
> 
> Giuseppe Totaro
> 
>


Re: Review Request 32291: ISATab parsers (preliminary version)

Posted by Tyler Palsulich <tp...@gmail.com>.
You can comment on a specific line by clicking on the line number on the
left (and a window will pop up). :)

Tyler

On Mon, Mar 23, 2015 at 3:52 PM, Tim Allison <ta...@apache.org> wrote:

>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32291/#review77453
> -----------------------------------------------------------
>
>
> Apologies, I haven't figured out how to reference specific code snippets.
> First time reviewing.  Mea culpa.
>
> For several of these points, I've either violated or have code that is
> currently in violation of them.
>
> Might want to wrap the table headers in thead and the table in tbody, as
> we do in XSSFExcelExtractorDecorator and jdbc.AbstractDBParser.
>
> Do we need the xhtml.newline()?  We don't do that with other tables.
> Should we add that to other tables?
>
> For the tests, unless you are doing something special with the handler or
> context, shorter to use TikaTest's getXML and then run assertContains on
> the xml.
>
> Should probably add Metadata.CONTENT_LENGTH as a metadata item throughout.
>
>
> In ISATabStudyParser and elsewhere, do we want to suck the entire file
> into memory?  Better to use the iterator (on the unchecked! theory) that
> the iterator does just in time parsing?
> List<CSVRecord> records = csvParser.getRecords();
>
> There's still a pretty big TODO in ISATabInvestigationParser btwn
> startDocument() and endDocument()...I might be looking at an old copy, or
> this is the intended behavior.  If it is the intended behavior, note that
> in the javadocs!
>
> In ISATabInvestigationParser, why not use the CSVParser so that you don't
> have to do this?
> metadata.add(values[0], values[i].replaceAll("(^\")|(\"$)",""));
>
> +1 to Chris's recommendation to create short dummy test files that are
> clean with respect to ASL 2.0.
>
> - Tim Allison
>
>
> On March 23, 2015, 5:04 p.m., Giuseppe Totaro wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/32291/
> > -----------------------------------------------------------
> >
> > (Updated March 23, 2015, 5:04 p.m.)
> >
> >
> > Review request for tika and Chris Mattmann.
> >
> >
> > Bugs: TIKA-1580
> >     https://issues.apache.org/jira/browse/TIKA-1580
> >
> >
> > Repository: tika
> >
> >
> > Description
> > -------
> >
> > ISATab parsers. This preliminary solution provides three parsers, one
> for each ISA-Tab filetype (Investigation, Study, Assay).
> >
> >
> > Diffs
> > -----
> >
> >   trunk/tika-bundle/pom.xml 1668683
> >
>  trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
> 1668683
> >   trunk/tika-parsers/pom.xml 1668683
> >
>  trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java
> PRE-CREATION
> >
>  trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java
> PRE-CREATION
> >
>  trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java
> PRE-CREATION
> >
>  trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
> 1668683
> >
>  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite
> profiling_NMR spectroscopy.txt PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt
> PRE-CREATION
> >
>  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt
> PRE-CREATION
> >
> > Diff: https://reviews.apache.org/r/32291/diff/
> >
> >
> > Testing
> > -------
> >
> > Tested on sample ISA-Tab files downloaded from
> http://www.isa-tools.org/format/examples/.
> >
> >
> > Thanks,
> >
> > Giuseppe Totaro
> >
> >
>
>

Re: Review Request 32291: ISATab parsers (preliminary version)

Posted by Tim Allison <ta...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/#review77453
-----------------------------------------------------------


Apologies, I haven't figured out how to reference specific code snippets.  First time reviewing.  Mea culpa.

For several of these points, I've either violated or have code that is currently in violation of them.

Might want to wrap the table headers in thead and the table in tbody, as we do in XSSFExcelExtractorDecorator and jdbc.AbstractDBParser.

Do we need the xhtml.newline()?  We don't do that with other tables.  Should we add that to other tables?

For the tests, unless you are doing something special with the handler or context, shorter to use TikaTest's getXML and then run assertContains on the xml. 

Should probably add Metadata.CONTENT_LENGTH as a metadata item throughout.


In ISATabStudyParser and elsewhere, do we want to suck the entire file into memory?  Better to use the iterator (on the unchecked! theory) that the iterator does just in time parsing?
List<CSVRecord> records = csvParser.getRecords();

There's still a pretty big TODO in ISATabInvestigationParser btwn startDocument() and endDocument()...I might be looking at an old copy, or this is the intended behavior.  If it is the intended behavior, note that in the javadocs!

In ISATabInvestigationParser, why not use the CSVParser so that you don't have to do this?
metadata.add(values[0], values[i].replaceAll("(^\")|(\"$)",""));

+1 to Chris's recommendation to create short dummy test files that are clean with respect to ASL 2.0.

- Tim Allison


On March 23, 2015, 5:04 p.m., Giuseppe Totaro wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32291/
> -----------------------------------------------------------
> 
> (Updated March 23, 2015, 5:04 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Bugs: TIKA-1580
>     https://issues.apache.org/jira/browse/TIKA-1580
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> ISATab parsers. This preliminary solution provides three parsers, one for each ISA-Tab filetype (Investigation, Study, Assay).
> 
> 
> Diffs
> -----
> 
>   trunk/tika-bundle/pom.xml 1668683 
>   trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 1668683 
>   trunk/tika-parsers/pom.xml 1668683 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java PRE-CREATION 
>   trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1668683 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite profiling_NMR spectroscopy.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32291/diff/
> 
> 
> Testing
> -------
> 
> Tested on sample ISA-Tab files downloaded from http://www.isa-tools.org/format/examples/.
> 
> 
> Thanks,
> 
> Giuseppe Totaro
> 
>


Re: Review Request 32291: ISATab parsers (preliminary version)

Posted by Giuseppe Totaro <to...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/
-----------------------------------------------------------

(Updated March 23, 2015, 5:04 p.m.)


Review request for tika and Chris Mattmann.


Bugs: TIKA-1580
    https://issues.apache.org/jira/browse/TIKA-1580


Repository: tika


Description
-------

ISATab parsers. This preliminary solution provides three parsers, one for each ISA-Tab filetype (Investigation, Study, Assay).


Diffs (updated)
-----

  trunk/tika-bundle/pom.xml 1668683 
  trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 1668683 
  trunk/tika-parsers/pom.xml 1668683 
  trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java PRE-CREATION 
  trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java PRE-CREATION 
  trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java PRE-CREATION 
  trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1668683 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite profiling_NMR spectroscopy.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt PRE-CREATION 

Diff: https://reviews.apache.org/r/32291/diff/


Testing
-------

Tested on sample ISA-Tab files downloaded from http://www.isa-tools.org/format/examples/.


Thanks,

Giuseppe Totaro