You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/03/23 18:11:11 UTC

[jira] [Comment Edited] (TIKA-1580) ISA-Tab parsers

    [ https://issues.apache.org/jira/browse/TIKA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376210#comment-14376210 ] 

Giuseppe Totaro edited comment on TIKA-1580 at 3/23/15 5:10 PM:
----------------------------------------------------------------

Hi [~chrismattmann], I apologize about that. I forgot to include the parsers.
I updated right now the patch (including parsers) in [https://reviews.apache.org/r/32291/]. You can find the patch also in attachment.
Thanks [~tpalsulich] for your review. The new patch should include what you suggested.
[~chrismattmann] and [~tpalsulich], I am going to create my own sample files using [ISACreator|http://www.isa-tools.org/software-suite/] tool and then I will add to the patch.
Thanks a lot for your feedback.


was (Author: gostep):
Hi [~chrismattmann], I apologize about that. I forgot to include the parsers.
I updated right now the patch in [https://reviews.apache.org/r/32291/]. You can find the patch also in attachment.
Thanks [~tpalsulich] for your review. The new patch should include what you suggested.
[~chrismattmann] and [~tpalsulich], I am going to create my own sample files using [ISACreator|http://www.isa-tools.org/software-suite/] tool and then I will add to the patch.
Thanks a lot for your feedback.

> ISA-Tab parsers
> ---------------
>
>                 Key: TIKA-1580
>                 URL: https://issues.apache.org/jira/browse/TIKA-1580
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Giuseppe Totaro
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>              Labels: new-parser
>             Fix For: 1.8
>
>         Attachments: TIKA-1580.patch, TIKA-1580.v02.patch
>
>
> We are going to add parsers for ISA-Tab data formats.
> ISA-Tab files are related to [ISA Tools|http://www.isa-tools.org/] which help to manage an increasingly diverse set of life science, environmental and biomedical experiments that employing one or a combination of technologies.
> The ISA tools are built upon _Investigation_, _Study_, and _Assay_ tabular format. Therefore, ISA-Tab data format includes three types of file: Investigation file ({{a_xxxx.txt}}), Study file ({{s_xxxx.txt}}), Assay file ({{a_xxxx.txt}}). These files are organized as [top-down hierarchy|http://www.isa-tools.org/format/specification/]: An Investigation file includes one or more Study files: each Study files includes one or more Assay files.
> Essentially, the Investigation files contains high-level information about the related study, so it provides only metadata about ISA-Tab files.
> More details on file format specification are [available online|http://isatab.sourceforge.net/docs/ISA-TAB_release-candidate-1_v1.0_24nov08.pdf].
> The patch in attachment provides a preliminary version of ISA-Tab parsers (there are three parsers; one parser for each ISA-Tab filetype):
> * {{ISATabInvestigationParser.java}}: parses Investigation files. It extracts only metadata.
> * {{ISATabStudyParser.java}}: parses Study files.
> * {{ISATabAssayParser.java}}: parses Assay files.
> The most important improvements are:
> * Combine these three parsers in order to parse an ISArchive
> * Provide a better mapping of both study and assay data on XHML. Currently, {{ISATabStudyParser}} and {{ISATabAssayParser}} provide a naive mapping function relying on [Apache Commons CSV|https://commons.apache.org/proper/commons-csv/].
> Thanks for supporting me on this work [~chrismattmann]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)