You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Edmon Begoli (JIRA)" <ji...@apache.org> on 2017/10/31 12:08:00 UTC

[jira] [Created] (CALCITE-2025) Create adapter(s) for standard bioinformatics database files

Edmon Begoli created CALCITE-2025:
-------------------------------------

             Summary: Create adapter(s) for standard bioinformatics database files
                 Key: CALCITE-2025
                 URL: https://issues.apache.org/jira/browse/CALCITE-2025
             Project: Calcite
          Issue Type: New Feature
            Reporter: Edmon Begoli
            Assignee: Edmon Begoli
            Priority: Minor


Common bioinformatics files, used mostly in genomic medicine, and life sciences research are VCF, SAM, and FASTQ/FASTA files [1,2,3,4]. 

They are structured text files, with metadata headers, and (generally) column oriented queries.

Having calcite support for these formats would enable it to serve as the front end for processing of a very large body of important data, and to facilitate the integration of these datasets into a downstream frameworks that incorporate or use calcite. 

This issue will serve as the parent issues for each format that will be implemented (SAM, VCF, etc.)

1. SAM file format, https://en.wikipedia.org/wiki/SAM_(file_format) 
2. VCF file format, https://en.wikipedia.org/wiki/Variant_Call_Format 
3. FASTQ file format, https://en.wikipedia.org/wiki/FASTQ_format 
4.Other, https://bioinf.comav.upv.es/courses/sequence_analysis/sequence_file_formats.html 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)