You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Lai Zhou (JIRA)" <ji...@apache.org> on 2019/05/09 07:32:00 UTC

[jira] [Commented] (CALCITE-2040) Create adapter for Apache Arrow

    [ https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836162#comment-16836162 ] 

Lai Zhou commented on CALCITE-2040:
-----------------------------------

I think it may improve a lot of performance  if we have Arrow as a calling convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I will have a try to make Arrow as a calling convention.

> Create adapter for Apache Arrow
> -------------------------------
>
>                 Key: CALCITE-2040
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2040
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would allow people to execute SQL statements, via JDBC or ODBC, on data stored in Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, say, CSV files using the file adapter: an Arrow data set does not have a URL. (Unless we use Arrow's [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/] format, or use an in-memory file system such as Alluxio.) So we would need to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it would also be good to have Arrow as a calling convention. That is, implementations of relational operators such as Filter, Project, Aggregate in addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in CALCITE-2025) it would make a lot of sense to translate those formats directly into Arrow (applying simple projects and filters first if applicable). Those adapters would belong as a "contrib" module in the Arrow project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)