You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by "Chaitanya (JIRA)" <ji...@apache.org> on 2017/05/25 12:59:04 UTC

[jira] [Commented] (PHOENIX-3887) Bulk export of large query result set

    [ https://issues.apache.org/jira/browse/PHOENIX-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024702#comment-16024702 ] 

Chaitanya commented on PHOENIX-3887:
------------------------------------

This looks like it can be supported by a MR job very similar to org.apache.phoenix.mapreduce.CsvBulkLoadTool. I am ready to contribute for this feature. Any help regarding structure of code / contribution guidelines is appreciated.

> Bulk export of large query result set
> -------------------------------------
>
>                 Key: PHOENIX-3887
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3887
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.8.0
>            Reporter: Chaitanya
>            Priority: Minor
>              Labels: beginner
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Query results with large number of rows can not be consumed by a single JDBC connection.
> To export these results as a CSV file either on a local filesystem or HDFS, we can connect by Spark / Hive to Phoenix but there can be a tool (or MR job) which can be implemented just like CsvBulkLoadTool or ./psql.py. This is a very common use case for big results. 
> Similar functionality exists in Postgres via COPY command and in Redshift via UNLOAD command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)