You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Kalyan (JIRA)" <ji...@apache.org> on 2016/08/16 16:19:20 UTC

[jira] [Commented] (PHOENIX-2938) HFile support for SparkSQL DataFrame saves

    [ https://issues.apache.org/jira/browse/PHOENIX-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422971#comment-15422971 ] 

Kalyan commented on PHOENIX-2938:
---------------------------------

Converting HFile into SparkSQL DataFrame.

Adding the existing base code to github

https://github.com/kalyanhadooptraining/phoenix/commit/ce5869e3ae9036a72e123ff2e319ba0a1b59e922

TODO:
1. code cleanup
2. comments need to be update
3. unit test cases are required
4. final review on code


any suggestions are allowed ..


> HFile support for SparkSQL DataFrame saves
> ------------------------------------------
>
>                 Key: PHOENIX-2938
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2938
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Chris Tarnas
>            Assignee: Kalyan
>            Priority: Minor
>
> Currently when saving a DataFrame in Spark it is persisted as upserts. Having an option to do saves natively via HFiles, as the MapReduce loader does, would be a great performance improvement for large bulk loads. The current work around to reduce the load on the regionservers would be to save to csv from Spark then load via the MapReduce loader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)