You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Josh Mahonin (JIRA)" <ji...@apache.org> on 2016/01/28 01:27:40 UTC

[jira] [Comment Edited] (PHOENIX-2632) Easier Hive->Phoenix data movement

    [ https://issues.apache.org/jira/browse/PHOENIX-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120500#comment-15120500 ] 

Josh Mahonin edited comment on PHOENIX-2632 at 1/28/16 12:27 AM:
-----------------------------------------------------------------

This looks pretty neat [~rgelhau]

I bet there's a way to take your 'CREATE TABLE IF NOT EXISTS' functionality and wrap it into the existing Spark DataFrame code, which could be made to use the SaveMode.Ignore option [1]. Right now it only supports SaveMode.Overwrite, which assumes the table is created already.

Once that's in, I think the Hive->Phoenix functionality becomes a documentation exercise: show to to setup the Hive table as a DataFrame, then invoke df.save("org.apache.phoenix.spark"...) on it.

[1] http://spark.apache.org/docs/latest/sql-programming-guide.html




was (Author: jmahonin):
This looks pretty neat [~rgelhau]

I bet there's a way to take your 'CREATE TABLE IF NOT EXISTS' functionality could be wrapped into the existing Spark DataFrame code, and be made to use for the SaveMode.Ignore option [1]. Right now it only supports SaveMode.Overwrite, which assumes the table is setup already.

Once that's in, I think the Hive->Phoenix functionality becomes a documentation exercise: show to to setup the Hive table as a DataFrame, then invoke df.save("org.apache.phoenix.spark"...) on it.

[1] http://spark.apache.org/docs/latest/sql-programming-guide.html



> Easier Hive->Phoenix data movement
> ----------------------------------
>
>                 Key: PHOENIX-2632
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2632
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Randy Gelhausen
>
> Moving tables or query results from Hive into Phoenix today requires error prone manual schema re-definition inside HBase storage handler properties. 
> Since Hive and Phoenix support near equivalent types, it should be easier for users to pick a Hive table and load it (or derived query results) from it.
> I'm posting this to open design discussion, but also submit my own project https://github.com/randerzander/HiveToPhoenix for consideration as an early solution. It creates a Spark DataFrame from a Hive query, uses Phoenix JDBC to "create if not exists" a Phoenix equivalent table, and uses the phoenix-spark artifact to store the DataFrame into Phoenix.
> I'm eager to get feedback if this is interesting/useful to the Phoenix community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)