You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by xiaopeng-liao <gi...@git.apache.org> on 2016/08/25 12:20:50 UTC
[GitHub] phoenix pull request #196: [PHOENIX-2648] Add dynamic column support for spa...
GitHub user xiaopeng-liao opened a pull request:
https://github.com/apache/phoenix/pull/196
[PHOENIX-2648] Add dynamic column support for spark integration
It supports both RDD and Dataframe read /write,
Things needed consideration
======
When loading from Dataframe, there is a need to convert from catalyst data type to Phoenix type, ex.
StringType to Varchar, Array<Integer> to INTEGER_ARRAY,. etc. The code is under phoenix-spark/src/main/scala/org.apache.phoenix.spark.DataFrameFunctions.scala
Usages
=======
- **RDD**
**Save**
```
val dataSet = List((1L, "1", 1, 1), (2L, "2", 2, 2), (3L, "3", 3, 3))
sc
.parallelize(dataSet)
.saveToPhoenix(
"OUTPUT_TEST_TABLE",
Seq("ID", "COL1", "COL2", "COL4<INTEGER"),
hbaseConfiguration
)
```
**Read**
```
val columnNames = Seq("ID", "COL1", "COL2", "COL5<INTEGER")
// Load the results back
val loaded = sc.phoenixTableAsRDD(
"OUTPUT_TEST_TABLE",columnNames,
conf = hbaseConfiguration
)
```
- **Dataframe**
**Save**
It will get data types from Dataframe and convert to Phoenix supported types
```
val dataSet = List((1L, "1", 1, 1,"2"), (2L, "2", 2, 2,"3"), (3L, "3", 3, 3,"4"))
sc
.parallelize(dataSet).toDF("ID","COL1","COL2","COL6","COL7")
.saveToPhoenix("OUTPUT_TEST_TABLE",zkUrl = Some(quorumAddress))
```
**Read**
```
val df1 = sqlContext.phoenixTableAsDataFrame("OUTPUT_TEST_TABLE", Array("ID",
"COL1","COL6<INTEGER", "COL7<VARCHAR"), conf = hbaseConfiguration)
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xiaopeng-liao/phoenix phoenix-addsparkdynamic
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/phoenix/pull/196.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #196
----
commit a2dc6101d96333f781ff9e905c47c035f8b89462
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-17T12:13:58Z
add dynamic column support for SPARK rdd
commit 6969287db5ea341bc3876af55f7d0ef3acb035c2
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-18T09:46:38Z
add dynamic column support for reading from PhoenixRDD.
commit 5688b6c90c66b02cc22fcac6e67b9712d7eb660e
Author: xiaopeng-liao <xp...@gmail.com>
Date: 2016-08-19T14:52:27Z
Merge pull request #1 from apache/master
merge in latest changes from phoenix
commit a9b217e55393f613e9ca168faccd93e7626c7324
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T10:51:34Z
[PHOENIX-2648] add support for dynamic columns for RDD and Dataframe
commit 51190865375397581cbd1d6b960c79be7d727b97
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T10:52:27Z
Merge branch 'phoenix-addsparkdynamic' of https://github.com/xiaopeng-liao/phoenix into phoenix-addsparkdynamic
commit 6cbd6314782a6eb1a4c69eae25371791e4d64f90
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T13:00:55Z
Remove the configuration for enable dynamic column as it is not used anyway
commit 8602554c875229f376499c082894cc33999f3e7b
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T15:01:29Z
More clean up, remove the configuration for dynamic column
commit d3a4f1575f4b376df32f6d28aeba14270ce58088
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-25T08:44:47Z
[PHOENIX-2648] change dynamic column format from COL:DataType to COL<DataType becaues it conflict with index syntax
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---