You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Balazs Meszaros (JIRA)" <ji...@apache.org> on 2019/07/22 14:33:00 UTC
[jira] [Resolved] (HBASE-22711) Spark connector doesn't use the
given mapping when inserting data
[ https://issues.apache.org/jira/browse/HBASE-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Balazs Meszaros resolved HBASE-22711.
-------------------------------------
Resolution: Fixed
Fix Version/s: connector-1.0.1
> Spark connector doesn't use the given mapping when inserting data
> -----------------------------------------------------------------
>
> Key: HBASE-22711
> URL: https://issues.apache.org/jira/browse/HBASE-22711
> Project: HBase
> Issue Type: Bug
> Components: hbase-connectors
> Affects Versions: connector-1.0.0
> Reporter: Balazs Meszaros
> Assignee: Balazs Meszaros
> Priority: Major
> Fix For: connector-1.0.1
>
>
> In some cases a Spark DataFrames cannot be read back with the same mapping as they were written. For example:
> {code:scala}
> val sql = spark.sqlContext
> val persons =
> """[
> |{"name": "alice", "age": 20, "height": 5, "email": "alice@alice.com"},
> |{"name": "bob", "age": 23, "height": 6, "email": "bob@bob.com"},
> |{"name": "carol", "age": 12, "email": "carol@carol.com", "height": 4.11}
> |]
> """.stripMargin
> val df = spark.read.json(Seq(persons).toDS)
> df.write
> .format("org.apache.hadoop.hbase.spark")
> .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email STRING c:email, height FLOAT p:height")
> .option("hbase.table", "person")
> .option("hbase.spark.use.hbasecontext", false)
> .save()
> {code}
> It cannot be read back with the same mapping:
> {code:scala}
> val df2 = sql.read
> .format("org.apache.hadoop.hbase.spark")
> .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email STRING c:email, height FLOAT p:height")
> .option("hbase.table", "person")
> .option("hbase.spark.use.hbasecontext", false)
> .load()
> df2.createOrReplaceTempView("tableView")
> val results = sql.sql("SELECT * FROM tableView")
> results.show()
> {code}
> The results:
> {noformat}
> +---+-----+---------+---------------+
> |age| name| height| email|
> +---+-----+---------+---------------+
> | 0|alice| 2.3125|alice@alice.com|
> | 0| bob| 2.375| bob@bob.com|
> | 0|carol|2.2568748|carol@carol.com|
> +---+-----+---------+---------------+
> {noformat}
> Spark stores integer values in long, floating point values in double so shorts become 8 bytes long, floats also become 8 bytes long in HBase:
> {noformat}
> shell> scan 'person'
> alice column=p:age, timestamp=1563450714829, value=\x00\x00\x00\x00\x00\x00\x00\x14
> alice column=p:height, timestamp=1563450714829, value=@\x14\x00\x00\x00\x00\x00\x00
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)