You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Balazs Meszaros (JIRA)" <ji...@apache.org> on 2019/07/18 12:07:00 UTC
[jira] [Created] (HBASE-22711) Spark connector doesn't use the
given mapping when inserting data
Balazs Meszaros created HBASE-22711:
---------------------------------------
Summary: Spark connector doesn't use the given mapping when inserting data
Key: HBASE-22711
URL: https://issues.apache.org/jira/browse/HBASE-22711
Project: HBase
Issue Type: Bug
Components: hbase-connectors
Affects Versions: connector-1.0.0
Reporter: Balazs Meszaros
Assignee: Balazs Meszaros
In some cases a Spark DataFrames cannot be read back with the same mapping as they were written. For example:
{code:scala}
val sql = spark.sqlContext
val persons =
"""[
|{"name": "alice", "age": 20, "height": 5, "email": "alice@alice.com"},
|{"name": "bob", "age": 23, "height": 6, "email": "bob@bob.com"},
|{"name": "carol", "age": 12, "email": "carol@carol.com", "height": 4.11}
|]
""".stripMargin
val df = spark.read.json(Seq(persons).toDS)
df.write
.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email STRING c:email, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.save()
{code}
It cannot be read back with the same mapping:
{code:scala}
val df2 = sql.read
.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email STRING c:email, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.load()
df2.createOrReplaceTempView("tableView")
val results = sql.sql("SELECT * FROM tableView")
results.show()
{code}
The results:
{noformat}
+---+-----+---------+---------------+
|age| name| height| email|
+---+-----+---------+---------------+
| 0|alice| 2.3125|alice@alice.com|
| 0| bob| 2.375| bob@bob.com|
| 0|carol|2.2568748|carol@carol.com|
+---+-----+---------+---------------+
{noformat}
Spark stores integer values in long, floating point values in double so shorts become 8 bytes long, floats also become 8 bytes long in HBase:
{noformat}
shell> scan 'person'
alice column=p:age, timestamp=1563450714829, value=\x00\x00\x00\x00\x00\x00\x00\x14
alice column=p:height, timestamp=1563450714829, value=@\x14\x00\x00\x00\x00\x00\x00
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)