You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Chetan Khatri (JIRA)" <ji...@apache.org> on 2017/01/26 15:39:24 UTC
[jira] [Created] (HBASE-17547) HBase-Spark Module : TableCatelog
doesn't supports multiple columns from multiple Column families
Chetan Khatri created HBASE-17547:
-------------------------------------
Summary: HBase-Spark Module : TableCatelog doesn't supports multiple columns from multiple Column families
Key: HBASE-17547
URL: https://issues.apache.org/jira/browse/HBASE-17547
Project: HBase
Issue Type: Bug
Components: hbase, spark
Affects Versions: 1.1.8
Reporter: Chetan Khatri
Fix For: 1.1.8
Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns from multiple Column families
Description:
Datasource API under HBase-Spark Module having error, which accessing more than 1 columns from same column family.
If your catalog having the format where you have multiple columns from single / multiple column family, at that point it throws an exception, for example.
def empcatalog = s"""{
|"table":{"namespace":"empschema", "name":"emp"},
|"rowkey":"key",
|"columns":{
|"empNumber":{"cf":"rowkey", "col":"key", "type":"string"},
|"city":{"cf":"pdata", "col":"city", "type":"string"},
|"empName":{"cf":"pdata", "col":"name", "type":"string"},
|"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"},
|"salary":{"cf":"pdata", "col":"salary", "type":"string"}
|}
|}""".stripMargin
Here, we have city, name, designation, salary from pdata column family.
Exception while saving Dataframe at HBase.
java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be added
at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
HBaseTableCatalog.scala class has getColumnFamilies method which returns duplicates, which should not return.
Unit test has been written for the same at HBaseTableCatelog.scala, writeCatalog object definition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)