You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 罗辉 <lu...@ifeng.com> on 2016/07/04 02:36:50 UTC

答复: [Marketing Mail] Re: question about SparkSQL loading hbase tables

Hi ted, yes I check hbase's github and the doc , but seems there isn't an available dependency for module hbase-spark.
I checked the http://mvnrepository.com/search?q=HBase-Spark, not result matches.
I also checked https://hbase.apache.org/book.html#spark> ,there is not dependency of hbase-spark or relative as well.
And 

-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2016年6月29日 11:24
收件人: user@hbase.apache.org
主题: [Marketing Mail] Re: question about SparkSQL loading hbase tables

There is no hbase release with full support for SparkSQL yet.
For #1, the classes / directories are (master branch):

./hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext
./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext

hbase-spark/src/main/scala/org/apache/spark/sql/datasources/hbase/HBaseTableCatalog.scala

./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/datasources/HBaseSparkConf.scala

For documentation, see HBASE-15473.


On Tue, Jun 28, 2016 at 7:13 PM, 罗辉 <lu...@ifeng.com> wrote:

> Hi there
>
>      I am using SparkSQL to read from hbase, however
>
> 1.       I find some API not available in my dependencies. Where to add
> them:
>
> org.apache.hadoop.hbase.spark.example.hbasecontext
>
> org.apache.spark.sql.datasources.hbase.HBaseTableCatalog
>
> org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf
>
> 2.       Is there a complete example code about how to use SparkSQL
> read/write from hbase?
>
> The document I refered is this:
> http://hbase.apache.org/book.html#_sparksql_dataframes. It seems that 
> this is a snapshot for 2.0, while I am using hbase 1.2.1 + spark1.6.1 
> + hadoop2.7.1.
>
>
>
> In my App, I want to load the entire hbase table into sparksql
>
> My code:
>
>
>
> import org.apache.spark._
>
> import org.apache.hadoop.hbase._
>
> import org.apache.hadoop.hbase.HBaseConfiguration
>
> import org.apache.hadoop.hbase.spark.example.hbasecontext
>
> import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog
>
> import org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf
>
>
>
> object HbaseConnector {
>
>   def main(args: Array[String]) {
>
>     val tableName = args(0)
>
>     val sparkMasterUrlDev = "spark:// hadoopmaster:7077"
>
>     val sparkMasterUrlLocal = "local[2]"
>
>
>
>     val sparkConf = new SparkConf().setAppName("HbaseConnector for table "
> + tableName).setMaster(sparkMasterUrlDev).set("spark.executor.memory",
> "10g")
>
>     val sc = new SparkContext(sparkConf)
>
>     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
>     val conf = new HBaseConfiguration()
>
>     conf.set("hbase.zookeeper.quorum", "z1,z2,z3")
>
>     conf.set("hbase.zookeeper.property.clientPort", "2181")
>
>     conf.set("hbase.rootdir", "hdfs://hadoopmaster:8020/hbase")
>
>     //    val hbaseContext = new HBaseContext(sc, conf)
>
>
>
>     val pv = 
> sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog -> 
> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
>
>       .format("org.apache.hadoop.hbase.spark")
>
>       .load()
>
>     pv.write.saveAsTable(tableName)
>
>
>
>   }
>
>
>
> }
>
>
>
> My POM file is attached as well.
>
>
>
> Thanks for a help.
>
>
>
> San.Luo
>