You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Wenchen Fan <cl...@gmail.com> on 2017/06/21 02:21:54 UTC

Re: appendix

you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase,  and do join in Spark SQL.

> On 21 Jun 2017, at 10:17 AM, sunerhan1992@sina.com wrote:
> 
> Hello,
> My scenary is like this:
>         1.val df=hivecontext/carboncontex.sql("sql....")
>         2.iterating rows,extrating two columns,id and mvcc, and use id as key to scan hbase to get corresponding value
>             if mvcc==value, this row pass,else drop
> Is there a better way except dataframe.mapPartitions because it cause an extra stage and spend more time.
> I put two DAGs in appendix,please check!
> 
> Thanks！！
> sunerhan1992@sina.com <ma...@sina.com><appendix.zip>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>

Re: Re: appendix

Posted by "sunerhan1992@sina.com" <su...@sina.com>.

you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase,  and do join in Spark SQL.

> On 21 Jun 2017, at 10:17 AM, sunerhan1992@sina.com wrote:
> 
> Hello,
> My scenary is like this:
>         1.val df=hivecontext/carboncontex.sql("sql....")
>         2.iterating rows,extrating two columns,id and mvcc, and use id as key to scan hbase to get corresponding value
>             if mvcc==value, this row pass,else drop
> Is there a better way except dataframe.mapPartitions because it cause an extra stage and spend more time.
> I put two DAGs in appendix,please check!
> 
> Thanks！！
> sunerhan1992@sina.com <ma...@sina.com><appendix.zip>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>