You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Yan Zhou.sc" <Ya...@huawei.com> on 2015/08/11 10:07:38 UTC

答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

Ok. Then a question will be to define a boundary between a query engine and a built-in processing. If, for instance, the Spark DataFrame functionalities involving shuffling are to be supported inside HBase,
in my opinion, it’d be hard not to tag it as an query engine. If, on the other hand, only map-side ops from DataFrame are to be supported inside HBase, then Astro’s coprocessor already has the capabilities.

Again, I still have no full knowledge about HBase-14181 beyond your description in email. So my opinion above might be skewed as result.

Regards,

Yan

发件人: Ted Yu [mailto:yuzhihong@gmail.com]
发送时间: 2015年8月11日 15:28
收件人: Yan Zhou.sc
抄送: Bing Xiao (Bing); dev@spark.apache.org; user@spark.apache.org
主题: Re: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

HBase will not have query engine.

It will provide better support to query engines.

Cheers

On Aug 10, 2015, at 11:11 PM, Yan Zhou.sc <Ya...@huawei.com>> wrote:
Ted,

I’m in China now, and seem to experience difficulty to access Apache Jira. Anyways, it appears to me that HBASE-14181<https://issues.apache.org/jira/browse/HBASE-14181> attempts to support Spark DataFrame inside HBase.
If true, one question to me is whether HBase is intended to have a built-in query engine or not. Or it will stick with the current way as
a k-v store with some built-in processing capabilities in the forms of coprocessor, custom filter, …, etc., which allows for loosely-coupled query engines
built on top of it.

Thanks,

发件人: Ted Yu [mailto:yuzhihong@gmail.com]
发送时间: 2015年8月11日 8:54
收件人: Bing Xiao (Bing)
抄送: dev@spark.apache.org<ma...@spark.apache.org>; user@spark.apache.org<ma...@spark.apache.org>; Yan Zhou.sc
主题: Re: Package Release Annoucement: Spark SQL on HBase "Astro"

Yan / Bing:
Mind taking a look at HBASE-14181<https://issues.apache.org/jira/browse/HBASE-14181> 'Add Spark DataFrame DataSource to HBase-Spark Module' ?

Thanks

On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) <bi...@huawei.com>> wrote:
We are happy to announce the availability of the Spark SQL on HBase 1.0.0 release. http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
The main features in this package, dubbed “Astro”, include:

• Systematic and powerful handling of data pruning and intelligent scan, based on partial evaluation technique

• HBase pushdown capabilities like custom filters and coprocessor to support ultra low latency processing

• SQL, Data Frame support

• More SQL capabilities made possible (Secondary index, bloom filter, Primary Key, Bulk load, Update)

• Joins with data from other sources

• Python/Java/Scala support

• Support latest Spark 1.4.0 release

The tests by Huawei team and community contributors covered the areas: bulk load; projection pruning; partition pruning; partial evaluation; code generation; coprocessor; customer filtering; DML; complex filtering on keys and non-keys; Join/union with non-Hbase data; Data Frame; multi-column family test. We will post the test results including performance tests the middle of August.
You are very welcomed to try out or deploy the package, and help improve the integration tests with various combinations of the settings, extensive Data Frame tests, complex join/union test and extensive performance tests. Please use the “Issues” “Pull Requests” links at this package homepage, if you want to report bugs, improvement or feature requests.
Special thanks to project owner and technical leader Yan Zhou, Huawei global team, community contributors and Databricks. Databricks has been providing great assistance from the design to the release.
“Astro”, the Spark SQL on HBase package will be useful for ultra low latency query and analytics of large scale data sets in vertical enterprises. We will continue to work with the community to develop new features and improve code base. Your comments and suggestions are greatly appreciated.

Yan Zhou / Bing Xiao
Huawei Big Data team