You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Ted Yu <yu...@gmail.com> on 2016/04/19 19:20:09 UTC

Re: RFC: Remove "HBaseTest" from examples?

Corrected typo in subject.

I want to note that the hbase-spark module in HBase is incomplete. Zhan has
several patches pending review.

hbase-spark module is currently only in master branch which would be
released as 2.0
However the release date for 2.0 is unclear - probably half a year from now.

If we remove the examples now, there would be no release from either
project which can show users how to access hbase.

On Tue, Apr 19, 2016 at 10:15 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> Hey all,
>
> Two reasons why I think we should remove that from the examples:
>
> - HBase now has Spark integration in its own repo, so that really
> should be the template for how to use HBase from Spark, making that
> example less useful, even misleading.
>
> - It brings up a lot of extra dependencies that make the size of the
> Spark distribution grow.
>
> Any reason why we shouldn't drop that example?
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Ted Yu <yu...@gmail.com>.

'bq.' is used in JIRA to quote what other people have said.

On Tue, Apr 19, 2016 at 10:42 AM, Reynold Xin <rx...@databricks.com> wrote:

> Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian)
> syntax? They are not being rendered in email.
>
>
> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. it's actually in use right now in spite of not being in any upstream
>> HBase release
>>
>> If it is not in upstream, then it is not relevant for discussion on
>> Apache mailing list.
>>
>> On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>>
>>> Alright, if you prefer, I'll say "it's actually in use right now in
>>> spite of not being in any upstream HBase release", and it's more
>>> useful than a single example file in the Spark repo for those who
>>> really want to integrate with HBase.
>>>
>>> Spark's example is really very trivial (just uses one of HBase's input
>>> formats), which makes it not very useful as a blueprint for developing
>>> HBase apps with Spark.
>>>
>>> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com> wrote:
>>> > bq. I wouldn't call it "incomplete".
>>> >
>>> > I would call it incomplete.
>>> >
>>> > Please see HBASE-15333 'Enhance the filter to handle short, integer,
>>> long,
>>> > float and double' which is a bug fix.
>>> >
>>> > Please exclude presence of related of module in vendor distro from this
>>> > discussion.
>>> >
>>> > Thanks
>>> >
>>> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <va...@cloudera.com>
>>> > wrote:
>>> >>
>>> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com> wrote:
>>> >> > I want to note that the hbase-spark module in HBase is incomplete.
>>> Zhan
>>> >> > has
>>> >> > several patches pending review.
>>> >>
>>> >> I wouldn't call it "incomplete". Lots of functionality is there, which
>>> >> doesn't mean new ones, or more efficient implementations of existing
>>> >> ones, can't be added.
>>> >>
>>> >> > hbase-spark module is currently only in master branch which would be
>>> >> > released as 2.0
>>> >>
>>> >> Just as a side note, it's part of CDH 5.7.0, not that it matters much
>>> >> for upstream HBase.
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Reynold Xin <rx...@databricks.com>.

Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian)
syntax? They are not being rendered in email.


On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. it's actually in use right now in spite of not being in any upstream
> HBase release
>
> If it is not in upstream, then it is not relevant for discussion on Apache
> mailing list.
>
> On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> Alright, if you prefer, I'll say "it's actually in use right now in
>> spite of not being in any upstream HBase release", and it's more
>> useful than a single example file in the Spark repo for those who
>> really want to integrate with HBase.
>>
>> Spark's example is really very trivial (just uses one of HBase's input
>> formats), which makes it not very useful as a blueprint for developing
>> HBase apps with Spark.
>>
>> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com> wrote:
>> > bq. I wouldn't call it "incomplete".
>> >
>> > I would call it incomplete.
>> >
>> > Please see HBASE-15333 'Enhance the filter to handle short, integer,
>> long,
>> > float and double' which is a bug fix.
>> >
>> > Please exclude presence of related of module in vendor distro from this
>> > discussion.
>> >
>> > Thanks
>> >
>> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <va...@cloudera.com>
>> > wrote:
>> >>
>> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com> wrote:
>> >> > I want to note that the hbase-spark module in HBase is incomplete.
>> Zhan
>> >> > has
>> >> > several patches pending review.
>> >>
>> >> I wouldn't call it "incomplete". Lots of functionality is there, which
>> >> doesn't mean new ones, or more efficient implementations of existing
>> >> ones, can't be added.
>> >>
>> >> > hbase-spark module is currently only in master branch which would be
>> >> > released as 2.0
>> >>
>> >> Just as a side note, it's part of CDH 5.7.0, not that it matters much
>> >> for upstream HBase.
>> >>
>> >> --
>> >> Marcelo
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Tue, Apr 19, 2016 at 11:21 AM, Ted Yu <yu...@gmail.com> wrote:

> Clarification: in my previous email, I was not talking
> about spark-streaming-flume artifact or spark-streaming-kafka artifact.
> I was talking about examples for these projects, such
> as examples//src/main/python/streaming/flume_wordcount.py
>

I understand. And those examples are showing how to use code that is part
of Spark. HBaseTest just shows how to use a generic Spark API that can both
be used to talk to HBase or to anything else that has an InputFormat, so
it's much less useful as an example.

I'd put CassandraTest in that same category, although that particular
example at least shows more functionality than the HBase one.



> On Tue, Apr 19, 2016 at 11:10 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> The same question can be asked w.r.t. examples for other projects, such
>>> as flume and kafka.
>>>
>>
>> The main difference being that flume and kafka integration are part of
>> Spark itself. HBase integration is not.
>>
>>
>>
>>> On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mt...@handybook.com>
>>> wrote:
>>>
>>>> Let's posit that the spark example is much better than what is
>>>> available in HBase. Why is that a reason to keep it within Spark?
>>>>
>>>> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> bq. HBase's current support, even if there are bugs or things that
>>>>> still need to be done, is much better than the Spark example
>>>>>
>>>>> In my opinion, a simple example that works is better than a buggy
>>>>> package.
>>>>>
>>>>> I hope before long the hbase-spark module in HBase can arrive at a
>>>>> state which we can advertise as mature - but we're not there yet.
>>>>>
>>>>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <va...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> You're completely missing my point. I'm saying that HBase's current
>>>>>> support, even if there are bugs or things that still need to be done,
>>>>>> is much better than the Spark example, which is basically a call to
>>>>>> "SparkContext.hadoopRDD".
>>>>>>
>>>>>> Spark's example is not helpful in learning how to build an HBase
>>>>>> application on Spark, and clashes head on with how the HBase
>>>>>> developers think it should be done. That, and because it brings too
>>>>>> many dependencies for something that is not really useful, is why I'm
>>>>>> suggesting removing it.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>>>>>> >
>>>>>> > I would say the refguide link you provided should not be considered
>>>>>> as
>>>>>> > complete.
>>>>>> >
>>>>>> > Note it is marked as Blocker by Sean B.
>>>>>> >
>>>>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <
>>>>>> vanzin@cloudera.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> You're entitled to your own opinions.
>>>>>> >>
>>>>>> >> While you're at it, here's some much better documentation, from the
>>>>>> >> HBase project themselves, than what the Spark example provides:
>>>>>> >> http://hbase.apache.org/book.html#spark
>>>>>> >>
>>>>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com>
>>>>>> wrote:
>>>>>> >> > bq. it's actually in use right now in spite of not being in any
>>>>>> upstream
>>>>>> >> > HBase release
>>>>>> >> >
>>>>>> >> > If it is not in upstream, then it is not relevant for discussion
>>>>>> on
>>>>>> >> > Apache
>>>>>> >> > mailing list.
>>>>>> >> >
>>>>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>>>>>> vanzin@cloudera.com>
>>>>>> >> > wrote:
>>>>>> >> >>
>>>>>> >> >> Alright, if you prefer, I'll say "it's actually in use right
>>>>>> now in
>>>>>> >> >> spite of not being in any upstream HBase release", and it's more
>>>>>> >> >> useful than a single example file in the Spark repo for those
>>>>>> who
>>>>>> >> >> really want to integrate with HBase.
>>>>>> >> >>
>>>>>> >> >> Spark's example is really very trivial (just uses one of
>>>>>> HBase's input
>>>>>> >> >> formats), which makes it not very useful as a blueprint for
>>>>>> developing
>>>>>> >> >> HBase apps with Spark.
>>>>>> >> >>
>>>>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com>
>>>>>> wrote:
>>>>>> >> >> > bq. I wouldn't call it "incomplete".
>>>>>> >> >> >
>>>>>> >> >> > I would call it incomplete.
>>>>>> >> >> >
>>>>>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short,
>>>>>> integer,
>>>>>> >> >> > long,
>>>>>> >> >> > float and double' which is a bug fix.
>>>>>> >> >> >
>>>>>> >> >> > Please exclude presence of related of module in vendor distro
>>>>>> from
>>>>>> >> >> > this
>>>>>> >> >> > discussion.
>>>>>> >> >> >
>>>>>> >> >> > Thanks
>>>>>> >> >> >
>>>>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>>>>>> >> >> > <va...@cloudera.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >>
>>>>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <
>>>>>> yuzhihong@gmail.com>
>>>>>> >> >> >> wrote:
>>>>>> >> >> >> > I want to note that the hbase-spark module in HBase is
>>>>>> incomplete.
>>>>>> >> >> >> > Zhan
>>>>>> >> >> >> > has
>>>>>> >> >> >> > several patches pending review.
>>>>>> >> >> >>
>>>>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is
>>>>>> there,
>>>>>> >> >> >> which
>>>>>> >> >> >> doesn't mean new ones, or more efficient implementations of
>>>>>> existing
>>>>>> >> >> >> ones, can't be added.
>>>>>> >> >> >>
>>>>>> >> >> >> > hbase-spark module is currently only in master branch
>>>>>> which would
>>>>>> >> >> >> > be
>>>>>> >> >> >> > released as 2.0
>>>>>> >> >> >>
>>>>>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it
>>>>>> matters
>>>>>> >> >> >> much
>>>>>> >> >> >> for upstream HBase.
>>>>>> >> >> >>
>>>>>> >> >> >> --
>>>>>> >> >> >> Marcelo
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Marcelo
>>>>>> >> >
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Marcelo
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Marcelo
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Want to work at Handy? Check out our culture deck and open roles
>>>> <http://www.handy.com/careers>
>>>> Latest news <http://www.handy.com/press> at Handy
>>>> Handy just raised $50m
>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
>>>> by Fidelity
>>>>
>>>>
>>>
>>
>>
>> --
>> Marcelo
>>
>
>


-- 
Marcelo

Re: RFC: Remove "HBaseTest" from examples?

Posted by Ted Yu <yu...@gmail.com>.

Clarification: in my previous email, I was not talking
about spark-streaming-flume artifact or spark-streaming-kafka artifact.

I was talking about examples for these projects, such
as examples//src/main/python/streaming/flume_wordcount.py

On Tue, Apr 19, 2016 at 11:10 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> The same question can be asked w.r.t. examples for other projects, such
>> as flume and kafka.
>>
>
> The main difference being that flume and kafka integration are part of
> Spark itself. HBase integration is not.
>
>
>
>> On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mt...@handybook.com>
>> wrote:
>>
>>> Let's posit that the spark example is much better than what is available
>>> in HBase. Why is that a reason to keep it within Spark?
>>>
>>> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> bq. HBase's current support, even if there are bugs or things that
>>>> still need to be done, is much better than the Spark example
>>>>
>>>> In my opinion, a simple example that works is better than a buggy
>>>> package.
>>>>
>>>> I hope before long the hbase-spark module in HBase can arrive at a
>>>> state which we can advertise as mature - but we're not there yet.
>>>>
>>>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <va...@cloudera.com>
>>>> wrote:
>>>>
>>>>> You're completely missing my point. I'm saying that HBase's current
>>>>> support, even if there are bugs or things that still need to be done,
>>>>> is much better than the Spark example, which is basically a call to
>>>>> "SparkContext.hadoopRDD".
>>>>>
>>>>> Spark's example is not helpful in learning how to build an HBase
>>>>> application on Spark, and clashes head on with how the HBase
>>>>> developers think it should be done. That, and because it brings too
>>>>> many dependencies for something that is not really useful, is why I'm
>>>>> suggesting removing it.
>>>>>
>>>>>
>>>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>>>>> >
>>>>> > I would say the refguide link you provided should not be considered
>>>>> as
>>>>> > complete.
>>>>> >
>>>>> > Note it is marked as Blocker by Sean B.
>>>>> >
>>>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <
>>>>> vanzin@cloudera.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> You're entitled to your own opinions.
>>>>> >>
>>>>> >> While you're at it, here's some much better documentation, from the
>>>>> >> HBase project themselves, than what the Spark example provides:
>>>>> >> http://hbase.apache.org/book.html#spark
>>>>> >>
>>>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com>
>>>>> wrote:
>>>>> >> > bq. it's actually in use right now in spite of not being in any
>>>>> upstream
>>>>> >> > HBase release
>>>>> >> >
>>>>> >> > If it is not in upstream, then it is not relevant for discussion
>>>>> on
>>>>> >> > Apache
>>>>> >> > mailing list.
>>>>> >> >
>>>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>>>>> vanzin@cloudera.com>
>>>>> >> > wrote:
>>>>> >> >>
>>>>> >> >> Alright, if you prefer, I'll say "it's actually in use right now
>>>>> in
>>>>> >> >> spite of not being in any upstream HBase release", and it's more
>>>>> >> >> useful than a single example file in the Spark repo for those who
>>>>> >> >> really want to integrate with HBase.
>>>>> >> >>
>>>>> >> >> Spark's example is really very trivial (just uses one of HBase's
>>>>> input
>>>>> >> >> formats), which makes it not very useful as a blueprint for
>>>>> developing
>>>>> >> >> HBase apps with Spark.
>>>>> >> >>
>>>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com>
>>>>> wrote:
>>>>> >> >> > bq. I wouldn't call it "incomplete".
>>>>> >> >> >
>>>>> >> >> > I would call it incomplete.
>>>>> >> >> >
>>>>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short,
>>>>> integer,
>>>>> >> >> > long,
>>>>> >> >> > float and double' which is a bug fix.
>>>>> >> >> >
>>>>> >> >> > Please exclude presence of related of module in vendor distro
>>>>> from
>>>>> >> >> > this
>>>>> >> >> > discussion.
>>>>> >> >> >
>>>>> >> >> > Thanks
>>>>> >> >> >
>>>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>>>>> >> >> > <va...@cloudera.com>
>>>>> >> >> > wrote:
>>>>> >> >> >>
>>>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yuzhihong@gmail.com
>>>>> >
>>>>> >> >> >> wrote:
>>>>> >> >> >> > I want to note that the hbase-spark module in HBase is
>>>>> incomplete.
>>>>> >> >> >> > Zhan
>>>>> >> >> >> > has
>>>>> >> >> >> > several patches pending review.
>>>>> >> >> >>
>>>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is
>>>>> there,
>>>>> >> >> >> which
>>>>> >> >> >> doesn't mean new ones, or more efficient implementations of
>>>>> existing
>>>>> >> >> >> ones, can't be added.
>>>>> >> >> >>
>>>>> >> >> >> > hbase-spark module is currently only in master branch which
>>>>> would
>>>>> >> >> >> > be
>>>>> >> >> >> > released as 2.0
>>>>> >> >> >>
>>>>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it
>>>>> matters
>>>>> >> >> >> much
>>>>> >> >> >> for upstream HBase.
>>>>> >> >> >>
>>>>> >> >> >> --
>>>>> >> >> >> Marcelo
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Marcelo
>>>>> >> >
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Marcelo
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>
>>>>
>>>
>>> Want to work at Handy? Check out our culture deck and open roles
>>> <http://www.handy.com/careers>
>>> Latest news <http://www.handy.com/press> at Handy
>>> Handy just raised $50m
>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
>>> by Fidelity
>>>
>>>
>>
>
>
> --
> Marcelo
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yu...@gmail.com> wrote:

> The same question can be asked w.r.t. examples for other projects, such as flume
> and kafka.
>

The main difference being that flume and kafka integration are part of
Spark itself. HBase integration is not.



> On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mt...@handybook.com>
> wrote:
>
>> Let's posit that the spark example is much better than what is available
>> in HBase. Why is that a reason to keep it within Spark?
>>
>> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> bq. HBase's current support, even if there are bugs or things that
>>> still need to be done, is much better than the Spark example
>>>
>>> In my opinion, a simple example that works is better than a buggy
>>> package.
>>>
>>> I hope before long the hbase-spark module in HBase can arrive at a state
>>> which we can advertise as mature - but we're not there yet.
>>>
>>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <va...@cloudera.com>
>>> wrote:
>>>
>>>> You're completely missing my point. I'm saying that HBase's current
>>>> support, even if there are bugs or things that still need to be done,
>>>> is much better than the Spark example, which is basically a call to
>>>> "SparkContext.hadoopRDD".
>>>>
>>>> Spark's example is not helpful in learning how to build an HBase
>>>> application on Spark, and clashes head on with how the HBase
>>>> developers think it should be done. That, and because it brings too
>>>> many dependencies for something that is not really useful, is why I'm
>>>> suggesting removing it.
>>>>
>>>>
>>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
>>>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>>>> >
>>>> > I would say the refguide link you provided should not be considered as
>>>> > complete.
>>>> >
>>>> > Note it is marked as Blocker by Sean B.
>>>> >
>>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <vanzin@cloudera.com
>>>> >
>>>> > wrote:
>>>> >>
>>>> >> You're entitled to your own opinions.
>>>> >>
>>>> >> While you're at it, here's some much better documentation, from the
>>>> >> HBase project themselves, than what the Spark example provides:
>>>> >> http://hbase.apache.org/book.html#spark
>>>> >>
>>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com>
>>>> wrote:
>>>> >> > bq. it's actually in use right now in spite of not being in any
>>>> upstream
>>>> >> > HBase release
>>>> >> >
>>>> >> > If it is not in upstream, then it is not relevant for discussion on
>>>> >> > Apache
>>>> >> > mailing list.
>>>> >> >
>>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>>>> vanzin@cloudera.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Alright, if you prefer, I'll say "it's actually in use right now
>>>> in
>>>> >> >> spite of not being in any upstream HBase release", and it's more
>>>> >> >> useful than a single example file in the Spark repo for those who
>>>> >> >> really want to integrate with HBase.
>>>> >> >>
>>>> >> >> Spark's example is really very trivial (just uses one of HBase's
>>>> input
>>>> >> >> formats), which makes it not very useful as a blueprint for
>>>> developing
>>>> >> >> HBase apps with Spark.
>>>> >> >>
>>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com>
>>>> wrote:
>>>> >> >> > bq. I wouldn't call it "incomplete".
>>>> >> >> >
>>>> >> >> > I would call it incomplete.
>>>> >> >> >
>>>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short,
>>>> integer,
>>>> >> >> > long,
>>>> >> >> > float and double' which is a bug fix.
>>>> >> >> >
>>>> >> >> > Please exclude presence of related of module in vendor distro
>>>> from
>>>> >> >> > this
>>>> >> >> > discussion.
>>>> >> >> >
>>>> >> >> > Thanks
>>>> >> >> >
>>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>>>> >> >> > <va...@cloudera.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com>
>>>> >> >> >> wrote:
>>>> >> >> >> > I want to note that the hbase-spark module in HBase is
>>>> incomplete.
>>>> >> >> >> > Zhan
>>>> >> >> >> > has
>>>> >> >> >> > several patches pending review.
>>>> >> >> >>
>>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is
>>>> there,
>>>> >> >> >> which
>>>> >> >> >> doesn't mean new ones, or more efficient implementations of
>>>> existing
>>>> >> >> >> ones, can't be added.
>>>> >> >> >>
>>>> >> >> >> > hbase-spark module is currently only in master branch which
>>>> would
>>>> >> >> >> > be
>>>> >> >> >> > released as 2.0
>>>> >> >> >>
>>>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it
>>>> matters
>>>> >> >> >> much
>>>> >> >> >> for upstream HBase.
>>>> >> >> >>
>>>> >> >> >> --
>>>> >> >> >> Marcelo
>>>> >> >> >
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Marcelo
>>>> >> >
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Marcelo
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>
>>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
>> by Fidelity
>>
>>
>


-- 
Marcelo

Re: RFC: Remove "HBaseTest" from examples?

Posted by Ted Yu <yu...@gmail.com>.

The same question can be asked w.r.t. examples for other projects,
such as flume
and kafka.

On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mt...@handybook.com>
wrote:

> Let's posit that the spark example is much better than what is available
> in HBase. Why is that a reason to keep it within Spark?
>
> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. HBase's current support, even if there are bugs or things that still
>> need to be done, is much better than the Spark example
>>
>> In my opinion, a simple example that works is better than a buggy package.
>>
>> I hope before long the hbase-spark module in HBase can arrive at a state
>> which we can advertise as mature - but we're not there yet.
>>
>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>>
>>> You're completely missing my point. I'm saying that HBase's current
>>> support, even if there are bugs or things that still need to be done,
>>> is much better than the Spark example, which is basically a call to
>>> "SparkContext.hadoopRDD".
>>>
>>> Spark's example is not helpful in learning how to build an HBase
>>> application on Spark, and clashes head on with how the HBase
>>> developers think it should be done. That, and because it brings too
>>> many dependencies for something that is not really useful, is why I'm
>>> suggesting removing it.
>>>
>>>
>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
>>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>>> >
>>> > I would say the refguide link you provided should not be considered as
>>> > complete.
>>> >
>>> > Note it is marked as Blocker by Sean B.
>>> >
>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <va...@cloudera.com>
>>> > wrote:
>>> >>
>>> >> You're entitled to your own opinions.
>>> >>
>>> >> While you're at it, here's some much better documentation, from the
>>> >> HBase project themselves, than what the Spark example provides:
>>> >> http://hbase.apache.org/book.html#spark
>>> >>
>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
>>> >> > bq. it's actually in use right now in spite of not being in any
>>> upstream
>>> >> > HBase release
>>> >> >
>>> >> > If it is not in upstream, then it is not relevant for discussion on
>>> >> > Apache
>>> >> > mailing list.
>>> >> >
>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>>> vanzin@cloudera.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Alright, if you prefer, I'll say "it's actually in use right now in
>>> >> >> spite of not being in any upstream HBase release", and it's more
>>> >> >> useful than a single example file in the Spark repo for those who
>>> >> >> really want to integrate with HBase.
>>> >> >>
>>> >> >> Spark's example is really very trivial (just uses one of HBase's
>>> input
>>> >> >> formats), which makes it not very useful as a blueprint for
>>> developing
>>> >> >> HBase apps with Spark.
>>> >> >>
>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com>
>>> wrote:
>>> >> >> > bq. I wouldn't call it "incomplete".
>>> >> >> >
>>> >> >> > I would call it incomplete.
>>> >> >> >
>>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short,
>>> integer,
>>> >> >> > long,
>>> >> >> > float and double' which is a bug fix.
>>> >> >> >
>>> >> >> > Please exclude presence of related of module in vendor distro
>>> from
>>> >> >> > this
>>> >> >> > discussion.
>>> >> >> >
>>> >> >> > Thanks
>>> >> >> >
>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>>> >> >> > <va...@cloudera.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com>
>>> >> >> >> wrote:
>>> >> >> >> > I want to note that the hbase-spark module in HBase is
>>> incomplete.
>>> >> >> >> > Zhan
>>> >> >> >> > has
>>> >> >> >> > several patches pending review.
>>> >> >> >>
>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is there,
>>> >> >> >> which
>>> >> >> >> doesn't mean new ones, or more efficient implementations of
>>> existing
>>> >> >> >> ones, can't be added.
>>> >> >> >>
>>> >> >> >> > hbase-spark module is currently only in master branch which
>>> would
>>> >> >> >> > be
>>> >> >> >> > released as 2.0
>>> >> >> >>
>>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it matters
>>> >> >> >> much
>>> >> >> >> for upstream HBase.
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Marcelo
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Marcelo
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
> by Fidelity
>
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Marcin Tustin <mt...@handybook.com>.

Let's posit that the spark example is much better than what is available in
HBase. Why is that a reason to keep it within Spark?

On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. HBase's current support, even if there are bugs or things that still
> need to be done, is much better than the Spark example
>
> In my opinion, a simple example that works is better than a buggy package.
>
> I hope before long the hbase-spark module in HBase can arrive at a state
> which we can advertise as mature - but we're not there yet.
>
> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> You're completely missing my point. I'm saying that HBase's current
>> support, even if there are bugs or things that still need to be done,
>> is much better than the Spark example, which is basically a call to
>> "SparkContext.hadoopRDD".
>>
>> Spark's example is not helpful in learning how to build an HBase
>> application on Spark, and clashes head on with how the HBase
>> developers think it should be done. That, and because it brings too
>> many dependencies for something that is not really useful, is why I'm
>> suggesting removing it.
>>
>>
>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>> >
>> > I would say the refguide link you provided should not be considered as
>> > complete.
>> >
>> > Note it is marked as Blocker by Sean B.
>> >
>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <va...@cloudera.com>
>> > wrote:
>> >>
>> >> You're entitled to your own opinions.
>> >>
>> >> While you're at it, here's some much better documentation, from the
>> >> HBase project themselves, than what the Spark example provides:
>> >> http://hbase.apache.org/book.html#spark
>> >>
>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
>> >> > bq. it's actually in use right now in spite of not being in any
>> upstream
>> >> > HBase release
>> >> >
>> >> > If it is not in upstream, then it is not relevant for discussion on
>> >> > Apache
>> >> > mailing list.
>> >> >
>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>> vanzin@cloudera.com>
>> >> > wrote:
>> >> >>
>> >> >> Alright, if you prefer, I'll say "it's actually in use right now in
>> >> >> spite of not being in any upstream HBase release", and it's more
>> >> >> useful than a single example file in the Spark repo for those who
>> >> >> really want to integrate with HBase.
>> >> >>
>> >> >> Spark's example is really very trivial (just uses one of HBase's
>> input
>> >> >> formats), which makes it not very useful as a blueprint for
>> developing
>> >> >> HBase apps with Spark.
>> >> >>
>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com>
>> wrote:
>> >> >> > bq. I wouldn't call it "incomplete".
>> >> >> >
>> >> >> > I would call it incomplete.
>> >> >> >
>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short,
>> integer,
>> >> >> > long,
>> >> >> > float and double' which is a bug fix.
>> >> >> >
>> >> >> > Please exclude presence of related of module in vendor distro from
>> >> >> > this
>> >> >> > discussion.
>> >> >> >
>> >> >> > Thanks
>> >> >> >
>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>> >> >> > <va...@cloudera.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > I want to note that the hbase-spark module in HBase is
>> incomplete.
>> >> >> >> > Zhan
>> >> >> >> > has
>> >> >> >> > several patches pending review.
>> >> >> >>
>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is there,
>> >> >> >> which
>> >> >> >> doesn't mean new ones, or more efficient implementations of
>> existing
>> >> >> >> ones, can't be added.
>> >> >> >>
>> >> >> >> > hbase-spark module is currently only in master branch which
>> would
>> >> >> >> > be
>> >> >> >> > released as 2.0
>> >> >> >>
>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it matters
>> >> >> >> much
>> >> >> >> for upstream HBase.
>> >> >> >>
>> >> >> >> --
>> >> >> >> Marcelo
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity

Re: RFC: Remove "HBaseTest" from examples?

Posted by Sean Busbey <bu...@cloudera.com>.

I'd suggest that the hbase-downstreamer project[1] is a better place
for folks to see these examples. There's already an example for spark
streaming that does not rely on any of the new goodness in the
hbase-spark module[2].

Granted, it uses the Spark Java APIs[3], but we'd be glad to have a
scala based example if someone wanted to translate.

[1]: https://github.com/saintstack/hbase-downstreamer
[2]: https://github.com/saintstack/hbase-downstreamer#spark-streaming-test-application
[3]: https://s.apache.org/apvQ



On Tue, Apr 19, 2016 at 12:59 PM, Ted Yu <yu...@gmail.com> wrote:
> bq. HBase's current support, even if there are bugs or things that still
> need to be done, is much better than the Spark example
>
> In my opinion, a simple example that works is better than a buggy package.
>
> I hope before long the hbase-spark module in HBase can arrive at a state
> which we can advertise as mature - but we're not there yet.
>
> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>>
>> You're completely missing my point. I'm saying that HBase's current
>> support, even if there are bugs or things that still need to be done,
>> is much better than the Spark example, which is basically a call to
>> "SparkContext.hadoopRDD".
>>
>> Spark's example is not helpful in learning how to build an HBase
>> application on Spark, and clashes head on with how the HBase
>> developers think it should be done. That, and because it brings too
>> many dependencies for something that is not really useful, is why I'm
>> suggesting removing it.
>>
>>
>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>> >
>> > I would say the refguide link you provided should not be considered as
>> > complete.
>> >
>> > Note it is marked as Blocker by Sean B.
>> >
>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <va...@cloudera.com>
>> > wrote:
>> >>
>> >> You're entitled to your own opinions.
>> >>
>> >> While you're at it, here's some much better documentation, from the
>> >> HBase project themselves, than what the Spark example provides:
>> >> http://hbase.apache.org/book.html#spark
>> >>
>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
>> >> > bq. it's actually in use right now in spite of not being in any
>> >> > upstream
>> >> > HBase release
>> >> >
>> >> > If it is not in upstream, then it is not relevant for discussion on
>> >> > Apache
>> >> > mailing list.
>> >> >
>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin
>> >> > <va...@cloudera.com>
>> >> > wrote:
>> >> >>
>> >> >> Alright, if you prefer, I'll say "it's actually in use right now in
>> >> >> spite of not being in any upstream HBase release", and it's more
>> >> >> useful than a single example file in the Spark repo for those who
>> >> >> really want to integrate with HBase.
>> >> >>
>> >> >> Spark's example is really very trivial (just uses one of HBase's
>> >> >> input
>> >> >> formats), which makes it not very useful as a blueprint for
>> >> >> developing
>> >> >> HBase apps with Spark.
>> >> >>
>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com>
>> >> >> wrote:
>> >> >> > bq. I wouldn't call it "incomplete".
>> >> >> >
>> >> >> > I would call it incomplete.
>> >> >> >
>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short,
>> >> >> > integer,
>> >> >> > long,
>> >> >> > float and double' which is a bug fix.
>> >> >> >
>> >> >> > Please exclude presence of related of module in vendor distro from
>> >> >> > this
>> >> >> > discussion.
>> >> >> >
>> >> >> > Thanks
>> >> >> >
>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>> >> >> > <va...@cloudera.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > I want to note that the hbase-spark module in HBase is
>> >> >> >> > incomplete.
>> >> >> >> > Zhan
>> >> >> >> > has
>> >> >> >> > several patches pending review.
>> >> >> >>
>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is there,
>> >> >> >> which
>> >> >> >> doesn't mean new ones, or more efficient implementations of
>> >> >> >> existing
>> >> >> >> ones, can't be added.
>> >> >> >>
>> >> >> >> > hbase-spark module is currently only in master branch which
>> >> >> >> > would
>> >> >> >> > be
>> >> >> >> > released as 2.0
>> >> >> >>
>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it matters
>> >> >> >> much
>> >> >> >> for upstream HBase.
>> >> >> >>
>> >> >> >> --
>> >> >> >> Marcelo
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>
>



-- 
busbey

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: RFC: Remove "HBaseTest" from examples?

Posted by Ted Yu <yu...@gmail.com>.

bq. HBase's current support, even if there are bugs or things that still
need to be done, is much better than the Spark example

In my opinion, a simple example that works is better than a buggy package.

I hope before long the hbase-spark module in HBase can arrive at a state
which we can advertise as mature - but we're not there yet.

On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> You're completely missing my point. I'm saying that HBase's current
> support, even if there are bugs or things that still need to be done,
> is much better than the Spark example, which is basically a call to
> "SparkContext.hadoopRDD".
>
> Spark's example is not helpful in learning how to build an HBase
> application on Spark, and clashes head on with how the HBase
> developers think it should be done. That, and because it brings too
> many dependencies for something that is not really useful, is why I'm
> suggesting removing it.
>
>
> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
> > There is an Open JIRA for fixing the documentation: HBASE-15473
> >
> > I would say the refguide link you provided should not be considered as
> > complete.
> >
> > Note it is marked as Blocker by Sean B.
> >
> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <va...@cloudera.com>
> > wrote:
> >>
> >> You're entitled to your own opinions.
> >>
> >> While you're at it, here's some much better documentation, from the
> >> HBase project themselves, than what the Spark example provides:
> >> http://hbase.apache.org/book.html#spark
> >>
> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
> >> > bq. it's actually in use right now in spite of not being in any
> upstream
> >> > HBase release
> >> >
> >> > If it is not in upstream, then it is not relevant for discussion on
> >> > Apache
> >> > mailing list.
> >> >
> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <vanzin@cloudera.com
> >
> >> > wrote:
> >> >>
> >> >> Alright, if you prefer, I'll say "it's actually in use right now in
> >> >> spite of not being in any upstream HBase release", and it's more
> >> >> useful than a single example file in the Spark repo for those who
> >> >> really want to integrate with HBase.
> >> >>
> >> >> Spark's example is really very trivial (just uses one of HBase's
> input
> >> >> formats), which makes it not very useful as a blueprint for
> developing
> >> >> HBase apps with Spark.
> >> >>
> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >> > bq. I wouldn't call it "incomplete".
> >> >> >
> >> >> > I would call it incomplete.
> >> >> >
> >> >> > Please see HBASE-15333 'Enhance the filter to handle short,
> integer,
> >> >> > long,
> >> >> > float and double' which is a bug fix.
> >> >> >
> >> >> > Please exclude presence of related of module in vendor distro from
> >> >> > this
> >> >> > discussion.
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
> >> >> > <va...@cloudera.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com>
> >> >> >> wrote:
> >> >> >> > I want to note that the hbase-spark module in HBase is
> incomplete.
> >> >> >> > Zhan
> >> >> >> > has
> >> >> >> > several patches pending review.
> >> >> >>
> >> >> >> I wouldn't call it "incomplete". Lots of functionality is there,
> >> >> >> which
> >> >> >> doesn't mean new ones, or more efficient implementations of
> existing
> >> >> >> ones, can't be added.
> >> >> >>
> >> >> >> > hbase-spark module is currently only in master branch which
> would
> >> >> >> > be
> >> >> >> > released as 2.0
> >> >> >>
> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it matters
> >> >> >> much
> >> >> >> for upstream HBase.
> >> >> >>
> >> >> >> --
> >> >> >> Marcelo
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Marcelo
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Marcelo
> >
> >
>
>
>
> --
> Marcelo
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Marcelo Vanzin <va...@cloudera.com>.

You're completely missing my point. I'm saying that HBase's current
support, even if there are bugs or things that still need to be done,
is much better than the Spark example, which is basically a call to
"SparkContext.hadoopRDD".

Spark's example is not helpful in learning how to build an HBase
application on Spark, and clashes head on with how the HBase
developers think it should be done. That, and because it brings too
many dependencies for something that is not really useful, is why I'm
suggesting removing it.


On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yu...@gmail.com> wrote:
> There is an Open JIRA for fixing the documentation: HBASE-15473
>
> I would say the refguide link you provided should not be considered as
> complete.
>
> Note it is marked as Blocker by Sean B.
>
> On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>>
>> You're entitled to your own opinions.
>>
>> While you're at it, here's some much better documentation, from the
>> HBase project themselves, than what the Spark example provides:
>> http://hbase.apache.org/book.html#spark
>>
>> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
>> > bq. it's actually in use right now in spite of not being in any upstream
>> > HBase release
>> >
>> > If it is not in upstream, then it is not relevant for discussion on
>> > Apache
>> > mailing list.
>> >
>> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <va...@cloudera.com>
>> > wrote:
>> >>
>> >> Alright, if you prefer, I'll say "it's actually in use right now in
>> >> spite of not being in any upstream HBase release", and it's more
>> >> useful than a single example file in the Spark repo for those who
>> >> really want to integrate with HBase.
>> >>
>> >> Spark's example is really very trivial (just uses one of HBase's input
>> >> formats), which makes it not very useful as a blueprint for developing
>> >> HBase apps with Spark.
>> >>
>> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com> wrote:
>> >> > bq. I wouldn't call it "incomplete".
>> >> >
>> >> > I would call it incomplete.
>> >> >
>> >> > Please see HBASE-15333 'Enhance the filter to handle short, integer,
>> >> > long,
>> >> > float and double' which is a bug fix.
>> >> >
>> >> > Please exclude presence of related of module in vendor distro from
>> >> > this
>> >> > discussion.
>> >> >
>> >> > Thanks
>> >> >
>> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>> >> > <va...@cloudera.com>
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com>
>> >> >> wrote:
>> >> >> > I want to note that the hbase-spark module in HBase is incomplete.
>> >> >> > Zhan
>> >> >> > has
>> >> >> > several patches pending review.
>> >> >>
>> >> >> I wouldn't call it "incomplete". Lots of functionality is there,
>> >> >> which
>> >> >> doesn't mean new ones, or more efficient implementations of existing
>> >> >> ones, can't be added.
>> >> >>
>> >> >> > hbase-spark module is currently only in master branch which would
>> >> >> > be
>> >> >> > released as 2.0
>> >> >>
>> >> >> Just as a side note, it's part of CDH 5.7.0, not that it matters
>> >> >> much
>> >> >> for upstream HBase.
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: RFC: Remove "HBaseTest" from examples?

Posted by Ted Yu <yu...@gmail.com>.

There is an Open JIRA for fixing the documentation: HBASE-15473

I would say the refguide link you provided should not be considered as
complete.

Note it is marked as Blocker by Sean B.

On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> You're entitled to your own opinions.
>
> While you're at it, here's some much better documentation, from the
> HBase project themselves, than what the Spark example provides:
> http://hbase.apache.org/book.html#spark
>
> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
> > bq. it's actually in use right now in spite of not being in any upstream
> > HBase release
> >
> > If it is not in upstream, then it is not relevant for discussion on
> Apache
> > mailing list.
> >
> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <va...@cloudera.com>
> > wrote:
> >>
> >> Alright, if you prefer, I'll say "it's actually in use right now in
> >> spite of not being in any upstream HBase release", and it's more
> >> useful than a single example file in the Spark repo for those who
> >> really want to integrate with HBase.
> >>
> >> Spark's example is really very trivial (just uses one of HBase's input
> >> formats), which makes it not very useful as a blueprint for developing
> >> HBase apps with Spark.
> >>
> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com> wrote:
> >> > bq. I wouldn't call it "incomplete".
> >> >
> >> > I would call it incomplete.
> >> >
> >> > Please see HBASE-15333 'Enhance the filter to handle short, integer,
> >> > long,
> >> > float and double' which is a bug fix.
> >> >
> >> > Please exclude presence of related of module in vendor distro from
> this
> >> > discussion.
> >> >
> >> > Thanks
> >> >
> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <vanzin@cloudera.com
> >
> >> > wrote:
> >> >>
> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >> > I want to note that the hbase-spark module in HBase is incomplete.
> >> >> > Zhan
> >> >> > has
> >> >> > several patches pending review.
> >> >>
> >> >> I wouldn't call it "incomplete". Lots of functionality is there,
> which
> >> >> doesn't mean new ones, or more efficient implementations of existing
> >> >> ones, can't be added.
> >> >>
> >> >> > hbase-spark module is currently only in master branch which would
> be
> >> >> > released as 2.0
> >> >>
> >> >> Just as a side note, it's part of CDH 5.7.0, not that it matters much
> >> >> for upstream HBase.
> >> >>
> >> >> --
> >> >> Marcelo
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Marcelo
> >
> >
>
>
>
> --
> Marcelo
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Marcelo Vanzin <va...@cloudera.com>.

You're entitled to your own opinions.

While you're at it, here's some much better documentation, from the
HBase project themselves, than what the Spark example provides:
http://hbase.apache.org/book.html#spark

On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yu...@gmail.com> wrote:
> bq. it's actually in use right now in spite of not being in any upstream
> HBase release
>
> If it is not in upstream, then it is not relevant for discussion on Apache
> mailing list.
>
> On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>>
>> Alright, if you prefer, I'll say "it's actually in use right now in
>> spite of not being in any upstream HBase release", and it's more
>> useful than a single example file in the Spark repo for those who
>> really want to integrate with HBase.
>>
>> Spark's example is really very trivial (just uses one of HBase's input
>> formats), which makes it not very useful as a blueprint for developing
>> HBase apps with Spark.
>>
>> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com> wrote:
>> > bq. I wouldn't call it "incomplete".
>> >
>> > I would call it incomplete.
>> >
>> > Please see HBASE-15333 'Enhance the filter to handle short, integer,
>> > long,
>> > float and double' which is a bug fix.
>> >
>> > Please exclude presence of related of module in vendor distro from this
>> > discussion.
>> >
>> > Thanks
>> >
>> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <va...@cloudera.com>
>> > wrote:
>> >>
>> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com> wrote:
>> >> > I want to note that the hbase-spark module in HBase is incomplete.
>> >> > Zhan
>> >> > has
>> >> > several patches pending review.
>> >>
>> >> I wouldn't call it "incomplete". Lots of functionality is there, which
>> >> doesn't mean new ones, or more efficient implementations of existing
>> >> ones, can't be added.
>> >>
>> >> > hbase-spark module is currently only in master branch which would be
>> >> > released as 2.0
>> >>
>> >> Just as a side note, it's part of CDH 5.7.0, not that it matters much
>> >> for upstream HBase.
>> >>
>> >> --
>> >> Marcelo
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: RFC: Remove "HBaseTest" from examples?

Posted by Ted Yu <yu...@gmail.com>.

bq. it's actually in use right now in spite of not being in any upstream
HBase release

If it is not in upstream, then it is not relevant for discussion on Apache
mailing list.

On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> Alright, if you prefer, I'll say "it's actually in use right now in
> spite of not being in any upstream HBase release", and it's more
> useful than a single example file in the Spark repo for those who
> really want to integrate with HBase.
>
> Spark's example is really very trivial (just uses one of HBase's input
> formats), which makes it not very useful as a blueprint for developing
> HBase apps with Spark.
>
> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com> wrote:
> > bq. I wouldn't call it "incomplete".
> >
> > I would call it incomplete.
> >
> > Please see HBASE-15333 'Enhance the filter to handle short, integer,
> long,
> > float and double' which is a bug fix.
> >
> > Please exclude presence of related of module in vendor distro from this
> > discussion.
> >
> > Thanks
> >
> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <va...@cloudera.com>
> > wrote:
> >>
> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com> wrote:
> >> > I want to note that the hbase-spark module in HBase is incomplete.
> Zhan
> >> > has
> >> > several patches pending review.
> >>
> >> I wouldn't call it "incomplete". Lots of functionality is there, which
> >> doesn't mean new ones, or more efficient implementations of existing
> >> ones, can't be added.
> >>
> >> > hbase-spark module is currently only in master branch which would be
> >> > released as 2.0
> >>
> >> Just as a side note, it's part of CDH 5.7.0, not that it matters much
> >> for upstream HBase.
> >>
> >> --
> >> Marcelo
> >
> >
>
>
>
> --
> Marcelo
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Marcelo Vanzin <va...@cloudera.com>.

Alright, if you prefer, I'll say "it's actually in use right now in
spite of not being in any upstream HBase release", and it's more
useful than a single example file in the Spark repo for those who
really want to integrate with HBase.

Spark's example is really very trivial (just uses one of HBase's input
formats), which makes it not very useful as a blueprint for developing
HBase apps with Spark.

On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yu...@gmail.com> wrote:
> bq. I wouldn't call it "incomplete".
>
> I would call it incomplete.
>
> Please see HBASE-15333 'Enhance the filter to handle short, integer, long,
> float and double' which is a bug fix.
>
> Please exclude presence of related of module in vendor distro from this
> discussion.
>
> Thanks
>
> On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>>
>> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com> wrote:
>> > I want to note that the hbase-spark module in HBase is incomplete. Zhan
>> > has
>> > several patches pending review.
>>
>> I wouldn't call it "incomplete". Lots of functionality is there, which
>> doesn't mean new ones, or more efficient implementations of existing
>> ones, can't be added.
>>
>> > hbase-spark module is currently only in master branch which would be
>> > released as 2.0
>>
>> Just as a side note, it's part of CDH 5.7.0, not that it matters much
>> for upstream HBase.
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: RFC: Remove "HBaseTest" from examples?

Posted by Ted Yu <yu...@gmail.com>.

bq. I wouldn't call it "incomplete".

I would call it incomplete.

Please see HBASE-15333 'Enhance the filter to handle short, integer, long,
float and double' which is a bug fix.

Please exclude presence of related of module in vendor distro from this
discussion.

Thanks

On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com> wrote:
> > I want to note that the hbase-spark module in HBase is incomplete. Zhan
> has
> > several patches pending review.
>
> I wouldn't call it "incomplete". Lots of functionality is there, which
> doesn't mean new ones, or more efficient implementations of existing
> ones, can't be added.
>
> > hbase-spark module is currently only in master branch which would be
> > released as 2.0
>
> Just as a side note, it's part of CDH 5.7.0, not that it matters much
> for upstream HBase.
>
> --
> Marcelo
>

Re: RFC: Remove "HBaseTest" from examples?

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yu...@gmail.com> wrote:
> I want to note that the hbase-spark module in HBase is incomplete. Zhan has
> several patches pending review.

I wouldn't call it "incomplete". Lots of functionality is there, which
doesn't mean new ones, or more efficient implementations of existing
ones, can't be added.

> hbase-spark module is currently only in master branch which would be
> released as 2.0

Just as a side note, it's part of CDH 5.7.0, not that it matters much
for upstream HBase.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org