You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@apache.org> on 2016/11/16 17:49:20 UTC

[DISCUSS] hbase-spark module in branch-1 and branch-2

Hi folks!

With 2.0 releases coming up, I'd like to revive our prior discussion
on the readiness of the hbase-spark module for downstream users.

We've had a ticket for tracking the milestones set up for inclusion in
branch-1 releases for about 1.5 years:

https://issues.apache.org/jira/browse/HBASE-14160

We still haven't gotten all of the blocker issues completed, AFAIK.

Is anyone interested in volunteering to knock the rest of these out?

If they aren't, shall we plan to leave hbase-spark in master and
revert it from branch-2 once it forks for the HBase 2.0 release line?

This feature isn't a blocker for 2.0; just as we've been planning to
add the hbase-spark module to some 1.y release we can also include it
in a 2.1+ release.

This does appear to be a feature our downstream users could benefit
from, so I'd hate to continue the current situation where no official
releases include it. This is especially true now that we're looking at
ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.

-
busbey

Re: [DISCUSS] hbase-spark module in branch-1 and branch-2

Posted by Jerry He <je...@gmail.com>.
Hi, Andrew

Stack was talking to me about this area when I met him in the HBase Meetup
last December.
Let me take a shot at HBASE-14375.

Thanks,

Jerry

On Sat, Jan 14, 2017 at 9:22 PM, Andrew Purtell <an...@gmail.com>
wrote:

>
>
> > On Jan 14, 2017, at 9:07 PM, Jerry He <je...@gmail.com> wrote:
> >
> > I think it will be a big disappointment for the community if the
> > hbase-spark module is not going into 2.0.
> > I understand there are still a few blockers, including HBASE-16179.
>
> Patches welcome. :-)
>
>
> > We have it in our distribution, probably in other vendors' as well.  It
> is
> > little easier for us because we can be flexible on the supported
> > Spark/Scala version combinations and the APIs.
> > But a major release still without a good Spark story for the HBase open
> > source community does not look good.
> >
> > Jerry
> >
> >> On Sat, Jan 14, 2017 at 4:52 PM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >> I agree with Devaraj's assessment w.r.t. hbase-spark module in master
> >> (which is becoming branch-2).
> >>
> >> Cheers
> >>
> >>
> >>
> >> On Mon, Nov 21, 2016 at 11:46 AM, Devaraj Das <dd...@hortonworks.com>
> >> wrote:
> >>
> >>> Hi Sean, I did a quick check with someone from the Spark team here and
> >> his
> >>> opinion was that the hbase-spark module as it currently stands can be
> >> used
> >>> by downstream users to do basic stuff and to try some simple things
> out,
> >>> etc. The integration is improving.
> >>> I think we should get what we have in 2.0 (which is the default action
> >>> anyways).
> >>> Thanks
> >>> Devaraj
> >>> ________________________________________
> >>> From: Sean Busbey <bu...@apache.org>
> >>> Sent: Wednesday, November 16, 2016 9:49 AM
> >>> To: dev
> >>> Subject: [DISCUSS] hbase-spark module in branch-1 and branch-2
> >>>
> >>> Hi folks!
> >>>
> >>> With 2.0 releases coming up, I'd like to revive our prior discussion
> >>> on the readiness of the hbase-spark module for downstream users.
> >>>
> >>> We've had a ticket for tracking the milestones set up for inclusion in
> >>> branch-1 releases for about 1.5 years:
> >>>
> >>> https://issues.apache.org/jira/browse/HBASE-14160
> >>>
> >>> We still haven't gotten all of the blocker issues completed, AFAIK.
> >>>
> >>> Is anyone interested in volunteering to knock the rest of these out?
> >>>
> >>> If they aren't, shall we plan to leave hbase-spark in master and
> >>> revert it from branch-2 once it forks for the HBase 2.0 release line?
> >>>
> >>> This feature isn't a blocker for 2.0; just as we've been planning to
> >>> add the hbase-spark module to some 1.y release we can also include it
> >>> in a 2.1+ release.
> >>>
> >>> This does appear to be a feature our downstream users could benefit
> >>> from, so I'd hate to continue the current situation where no official
> >>> releases include it. This is especially true now that we're looking at
> >>> ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.
> >>>
> >>> -
> >>> busbey
> >>>
> >>>
> >>
>

Re: [DISCUSS] hbase-spark module in branch-1 and branch-2

Posted by Andrew Purtell <an...@gmail.com>.

> On Jan 14, 2017, at 9:07 PM, Jerry He <je...@gmail.com> wrote:
> 
> I think it will be a big disappointment for the community if the
> hbase-spark module is not going into 2.0.
> I understand there are still a few blockers, including HBASE-16179.

Patches welcome. :-) 


> We have it in our distribution, probably in other vendors' as well.  It is
> little easier for us because we can be flexible on the supported
> Spark/Scala version combinations and the APIs.
> But a major release still without a good Spark story for the HBase open
> source community does not look good.
> 
> Jerry
> 
>> On Sat, Jan 14, 2017 at 4:52 PM, Ted Yu <yu...@gmail.com> wrote:
>> 
>> I agree with Devaraj's assessment w.r.t. hbase-spark module in master
>> (which is becoming branch-2).
>> 
>> Cheers
>> 
>> 
>> 
>> On Mon, Nov 21, 2016 at 11:46 AM, Devaraj Das <dd...@hortonworks.com>
>> wrote:
>> 
>>> Hi Sean, I did a quick check with someone from the Spark team here and
>> his
>>> opinion was that the hbase-spark module as it currently stands can be
>> used
>>> by downstream users to do basic stuff and to try some simple things out,
>>> etc. The integration is improving.
>>> I think we should get what we have in 2.0 (which is the default action
>>> anyways).
>>> Thanks
>>> Devaraj
>>> ________________________________________
>>> From: Sean Busbey <bu...@apache.org>
>>> Sent: Wednesday, November 16, 2016 9:49 AM
>>> To: dev
>>> Subject: [DISCUSS] hbase-spark module in branch-1 and branch-2
>>> 
>>> Hi folks!
>>> 
>>> With 2.0 releases coming up, I'd like to revive our prior discussion
>>> on the readiness of the hbase-spark module for downstream users.
>>> 
>>> We've had a ticket for tracking the milestones set up for inclusion in
>>> branch-1 releases for about 1.5 years:
>>> 
>>> https://issues.apache.org/jira/browse/HBASE-14160
>>> 
>>> We still haven't gotten all of the blocker issues completed, AFAIK.
>>> 
>>> Is anyone interested in volunteering to knock the rest of these out?
>>> 
>>> If they aren't, shall we plan to leave hbase-spark in master and
>>> revert it from branch-2 once it forks for the HBase 2.0 release line?
>>> 
>>> This feature isn't a blocker for 2.0; just as we've been planning to
>>> add the hbase-spark module to some 1.y release we can also include it
>>> in a 2.1+ release.
>>> 
>>> This does appear to be a feature our downstream users could benefit
>>> from, so I'd hate to continue the current situation where no official
>>> releases include it. This is especially true now that we're looking at
>>> ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.
>>> 
>>> -
>>> busbey
>>> 
>>> 
>> 

Re: [DISCUSS] hbase-spark module in branch-1 and branch-2

Posted by Jerry He <je...@gmail.com>.
I think it will be a big disappointment for the community if the
hbase-spark module is not going into 2.0.
I understand there are still a few blockers, including HBASE-16179.
We have it in our distribution, probably in other vendors' as well.  It is
little easier for us because we can be flexible on the supported
Spark/Scala version combinations and the APIs.
But a major release still without a good Spark story for the HBase open
source community does not look good.

Jerry

On Sat, Jan 14, 2017 at 4:52 PM, Ted Yu <yu...@gmail.com> wrote:

> I agree with Devaraj's assessment w.r.t. hbase-spark module in master
> (which is becoming branch-2).
>
> Cheers
>
>
>
> On Mon, Nov 21, 2016 at 11:46 AM, Devaraj Das <dd...@hortonworks.com>
> wrote:
>
> > Hi Sean, I did a quick check with someone from the Spark team here and
> his
> > opinion was that the hbase-spark module as it currently stands can be
> used
> > by downstream users to do basic stuff and to try some simple things out,
> > etc. The integration is improving.
> > I think we should get what we have in 2.0 (which is the default action
> > anyways).
> > Thanks
> > Devaraj
> > ________________________________________
> > From: Sean Busbey <bu...@apache.org>
> > Sent: Wednesday, November 16, 2016 9:49 AM
> > To: dev
> > Subject: [DISCUSS] hbase-spark module in branch-1 and branch-2
> >
> > Hi folks!
> >
> > With 2.0 releases coming up, I'd like to revive our prior discussion
> > on the readiness of the hbase-spark module for downstream users.
> >
> > We've had a ticket for tracking the milestones set up for inclusion in
> > branch-1 releases for about 1.5 years:
> >
> > https://issues.apache.org/jira/browse/HBASE-14160
> >
> > We still haven't gotten all of the blocker issues completed, AFAIK.
> >
> > Is anyone interested in volunteering to knock the rest of these out?
> >
> > If they aren't, shall we plan to leave hbase-spark in master and
> > revert it from branch-2 once it forks for the HBase 2.0 release line?
> >
> > This feature isn't a blocker for 2.0; just as we've been planning to
> > add the hbase-spark module to some 1.y release we can also include it
> > in a 2.1+ release.
> >
> > This does appear to be a feature our downstream users could benefit
> > from, so I'd hate to continue the current situation where no official
> > releases include it. This is especially true now that we're looking at
> > ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.
> >
> > -
> > busbey
> >
> >
>

Re: [DISCUSS] hbase-spark module in branch-1 and branch-2

Posted by Ted Yu <yu...@gmail.com>.
After HBASE-16179 gets reviewed / committed, I should be able to take on
other high priority Spark connector issues.

Cheers

On Wed, Jan 18, 2017 at 12:30 PM, Sean Busbey <bu...@apache.org> wrote:

> I don't doubt that downstream users could "try out" our integration
> using what currently exists in the branch-2. However, we already had
> community consensus on what is necessary for our downstream folks to
> have a good experience with a ready-for-production feature. I don't
> see why we should subject them to a lower bar in a branch-2 release
> than we would have in a branch-1 release just because we're starting
> up a new major version.
>
> The work in HBASE-16179 is certainly a blocker given the rising
> popularity of Spark 2.0 (thanks Ted for getting that work under way, I
> hope we get sufficient review bandwidth to get it finished), but it's not
> everything; e.g. we don't have regression checks in place for the
> things that show up in our docs.
>
> -
> busbey
>
> On Sat, Jan 14, 2017 at 4:52 PM, Ted Yu <yu...@gmail.com> wrote:
> > I agree with Devaraj's assessment w.r.t. hbase-spark module in master
> > (which is becoming branch-2).
> >
> > Cheers
> >
> >
> >
> > On Mon, Nov 21, 2016 at 11:46 AM, Devaraj Das <dd...@hortonworks.com>
> wrote:
> >
> >> Hi Sean, I did a quick check with someone from the Spark team here and
> his
> >> opinion was that the hbase-spark module as it currently stands can be
> used
> >> by downstream users to do basic stuff and to try some simple things out,
> >> etc. The integration is improving.
> >> I think we should get what we have in 2.0 (which is the default action
> >> anyways).
> >> Thanks
> >> Devaraj
> >> ________________________________________
> >> From: Sean Busbey <bu...@apache.org>
> >> Sent: Wednesday, November 16, 2016 9:49 AM
> >> To: dev
> >> Subject: [DISCUSS] hbase-spark module in branch-1 and branch-2
> >>
> >> Hi folks!
> >>
> >> With 2.0 releases coming up, I'd like to revive our prior discussion
> >> on the readiness of the hbase-spark module for downstream users.
> >>
> >> We've had a ticket for tracking the milestones set up for inclusion in
> >> branch-1 releases for about 1.5 years:
> >>
> >> https://issues.apache.org/jira/browse/HBASE-14160
> >>
> >> We still haven't gotten all of the blocker issues completed, AFAIK.
> >>
> >> Is anyone interested in volunteering to knock the rest of these out?
> >>
> >> If they aren't, shall we plan to leave hbase-spark in master and
> >> revert it from branch-2 once it forks for the HBase 2.0 release line?
> >>
> >> This feature isn't a blocker for 2.0; just as we've been planning to
> >> add the hbase-spark module to some 1.y release we can also include it
> >> in a 2.1+ release.
> >>
> >> This does appear to be a feature our downstream users could benefit
> >> from, so I'd hate to continue the current situation where no official
> >> releases include it. This is especially true now that we're looking at
> >> ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.
> >>
> >> -
> >> busbey
> >>
> >>
>

Re: [DISCUSS] hbase-spark module in branch-1 and branch-2

Posted by Sean Busbey <bu...@apache.org>.
I don't doubt that downstream users could "try out" our integration
using what currently exists in the branch-2. However, we already had
community consensus on what is necessary for our downstream folks to
have a good experience with a ready-for-production feature. I don't
see why we should subject them to a lower bar in a branch-2 release
than we would have in a branch-1 release just because we're starting
up a new major version.

The work in HBASE-16179 is certainly a blocker given the rising
popularity of Spark 2.0 (thanks Ted for getting that work under way, I
hope we get sufficient review bandwidth to get it finished), but it's not
everything; e.g. we don't have regression checks in place for the
things that show up in our docs.

-
busbey

On Sat, Jan 14, 2017 at 4:52 PM, Ted Yu <yu...@gmail.com> wrote:
> I agree with Devaraj's assessment w.r.t. hbase-spark module in master
> (which is becoming branch-2).
>
> Cheers
>
>
>
> On Mon, Nov 21, 2016 at 11:46 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>
>> Hi Sean, I did a quick check with someone from the Spark team here and his
>> opinion was that the hbase-spark module as it currently stands can be used
>> by downstream users to do basic stuff and to try some simple things out,
>> etc. The integration is improving.
>> I think we should get what we have in 2.0 (which is the default action
>> anyways).
>> Thanks
>> Devaraj
>> ________________________________________
>> From: Sean Busbey <bu...@apache.org>
>> Sent: Wednesday, November 16, 2016 9:49 AM
>> To: dev
>> Subject: [DISCUSS] hbase-spark module in branch-1 and branch-2
>>
>> Hi folks!
>>
>> With 2.0 releases coming up, I'd like to revive our prior discussion
>> on the readiness of the hbase-spark module for downstream users.
>>
>> We've had a ticket for tracking the milestones set up for inclusion in
>> branch-1 releases for about 1.5 years:
>>
>> https://issues.apache.org/jira/browse/HBASE-14160
>>
>> We still haven't gotten all of the blocker issues completed, AFAIK.
>>
>> Is anyone interested in volunteering to knock the rest of these out?
>>
>> If they aren't, shall we plan to leave hbase-spark in master and
>> revert it from branch-2 once it forks for the HBase 2.0 release line?
>>
>> This feature isn't a blocker for 2.0; just as we've been planning to
>> add the hbase-spark module to some 1.y release we can also include it
>> in a 2.1+ release.
>>
>> This does appear to be a feature our downstream users could benefit
>> from, so I'd hate to continue the current situation where no official
>> releases include it. This is especially true now that we're looking at
>> ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.
>>
>> -
>> busbey
>>
>>

Re: [DISCUSS] hbase-spark module in branch-1 and branch-2

Posted by Ted Yu <yu...@gmail.com>.
I agree with Devaraj's assessment w.r.t. hbase-spark module in master
(which is becoming branch-2).

Cheers



On Mon, Nov 21, 2016 at 11:46 AM, Devaraj Das <dd...@hortonworks.com> wrote:

> Hi Sean, I did a quick check with someone from the Spark team here and his
> opinion was that the hbase-spark module as it currently stands can be used
> by downstream users to do basic stuff and to try some simple things out,
> etc. The integration is improving.
> I think we should get what we have in 2.0 (which is the default action
> anyways).
> Thanks
> Devaraj
> ________________________________________
> From: Sean Busbey <bu...@apache.org>
> Sent: Wednesday, November 16, 2016 9:49 AM
> To: dev
> Subject: [DISCUSS] hbase-spark module in branch-1 and branch-2
>
> Hi folks!
>
> With 2.0 releases coming up, I'd like to revive our prior discussion
> on the readiness of the hbase-spark module for downstream users.
>
> We've had a ticket for tracking the milestones set up for inclusion in
> branch-1 releases for about 1.5 years:
>
> https://issues.apache.org/jira/browse/HBASE-14160
>
> We still haven't gotten all of the blocker issues completed, AFAIK.
>
> Is anyone interested in volunteering to knock the rest of these out?
>
> If they aren't, shall we plan to leave hbase-spark in master and
> revert it from branch-2 once it forks for the HBase 2.0 release line?
>
> This feature isn't a blocker for 2.0; just as we've been planning to
> add the hbase-spark module to some 1.y release we can also include it
> in a 2.1+ release.
>
> This does appear to be a feature our downstream users could benefit
> from, so I'd hate to continue the current situation where no official
> releases include it. This is especially true now that we're looking at
> ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.
>
> -
> busbey
>
>

Re: [DISCUSS] hbase-spark module in branch-1 and branch-2

Posted by Devaraj Das <dd...@hortonworks.com>.
Hi Sean, I did a quick check with someone from the Spark team here and his opinion was that the hbase-spark module as it currently stands can be used by downstream users to do basic stuff and to try some simple things out, etc. The integration is improving.
I think we should get what we have in 2.0 (which is the default action anyways).
Thanks
Devaraj
________________________________________
From: Sean Busbey <bu...@apache.org>
Sent: Wednesday, November 16, 2016 9:49 AM
To: dev
Subject: [DISCUSS] hbase-spark module in branch-1 and branch-2

Hi folks!

With 2.0 releases coming up, I'd like to revive our prior discussion
on the readiness of the hbase-spark module for downstream users.

We've had a ticket for tracking the milestones set up for inclusion in
branch-1 releases for about 1.5 years:

https://issues.apache.org/jira/browse/HBASE-14160

We still haven't gotten all of the blocker issues completed, AFAIK.

Is anyone interested in volunteering to knock the rest of these out?

If they aren't, shall we plan to leave hbase-spark in master and
revert it from branch-2 once it forks for the HBase 2.0 release line?

This feature isn't a blocker for 2.0; just as we've been planning to
add the hbase-spark module to some 1.y release we can also include it
in a 2.1+ release.

This does appear to be a feature our downstream users could benefit
from, so I'd hate to continue the current situation where no official
releases include it. This is especially true now that we're looking at
ways to handle changes between Spark 1.6 and Spark 2.0 in HBASE-16179.

-
busbey