You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Rose, Joseph" <Jo...@childrens.harvard.edu> on 2015/03/12 21:39:08 UTC

Status of Huawei's 2' Indexing?

Hi,

I’ve been looking over the Jira tickets for the secondary indexing
mechanism Huawei had started to integrate back in 2013 (@see
https://issues.apache.org/jira/browse/HBASE-10222 ). The code was
developed against 0.94 and it seems like a lot of work was done — but then
it suddenly stops (the last update to HBASE-10222, the ticket for the work
that actually adds secondary indexes, was a bit over a year ago. The last
update for the load balancer work was from early last fall.)

Is there work on this that I don’t see?

I understand I can run this using Huawei’s code for 0.94 but I was hoping
for a more recent hbase build. And I’ve tried applying the patches in
HBASE-10222 (hope springs eternal); naturally there were some failures. I
thought I’d ask here before trying to work through the failed hunks — and
ask if you think that’s even a good idea in the first place.

Thanks for your input!


-j

Re: Status of Huawei's 2' Indexing?

Posted by Rajeshbabu Chintaguntla <ch...@gmail.com>.

Hi Rose,

Sorry for late reply.

bq. Is there work on this that I don’t see?
You can try this [1] for checking something with 0.98.3 version(sorry not
that much latest). We thought of making it independent from HBase. Trying
to do when ever find time(only few kernel changes left in bulkload to
prepare and load data together to data table and all indexes in single
job.).

bq. Did I miss the mailing list thread where the architectural
differences were discussed?
You can find the discussion that time happened here[2].

By the time I started working on this in HBase lot of things done in
Phoenix indexing which I didn't know like 1)failover handling 2)data type
support 3) maintaining standard index meta data separately in catalog
tables 4) expression based filters in Phoenix and many more.. which are
missing in hindex. So we thought of integrating the same solution to
Phoenix first and able to do with minimal changes. To avoid the
complexities with colocation raised an improvement action in Phoenix hope
it simplifies many things[3].


[1] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98
[2]
http://search-hadoop.com/m/L1qeI1U99nd1&subj=Re+Design+review+Secondary+index+support+through+coprocess
[3] https://issues.apache.org/jira/browse/PHOENIX-1734

Thanks,
Rajeshbabu.

On Tue, Mar 17, 2015 at 12:52 AM, Michael Segel <mi...@hotmail.com>
wrote:

> You miss the point.
> Your index is going to be orthogonal to your base table.
> Again, how do you handle joins?
>
> In terms of indexing… you have to ways of building your index.
> 1) In a separate M/R job.
> 2) As each row is inserted, the coprocessor inserts the data in to the
> secondary indexes.
>
> More to your point…
>
> Yes there is a delta between when you write your row to the base table and
> when you write your row to your inverted index table.
> The short answer is that time is relative and it doesn’t matter.  Again,
> you’re going to have to think about that issue for a while before it sinks
> in. You’re not dealing with an RTOS problem… so its not real time but
> subjective real time.
>
> In terms of writing to two tables… what do you think your relational
> database is doing? ;-)
>
> I suggest you think more about the problem and the more you think about
> the problem, you’ll understand that there are tradeoffs and when you walk
> through the problem you’ll come to the conclusion that you want your index
> table(s) to be orthogonal to the base table.
>
>
> > On Mar 16, 2015, at 12:54 PM, lars hofhansl <la...@apache.org> wrote:
> >
> > Dude... Relax... Let's keep it cordial, please.
> >
> > To the topic:
> > Any CS 101 student can implement an eventually consistent index on top
> of HBase.
> > The part that is always missed is: How do you keep it consistent?There
> you have essentially two choices: (1) every update to an indexed table
> becomes a distributed transaction or (2) you keep region server local
> indexes.
> > There is nothing wrong with #2. It's good for not-so-selective indexes.
> > There is also nothing wrong with #1. This one is good for highly
> selective indexes (PK, etc)
> >
> > Indexes and joins do not have to be conflated. And maybe your use case
> is fine with eventually consistent indexes. In that case just write your
> stuff into two tables and be done with it.
> >
> > -- Lars
> >
> >      From: Michael Segel <mi...@hotmail.com>
> > To: dev@hbase.apache.org
> > Sent: Monday, March 16, 2015 8:14 AM
> > Subject: Re: Status of Huawei's 2' Indexing?
> >
> > You’ll have to excuse Andy.
> >
> > He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
> was trivial. Just got done last month….
> >
> > But I digress… The long story short…
> >
> > HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
> the region which had two problems.
> > 1) Complexity in that they wanted to keep the index on the same region
> server
> > 2) Joins become impossible.  Well, actually not impossible, but
> incredibly slow when compared to the alternative.
> >
> > You really should go back to the email chain.
> > Their defense (including Salesforce who was going to push this approach)
> fell apart when you asked the simple question on how do you handle joins?
> >
> > That’s their OOPS moment. Once you start to understand that, then
> allowing the index to be orthogonal to the base table, things started to
> come together.
> >
> > In short, you have a query either against a single table, or if you’re
> doing a join.  You then get the indexes and assuming that you’re only using
> the AND predicate, its a simple intersection of the index result sets.
> (Since the result sets are ordered, its relatively trivial to walk through
> and find the intersections of N Lists in a single pass.)
> >
> >
> > Now you have your result set of base table row keys and you can work
> with that data. (Either returning the records to the client, or as input to
> a map/reduce job.
> >
> > That’s the 30K view.  There’s more to it, but once Salesforce got the
> basic idea, they ran with it. It was really that simple concept that the
> index would be orthogonal to the base table that got them moving in the
> right direction.
> >
> >
> > To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
> it seems that some of the Committers are suffering from rectal induced
> hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
> spotting’ but also to get the Committers to focus on bringing the solutions
> that they glum on in the client, back to the server side of things.
> >
> > Unfortunately the last great attempt at fixing things on the server side
> was the bastardization of coprocessors which again, suffers from the lack
> of thought.  This isn’t to say that allowing users to extend the server
> side functionality is wrong. (Because it isn’t.) But that the
> implementation done in HBase is a tad lacking in thought.
> >
> > So in terms of indexing…
> > Longer term picture, there has to be some fixes on the server side of
> things to allow one to associate an index (allowing for different types) to
> a base table, yet the implementation of using the index would end up
> becoming a client.  And by client, it would be an external query engine
> processor that could/should sit on the cluster.
> >
> > But hey! What do I know?
> > I gave up trying to have an intelligent/civilized conversation with
> Andrew because he just couldn’t grasp the basics.  ;-)
> >
> >
> >
> >
> >
> >
> >
> >> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> >>
> >> When I made that remark I was thinking of a recent discussion we had at
> a
> >> joint Phoenix and HBase developer meetup. The difference of opinion was
> >> certainly civilized. (smile) I'm not aware of any specific written
> >> discussion, it may or may not exist. I'm pretty sure a revival of
> HBASE-9203
> >> would attract some controversy, but let me be clearer this time than I
> was
> >> before that this is just my opinion, FWIW.
> >>
> >>
> >> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> >> Joseph.Rose@childrens.harvard.edu> wrote:
> >>
> >>> I saw that it was added to their project. I’m really not keen on
> bringing
> >>> in all the RDBMS apparatus on top of hbase, so I decided to follow
> other
> >>> avenues first (like trying to patch 0.98, for better or worse.)
> >>>
> >>> That Phoenix article seems like a good breakdown of the various
> indexing
> >>> architectures.
> >>>
> >>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
> (as
> >>> are most of them, it seems) so I didn’t know there were these
> differences
> >>> of opinion. Did I miss the mailing list thread where the architectural
> >>> differences were discussed?
> >>>
> >>>
> >>> -j
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
> >
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: Status of Huawei's 2' Indexing?

Posted by Michael Segel <mi...@hotmail.com>.

You miss the point. 
Your index is going to be orthogonal to your base table. 
Again, how do you handle joins?

In terms of indexing… you have to ways of building your index. 
1) In a separate M/R job. 
2) As each row is inserted, the coprocessor inserts the data in to the secondary indexes. 

More to your point… 

Yes there is a delta between when you write your row to the base table and when you write your row to your inverted index table. 
The short answer is that time is relative and it doesn’t matter.  Again, you’re going to have to think about that issue for a while before it sinks in. You’re not dealing with an RTOS problem… so its not real time but subjective real time. 

In terms of writing to two tables… what do you think your relational database is doing? ;-) 

I suggest you think more about the problem and the more you think about the problem, you’ll understand that there are tradeoffs and when you walk through the problem you’ll come to the conclusion that you want your index table(s) to be orthogonal to the base table. 


> On Mar 16, 2015, at 12:54 PM, lars hofhansl <la...@apache.org> wrote:
> 
> Dude... Relax... Let's keep it cordial, please.
> 
> To the topic:
> Any CS 101 student can implement an eventually consistent index on top of HBase.
> The part that is always missed is: How do you keep it consistent?There you have essentially two choices: (1) every update to an indexed table becomes a distributed transaction or (2) you keep region server local indexes.
> There is nothing wrong with #2. It's good for not-so-selective indexes.
> There is also nothing wrong with #1. This one is good for highly selective indexes (PK, etc)
> 
> Indexes and joins do not have to be conflated. And maybe your use case is fine with eventually consistent indexes. In that case just write your stuff into two tables and be done with it.
> 
> -- Lars
>  
>      From: Michael Segel <mi...@hotmail.com>
> To: dev@hbase.apache.org 
> Sent: Monday, March 16, 2015 8:14 AM
> Subject: Re: Status of Huawei's 2' Indexing?
> 
> You’ll have to excuse Andy. 
> 
> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it was trivial. Just got done last month…. 
> 
> But I digress… The long story short… 
> 
> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on the region which had two problems. 
> 1) Complexity in that they wanted to keep the index on the same region server
> 2) Joins become impossible.  Well, actually not impossible, but incredibly slow when compared to the alternative. 
> 
> You really should go back to the email chain. 
> Their defense (including Salesforce who was going to push this approach) fell apart when you asked the simple question on how do you handle joins? 
> 
> That’s their OOPS moment. Once you start to understand that, then allowing the index to be orthogonal to the base table, things started to come together. 
> 
> In short, you have a query either against a single table, or if you’re doing a join.  You then get the indexes and assuming that you’re only using the AND predicate, its a simple intersection of the index result sets. (Since the result sets are ordered, its relatively trivial to walk through and find the intersections of N Lists in a single pass.) 
> 
> 
> Now you have your result set of base table row keys and you can work with that data. (Either returning the records to the client, or as input to a map/reduce job. 
> 
> That’s the 30K view.  There’s more to it, but once Salesforce got the basic idea, they ran with it. It was really that simple concept that the index would be orthogonal to the base table that got them moving in the right direction. 
> 
> 
> To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However, it seems that some of the Committers are suffering from rectal induced hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot spotting’ but also to get the Committers to focus on bringing the solutions that they glum on in the client, back to the server side of things. 
> 
> Unfortunately the last great attempt at fixing things on the server side was the bastardization of coprocessors which again, suffers from the lack of thought.  This isn’t to say that allowing users to extend the server side functionality is wrong. (Because it isn’t.) But that the implementation done in HBase is a tad lacking in thought. 
> 
> So in terms of indexing… 
> Longer term picture, there has to be some fixes on the server side of things to allow one to associate an index (allowing for different types) to a base table, yet the implementation of using the index would end up becoming a client.  And by client, it would be an external query engine processor that could/should sit on the cluster. 
> 
> But hey! What do I know? 
> I gave up trying to have an intelligent/civilized conversation with Andrew because he just couldn’t grasp the basics.  ;-) 
> 
> 
> 
> 
> 
> 
> 
>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
>> 
>> When I made that remark I was thinking of a recent discussion we had at a
>> joint Phoenix and HBase developer meetup. The difference of opinion was
>> certainly civilized. (smile) I'm not aware of any specific written
>> discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203
>> would attract some controversy, but let me be clearer this time than I was
>> before that this is just my opinion, FWIW.
>> 
>> 
>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>> Joseph.Rose@childrens.harvard.edu> wrote:
>> 
>>> I saw that it was added to their project. I’m really not keen on bringing
>>> in all the RDBMS apparatus on top of hbase, so I decided to follow other
>>> avenues first (like trying to patch 0.98, for better or worse.)
>>> 
>>> That Phoenix article seems like a good breakdown of the various indexing
>>> architectures.
>>> 
>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
>>> are most of them, it seems) so I didn’t know there were these differences
>>> of opinion. Did I miss the mailing list thread where the architectural
>>> differences were discussed?
>>> 
>>> 
>>> -j
> 
> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
> Use at your own risk. 
> Michael Segel
> michael_segel (AT) hotmail.com
> 
> 
> 
> 
> 
> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

Posted by lars hofhansl <la...@apache.org>.

Dude... Relax... Let's keep it cordial, please.

To the topic:
Any CS 101 student can implement an eventually consistent index on top of HBase.
The part that is always missed is: How do you keep it consistent?There you have essentially two choices: (1) every update to an indexed table becomes a distributed transaction or (2) you keep region server local indexes.
There is nothing wrong with #2. It's good for not-so-selective indexes.
There is also nothing wrong with #1. This one is good for highly selective indexes (PK, etc)

Indexes and joins do not have to be conflated. And maybe your use case is fine with eventually consistent indexes. In that case just write your stuff into two tables and be done with it.

-- Lars

      From: Michael Segel <mi...@hotmail.com>
 To: dev@hbase.apache.org 
 Sent: Monday, March 16, 2015 8:14 AM
 Subject: Re: Status of Huawei's 2' Indexing?

You’ll have to excuse Andy. 

He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it was trivial. Just got done last month…. 

But I digress… The long story short… 

HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on the region which had two problems. 
1) Complexity in that they wanted to keep the index on the same region server
2) Joins become impossible.  Well, actually not impossible, but incredibly slow when compared to the alternative. 

You really should go back to the email chain. 
Their defense (including Salesforce who was going to push this approach) fell apart when you asked the simple question on how do you handle joins? 

That’s their OOPS moment. Once you start to understand that, then allowing the index to be orthogonal to the base table, things started to come together. 

In short, you have a query either against a single table, or if you’re doing a join.  You then get the indexes and assuming that you’re only using the AND predicate, its a simple intersection of the index result sets. (Since the result sets are ordered, its relatively trivial to walk through and find the intersections of N Lists in a single pass.) 

Now you have your result set of base table row keys and you can work with that data. (Either returning the records to the client, or as input to a map/reduce job. 

That’s the 30K view.  There’s more to it, but once Salesforce got the basic idea, they ran with it. It was really that simple concept that the index would be orthogonal to the base table that got them moving in the right direction. 

To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However, it seems that some of the Committers are suffering from rectal induced hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot spotting’ but also to get the Committers to focus on bringing the solutions that they glum on in the client, back to the server side of things. 

Unfortunately the last great attempt at fixing things on the server side was the bastardization of coprocessors which again, suffers from the lack of thought.  This isn’t to say that allowing users to extend the server side functionality is wrong. (Because it isn’t.) But that the implementation done in HBase is a tad lacking in thought. 

So in terms of indexing… 
Longer term picture, there has to be some fixes on the server side of things to allow one to associate an index (allowing for different types) to a base table, yet the implementation of using the index would end up becoming a client.  And by client, it would be an external query engine processor that could/should sit on the cluster. 

But hey! What do I know? 
I gave up trying to have an intelligent/civilized conversation with Andrew because he just couldn’t grasp the basics.  ;-) 

> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
> 
> When I made that remark I was thinking of a recent discussion we had at a
> joint Phoenix and HBase developer meetup. The difference of opinion was
> certainly civilized. (smile) I'm not aware of any specific written
> discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203
> would attract some controversy, but let me be clearer this time than I was
> before that this is just my opinion, FWIW.
> 
> 
> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> Joseph.Rose@childrens.harvard.edu> wrote:
> 
>> I saw that it was added to their project. I’m really not keen on bringing
>> in all the RDBMS apparatus on top of hbase, so I decided to follow other
>> avenues first (like trying to patch 0.98, for better or worse.)
>> 
>> That Phoenix article seems like a good breakdown of the various indexing
>> architectures.
>> 
>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
>> are most of them, it seems) so I didn’t know there were these differences
>> of opinion. Did I miss the mailing list thread where the architectural
>> differences were discussed?
>> 
>> 
>> -j

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

Posted by Michael Segel <mi...@hotmail.com>.

Andrew, because 2+ years ago,  Phoenix wasn’t an Apache project. 

At the time, Huawei was releasing their research on it and Salesforce was implementing it. 
I mention the company names because those were the parties involved in the work as well as the discussion. Also those companies are mentioned in a lot of the earlier documentation. 

What pretty much ended those conversations is when I asked “How do you handle table Joins?”. And again since Phoenix was a Salesforce.com <http://salesforce.com/> project at the time, his response was that Phoenix doesn’t do table joins.  (Which they supposedly do now…) 

I would have gone further to mention Informix’s XPS Distributed Relational Database, however the last time I talked about some of the lessons learned from the RDBMS advances done back in the 90’s you seemed to have an issue with it.  Of course there we were talking about coprocessors and I compared it to the extensibility done to RDBSs and what worked and what didn’t.  The irony is that Mike Olson who was part of Illustra is now at Cloudera. (And Informix eventually got it right)


Its very disappointing that this issue has been raised again. Once you talk about table Joins the index is orthogonal to the base table and the argument becomes moot. 
Add to this using a different type of index, or allowing multiple indexes to the base table and you now have the issue of column families all over again, but in spades. Again this makes the Huawei’s idea unworkable.

It would even be pointless to try and hold a discussion on what should happen client side and what should happen server side to support indexes. 

My suggestion is that when you think you have an answer, stop, go get a few drinks and spend more time thinking about your answer. 

Later


> On Mar 16, 2015, at 11:41 AM, Andrew Purtell <ap...@apache.org> wrote:
> 
> I don't understand the repeated mention of "Salesforce" in that invective.
> As point of fact the work of adding local mutable indexes to Phoenix was
> done by a contributor from Huawei, who has since moved over to Hortonworks,
> if I'm not mistaken - but not like affiliation matters, it really doesn't.
> 
> As for the rest, well I've had to give up on your like and respect, but I
> picked up the pieces of my life a while back after we had that falling out
> over coprocessors.
> 
> 
> On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> You’ll have to excuse Andy.
>> 
>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>> was trivial. Just got done last month….
>> 
>> But I digress… The long story short…
>> 
>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
>> the region which had two problems.
>> 1) Complexity in that they wanted to keep the index on the same region
>> server
>> 2) Joins become impossible.  Well, actually not impossible, but incredibly
>> slow when compared to the alternative.
>> 
>> You really should go back to the email chain.
>> Their defense (including Salesforce who was going to push this approach)
>> fell apart when you asked the simple question on how do you handle joins?
>> 
>> That’s their OOPS moment. Once you start to understand that, then allowing
>> the index to be orthogonal to the base table, things started to come
>> together.
>> 
>> In short, you have a query either against a single table, or if you’re
>> doing a join.  You then get the indexes and assuming that you’re only using
>> the AND predicate, its a simple intersection of the index result sets.
>> (Since the result sets are ordered, its relatively trivial to walk through
>> and find the intersections of N Lists in a single pass.)
>> 
>> 
>> Now you have your result set of base table row keys and you can work with
>> that data. (Either returning the records to the client, or as input to a
>> map/reduce job.
>> 
>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>> basic idea, they ran with it. It was really that simple concept that the
>> index would be orthogonal to the base table that got them moving in the
>> right direction.
>> 
>> 
>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
>> it seems that some of the Committers are suffering from rectal induced
>> hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
>> spotting’ but also to get the Committers to focus on bringing the solutions
>> that they glum on in the client, back to the server side of things.
>> 
>> Unfortunately the last great attempt at fixing things on the server side
>> was the bastardization of coprocessors which again, suffers from the lack
>> of thought.  This isn’t to say that allowing users to extend the server
>> side functionality is wrong. (Because it isn’t.) But that the
>> implementation done in HBase is a tad lacking in thought.
>> 
>> So in terms of indexing…
>> Longer term picture, there has to be some fixes on the server side of
>> things to allow one to associate an index (allowing for different types) to
>> a base table, yet the implementation of using the index would end up
>> becoming a client.  And by client, it would be an external query engine
>> processor that could/should sit on the cluster.
>> 
>> But hey! What do I know?
>> I gave up trying to have an intelligent/civilized conversation with Andrew
>> because he just couldn’t grasp the basics.  ;-)
>> 
>> 
>> 
>> 
>> 
>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
>>> 
>>> When I made that remark I was thinking of a recent discussion we had at a
>>> joint Phoenix and HBase developer meetup. The difference of opinion was
>>> certainly civilized. (smile) I'm not aware of any specific written
>>> discussion, it may or may not exist. I'm pretty sure a revival of
>> HBASE-9203
>>> would attract some controversy, but let me be clearer this time than I
>> was
>>> before that this is just my opinion, FWIW.
>>> 
>>> 
>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>> 
>>>> I saw that it was added to their project. I’m really not keen on
>> bringing
>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow other
>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>> 
>>>> That Phoenix article seems like a good breakdown of the various indexing
>>>> architectures.
>>>> 
>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
>> (as
>>>> are most of them, it seems) so I didn’t know there were these
>> differences
>>>> of opinion. Did I miss the mailing list thread where the architectural
>>>> differences were discussed?
>>>> 
>>>> 
>>>> -j
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

Posted by Andrew Purtell <ap...@apache.org>.

I don't understand the repeated mention of "Salesforce" in that invective.
As point of fact the work of adding local mutable indexes to Phoenix was
done by a contributor from Huawei, who has since moved over to Hortonworks,
if I'm not mistaken - but not like affiliation matters, it really doesn't.

As for the rest, well I've had to give up on your like and respect, but I
picked up the pieces of my life a while back after we had that falling out
over coprocessors.


On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel <mi...@hotmail.com>
wrote:

> You’ll have to excuse Andy.
>
> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
> was trivial. Just got done last month….
>
> But I digress… The long story short…
>
> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
> the region which had two problems.
> 1) Complexity in that they wanted to keep the index on the same region
> server
> 2) Joins become impossible.  Well, actually not impossible, but incredibly
> slow when compared to the alternative.
>
> You really should go back to the email chain.
> Their defense (including Salesforce who was going to push this approach)
> fell apart when you asked the simple question on how do you handle joins?
>
> That’s their OOPS moment. Once you start to understand that, then allowing
> the index to be orthogonal to the base table, things started to come
> together.
>
> In short, you have a query either against a single table, or if you’re
> doing a join.  You then get the indexes and assuming that you’re only using
> the AND predicate, its a simple intersection of the index result sets.
> (Since the result sets are ordered, its relatively trivial to walk through
> and find the intersections of N Lists in a single pass.)
>
>
> Now you have your result set of base table row keys and you can work with
> that data. (Either returning the records to the client, or as input to a
> map/reduce job.
>
> That’s the 30K view.  There’s more to it, but once Salesforce got the
> basic idea, they ran with it. It was really that simple concept that the
> index would be orthogonal to the base table that got them moving in the
> right direction.
>
>
> To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
> it seems that some of the Committers are suffering from rectal induced
> hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
> spotting’ but also to get the Committers to focus on bringing the solutions
> that they glum on in the client, back to the server side of things.
>
> Unfortunately the last great attempt at fixing things on the server side
> was the bastardization of coprocessors which again, suffers from the lack
> of thought.  This isn’t to say that allowing users to extend the server
> side functionality is wrong. (Because it isn’t.) But that the
> implementation done in HBase is a tad lacking in thought.
>
> So in terms of indexing…
> Longer term picture, there has to be some fixes on the server side of
> things to allow one to associate an index (allowing for different types) to
> a base table, yet the implementation of using the index would end up
> becoming a client.  And by client, it would be an external query engine
> processor that could/should sit on the cluster.
>
> But hey! What do I know?
> I gave up trying to have an intelligent/civilized conversation with Andrew
> because he just couldn’t grasp the basics.  ;-)
>
>
>
>
>
> > On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
> >
> > When I made that remark I was thinking of a recent discussion we had at a
> > joint Phoenix and HBase developer meetup. The difference of opinion was
> > certainly civilized. (smile) I'm not aware of any specific written
> > discussion, it may or may not exist. I'm pretty sure a revival of
> HBASE-9203
> > would attract some controversy, but let me be clearer this time than I
> was
> > before that this is just my opinion, FWIW.
> >
> >
> > On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> > Joseph.Rose@childrens.harvard.edu> wrote:
> >
> >> I saw that it was added to their project. I’m really not keen on
> bringing
> >> in all the RDBMS apparatus on top of hbase, so I decided to follow other
> >> avenues first (like trying to patch 0.98, for better or worse.)
> >>
> >> That Phoenix article seems like a good breakdown of the various indexing
> >> architectures.
> >>
> >> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
> (as
> >> are most of them, it seems) so I didn’t know there were these
> differences
> >> of opinion. Did I miss the mailing list thread where the architectural
> >> differences were discussed?
> >>
> >>
> >> -j
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Status of Huawei's 2' Indexing?

Posted by Stack <st...@duboce.net>.

On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel <mi...@hotmail.com>
wrote:

> You’ll have to excuse Andy.
>
> He’s a bit slow.


...


I gave up trying to have an intelligent/civilized conversation with Andrew
> because he just couldn’t grasp the basics.  ;-)
>
>
>
Michael:

Quit insult and ad hominem. Stick to the tech.

St.Ack









>
>
>
> > On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
> >
> > When I made that remark I was thinking of a recent discussion we had at a
> > joint Phoenix and HBase developer meetup. The difference of opinion was
> > certainly civilized. (smile) I'm not aware of any specific written
> > discussion, it may or may not exist. I'm pretty sure a revival of
> HBASE-9203
> > would attract some controversy, but let me be clearer this time than I
> was
> > before that this is just my opinion, FWIW.
> >
> >
> > On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> > Joseph.Rose@childrens.harvard.edu> wrote:
> >
> >> I saw that it was added to their project. I’m really not keen on
> bringing
> >> in all the RDBMS apparatus on top of hbase, so I decided to follow other
> >> avenues first (like trying to patch 0.98, for better or worse.)
> >>
> >> That Phoenix article seems like a good breakdown of the various indexing
> >> architectures.
> >>
> >> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
> (as
> >> are most of them, it seems) so I didn’t know there were these
> differences
> >> of opinion. Did I miss the mailing list thread where the architectural
> >> differences were discussed?
> >>
> >>
> >> -j
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: Status of Huawei's 2' Indexing?

Posted by "Rose, Joseph" <Jo...@childrens.harvard.edu>.

Thanks, Wilm. I’ll look for the thread there.

Obviously I didn’t realize there was so much back story: I was asking
about this specific implementation because it seems to be fairly well
thought out and have good commentary in the Jira ticket (HBASE-9203). At
the time I thought it was mostly a dev concern. I think we’ve moved on, as
you pointed out.

I'd be happy to contribute to hbase if I have something to offer. I’m just
starting with this, so let’s see where it takes us.

For those of you joining us late, you can find the continuation here:
http://mail-archives.apache.org/mod_mbox/hbase-user/201503.mbox/%3C550722DA
.3040009%40gmail.com%3E


-j


On 3/16/15, 2:09 PM, "Wilm Schumacher" <wi...@gmail.com> wrote:

>Hi Joseph,
>
>I think that you kicked off this discussion, because to implement an
>indexing mechanism for hbase in general is much more complicate than
>your specific problem. The people on this list want to bear every
>possible (or at least A LOT) of applications in mind. A too easy
>mechanism wouldn't fit the needs of most of the users (thus would be
>useless), a more complicate model is harder to maintain and you would
>have to find more coders etc.. Thus with your application question you
>seemed to walked right into a very general discussion.
>
>Furthermore this is a user question, as you do not want to change the
>code of hbase, aren't you ;). I'll try an answer on the general user
>list in a couple of minutes, thus more people can discuss and we can get
>traffic out of this list, okay?
>
>Best wishes
>
>Wilm
>
>Am 16.03.2015 um 18:46 schrieb Rose, Joseph:
>> Alright, let’s see if I can get this discussion back on track.
>>
>> I have a sensibly defined table for patient data; its rowkey is simply
>> lastname:firstname, since it’s convenient for the bulk of my lookups.
>> Unfortunately I also need to efficiently find patients using an ID
>>string,
>> whose literal value is buried in a value field. I’m sure this situation
>>is
>> not foreign to the people on this list.
>>
>> It’s been suggested that I implement 2’ indexes myself — fine. All the
>> research I’ve done seems to end with that suggestion, with the exception
>> of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which
>>seems
>> to incite some discussion here). I’m happy to put this together but I’d
>> rather go with something that has been vetted and has a larger developer
>> community than one (i.e., ME). Besides, I have a full enough plate at
>>the
>> moment that I’d rather not have to do this, too.
>>
>> Are there constructive suggestions regarding how I can proceed with
>>HBase?
>> Right now even a well-vetted local index would be a godsend.
>>
>> Thanks.
>>
>>
>> -j
>>
>>
>> p.s., I’ll refer you to this post for a slightly more detailed rundown
>>of
>> how I plan to do things:
>> 
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__article.gmane.org_gma
>>ne.comp.java.hadoop.hbase.user_46467&d=BQIDaQ&c=qS4goWBT7poplM69zy_3xhKwE
>>W14JZMSdioCoppxeFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaV
>>uQHxlqAccDLc&m=NwQpAjAe0QcCDK7Dp0galpRYD3IvcpoK3xijbLf1WFo&s=lBW_VCH7IruB
>>tyg3PhTjU_CW2-po9IFfiIYNMpglIRk&e=
>>
>>
>> On 3/16/15, 12:18 PM, "Michael Segel" <mi...@hotmail.com> wrote:
>>
>>> Joseph, 
>>>
>>> The issue with Andrew goes back a few years.  His comment about having
>>>a
>>> civilized discussion was a personal dig at me.
>>>
>>>
>>>> On Mar 16, 2015, at 10:38 AM, Rose, Joseph
>>>> <Jo...@childrens.harvard.edu> wrote:
>>>>
>>>> Michael,
>>>>
>>>> I don’t understand the invective. I’m sure you have something to
>>>> contribute but when bring on this tone the only thing I hear are the
>>>> snide
>>>> comments.
>>>>
>>>>
>>>> -j
>>>>
>>>>
>>>> P.s., I’ll refer you to this:
>>>> 
>>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_b
>>>>oo
>>>> 
>>>>k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeF
>>>>U&
>>>> 
>>>>r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=uj
>>>>JC
>>>> 
>>>>fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJ
>>>>q_
>>>> PqYKJXUqAjNk&e=
>>>>
>>>>
>>>> On 3/16/15, 11:15 AM, "Michael Segel" <mi...@hotmail.com>
>>>>wrote:
>>>>
>>>>> You’ll have to excuse Andy.
>>>>>
>>>>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And
>>>>>it
>>>>> was trivial. Just got done last month….
>>>>>
>>>>> But I digress… The long story short…
>>>>>
>>>>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
>>>>> on
>>>>> the region which had two problems.
>>>>> 1) Complexity in that they wanted to keep the index on the same
>>>>>region
>>>>> server
>>>>> 2) Joins become impossible.  Well, actually not impossible, but
>>>>> incredibly slow when compared to the alternative.
>>>>>
>>>>> You really should go back to the email chain.
>>>>> Their defense (including Salesforce who was going to push this
>>>>> approach)
>>>>> fell apart when you asked the simple question on how do you handle
>>>>> joins?
>>>>>
>>>>> That’s their OOPS moment. Once you start to understand that, then
>>>>> allowing the index to be orthogonal to the base table, things started
>>>>> to
>>>>> come together.
>>>>>
>>>>> In short, you have a query either against a single table, or if
>>>>>you’re
>>>>> doing a join.  You then get the indexes and assuming that you’re only
>>>>> using the AND predicate, its a simple intersection of the index
>>>>>result
>>>>> sets. (Since the result sets are ordered, its relatively trivial to
>>>>> walk
>>>>> through and find the intersections of N Lists in a single pass.)
>>>>>
>>>>>
>>>>> Now you have your result set of base table row keys and you can work
>>>>> with
>>>>> that data. (Either returning the records to the client, or as input
>>>>>to
>>>>> a
>>>>> map/reduce job.
>>>>>
>>>>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>>>>> basic idea, they ran with it. It was really that simple concept that
>>>>> the
>>>>> index would be orthogonal to the base table that got them moving in
>>>>>the
>>>>> right direction.
>>>>>
>>>>>
>>>>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
>>>>> However,
>>>>> it seems that some of the Committers are suffering from rectal
>>>>>induced
>>>>> hypoxia. HBASE-12853 was created not just to help solve the issue of
>>>>> ‘hot
>>>>> spotting’ but also to get the Committers to focus on bringing the
>>>>> solutions that they glum on in the client, back to the server side of
>>>>> things. 
>>>>>
>>>>> Unfortunately the last great attempt at fixing things on the server
>>>>> side
>>>>> was the bastardization of coprocessors which again, suffers from the
>>>>> lack
>>>>> of thought.  This isn’t to say that allowing users to extend the
>>>>>server
>>>>> side functionality is wrong. (Because it isn’t.) But that the
>>>>> implementation done in HBase is a tad lacking in thought.
>>>>>
>>>>> So in terms of indexing…
>>>>> Longer term picture, there has to be some fixes on the server side of
>>>>> things to allow one to associate an index (allowing for different
>>>>> types)
>>>>> to a base table, yet the implementation of using the index would end
>>>>>up
>>>>> becoming a client.  And by client, it would be an external query
>>>>>engine
>>>>> processor that could/should sit on the cluster.
>>>>>
>>>>> But hey! What do I know?
>>>>> I gave up trying to have an intelligent/civilized conversation with
>>>>> Andrew because he just couldn’t grasp the basics.  ;-)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> When I made that remark I was thinking of a recent discussion we had
>>>>>> at
>>>>>> a
>>>>>> joint Phoenix and HBase developer meetup. The difference of opinion
>>>>>> was
>>>>>> certainly civilized. (smile) I'm not aware of any specific written
>>>>>> discussion, it may or may not exist. I'm pretty sure a revival of
>>>>>> HBASE-9203
>>>>>> would attract some controversy, but let me be clearer this time
>>>>>>than I
>>>>>> was
>>>>>> before that this is just my opinion, FWIW.
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>>>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>>>>>
>>>>>>> I saw that it was added to their project. I’m really not keen on
>>>>>>> bringing
>>>>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
>>>>>>> other
>>>>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>>>>>
>>>>>>> That Phoenix article seems like a good breakdown of the various
>>>>>>> indexing
>>>>>>> architectures.
>>>>>>>
>>>>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty
>>>>>>> civilized
>>>>>>> (as
>>>>>>> are most of them, it seems) so I didn’t know there were these
>>>>>>> differences
>>>>>>> of opinion. Did I miss the mailing list thread where the
>>>>>>> architectural
>>>>>>> differences were discussed?
>>>>>>>
>>>>>>>
>>>>>>> -j
>>>>> The opinions expressed here are mine, while they may reflect a
>>>>> cognitive
>>>>> thought, that is purely accidental.
>>>>> Use at your own risk.
>>>>> Michael Segel
>>>>> michael_segel (AT) hotmail.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>> The opinions expressed here are mine, while they may reflect a
>>>cognitive
>>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>>
>>>
>>>
>>>
>>>
>

Re: Status of Huawei's 2' Indexing?

Posted by Wilm Schumacher <wi...@gmail.com>.

Hi Joseph,

I think that you kicked off this discussion, because to implement an
indexing mechanism for hbase in general is much more complicate than
your specific problem. The people on this list want to bear every
possible (or at least A LOT) of applications in mind. A too easy
mechanism wouldn't fit the needs of most of the users (thus would be
useless), a more complicate model is harder to maintain and you would
have to find more coders etc.. Thus with your application question you
seemed to walked right into a very general discussion.

Furthermore this is a user question, as you do not want to change the
code of hbase, aren't you ;). I'll try an answer on the general user
list in a couple of minutes, thus more people can discuss and we can get
traffic out of this list, okay?

Best wishes

Wilm

Am 16.03.2015 um 18:46 schrieb Rose, Joseph:
> Alright, let’s see if I can get this discussion back on track.
>
> I have a sensibly defined table for patient data; its rowkey is simply
> lastname:firstname, since it’s convenient for the bulk of my lookups.
> Unfortunately I also need to efficiently find patients using an ID string,
> whose literal value is buried in a value field. I’m sure this situation is
> not foreign to the people on this list.
>
> It’s been suggested that I implement 2’ indexes myself — fine. All the
> research I’ve done seems to end with that suggestion, with the exception
> of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
> to incite some discussion here). I’m happy to put this together but I’d
> rather go with something that has been vetted and has a larger developer
> community than one (i.e., ME). Besides, I have a full enough plate at the
> moment that I’d rather not have to do this, too.
>
> Are there constructive suggestions regarding how I can proceed with HBase?
> Right now even a well-vetted local index would be a godsend.
>
> Thanks.
>
>
> -j
>
>
> p.s., I’ll refer you to this post for a slightly more detailed rundown of
> how I plan to do things:
> http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/46467
>
>
> On 3/16/15, 12:18 PM, "Michael Segel" <mi...@hotmail.com> wrote:
>
>> Joseph, 
>>
>> The issue with Andrew goes back a few years.  His comment about having a
>> civilized discussion was a personal dig at me.
>>
>>
>>> On Mar 16, 2015, at 10:38 AM, Rose, Joseph
>>> <Jo...@childrens.harvard.edu> wrote:
>>>
>>> Michael,
>>>
>>> I don’t understand the invective. I’m sure you have something to
>>> contribute but when bring on this tone the only thing I hear are the
>>> snide
>>> comments.
>>>
>>>
>>> -j
>>>
>>>
>>> P.s., I’ll refer you to this:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
>>> k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
>>> r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=ujJC
>>> fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
>>> PqYKJXUqAjNk&e= 
>>>
>>>
>>> On 3/16/15, 11:15 AM, "Michael Segel" <mi...@hotmail.com> wrote:
>>>
>>>> You’ll have to excuse Andy.
>>>>
>>>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>>>> was trivial. Just got done last month….
>>>>
>>>> But I digress… The long story short…
>>>>
>>>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
>>>> on
>>>> the region which had two problems.
>>>> 1) Complexity in that they wanted to keep the index on the same region
>>>> server
>>>> 2) Joins become impossible.  Well, actually not impossible, but
>>>> incredibly slow when compared to the alternative.
>>>>
>>>> You really should go back to the email chain.
>>>> Their defense (including Salesforce who was going to push this
>>>> approach)
>>>> fell apart when you asked the simple question on how do you handle
>>>> joins?
>>>>
>>>> That’s their OOPS moment. Once you start to understand that, then
>>>> allowing the index to be orthogonal to the base table, things started
>>>> to
>>>> come together. 
>>>>
>>>> In short, you have a query either against a single table, or if you’re
>>>> doing a join.  You then get the indexes and assuming that you’re only
>>>> using the AND predicate, its a simple intersection of the index result
>>>> sets. (Since the result sets are ordered, its relatively trivial to
>>>> walk
>>>> through and find the intersections of N Lists in a single pass.)
>>>>
>>>>
>>>> Now you have your result set of base table row keys and you can work
>>>> with
>>>> that data. (Either returning the records to the client, or as input to
>>>> a
>>>> map/reduce job.
>>>>
>>>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>>>> basic idea, they ran with it. It was really that simple concept that
>>>> the
>>>> index would be orthogonal to the base table that got them moving in the
>>>> right direction.
>>>>
>>>>
>>>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
>>>> However,
>>>> it seems that some of the Committers are suffering from rectal induced
>>>> hypoxia. HBASE-12853 was created not just to help solve the issue of
>>>> ‘hot
>>>> spotting’ but also to get the Committers to focus on bringing the
>>>> solutions that they glum on in the client, back to the server side of
>>>> things. 
>>>>
>>>> Unfortunately the last great attempt at fixing things on the server
>>>> side
>>>> was the bastardization of coprocessors which again, suffers from the
>>>> lack
>>>> of thought.  This isn’t to say that allowing users to extend the server
>>>> side functionality is wrong. (Because it isn’t.) But that the
>>>> implementation done in HBase is a tad lacking in thought.
>>>>
>>>> So in terms of indexing…
>>>> Longer term picture, there has to be some fixes on the server side of
>>>> things to allow one to associate an index (allowing for different
>>>> types)
>>>> to a base table, yet the implementation of using the index would end up
>>>> becoming a client.  And by client, it would be an external query engine
>>>> processor that could/should sit on the cluster.
>>>>
>>>> But hey! What do I know?
>>>> I gave up trying to have an intelligent/civilized conversation with
>>>> Andrew because he just couldn’t grasp the basics.  ;-)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org>
>>>>> wrote:
>>>>>
>>>>> When I made that remark I was thinking of a recent discussion we had
>>>>> at
>>>>> a
>>>>> joint Phoenix and HBase developer meetup. The difference of opinion
>>>>> was
>>>>> certainly civilized. (smile) I'm not aware of any specific written
>>>>> discussion, it may or may not exist. I'm pretty sure a revival of
>>>>> HBASE-9203
>>>>> would attract some controversy, but let me be clearer this time than I
>>>>> was
>>>>> before that this is just my opinion, FWIW.
>>>>>
>>>>>
>>>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>>>>
>>>>>> I saw that it was added to their project. I’m really not keen on
>>>>>> bringing
>>>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
>>>>>> other
>>>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>>>>
>>>>>> That Phoenix article seems like a good breakdown of the various
>>>>>> indexing
>>>>>> architectures.
>>>>>>
>>>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty
>>>>>> civilized
>>>>>> (as
>>>>>> are most of them, it seems) so I didn’t know there were these
>>>>>> differences
>>>>>> of opinion. Did I miss the mailing list thread where the
>>>>>> architectural
>>>>>> differences were discussed?
>>>>>>
>>>>>>
>>>>>> -j
>>>> The opinions expressed here are mine, while they may reflect a
>>>> cognitive
>>>> thought, that is purely accidental.
>>>> Use at your own risk.
>>>> Michael Segel
>>>> michael_segel (AT) hotmail.com
>>>>
>>>>
>>>>
>>>>
>>>>
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>>
>>
>>
>>
>>

Re: Status of Huawei's 2' Indexing?

Posted by Andrew Purtell <ap...@apache.org>.

That's patently untrue and pure paranoia. The comment about having a
civilized discussion had nothing to do with you Michael. Joseph said:

"HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as are
most of them, it seems)"


and so I responded as you saw. I was not thinking of you, I swear I never
think of you unless you write in and call me names. Please let these nice
people get back to the topic at hand.



>
> On 3/16/15, 12:18 PM, "Michael Segel" <mi...@hotmail.com> wrote:
>
> >Joseph,
> >
> >The issue with Andrew goes back a few years.  His comment about having a
> >civilized discussion was a personal dig at me.
> >
> >
> >> On Mar 16, 2015, at 10:38 AM, Rose, Joseph
> >><Jo...@childrens.harvard.edu> wrote:
> >>
> >> Michael,
> >>
> >> I don’t understand the invective. I’m sure you have something to
> >> contribute but when bring on this tone the only thing I hear are the
> >>snide
> >> comments.
> >>
> >>
> >> -j
> >>
> >>
> >> P.s., I’ll refer you to this:
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
> >>k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> >>r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=ujJC
> >>fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
> >>PqYKJXUqAjNk&e=
> >>
> >>
> >> On 3/16/15, 11:15 AM, "Michael Segel" <mi...@hotmail.com>
> wrote:
> >>
> >>> You’ll have to excuse Andy.
> >>>
> >>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
> >>> was trivial. Just got done last month….
> >>>
> >>> But I digress… The long story short…
> >>>
> >>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
> >>>on
> >>> the region which had two problems.
> >>> 1) Complexity in that they wanted to keep the index on the same region
> >>> server
> >>> 2) Joins become impossible.  Well, actually not impossible, but
> >>> incredibly slow when compared to the alternative.
> >>>
> >>> You really should go back to the email chain.
> >>> Their defense (including Salesforce who was going to push this
> >>>approach)
> >>> fell apart when you asked the simple question on how do you handle
> >>>joins?
> >>>
> >>> That’s their OOPS moment. Once you start to understand that, then
> >>> allowing the index to be orthogonal to the base table, things started
> >>>to
> >>> come together.
> >>>
> >>> In short, you have a query either against a single table, or if you’re
> >>> doing a join.  You then get the indexes and assuming that you’re only
> >>> using the AND predicate, its a simple intersection of the index result
> >>> sets. (Since the result sets are ordered, its relatively trivial to
> >>>walk
> >>> through and find the intersections of N Lists in a single pass.)
> >>>
> >>>
> >>> Now you have your result set of base table row keys and you can work
> >>>with
> >>> that data. (Either returning the records to the client, or as input to
> >>>a
> >>> map/reduce job.
> >>>
> >>> That’s the 30K view.  There’s more to it, but once Salesforce got the
> >>> basic idea, they ran with it. It was really that simple concept that
> >>>the
> >>> index would be orthogonal to the base table that got them moving in the
> >>> right direction.
> >>>
> >>>
> >>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
> >>>However,
> >>> it seems that some of the Committers are suffering from rectal induced
> >>> hypoxia. HBASE-12853 was created not just to help solve the issue of
> >>>‘hot
> >>> spotting’ but also to get the Committers to focus on bringing the
> >>> solutions that they glum on in the client, back to the server side of
> >>> things.
> >>>
> >>> Unfortunately the last great attempt at fixing things on the server
> >>>side
> >>> was the bastardization of coprocessors which again, suffers from the
> >>>lack
> >>> of thought.  This isn’t to say that allowing users to extend the server
> >>> side functionality is wrong. (Because it isn’t.) But that the
> >>> implementation done in HBase is a tad lacking in thought.
> >>>
> >>> So in terms of indexing…
> >>> Longer term picture, there has to be some fixes on the server side of
> >>> things to allow one to associate an index (allowing for different
> >>>types)
> >>> to a base table, yet the implementation of using the index would end up
> >>> becoming a client.  And by client, it would be an external query engine
> >>> processor that could/should sit on the cluster.
> >>>
> >>> But hey! What do I know?
> >>> I gave up trying to have an intelligent/civilized conversation with
> >>> Andrew because he just couldn’t grasp the basics.  ;-)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org>
> >>>>wrote:
> >>>>
> >>>> When I made that remark I was thinking of a recent discussion we had
> >>>>at
> >>>> a
> >>>> joint Phoenix and HBase developer meetup. The difference of opinion
> >>>>was
> >>>> certainly civilized. (smile) I'm not aware of any specific written
> >>>> discussion, it may or may not exist. I'm pretty sure a revival of
> >>>> HBASE-9203
> >>>> would attract some controversy, but let me be clearer this time than I
> >>>> was
> >>>> before that this is just my opinion, FWIW.
> >>>>
> >>>>
> >>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> >>>> Joseph.Rose@childrens.harvard.edu> wrote:
> >>>>
> >>>>> I saw that it was added to their project. I’m really not keen on
> >>>>> bringing
> >>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
> >>>>> other
> >>>>> avenues first (like trying to patch 0.98, for better or worse.)
> >>>>>
> >>>>> That Phoenix article seems like a good breakdown of the various
> >>>>> indexing
> >>>>> architectures.
> >>>>>
> >>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty
> >>>>>civilized
> >>>>> (as
> >>>>> are most of them, it seems) so I didn’t know there were these
> >>>>> differences
> >>>>> of opinion. Did I miss the mailing list thread where the
> >>>>>architectural
> >>>>> differences were discussed?
> >>>>>
> >>>>>
> >>>>> -j
> >>>
> >>> The opinions expressed here are mine, while they may reflect a
> >>>cognitive
> >>> thought, that is purely accidental.
> >>> Use at your own risk.
> >>> Michael Segel
> >>> michael_segel (AT) hotmail.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >The opinions expressed here are mine, while they may reflect a cognitive
> >thought, that is purely accidental.
> >Use at your own risk.
> >Michael Segel
> >michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Status of Huawei's 2' Indexing?

Posted by Michael Segel <mi...@hotmail.com>.

Joseph, 

First, I would strongly recommend against using HBase… but since you insist.

Lets start with your row key. 

1) REMEMBER HIPPA

2) How are you going to access the base table? 


So if for example, you’re never going to do a “Get me Mary Smith’s record” but more “Show me all of the patients who had a positive TB test and cluster them by zip code…” You may want to consider using a UUID since you’re always going to go after the data via an index. 

If you want to use a patient’s name  e.g. “last|first”, you will want to take the hash of it.  

Now lets talk about indexing. 

First, what’s the use case for the database? 
Do you want real time access to specific records? Then you would want to consider using Lucene, however that would be a bit more heavy lifting. 

The simplest index is an inverted table index. 
Two ways to implement. 

One is to create the row key as the attribute value and then each column contains the RowKey of the base table. using the Rowkey’s value as the column header as well so that you get your results in sort order. 

The other way is to create a single row per record where the rowkey is “attribute|RowKey” and then the only column is the Rowkey itself.  

This is more skinny table vs fat table  and of course you could do something in the middle that limits the number of columns to N columns per row and then your result set is a set of rows.

That’s pretty much it.  You build your index either via a M/R job or as you insert a row, you insert in to the index at the same time. 




> On Mar 16, 2015, at 12:46 PM, Rose, Joseph <Jo...@childrens.harvard.edu> wrote:
> 
> Alright, let’s see if I can get this discussion back on track.
> 
> I have a sensibly defined table for patient data; its rowkey is simply
> lastname:firstname, since it’s convenient for the bulk of my lookups.
> Unfortunately I also need to efficiently find patients using an ID string,
> whose literal value is buried in a value field. I’m sure this situation is
> not foreign to the people on this list.
> 
> It’s been suggested that I implement 2’ indexes myself — fine. All the
> research I’ve done seems to end with that suggestion, with the exception
> of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
> to incite some discussion here). I’m happy to put this together but I’d
> rather go with something that has been vetted and has a larger developer
> community than one (i.e., ME). Besides, I have a full enough plate at the
> moment that I’d rather not have to do this, too.
> 
> Are there constructive suggestions regarding how I can proceed with HBase?
> Right now even a well-vetted local index would be a godsend.
> 
> Thanks.
> 
> 
> -j
> 
> 
> p.s., I’ll refer you to this post for a slightly more detailed rundown of
> how I plan to do things:
> http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/46467
> 
> 
> On 3/16/15, 12:18 PM, "Michael Segel" <mi...@hotmail.com> wrote:
> 
>> Joseph, 
>> 
>> The issue with Andrew goes back a few years.  His comment about having a
>> civilized discussion was a personal dig at me.
>> 
>> 
>>> On Mar 16, 2015, at 10:38 AM, Rose, Joseph
>>> <Jo...@childrens.harvard.edu> wrote:
>>> 
>>> Michael,
>>> 
>>> I don’t understand the invective. I’m sure you have something to
>>> contribute but when bring on this tone the only thing I hear are the
>>> snide
>>> comments.
>>> 
>>> 
>>> -j
>>> 
>>> 
>>> P.s., I’ll refer you to this:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
>>> k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
>>> r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=ujJC
>>> fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
>>> PqYKJXUqAjNk&e= 
>>> 
>>> 
>>> On 3/16/15, 11:15 AM, "Michael Segel" <mi...@hotmail.com> wrote:
>>> 
>>>> You’ll have to excuse Andy.
>>>> 
>>>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>>>> was trivial. Just got done last month….
>>>> 
>>>> But I digress… The long story short…
>>>> 
>>>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
>>>> on
>>>> the region which had two problems.
>>>> 1) Complexity in that they wanted to keep the index on the same region
>>>> server
>>>> 2) Joins become impossible.  Well, actually not impossible, but
>>>> incredibly slow when compared to the alternative.
>>>> 
>>>> You really should go back to the email chain.
>>>> Their defense (including Salesforce who was going to push this
>>>> approach)
>>>> fell apart when you asked the simple question on how do you handle
>>>> joins?
>>>> 
>>>> That’s their OOPS moment. Once you start to understand that, then
>>>> allowing the index to be orthogonal to the base table, things started
>>>> to
>>>> come together. 
>>>> 
>>>> In short, you have a query either against a single table, or if you’re
>>>> doing a join.  You then get the indexes and assuming that you’re only
>>>> using the AND predicate, its a simple intersection of the index result
>>>> sets. (Since the result sets are ordered, its relatively trivial to
>>>> walk
>>>> through and find the intersections of N Lists in a single pass.)
>>>> 
>>>> 
>>>> Now you have your result set of base table row keys and you can work
>>>> with
>>>> that data. (Either returning the records to the client, or as input to
>>>> a
>>>> map/reduce job.
>>>> 
>>>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>>>> basic idea, they ran with it. It was really that simple concept that
>>>> the
>>>> index would be orthogonal to the base table that got them moving in the
>>>> right direction.
>>>> 
>>>> 
>>>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
>>>> However,
>>>> it seems that some of the Committers are suffering from rectal induced
>>>> hypoxia. HBASE-12853 was created not just to help solve the issue of
>>>> ‘hot
>>>> spotting’ but also to get the Committers to focus on bringing the
>>>> solutions that they glum on in the client, back to the server side of
>>>> things. 
>>>> 
>>>> Unfortunately the last great attempt at fixing things on the server
>>>> side
>>>> was the bastardization of coprocessors which again, suffers from the
>>>> lack
>>>> of thought.  This isn’t to say that allowing users to extend the server
>>>> side functionality is wrong. (Because it isn’t.) But that the
>>>> implementation done in HBase is a tad lacking in thought.
>>>> 
>>>> So in terms of indexing…
>>>> Longer term picture, there has to be some fixes on the server side of
>>>> things to allow one to associate an index (allowing for different
>>>> types)
>>>> to a base table, yet the implementation of using the index would end up
>>>> becoming a client.  And by client, it would be an external query engine
>>>> processor that could/should sit on the cluster.
>>>> 
>>>> But hey! What do I know?
>>>> I gave up trying to have an intelligent/civilized conversation with
>>>> Andrew because he just couldn’t grasp the basics.  ;-)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org>
>>>>> wrote:
>>>>> 
>>>>> When I made that remark I was thinking of a recent discussion we had
>>>>> at
>>>>> a
>>>>> joint Phoenix and HBase developer meetup. The difference of opinion
>>>>> was
>>>>> certainly civilized. (smile) I'm not aware of any specific written
>>>>> discussion, it may or may not exist. I'm pretty sure a revival of
>>>>> HBASE-9203
>>>>> would attract some controversy, but let me be clearer this time than I
>>>>> was
>>>>> before that this is just my opinion, FWIW.
>>>>> 
>>>>> 
>>>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>>>> 
>>>>>> I saw that it was added to their project. I’m really not keen on
>>>>>> bringing
>>>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
>>>>>> other
>>>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>>>> 
>>>>>> That Phoenix article seems like a good breakdown of the various
>>>>>> indexing
>>>>>> architectures.
>>>>>> 
>>>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty
>>>>>> civilized
>>>>>> (as
>>>>>> are most of them, it seems) so I didn’t know there were these
>>>>>> differences
>>>>>> of opinion. Did I miss the mailing list thread where the
>>>>>> architectural
>>>>>> differences were discussed?
>>>>>> 
>>>>>> 
>>>>>> -j
>>>> 
>>>> The opinions expressed here are mine, while they may reflect a
>>>> cognitive
>>>> thought, that is purely accidental.
>>>> Use at your own risk.
>>>> Michael Segel
>>>> michael_segel (AT) hotmail.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

Posted by Wilm Schumacher <wi...@gmail.com>.

damit. Sry for double post. I forgot something.

Am 16.03.2015 um 19:37 schrieb Wilm Schumacher:
> * First ... MacGyver your own index.
>
> That's not that complicate as it sounds. A very easy idea would be the
> update within the CRUD operations on your data. Within a
>
> Put put =  new Put( Bytes.toBytes( id ) );
> put.add( Bytes.toBytes( "firstname" ) , firstname );
> put.add( Bytes.toBytes( "lastname" ) , lastname );
>
> make an additional
> Put indexPut = new Put( Bytes.toBytes( lastname+":"+firstname ) )
> indexPut.add( Bytes.toBytes( id ) , null );
>
> ...
> <put to tables>
Something like 1.b.) You could implement your index on client side, as
above pictured. OR your could go with coprocessors. By this your client
code dosn't have to deal with the index. On every put or delete the
above operations are triggered. This would be make the indexing system
more robust, as the clients couldn't corrupt your index by being killed
or whatever could happen on client side.

Best wishes

Wilm

Re: Status of Huawei's 2' Indexing?

Posted by Wilm Schumacher <wi...@gmail.com>.

Hi,

a cross post from the dev list. perhaps here more people have valuable
hints or ideas.

Am 16.03.2015 um 18:46 schrieb Rose, Joseph:
> Alright, let’s see if I can get this discussion back on track.
>
> I have a sensibly defined table for patient data; its rowkey is simply
> lastname:firstname, since it’s convenient for the bulk of my lookups.
> Unfortunately I also need to efficiently find patients using an ID string,
> whose literal value is buried in a value field. I’m sure this situation is
> not foreign to the people on this list.
>
> It’s been suggested that I implement 2’ indexes myself — fine. All the
> research I’ve done seems to end with that suggestion, with the exception
> of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
> to incite some discussion here). I’m happy to put this together but I’d
> rather go with something that has been vetted and has a larger developer
> community than one (i.e., ME). Besides, I have a full enough plate at the
> moment that I’d rather not have to do this, too.
>
> Are there constructive suggestions regarding how I can proceed with HBase?
> Right now even a well-vetted local index would be a godsend.

Well first I have a question. Is "lastname:firstname" a good idea for a
row key? Is a name that  specific? I think your row key should be the
ID, rather than the names, as it can be made unique. UUID or whatever.
However, by this the problem still stands, as just the roles are
switched. You either need an index for the IDs or the names.

The following is argued with the ID as row key and the name-firstname as
index data.

I could be image 3 solutions:

* First ... MacGyver your own index.

That's not that complicate as it sounds. A very easy idea would be the
update within the CRUD operations on your data. Within a

Put put =  new Put( Bytes.toBytes( id ) );
put.add( Bytes.toBytes( "firstname" ) , firstname );
put.add( Bytes.toBytes( "lastname" ) , lastname );

make an additional
Put indexPut = new Put( Bytes.toBytes( lastname+":"+firstname ) )
indexPut.add( Bytes.toBytes( id ) , null );

...
<put to tables>

Deleting is practically the same. Just fetch the ID, get the lastname,
firstname combination and kick it out of the index.

By this you can just fetch the row "lastename:firstname" and get all
possible ids as column qualifiers. And that's it ... almost. Here the
"risk" it, that your hbase table throws some error and the stuff is
added, but the index is not refreshed. Thus you have to write a little
more code to catch the "ID not existing" errors.

Furthermore you would have to run a small map-red now and then (perhaps
every night or so) which runs through the rows and refreshes the index
and run through the index and kicks rows there if the ID is not present
anymore. If you missed something above.

If you are new to hbase this perhaps sounds a little complicate. But
actually it's simple. If you are interested I could send you some small
snippets directly.

* Second ... Lucene

Lucene is an index system right away. As I wrote some days ago: with
hbase comes all the fancy apache/hadoop stuff. With lucene you can
implement a search method for your data. E.g. on ... drugs the people
already had.  Fancy feature for your application. And of course you
could search for "firstname=<firstname> AND lastname=<lastname>" which
would fit your need.

However, by this you introduce a new system which you have to maintain :/.

* Third ... other index systems, e.g. Elasticsearch

like the second idea. But more fancy, but more complicate. More points
of failure etc.

If your application do not need a search method, I would go with 1. If
you have to create a search anyway I would go with 2 or 3 as you can use
the search facility for your indexing problem right away.

Best wishes,

Wilm

Re: Status of Huawei's 2' Indexing?

Posted by "Rose, Joseph" <Jo...@childrens.harvard.edu>.

Alright, let’s see if I can get this discussion back on track.

I have a sensibly defined table for patient data; its rowkey is simply
lastname:firstname, since it’s convenient for the bulk of my lookups.
Unfortunately I also need to efficiently find patients using an ID string,
whose literal value is buried in a value field. I’m sure this situation is
not foreign to the people on this list.

It’s been suggested that I implement 2’ indexes myself — fine. All the
research I’ve done seems to end with that suggestion, with the exception
of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
to incite some discussion here). I’m happy to put this together but I’d
rather go with something that has been vetted and has a larger developer
community than one (i.e., ME). Besides, I have a full enough plate at the
moment that I’d rather not have to do this, too.

Are there constructive suggestions regarding how I can proceed with HBase?
Right now even a well-vetted local index would be a godsend.

Thanks.


-j


p.s., I’ll refer you to this post for a slightly more detailed rundown of
how I plan to do things:
http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/46467


On 3/16/15, 12:18 PM, "Michael Segel" <mi...@hotmail.com> wrote:

>Joseph, 
>
>The issue with Andrew goes back a few years.  His comment about having a
>civilized discussion was a personal dig at me.
>
>
>> On Mar 16, 2015, at 10:38 AM, Rose, Joseph
>><Jo...@childrens.harvard.edu> wrote:
>> 
>> Michael,
>> 
>> I don’t understand the invective. I’m sure you have something to
>> contribute but when bring on this tone the only thing I hear are the
>>snide
>> comments.
>> 
>> 
>> -j
>> 
>> 
>> P.s., I’ll refer you to this:
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
>>k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
>>r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=ujJC
>>fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
>>PqYKJXUqAjNk&e= 
>> 
>> 
>> On 3/16/15, 11:15 AM, "Michael Segel" <mi...@hotmail.com> wrote:
>> 
>>> You’ll have to excuse Andy.
>>> 
>>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>>> was trivial. Just got done last month….
>>> 
>>> But I digress… The long story short…
>>> 
>>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
>>>on
>>> the region which had two problems.
>>> 1) Complexity in that they wanted to keep the index on the same region
>>> server
>>> 2) Joins become impossible.  Well, actually not impossible, but
>>> incredibly slow when compared to the alternative.
>>> 
>>> You really should go back to the email chain.
>>> Their defense (including Salesforce who was going to push this
>>>approach)
>>> fell apart when you asked the simple question on how do you handle
>>>joins?
>>> 
>>> That’s their OOPS moment. Once you start to understand that, then
>>> allowing the index to be orthogonal to the base table, things started
>>>to
>>> come together. 
>>> 
>>> In short, you have a query either against a single table, or if you’re
>>> doing a join.  You then get the indexes and assuming that you’re only
>>> using the AND predicate, its a simple intersection of the index result
>>> sets. (Since the result sets are ordered, its relatively trivial to
>>>walk
>>> through and find the intersections of N Lists in a single pass.)
>>> 
>>> 
>>> Now you have your result set of base table row keys and you can work
>>>with
>>> that data. (Either returning the records to the client, or as input to
>>>a
>>> map/reduce job.
>>> 
>>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>>> basic idea, they ran with it. It was really that simple concept that
>>>the
>>> index would be orthogonal to the base table that got them moving in the
>>> right direction.
>>> 
>>> 
>>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
>>>However,
>>> it seems that some of the Committers are suffering from rectal induced
>>> hypoxia. HBASE-12853 was created not just to help solve the issue of
>>>‘hot
>>> spotting’ but also to get the Committers to focus on bringing the
>>> solutions that they glum on in the client, back to the server side of
>>> things. 
>>> 
>>> Unfortunately the last great attempt at fixing things on the server
>>>side
>>> was the bastardization of coprocessors which again, suffers from the
>>>lack
>>> of thought.  This isn’t to say that allowing users to extend the server
>>> side functionality is wrong. (Because it isn’t.) But that the
>>> implementation done in HBase is a tad lacking in thought.
>>> 
>>> So in terms of indexing…
>>> Longer term picture, there has to be some fixes on the server side of
>>> things to allow one to associate an index (allowing for different
>>>types)
>>> to a base table, yet the implementation of using the index would end up
>>> becoming a client.  And by client, it would be an external query engine
>>> processor that could/should sit on the cluster.
>>> 
>>> But hey! What do I know?
>>> I gave up trying to have an intelligent/civilized conversation with
>>> Andrew because he just couldn’t grasp the basics.  ;-)
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org>
>>>>wrote:
>>>> 
>>>> When I made that remark I was thinking of a recent discussion we had
>>>>at
>>>> a
>>>> joint Phoenix and HBase developer meetup. The difference of opinion
>>>>was
>>>> certainly civilized. (smile) I'm not aware of any specific written
>>>> discussion, it may or may not exist. I'm pretty sure a revival of
>>>> HBASE-9203
>>>> would attract some controversy, but let me be clearer this time than I
>>>> was
>>>> before that this is just my opinion, FWIW.
>>>> 
>>>> 
>>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>>> 
>>>>> I saw that it was added to their project. I’m really not keen on
>>>>> bringing
>>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
>>>>> other
>>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>>> 
>>>>> That Phoenix article seems like a good breakdown of the various
>>>>> indexing
>>>>> architectures.
>>>>> 
>>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty
>>>>>civilized
>>>>> (as
>>>>> are most of them, it seems) so I didn’t know there were these
>>>>> differences
>>>>> of opinion. Did I miss the mailing list thread where the
>>>>>architectural
>>>>> differences were discussed?
>>>>> 
>>>>> 
>>>>> -j
>>> 
>>> The opinions expressed here are mine, while they may reflect a
>>>cognitive
>>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>
>The opinions expressed here are mine, while they may reflect a cognitive
>thought, that is purely accidental.
>Use at your own risk.
>Michael Segel
>michael_segel (AT) hotmail.com
>
>
>
>
>

Re: Status of Huawei's 2' Indexing?

Posted by "Rose, Joseph" <Jo...@childrens.harvard.edu>.

Michael,

I don’t understand the invective. I’m sure you have something to
contribute but when bring on this tone the only thing I hear are the snide
comments.


-j


P.s., I’ll refer you to this: https://hbase.apache.org/book.html#_joins


On 3/16/15, 11:15 AM, "Michael Segel" <mi...@hotmail.com> wrote:

>You’ll have to excuse Andy.
>
>He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>was trivial. Just got done last month….
>
>But I digress… The long story short…
>
>HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
>the region which had two problems.
>1) Complexity in that they wanted to keep the index on the same region
>server
>2) Joins become impossible.  Well, actually not impossible, but
>incredibly slow when compared to the alternative.
>
>You really should go back to the email chain.
>Their defense (including Salesforce who was going to push this approach)
>fell apart when you asked the simple question on how do you handle joins?
>
>That’s their OOPS moment. Once you start to understand that, then
>allowing the index to be orthogonal to the base table, things started to
>come together. 
>
>In short, you have a query either against a single table, or if you’re
>doing a join.  You then get the indexes and assuming that you’re only
>using the AND predicate, its a simple intersection of the index result
>sets. (Since the result sets are ordered, its relatively trivial to walk
>through and find the intersections of N Lists in a single pass.)
>
>
>Now you have your result set of base table row keys and you can work with
>that data. (Either returning the records to the client, or as input to a
>map/reduce job. 
>
>That’s the 30K view.  There’s more to it, but once Salesforce got the
>basic idea, they ran with it. It was really that simple concept that the
>index would be orthogonal to the base table that got them moving in the
>right direction. 
>
>
>To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
>it seems that some of the Committers are suffering from rectal induced
>hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
>spotting’ but also to get the Committers to focus on bringing the
>solutions that they glum on in the client, back to the server side of
>things. 
>
>Unfortunately the last great attempt at fixing things on the server side
>was the bastardization of coprocessors which again, suffers from the lack
>of thought.  This isn’t to say that allowing users to extend the server
>side functionality is wrong. (Because it isn’t.) But that the
>implementation done in HBase is a tad lacking in thought.
>
>So in terms of indexing…
>Longer term picture, there has to be some fixes on the server side of
>things to allow one to associate an index (allowing for different types)
>to a base table, yet the implementation of using the index would end up
>becoming a client.  And by client, it would be an external query engine
>processor that could/should sit on the cluster.
>
>But hey! What do I know?
>I gave up trying to have an intelligent/civilized conversation with
>Andrew because he just couldn’t grasp the basics.  ;-)
>
>
>
>
>
>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
>> 
>> When I made that remark I was thinking of a recent discussion we had at
>>a
>> joint Phoenix and HBase developer meetup. The difference of opinion was
>> certainly civilized. (smile) I'm not aware of any specific written
>> discussion, it may or may not exist. I'm pretty sure a revival of
>>HBASE-9203
>> would attract some controversy, but let me be clearer this time than I
>>was
>> before that this is just my opinion, FWIW.
>> 
>> 
>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>> Joseph.Rose@childrens.harvard.edu> wrote:
>> 
>>> I saw that it was added to their project. I’m really not keen on
>>>bringing
>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
>>>other
>>> avenues first (like trying to patch 0.98, for better or worse.)
>>> 
>>> That Phoenix article seems like a good breakdown of the various
>>>indexing
>>> architectures.
>>> 
>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
>>>(as
>>> are most of them, it seems) so I didn’t know there were these
>>>differences
>>> of opinion. Did I miss the mailing list thread where the architectural
>>> differences were discussed?
>>> 
>>> 
>>> -j
>
>The opinions expressed here are mine, while they may reflect a cognitive
>thought, that is purely accidental.
>Use at your own risk.
>Michael Segel
>michael_segel (AT) hotmail.com
>
>
>
>
>

Re: Status of Huawei's 2' Indexing?

Posted by Michael Segel <mi...@hotmail.com>.

Sigh. 
Here we go again… 

1) Complexity?

2) Speed when looking at the indexes in a more general case. 

3) Resources required to do the search become excessive...

... 

Again, your indexes will be orthogonal to the base table. 
If you can’t understand that… then you need to sit back, drink a few cocktails. (Burbon, Single Malts, craft beers, Vodka, whatever… ) AND THINK ABOUT THE PROBLEM.

I think that’s the biggest issue. You’re not thinking about the problem enough before you take hand to keyboard and bang out crappy code. 

To make it simple… so what you’re saying is that you want to have two indexes… one orthogonal to the base table so you can use it for table joins or when you want faster performance, and the second when you want an advanced filter on a region. (God only know why you would want that…) 

Seriously? 

Apply KISS and then get back to me.  


> On Mar 16, 2015, at 10:38 AM, Vladimir Rodionov <vl...@gmail.com> wrote:
> 
> There is nothing wrong with co-locating index and data on a same RS. This
> will greatly improve single table search. Joins are evil anyway. Leave them
> to RDBMS Zoo.
> 
> -Vlad
> 
> 
> On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> You’ll have to excuse Andy.
>> 
>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>> was trivial. Just got done last month….
>> 
>> But I digress… The long story short…
>> 
>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
>> the region which had two problems.
>> 1) Complexity in that they wanted to keep the index on the same region
>> server
>> 2) Joins become impossible.  Well, actually not impossible, but incredibly
>> slow when compared to the alternative.
>> 
>> You really should go back to the email chain.
>> Their defense (including Salesforce who was going to push this approach)
>> fell apart when you asked the simple question on how do you handle joins?
>> 
>> That’s their OOPS moment. Once you start to understand that, then allowing
>> the index to be orthogonal to the base table, things started to come
>> together.
>> 
>> In short, you have a query either against a single table, or if you’re
>> doing a join.  You then get the indexes and assuming that you’re only using
>> the AND predicate, its a simple intersection of the index result sets.
>> (Since the result sets are ordered, its relatively trivial to walk through
>> and find the intersections of N Lists in a single pass.)
>> 
>> 
>> Now you have your result set of base table row keys and you can work with
>> that data. (Either returning the records to the client, or as input to a
>> map/reduce job.
>> 
>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>> basic idea, they ran with it. It was really that simple concept that the
>> index would be orthogonal to the base table that got them moving in the
>> right direction.
>> 
>> 
>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
>> it seems that some of the Committers are suffering from rectal induced
>> hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
>> spotting’ but also to get the Committers to focus on bringing the solutions
>> that they glum on in the client, back to the server side of things.
>> 
>> Unfortunately the last great attempt at fixing things on the server side
>> was the bastardization of coprocessors which again, suffers from the lack
>> of thought.  This isn’t to say that allowing users to extend the server
>> side functionality is wrong. (Because it isn’t.) But that the
>> implementation done in HBase is a tad lacking in thought.
>> 
>> So in terms of indexing…
>> Longer term picture, there has to be some fixes on the server side of
>> things to allow one to associate an index (allowing for different types) to
>> a base table, yet the implementation of using the index would end up
>> becoming a client.  And by client, it would be an external query engine
>> processor that could/should sit on the cluster.
>> 
>> But hey! What do I know?
>> I gave up trying to have an intelligent/civilized conversation with Andrew
>> because he just couldn’t grasp the basics.  ;-)
>> 
>> 
>> 
>> 
>> 
>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
>>> 
>>> When I made that remark I was thinking of a recent discussion we had at a
>>> joint Phoenix and HBase developer meetup. The difference of opinion was
>>> certainly civilized. (smile) I'm not aware of any specific written
>>> discussion, it may or may not exist. I'm pretty sure a revival of
>> HBASE-9203
>>> would attract some controversy, but let me be clearer this time than I
>> was
>>> before that this is just my opinion, FWIW.
>>> 
>>> 
>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>> 
>>>> I saw that it was added to their project. I’m really not keen on
>> bringing
>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow other
>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>> 
>>>> That Phoenix article seems like a good breakdown of the various indexing
>>>> architectures.
>>>> 
>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
>> (as
>>>> are most of them, it seems) so I didn’t know there were these
>> differences
>>>> of opinion. Did I miss the mailing list thread where the architectural
>>>> differences were discussed?
>>>> 
>>>> 
>>>> -j
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

Posted by Vladimir Rodionov <vl...@gmail.com>.

There is nothing wrong with co-locating index and data on a same RS. This
will greatly improve single table search. Joins are evil anyway. Leave them
to RDBMS Zoo.

-Vlad


On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel <mi...@hotmail.com>
wrote:

> You’ll have to excuse Andy.
>
> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
> was trivial. Just got done last month….
>
> But I digress… The long story short…
>
> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
> the region which had two problems.
> 1) Complexity in that they wanted to keep the index on the same region
> server
> 2) Joins become impossible.  Well, actually not impossible, but incredibly
> slow when compared to the alternative.
>
> You really should go back to the email chain.
> Their defense (including Salesforce who was going to push this approach)
> fell apart when you asked the simple question on how do you handle joins?
>
> That’s their OOPS moment. Once you start to understand that, then allowing
> the index to be orthogonal to the base table, things started to come
> together.
>
> In short, you have a query either against a single table, or if you’re
> doing a join.  You then get the indexes and assuming that you’re only using
> the AND predicate, its a simple intersection of the index result sets.
> (Since the result sets are ordered, its relatively trivial to walk through
> and find the intersections of N Lists in a single pass.)
>
>
> Now you have your result set of base table row keys and you can work with
> that data. (Either returning the records to the client, or as input to a
> map/reduce job.
>
> That’s the 30K view.  There’s more to it, but once Salesforce got the
> basic idea, they ran with it. It was really that simple concept that the
> index would be orthogonal to the base table that got them moving in the
> right direction.
>
>
> To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
> it seems that some of the Committers are suffering from rectal induced
> hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
> spotting’ but also to get the Committers to focus on bringing the solutions
> that they glum on in the client, back to the server side of things.
>
> Unfortunately the last great attempt at fixing things on the server side
> was the bastardization of coprocessors which again, suffers from the lack
> of thought.  This isn’t to say that allowing users to extend the server
> side functionality is wrong. (Because it isn’t.) But that the
> implementation done in HBase is a tad lacking in thought.
>
> So in terms of indexing…
> Longer term picture, there has to be some fixes on the server side of
> things to allow one to associate an index (allowing for different types) to
> a base table, yet the implementation of using the index would end up
> becoming a client.  And by client, it would be an external query engine
> processor that could/should sit on the cluster.
>
> But hey! What do I know?
> I gave up trying to have an intelligent/civilized conversation with Andrew
> because he just couldn’t grasp the basics.  ;-)
>
>
>
>
>
> > On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
> >
> > When I made that remark I was thinking of a recent discussion we had at a
> > joint Phoenix and HBase developer meetup. The difference of opinion was
> > certainly civilized. (smile) I'm not aware of any specific written
> > discussion, it may or may not exist. I'm pretty sure a revival of
> HBASE-9203
> > would attract some controversy, but let me be clearer this time than I
> was
> > before that this is just my opinion, FWIW.
> >
> >
> > On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> > Joseph.Rose@childrens.harvard.edu> wrote:
> >
> >> I saw that it was added to their project. I’m really not keen on
> bringing
> >> in all the RDBMS apparatus on top of hbase, so I decided to follow other
> >> avenues first (like trying to patch 0.98, for better or worse.)
> >>
> >> That Phoenix article seems like a good breakdown of the various indexing
> >> architectures.
> >>
> >> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
> (as
> >> are most of them, it seems) so I didn’t know there were these
> differences
> >> of opinion. Did I miss the mailing list thread where the architectural
> >> differences were discussed?
> >>
> >>
> >> -j
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: Status of Huawei's 2' Indexing?

Posted by Michael Segel <mi...@hotmail.com>.

You’ll have to excuse Andy. 

He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it was trivial. Just got done last month…. 

But I digress… The long story short… 

HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on the region which had two problems. 
1) Complexity in that they wanted to keep the index on the same region server
2) Joins become impossible.  Well, actually not impossible, but incredibly slow when compared to the alternative. 

You really should go back to the email chain. 
Their defense (including Salesforce who was going to push this approach) fell apart when you asked the simple question on how do you handle joins? 

That’s their OOPS moment. Once you start to understand that, then allowing the index to be orthogonal to the base table, things started to come together. 

In short, you have a query either against a single table, or if you’re doing a join.  You then get the indexes and assuming that you’re only using the AND predicate, its a simple intersection of the index result sets. (Since the result sets are ordered, its relatively trivial to walk through and find the intersections of N Lists in a single pass.) 

Now you have your result set of base table row keys and you can work with that data. (Either returning the records to the client, or as input to a map/reduce job. 

That’s the 30K view.  There’s more to it, but once Salesforce got the basic idea, they ran with it. It was really that simple concept that the index would be orthogonal to the base table that got them moving in the right direction. 

To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However, it seems that some of the Committers are suffering from rectal induced hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot spotting’ but also to get the Committers to focus on bringing the solutions that they glum on in the client, back to the server side of things. 

Unfortunately the last great attempt at fixing things on the server side was the bastardization of coprocessors which again, suffers from the lack of thought.  This isn’t to say that allowing users to extend the server side functionality is wrong. (Because it isn’t.) But that the implementation done in HBase is a tad lacking in thought. 

So in terms of indexing… 
Longer term picture, there has to be some fixes on the server side of things to allow one to associate an index (allowing for different types) to a base table, yet the implementation of using the index would end up becoming a client.  And by client, it would be an external query engine processor that could/should sit on the cluster. 

But hey! What do I know? 
I gave up trying to have an intelligent/civilized conversation with Andrew because he just couldn’t grasp the basics.  ;-) 

> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <ap...@apache.org> wrote:
> 
> When I made that remark I was thinking of a recent discussion we had at a
> joint Phoenix and HBase developer meetup. The difference of opinion was
> certainly civilized. (smile) I'm not aware of any specific written
> discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203
> would attract some controversy, but let me be clearer this time than I was
> before that this is just my opinion, FWIW.
> 
> 
> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> Joseph.Rose@childrens.harvard.edu> wrote:
> 
>> I saw that it was added to their project. I’m really not keen on bringing
>> in all the RDBMS apparatus on top of hbase, so I decided to follow other
>> avenues first (like trying to patch 0.98, for better or worse.)
>> 
>> That Phoenix article seems like a good breakdown of the various indexing
>> architectures.
>> 
>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
>> are most of them, it seems) so I didn’t know there were these differences
>> of opinion. Did I miss the mailing list thread where the architectural
>> differences were discussed?
>> 
>> 
>> -j

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

Posted by Andrew Purtell <ap...@apache.org>.

When I made that remark I was thinking of a recent discussion we had at a
joint Phoenix and HBase developer meetup. The difference of opinion was
certainly civilized. (smile) I'm not aware of any specific written
discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203
would attract some controversy, but let me be clearer this time than I was
before that this is just my opinion, FWIW.


On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
Joseph.Rose@childrens.harvard.edu> wrote:

> I saw that it was added to their project. I’m really not keen on bringing
> in all the RDBMS apparatus on top of hbase, so I decided to follow other
> avenues first (like trying to patch 0.98, for better or worse.)
>
> That Phoenix article seems like a good breakdown of the various indexing
> architectures.
>
> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
> are most of them, it seems) so I didn’t know there were these differences
> of opinion. Did I miss the mailing list thread where the architectural
> differences were discussed?
>
>
> -j
>
>
> On 3/12/15, 5:22 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>
> >There are some substantial architectural differences of opinion among the
> >community on this feature as I understand it, so it's unlikely that JIRA
> >will ever see a commit without a lot more work, if ever.
> >
> >A similar feature was later introduced into Apache Phoenix, which in this
> >context may best be described as an extension package for HBase offering a
> >suite of relational data management features. You may want to check out
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__phoenix.apache.org_sec
> >ondary-5Findexing.html&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> >eFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=f
> >OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4E&s=HzKu9AHzTP_7lUDorigQonFbEeZYV
> >7G5POJyxXmjzWI&e=  and
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jir
> >a_browse_PHOENIX-2D933&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> >eFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=f
> >OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4E&s=pfw7Z6pPaL9QWzNXdlXceR4A2E9W3
> >LntcjXmNRIkgiA&e=  for background.
> >
> >On Thu, Mar 12, 2015 at 1:39 PM, Rose, Joseph <
> >Joseph.Rose@childrens.harvard.edu> wrote:
> >
> >> Hi,
> >>
> >> I’ve been looking over the Jira tickets for the secondary indexing
> >> mechanism Huawei had started to integrate back in 2013 (@see
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_ji
> >>ra_browse_HBASE-2D10222&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop
> >>pxeFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&
> >>m=fOIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4E&s=59-VsLDYkrqV2TQ4W13H-HMn2
> >>qBjRcOPDRSuPdp2VjY&e=  ). The code was
> >> developed against 0.94 and it seems like a lot of work was done — but
> >>then
> >> it suddenly stops (the last update to HBASE-10222, the ticket for the
> >>work
> >> that actually adds secondary indexes, was a bit over a year ago. The
> >>last
> >> update for the load balancer work was from early last fall.)
> >>
> >> Is there work on this that I don’t see?
> >>
> >> I understand I can run this using Huawei’s code for 0.94 but I was
> >>hoping
> >> for a more recent hbase build. And I’ve tried applying the patches in
> >> HBASE-10222 (hope springs eternal); naturally there were some failures.
> >>I
> >> thought I’d ask here before trying to work through the failed hunks —
> >>and
> >> ask if you think that’s even a good idea in the first place.
> >>
> >> Thanks for your input!
> >>
> >>
> >> -j
> >>
> >>
> >
> >
> >--
> >Best regards,
> >
> >   - Andy
> >
> >Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >(via Tom White)
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Status of Huawei's 2' Indexing?

Posted by "Rose, Joseph" <Jo...@childrens.harvard.edu>.

I saw that it was added to their project. I’m really not keen on bringing
in all the RDBMS apparatus on top of hbase, so I decided to follow other
avenues first (like trying to patch 0.98, for better or worse.)

That Phoenix article seems like a good breakdown of the various indexing
architectures.

HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
are most of them, it seems) so I didn’t know there were these differences
of opinion. Did I miss the mailing list thread where the architectural
differences were discussed?


-j


On 3/12/15, 5:22 PM, "Andrew Purtell" <ap...@apache.org> wrote:

>There are some substantial architectural differences of opinion among the
>community on this feature as I understand it, so it's unlikely that JIRA
>will ever see a commit without a lot more work, if ever.
>
>A similar feature was later introduced into Apache Phoenix, which in this
>context may best be described as an extension package for HBase offering a
>suite of relational data management features. You may want to check out
>https://urldefense.proofpoint.com/v2/url?u=http-3A__phoenix.apache.org_sec
>ondary-5Findexing.html&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>eFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=f
>OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4E&s=HzKu9AHzTP_7lUDorigQonFbEeZYV
>7G5POJyxXmjzWI&e=  and
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jir
>a_browse_PHOENIX-2D933&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>eFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=f
>OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4E&s=pfw7Z6pPaL9QWzNXdlXceR4A2E9W3
>LntcjXmNRIkgiA&e=  for background.
>
>On Thu, Mar 12, 2015 at 1:39 PM, Rose, Joseph <
>Joseph.Rose@childrens.harvard.edu> wrote:
>
>> Hi,
>>
>> I’ve been looking over the Jira tickets for the secondary indexing
>> mechanism Huawei had started to integrate back in 2013 (@see
>> 
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_ji
>>ra_browse_HBASE-2D10222&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop
>>pxeFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&
>>m=fOIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4E&s=59-VsLDYkrqV2TQ4W13H-HMn2
>>qBjRcOPDRSuPdp2VjY&e=  ). The code was
>> developed against 0.94 and it seems like a lot of work was done — but
>>then
>> it suddenly stops (the last update to HBASE-10222, the ticket for the
>>work
>> that actually adds secondary indexes, was a bit over a year ago. The
>>last
>> update for the load balancer work was from early last fall.)
>>
>> Is there work on this that I don’t see?
>>
>> I understand I can run this using Huawei’s code for 0.94 but I was
>>hoping
>> for a more recent hbase build. And I’ve tried applying the patches in
>> HBASE-10222 (hope springs eternal); naturally there were some failures.
>>I
>> thought I’d ask here before trying to work through the failed hunks —
>>and
>> ask if you think that’s even a good idea in the first place.
>>
>> Thanks for your input!
>>
>>
>> -j
>>
>>
>
>
>-- 
>Best regards,
>
>   - Andy
>
>Problems worthy of attack prove their worth by hitting back. - Piet Hein
>(via Tom White)

Re: Status of Huawei's 2' Indexing?

Posted by Andrew Purtell <ap...@apache.org>.

There are some substantial architectural differences of opinion among the
community on this feature as I understand it, so it's unlikely that JIRA
will ever see a commit without a lot more work, if ever.

A similar feature was later introduced into Apache Phoenix, which in this
context may best be described as an extension package for HBase offering a
suite of relational data management features. You may want to check out
http://phoenix.apache.org/secondary_indexing.html and
https://issues.apache.org/jira/browse/PHOENIX-933 for background.

On Thu, Mar 12, 2015 at 1:39 PM, Rose, Joseph <
Joseph.Rose@childrens.harvard.edu> wrote:

> Hi,
>
> I’ve been looking over the Jira tickets for the secondary indexing
> mechanism Huawei had started to integrate back in 2013 (@see
> https://issues.apache.org/jira/browse/HBASE-10222 ). The code was
> developed against 0.94 and it seems like a lot of work was done — but then
> it suddenly stops (the last update to HBASE-10222, the ticket for the work
> that actually adds secondary indexes, was a bit over a year ago. The last
> update for the load balancer work was from early last fall.)
>
> Is there work on this that I don’t see?
>
> I understand I can run this using Huawei’s code for 0.94 but I was hoping
> for a more recent hbase build. And I’ve tried applying the patches in
> HBASE-10222 (hope springs eternal); naturally there were some failures. I
> thought I’d ask here before trying to work through the failed hunks — and
> ask if you think that’s even a good idea in the first place.
>
> Thanks for your input!
>
>
> -j
>
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)