You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Donald Miner <dm...@clearedgeit.com> on 2014/04/29 16:20:40 UTC

Writing data from iterators

Bit of a tangent... This came up earlier in the text indexing thread and
below, and is something I've seen come up a couple of other times.

What would it take to make it so something tabletserver-side could write to
accumulo (in an "acceptable way")? Either be it an iterator/constraint or
something new.

-d


On Tue, Apr 29, 2014 at 9:59 AM, Eric Newton <er...@gmail.com> wrote:

> I have taken a quick look at phoenix.  It's baked into HBase-specific
> features pretty hard.
>
> It uses coprocessors to do things like create index entries.  This is a
> common enough idiom in the HBase community, but not something we've
> supported in Accumulo.  In general, you do not want an accumulo Iterator or
> Constraint generating data for other tables.
>
> However, a more sophisticated Percolator type implementation (
> https://github.com/keith-turner/Accismus) could support index generation
> and query transactions.
>
> We could probably re-use a lot of it, but it's not going to be as simple as
> changing the classes that talk to the database back-end.
>
> -Eric
>
>
> On Tue, Apr 29, 2014 at 9:21 AM, Kepner, Jeremy - 0553 - MITLL <
> kepner@ll.mit.edu> wrote:
>
> > Hi James,
> >   Can you explain how the SQL layer to HBase works?
> > Regards.  -Jeremy
> >
> > On Apr 29, 2014, at 1:32 AM, James Taylor <ja...@apache.org>
> >  wrote:
> >
> > > Hello,
> > > Would there be any interest in developing a SQL-layer on top of
> Accumulo?
> > > I'm part of the Apache Phoenix project and we've built a similar system
> > on
> > > top of HBase. I wanted to see if there'd be interest on your end at
> > working
> > > with us to generalizing our client and provide in a server that would
> do
> > > Accumulo-specific push down in support of a SQL layer. I suspect
> there's
> > > enough similarity between HBase and Accumulo that this would be
> feasible.
> > > Thanks,
> > > James
> >
> >
>



-- 

Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

Re: Writing data from iterators

Posted by Josh Elser <jo...@gmail.com>.

A close method would definitely help. I think there's also a concern of 
deadlock (I keep waffling on this without just writing some code to 
(dis)prove it).

Consider the following (since we just had the Phoenix question hit the 
list):

You have some IndexUpdatingIterator that writes out new records on MinC. 
The data tablet and secondary index tablet end up residing on the same 
tabletserver. When you start flushing the data tablet, you try to create 
the secondary index records which will never finish until you complete 
the minc: Deadlock. To get around this would be a fairly big change in 
how the tabletserver manages memory and writes -- I can't speak as to if 
its even feasible without reading more code.

Maybe it's enough to not hook into it at MinC scope? MajC would have a 
bit of a delay involved for that index update. You would probably want 
to write some local data to know when you updated the index too, so you 
don't repeatedly update it.

My gut still tells me that trying to focus on percolator would be better 
(given that the problem posed is typically analogous to what percolator 
describes). Maybe we can encourage Keith to give a little overview as to 
what the current state is, where he thinks it needs to go, and where in 
the code patches would be good to hit :)

On 4/29/14, 10:27 AM, Donald Miner wrote:
> Just to be clear, I'm not asking "why shouldn't I do this"... I'm asking
> "what can added feature-wise to accumulo to support this?" ... because I
> want to do it :)
>
> So, I guess if there was a close method on an iterator that got called when
> it was torn down... that would help?
>
>
> On Tue, Apr 29, 2014 at 10:24 AM, <dl...@comcast.net> wrote:
>
>> One reason that I can think of is that there is not a close() method on
>> the iterator interface. If you had resources open, you won't know when to
>> clean them up.
>>
>>
>> ----- Original Message -----
>>
>> From: "Donald Miner" <dm...@clearedgeit.com>
>> To: dev@accumulo.apache.org
>> Sent: Tuesday, April 29, 2014 10:20:40 AM
>> Subject: Writing data from iterators
>>
>> Bit of a tangent... This came up earlier in the text indexing thread and
>> below, and is something I've seen come up a couple of other times.
>>
>> What would it take to make it so something tabletserver-side could write to
>> accumulo (in an "acceptable way")? Either be it an iterator/constraint or
>> something new.
>>
>> -d
>>
>>
>> On Tue, Apr 29, 2014 at 9:59 AM, Eric Newton <er...@gmail.com>
>> wrote:
>>
>>> I have taken a quick look at phoenix. It's baked into HBase-specific
>>> features pretty hard.
>>>
>>> It uses coprocessors to do things like create index entries. This is a
>>> common enough idiom in the HBase community, but not something we've
>>> supported in Accumulo. In general, you do not want an accumulo Iterator
>> or
>>> Constraint generating data for other tables.
>>>
>>> However, a more sophisticated Percolator type implementation (
>>> https://github.com/keith-turner/Accismus) could support index generation
>>> and query transactions.
>>>
>>> We could probably re-use a lot of it, but it's not going to be as simple
>> as
>>> changing the classes that talk to the database back-end.
>>>
>>> -Eric
>>>
>>>
>>> On Tue, Apr 29, 2014 at 9:21 AM, Kepner, Jeremy - 0553 - MITLL <
>>> kepner@ll.mit.edu> wrote:
>>>
>>>> Hi James,
>>>> Can you explain how the SQL layer to HBase works?
>>>> Regards. -Jeremy
>>>>
>>>> On Apr 29, 2014, at 1:32 AM, James Taylor <ja...@apache.org>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> Would there be any interest in developing a SQL-layer on top of
>>> Accumulo?
>>>>> I'm part of the Apache Phoenix project and we've built a similar
>> system
>>>> on
>>>>> top of HBase. I wanted to see if there'd be interest on your end at
>>>> working
>>>>> with us to generalizing our client and provide in a server that would
>>> do
>>>>> Accumulo-specific push down in support of a SQL layer. I suspect
>>> there's
>>>>> enough similarity between HBase and Accumulo that this would be
>>> feasible.
>>>>> Thanks,
>>>>> James
>>>>
>>>>
>>>
>>
>>
>>
>> --
>>
>> Donald Miner
>> Chief Technology Officer
>> ClearEdge IT Solutions, LLC
>> Cell: 443 799 7807
>> www.clearedgeit.com
>>
>>
>
>

Re: Writing data from iterators

Posted by Donald Miner <dm...@clearedgeit.com>.

Just to be clear, I'm not asking "why shouldn't I do this"... I'm asking
"what can added feature-wise to accumulo to support this?" ... because I
want to do it :)

So, I guess if there was a close method on an iterator that got called when
it was torn down... that would help?


On Tue, Apr 29, 2014 at 10:24 AM, <dl...@comcast.net> wrote:

> One reason that I can think of is that there is not a close() method on
> the iterator interface. If you had resources open, you won't know when to
> clean them up.
>
>
> ----- Original Message -----
>
> From: "Donald Miner" <dm...@clearedgeit.com>
> To: dev@accumulo.apache.org
> Sent: Tuesday, April 29, 2014 10:20:40 AM
> Subject: Writing data from iterators
>
> Bit of a tangent... This came up earlier in the text indexing thread and
> below, and is something I've seen come up a couple of other times.
>
> What would it take to make it so something tabletserver-side could write to
> accumulo (in an "acceptable way")? Either be it an iterator/constraint or
> something new.
>
> -d
>
>
> On Tue, Apr 29, 2014 at 9:59 AM, Eric Newton <er...@gmail.com>
> wrote:
>
> > I have taken a quick look at phoenix. It's baked into HBase-specific
> > features pretty hard.
> >
> > It uses coprocessors to do things like create index entries. This is a
> > common enough idiom in the HBase community, but not something we've
> > supported in Accumulo. In general, you do not want an accumulo Iterator
> or
> > Constraint generating data for other tables.
> >
> > However, a more sophisticated Percolator type implementation (
> > https://github.com/keith-turner/Accismus) could support index generation
> > and query transactions.
> >
> > We could probably re-use a lot of it, but it's not going to be as simple
> as
> > changing the classes that talk to the database back-end.
> >
> > -Eric
> >
> >
> > On Tue, Apr 29, 2014 at 9:21 AM, Kepner, Jeremy - 0553 - MITLL <
> > kepner@ll.mit.edu> wrote:
> >
> > > Hi James,
> > > Can you explain how the SQL layer to HBase works?
> > > Regards. -Jeremy
> > >
> > > On Apr 29, 2014, at 1:32 AM, James Taylor <ja...@apache.org>
> > > wrote:
> > >
> > > > Hello,
> > > > Would there be any interest in developing a SQL-layer on top of
> > Accumulo?
> > > > I'm part of the Apache Phoenix project and we've built a similar
> system
> > > on
> > > > top of HBase. I wanted to see if there'd be interest on your end at
> > > working
> > > > with us to generalizing our client and provide in a server that would
> > do
> > > > Accumulo-specific push down in support of a SQL layer. I suspect
> > there's
> > > > enough similarity between HBase and Accumulo that this would be
> > feasible.
> > > > Thanks,
> > > > James
> > >
> > >
> >
>
>
>
> --
>
> Donald Miner
> Chief Technology Officer
> ClearEdge IT Solutions, LLC
> Cell: 443 799 7807
> www.clearedgeit.com
>
>


-- 

Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

Re: Writing data from iterators

Posted by dl...@comcast.net.

One reason that I can think of is that there is not a close() method on the iterator interface. If you had resources open, you won't know when to clean them up. 


----- Original Message -----

From: "Donald Miner" <dm...@clearedgeit.com> 
To: dev@accumulo.apache.org 
Sent: Tuesday, April 29, 2014 10:20:40 AM 
Subject: Writing data from iterators 

Bit of a tangent... This came up earlier in the text indexing thread and 
below, and is something I've seen come up a couple of other times. 

What would it take to make it so something tabletserver-side could write to 
accumulo (in an "acceptable way")? Either be it an iterator/constraint or 
something new. 

-d 


On Tue, Apr 29, 2014 at 9:59 AM, Eric Newton <er...@gmail.com> wrote: 

> I have taken a quick look at phoenix. It's baked into HBase-specific 
> features pretty hard. 
> 
> It uses coprocessors to do things like create index entries. This is a 
> common enough idiom in the HBase community, but not something we've 
> supported in Accumulo. In general, you do not want an accumulo Iterator or 
> Constraint generating data for other tables. 
> 
> However, a more sophisticated Percolator type implementation ( 
> https://github.com/keith-turner/Accismus) could support index generation 
> and query transactions. 
> 
> We could probably re-use a lot of it, but it's not going to be as simple as 
> changing the classes that talk to the database back-end. 
> 
> -Eric 
> 
> 
> On Tue, Apr 29, 2014 at 9:21 AM, Kepner, Jeremy - 0553 - MITLL < 
> kepner@ll.mit.edu> wrote: 
> 
> > Hi James, 
> > Can you explain how the SQL layer to HBase works? 
> > Regards. -Jeremy 
> > 
> > On Apr 29, 2014, at 1:32 AM, James Taylor <ja...@apache.org> 
> > wrote: 
> > 
> > > Hello, 
> > > Would there be any interest in developing a SQL-layer on top of 
> Accumulo? 
> > > I'm part of the Apache Phoenix project and we've built a similar system 
> > on 
> > > top of HBase. I wanted to see if there'd be interest on your end at 
> > working 
> > > with us to generalizing our client and provide in a server that would 
> do 
> > > Accumulo-specific push down in support of a SQL layer. I suspect 
> there's 
> > > enough similarity between HBase and Accumulo that this would be 
> feasible. 
> > > Thanks, 
> > > James 
> > 
> > 
> 



-- 

Donald Miner 
Chief Technology Officer 
ClearEdge IT Solutions, LLC 
Cell: 443 799 7807 
www.clearedgeit.com