You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by BlackJack76 <ju...@gmail.com> on 2014/04/26 04:13:11 UTC

Write to table from Accumulo iterator

I am trying to figure out the best way to write to the table from inside the
seek method of a class that implements SortedKeyValueIterator.  I originally
tried to create a BatchWriter and just use that to write data.  However, if
the tablet moved during a flush then it would hang.

Any other recommendations on how to write back to the table?  Thanks!



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
I am trying to have them run in parallel instead of running them serially.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9419.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by David Medinets <da...@gmail.com>.
How about writing to the proxy server from inside an iterator? The
information still stays within the server, but without the concern for
resource leakage? Probably not high-performance with the repeated
serialization and de-serialization.


On Sat, Apr 26, 2014 at 9:04 PM, Josh Elser <jo...@gmail.com> wrote:

> I've been thinking about this some more -- resource management in the
> tabletservers is definitely a concern, even if deadlock isn't (I still
> haven't convinced myself one way or another).
>
> Difficulty will definitely arise in trying to make sure you don't leak
> BatchWriters (and the thread it spawns inside which sends mutationst o the
> appropriate tserver). Not impossible, but definitely tricky :)
>
>
> On 4/26/14, 2:38 PM, Donald Miner wrote:
>
>> I haven't actually implemented this, just what we've been thinking
>> about. But yeah, the idea is that you write using BatchWriter in the
>> check method of Constraint.
>>
>>
>> On Sat, Apr 26, 2014 at 9:58 AM, BlackJack76 <justin.loy@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Donald,
>>
>>     Thanks for the response.  Sounds like we are exactly on the same page.
>>
>>     Honestly, I wasn't familiar with Constraints until you mentioned
>>     them.  I
>>     did some quick reading about them and still trying to understand how
>>     you are
>>     accomplishing this.  This is a very interesting idea and I
>>     appreciate that
>>     you shared it.
>>
>>     I am assuming you are writing back to the table from the check method
>> in
>>     your Constraint.  How are you writing back to the table?  Through a
>>     BatchWriter?
>>
>>     I also thought about creating some sort of external buffer but agree
>>     that it
>>     would be a lot of extra work and book keeping.
>>
>>
>>
>>     --
>>     View this message in context:
>>     http://apache-accumulo.1065345.n5.nabble.com/Write-
>> to-table-from-Accumulo-iterator-tp9412p9424.html
>>     Sent from the Users mailing list archive at Nabble.com.
>>
>>
>>
>>
>> --
>> *
>> *Donald Miner
>>
>> Chief Technology Officer
>> ClearEdge IT Solutions, LLC
>> Cell: 443 799 7807
>> www.clearedgeit.com <http://www.clearedgeit.com>
>>
>

Re: Write to table from Accumulo iterator

Posted by Josh Elser <jo...@gmail.com>.
Inlined for clarity

On 4/26/14, 11:05 PM, BlackJack76 wrote:
> Thanks again Josh.
>
> The way I have been approaching it is to create/use/close the BatchWriter
> inside of the seek method when I need it.  Do you see any issues with this
> approach?

It's not terrible, but you will be incurring some extra overhead in this 
approach. The batchwriter is most efficient when you can keep a single 
instance open and just throw many mutations at it. Just make sure to 
close the batchwriter in a finally block, and you shouldn't have any 
problems.

> Call me naive but why don't you know when Accumulo is going to tear down
> your iterator and stop using it?  When I attach an iterator to a scanner,
> isn't it only destroyed after I complete my scan?

You don't know because the SKVI API currently doesn't have any means to 
tell you. Yes, the tabletserver knows when it's about to, but you don't 
have means to be told this. This gets trickier with some of the work 
that Accumulo is doing under the hoods that I hinted at previously.

Accumulo maintains a buffer between your (Batch)Scanner and the 
tserver(s) it communicates to. For a number of reasons, when that buffer 
fills up, Accumulo notes the last Key that scan returned, tears down 
your session, and (assuming the client is still there requesting more 
data), will then re-queue your scan to fetch more data starting back at 
where you left off.

For example, if you have a table where each row is a letter in the 
alphabet, and you want to scan over all rows, you would just pass some 
range like (-inf, +inf). Suppose that after you return the letter 'f', 
that buffer fills up, and your scan gets torn down.

Accumulo will restart your scan again with a different range than what 
you previously passed in: (f, +inf). This is an important note if you 
start doing "advanced topics" inside iterators that manipulate the Keys 
being returned, however it is relatively easy to work with.

> What I have observed is something similar to the following....
>
> init is called on creation
>
> seek is called where you need to have the first K,V pair at the end of seek
>
> hasTop, getTopKey, and getTopValue are called
>
> next is called as long as hasTop is true
>
> Once hasTop is false, the scan concludes
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9433.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
Thanks again Josh.

The way I have been approaching it is to create/use/close the BatchWriter
inside of the seek method when I need it.  Do you see any issues with this
approach?

Call me naive but why don't you know when Accumulo is going to tear down
your iterator and stop using it?  When I attach an iterator to a scanner,
isn't it only destroyed after I complete my scan?

What I have observed is something similar to the following....

init is called on creation

seek is called where you need to have the first K,V pair at the end of seek

hasTop, getTopKey, and getTopValue are called

next is called as long as hasTop is true

Once hasTop is false, the scan concludes



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9433.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by Josh Elser <jo...@gmail.com>.
Sure, if you look at the interface on SortedKeyValueIterator, the only 
"lifecycle" type methods are init and deepCopy. In other words, you have 
some control over when to create a BatchWriter, but you don't know when 
Accumulo is going to tear down that iterator and stop using it.

You could always create/use/close a batchwriter in an iterator without 
issue; however, it'll be difficult to keep a single BatchWriter alive 
for the desired lifecycle.

In practice, Accumulo's lifecycle for a SKVI is either timeout related 
or related to how often the buffer of results between server and client 
fill. Normally, the case is when the buffer of results between server 
and client fills, Accumulo will tear down scan, and thus, your iterator.

On 4/26/14, 10:26 PM, BlackJack76 wrote:
> Josh,
>
> Thank you very much for thinking about this more.  I appreciate your
> feedback.
>
> You are probably much more familiar with Accumulo and the BatchWriters than
> I am.  As long as you open and properly close the BatchWriter in each
> iterator then where do you envision a leak would occur?  I think I am
> missing something.  Again, appreciate your insight.
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9430.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
Josh,

Thank you very much for thinking about this more.  I appreciate your
feedback.

You are probably much more familiar with Accumulo and the BatchWriters than
I am.  As long as you open and properly close the BatchWriter in each
iterator then where do you envision a leak would occur?  I think I am
missing something.  Again, appreciate your insight.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9430.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by Josh Elser <jo...@gmail.com>.
I've been thinking about this some more -- resource management in the 
tabletservers is definitely a concern, even if deadlock isn't (I still 
haven't convinced myself one way or another).

Difficulty will definitely arise in trying to make sure you don't leak 
BatchWriters (and the thread it spawns inside which sends mutationst o 
the appropriate tserver). Not impossible, but definitely tricky :)

On 4/26/14, 2:38 PM, Donald Miner wrote:
> I haven't actually implemented this, just what we've been thinking
> about. But yeah, the idea is that you write using BatchWriter in the
> check method of Constraint.
>
>
> On Sat, Apr 26, 2014 at 9:58 AM, BlackJack76 <justin.loy@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Donald,
>
>     Thanks for the response.  Sounds like we are exactly on the same page.
>
>     Honestly, I wasn't familiar with Constraints until you mentioned
>     them.  I
>     did some quick reading about them and still trying to understand how
>     you are
>     accomplishing this.  This is a very interesting idea and I
>     appreciate that
>     you shared it.
>
>     I am assuming you are writing back to the table from the check method in
>     your Constraint.  How are you writing back to the table?  Through a
>     BatchWriter?
>
>     I also thought about creating some sort of external buffer but agree
>     that it
>     would be a lot of extra work and book keeping.
>
>
>
>     --
>     View this message in context:
>     http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9424.html
>     Sent from the Users mailing list archive at Nabble.com.
>
>
>
>
> --
> *
> *Donald Miner
> Chief Technology Officer
> ClearEdge IT Solutions, LLC
> Cell: 443 799 7807
> www.clearedgeit.com <http://www.clearedgeit.com>

Re: Write to table from Accumulo iterator

Posted by Donald Miner <dm...@clearedgeit.com>.
I haven't actually implemented this, just what we've been thinking about.
But yeah, the idea is that you write using BatchWriter in the check method
of Constraint.


On Sat, Apr 26, 2014 at 9:58 AM, BlackJack76 <ju...@gmail.com> wrote:

> Donald,
>
> Thanks for the response.  Sounds like we are exactly on the same page.
>
> Honestly, I wasn't familiar with Constraints until you mentioned them.  I
> did some quick reading about them and still trying to understand how you
> are
> accomplishing this.  This is a very interesting idea and I appreciate that
> you shared it.
>
> I am assuming you are writing back to the table from the check method in
> your Constraint.  How are you writing back to the table?  Through a
> BatchWriter?
>
> I also thought about creating some sort of external buffer but agree that
> it
> would be a lot of extra work and book keeping.
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9424.html
> Sent from the Users mailing list archive at Nabble.com.
>



-- 

Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
Donald,

Thanks for the response.  Sounds like we are exactly on the same page.

Honestly, I wasn't familiar with Constraints until you mentioned them.  I
did some quick reading about them and still trying to understand how you are
accomplishing this.  This is a very interesting idea and I appreciate that
you shared it.

I am assuming you are writing back to the table from the check method in
your Constraint.  How are you writing back to the table?  Through a
BatchWriter?

I also thought about creating some sort of external buffer but agree that it
would be a lot of extra work and book keeping.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9424.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by Donald Miner <dm...@clearedgeit.com>.
We've been thinking about doing this as well for the exact same use case.

We've been considering doing what you are suggesting from a constraint, not a iterator. This way it gets written to the index right away. Basically the constraint writes and then returns true on success. Kind of worried about performance implications.

We are also considering writing to a local buffer outside of accumulo tablet servers and then flushing periodically with a batchwriter. It's just more work. 

I want to do this because I don't think my clients can keep up with parsing load and and additional network load and i'd like to push this work to my cluster. 

> On Apr 26, 2014, at 12:12 AM, BlackJack76 <ju...@gmail.com> wrote:
> 
> As long as my tablets stay constant, I have no problem using a BatchWriter in
> an iterator.
> 
> 
> 
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9422.html
> Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
As long as my tablets stay constant, I have no problem using a BatchWriter in
an iterator.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9422.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by Josh Elser <jo...@gmail.com>.
I don't believe heavy load is a requirement. I'm pretty sure you can 
deadlock pretty easily if you try writing within an iterator.

Focus on Accismus would be best IMO, but, like Bill said, it's probably 
not fully there.

On 4/25/14, 11:42 PM, William Slacum wrote:
> Our own Keith Turner is trying to make this possible with Accismus
> (https://github.com/keith-turner/Accismus). I don't know the current
> state of it, but I believe it's still in the early stages.
>
> I've always been under the impression that launching a scanner or writer
> from within an iterator, as it can cause deadlock in the system if it is
> under heavy load.
>
>   If it doesn't meet your needs, I'd recommend writing a daemon process
> that identifies new documents via a scanner and filter, then write
> indices for it. It's more network bound than doing it in an iterator,
> but it's safer.
>
>
>
> On Fri, Apr 25, 2014 at 11:29 PM, David Medinets
> <david.medinets@gmail.com <ma...@gmail.com>> wrote:
>
>     Can you change the ingest process to token on ingest?
>
>
>     On Fri, Apr 25, 2014 at 10:45 PM, BlackJack76 <justin.loy@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Sure thing.  Basically, I am attempting to index a document.
>           When I find the
>         document, I want to insert the tokens directly back into the
>         table.  I want
>         to do it directly from the seek routine so that I don't need to
>         return
>         anything back to the client.
>
>         For example, seek may locate the document that has the following
>         sentence:
>
>         The quick brown fox
>
>          From there, I tokenize the document and want to insert the
>         individual tokens
>         back into tokens back into Accumulo (i.e., The, quick, brown,
>         and fox all as
>         separate mutations).
>
>
>
>         --
>         View this message in context:
>         http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9414.html
>         Sent from the Users mailing list archive at Nabble.com.
>
>
>

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
I will have to take a look at Keith's project.

Do you have any experience with the the scanner or writer deadlocking in an
iterator?  Or did you hear that somewhere?  Just curious because for the
most part it has worked well.  Only had one problem recently and trying to
find out if there is a better way.

So with the daemon process, am I understanding correctly that you are
recommending that I scan, filter, and use a batchwriter from the client?  I
agree that it would work but I am trying to avoid the network load that
comes with scanning and writing it back.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9421.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by William Slacum <wi...@accumulo.net>.
Our own Keith Turner is trying to make this possible with Accismus (
https://github.com/keith-turner/Accismus). I don't know the current state
of it, but I believe it's still in the early stages.

I've always been under the impression that launching a scanner or writer
from within an iterator, as it can cause deadlock in the system if it is
under heavy load.

 If it doesn't meet your needs, I'd recommend writing a daemon process that
identifies new documents via a scanner and filter, then write indices for
it. It's more network bound than doing it in an iterator, but it's safer.



On Fri, Apr 25, 2014 at 11:29 PM, David Medinets
<da...@gmail.com>wrote:

> Can you change the ingest process to token on ingest?
>
>
> On Fri, Apr 25, 2014 at 10:45 PM, BlackJack76 <ju...@gmail.com>wrote:
>
>> Sure thing.  Basically, I am attempting to index a document.  When I find
>> the
>> document, I want to insert the tokens directly back into the table.  I
>> want
>> to do it directly from the seek routine so that I don't need to return
>> anything back to the client.
>>
>> For example, seek may locate the document that has the following sentence:
>>
>> The quick brown fox
>>
>> From there, I tokenize the document and want to insert the individual
>> tokens
>> back into tokens back into Accumulo (i.e., The, quick, brown, and fox all
>> as
>> separate mutations).
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9414.html
>> Sent from the Users mailing list archive at Nabble.com.
>>
>
>

Re: Write to table from Accumulo iterator

Posted by David Medinets <da...@gmail.com>.
Can you change the ingest process to token on ingest?


On Fri, Apr 25, 2014 at 10:45 PM, BlackJack76 <ju...@gmail.com> wrote:

> Sure thing.  Basically, I am attempting to index a document.  When I find
> the
> document, I want to insert the tokens directly back into the table.  I want
> to do it directly from the seek routine so that I don't need to return
> anything back to the client.
>
> For example, seek may locate the document that has the following sentence:
>
> The quick brown fox
>
> From there, I tokenize the document and want to insert the individual
> tokens
> back into tokens back into Accumulo (i.e., The, quick, brown, and fox all
> as
> separate mutations).
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9414.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
Yes sir



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9418.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by Russ Weeks <rw...@newbrightidea.com>.
Like, building the index lazily? Very interesting idea...
-Russ

On Friday, April 25, 2014, BlackJack76 <ju...@gmail.com> wrote:

> Sure thing.  Basically, I am attempting to index a document.  When I find
> the
> document, I want to insert the tokens directly back into the table.  I want
> to do it directly from the seek routine so that I don't need to return
> anything back to the client.
>
> For example, seek may locate the document that has the following sentence:
>
> The quick brown fox
>
> From there, I tokenize the document and want to insert the individual
> tokens
> back into tokens back into Accumulo (i.e., The, quick, brown, and fox all
> as
> separate mutations).
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9414.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: Write to table from Accumulo iterator

Posted by BlackJack76 <ju...@gmail.com>.
Sure thing.  Basically, I am attempting to index a document.  When I find the
document, I want to insert the tokens directly back into the table.  I want
to do it directly from the seek routine so that I don't need to return
anything back to the client.

For example, seek may locate the document that has the following sentence:

The quick brown fox

>From there, I tokenize the document and want to insert the individual tokens
back into tokens back into Accumulo (i.e., The, quick, brown, and fox all as
separate mutations).



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412p9414.html
Sent from the Users mailing list archive at Nabble.com.

Re: Write to table from Accumulo iterator

Posted by Mike Drob <md...@mdrob.com>.
Can you share a little more about what you are trying to achieve? My first
thought would be to try looking at the Conditional Mutations present in
1.6.0 (not yet released) as either a ready implementation our a starting
point for your own code.
On Apr 25, 2014 10:13 PM, "BlackJack76" <ju...@gmail.com> wrote:

> I am trying to figure out the best way to write to the table from inside
> the
> seek method of a class that implements SortedKeyValueIterator.  I
> originally
> tried to create a BatchWriter and just use that to write data.  However, if
> the tablet moved during a flush then it would hang.
>
> Any other recommendations on how to write back to the table?  Thanks!
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412.html
> Sent from the Users mailing list archive at Nabble.com.
>