You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by D P <pa...@gmail.com> on 2014/10/22 19:59:29 UTC

WAL Log - Real-time updates

I am working with Accumulo and looking for the best means of knowing when
something has been updated/inserted into my Accumulo instance.  For
instance, every-time data is inserted, how can I know externally?  If the
write-ahead log file stores this, is it best to just read the HDFS WAL log
with a storm spout to know when something has been inserted into a table?

I am planning to do some real-time visualization with accumulo, but when
data is inserted I want to be able to notify my UI.

Thanks!

Re: WAL Log - Real-time updates

Posted by Josh Elser <jo...@gmail.com>.
I tried to design the replication implementation to be relatively flexible in what the act of "replication" actually looks like. In short, you write an implementation that will get some context information about new information in the system that will be run.

https://github.com/apache/accumulo/blob/3107627b778e3093d95777a4313277305cd0aaa2/core/src/main/java/org/apache/accumulo/core/client/replication/ReplicaSystem.java

Be aware though, this isn't a substitute for a trigger, and may not actually meet your needs of "realtime". By default, it would be order of minutes before your implementation would be triggered. You could tweak some configuration parameters down to 10's of seconds, but you would incur some more load by repeatedly scanning the Accumulo replication table.

If you just want notification of *any* data being written to a table, I think you could do this pretty easily. Inspecting the new data that has arrived and make some data-aware notification would be more difficult but likely still feasible.

D P wrote:
> The lily indexer/SEP is really interesting.  Thanks for both of your posts
>
> On Wed, Oct 22, 2014 at 2:07 PM, Sean Busbey <busbey@cloudera.com 
> <ma...@cloudera.com>> wrote:
>
>     the way this gets done in HBase, i.e. for the HBase Lily
>     Indexer[1], is to add a replication consumer that isn't an actual
>     cluster. IMHO, you'd be better off taking that kind of approach
>     rather than trying to consume the WALs off of HDFS. I haven't
>     attempted to use our replication interface for this yet, but in
>     principle it should work.
>
>     Note that either of these approaches are going to be very fragile
>     across Accumulo versions because they aren't interfaces intended
>     for consumption.
>
>     [1]: http://ngdata.github.io/hbase-indexer/
>
>     On Wed, Oct 22, 2014 at 12:59 PM, D P <pacificobuzz@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         I am working with Accumulo and looking for the best means of
>         knowing when something has been updated/inserted into my
>         Accumulo instance.  For instance, every-time data is inserted,
>         how can I know externally?  If the write-ahead log file stores
>         this, is it best to just read the HDFS WAL log with a storm
>         spout to know when something has been inserted into a table?
>
>         I am planning to do some real-time visualization with
>         accumulo, but when data is inserted I want to be able to
>         notify my UI.
>
>         Thanks!
>
>
>
>
>     -- 
>     Sean
>
>

Re: WAL Log - Real-time updates

Posted by D P <pa...@gmail.com>.
The lily indexer/SEP is really interesting.  Thanks for both of your posts

On Wed, Oct 22, 2014 at 2:07 PM, Sean Busbey <bu...@cloudera.com> wrote:

> the way this gets done in HBase, i.e. for the HBase Lily Indexer[1], is to
> add a replication consumer that isn't an actual cluster. IMHO, you'd be
> better off taking that kind of approach rather than trying to consume the
> WALs off of HDFS. I haven't attempted to use our replication interface for
> this yet, but in principle it should work.
>
> Note that either of these approaches are going to be very fragile across
> Accumulo versions because they aren't interfaces intended for consumption.
>
> [1]: http://ngdata.github.io/hbase-indexer/
>
> On Wed, Oct 22, 2014 at 12:59 PM, D P <pa...@gmail.com> wrote:
>
>> I am working with Accumulo and looking for the best means of knowing when
>> something has been updated/inserted into my Accumulo instance.  For
>> instance, every-time data is inserted, how can I know externally?  If the
>> write-ahead log file stores this, is it best to just read the HDFS WAL log
>> with a storm spout to know when something has been inserted into a table?
>>
>> I am planning to do some real-time visualization with accumulo, but when
>> data is inserted I want to be able to notify my UI.
>>
>> Thanks!
>>
>
>
>
> --
> Sean
>

Re: WAL Log - Real-time updates

Posted by Sean Busbey <bu...@cloudera.com>.
the way this gets done in HBase, i.e. for the HBase Lily Indexer[1], is to
add a replication consumer that isn't an actual cluster. IMHO, you'd be
better off taking that kind of approach rather than trying to consume the
WALs off of HDFS. I haven't attempted to use our replication interface for
this yet, but in principle it should work.

Note that either of these approaches are going to be very fragile across
Accumulo versions because they aren't interfaces intended for consumption.

[1]: http://ngdata.github.io/hbase-indexer/

On Wed, Oct 22, 2014 at 12:59 PM, D P <pa...@gmail.com> wrote:

> I am working with Accumulo and looking for the best means of knowing when
> something has been updated/inserted into my Accumulo instance.  For
> instance, every-time data is inserted, how can I know externally?  If the
> write-ahead log file stores this, is it best to just read the HDFS WAL log
> with a storm spout to know when something has been inserted into a table?
>
> I am planning to do some real-time visualization with accumulo, but when
> data is inserted I want to be able to notify my UI.
>
> Thanks!
>



-- 
Sean

Re: WAL Log - Real-time updates

Posted by John Vines <vi...@apache.org>.
The client writing to accumulo can guarantee that everything has been
submitted when the flush() call returns.

On Wed, Oct 22, 2014 at 1:59 PM, D P <pa...@gmail.com> wrote:

> I am working with Accumulo and looking for the best means of knowing when
> something has been updated/inserted into my Accumulo instance.  For
> instance, every-time data is inserted, how can I know externally?  If the
> write-ahead log file stores this, is it best to just read the HDFS WAL log
> with a storm spout to know when something has been inserted into a table?
>
> I am planning to do some real-time visualization with accumulo, but when
> data is inserted I want to be able to notify my UI.
>
> Thanks!
>