You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "THORMAN, ROBERT D" <rt...@att.com> on 2014/07/17 20:53:41 UTC

Search

What lexical search package (like lucene/solr) has anyone put on top of accumulo?  Is this possible or does everyone just index log files and documents?

v/r
Bob Thorman
Principal Big Data Engineer
AT&T Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714



Re: Search

Posted by Josh Elser <jo...@gmail.com>.
Accumulo Iterators really aren't designed to support writes from within 
Accumulo itself. Deadlock within Accumulo and lack of lifecycle methods 
are the two big problems. Iterators really weren't designed for this 
purpose.

Fluo[1] (formerly known as Accismus), an implementation of Percolator 
using Accumulo, is really the "correct" tool to target for incremental 
updates for documents.

- Josh

[1] https://github.com/fluo-io/fluo

On 7/23/14, 4:39 PM, Roshan Punnoose wrote:
> Is there a way to tie into the write process in Accumulo? Maybe just use an
> Iterator that worked on compaction to send data to blur/solr? I have seen
> something similar in Cassandra, a data hook to save data in Solr.
>
>
> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>
>> We were trying to do so, but adding visibility while adding/searching
>> documents needs lot more thinking. Adding visibility to core search engine
>> needs changes to algorithm and that does not make it very scalable.
>> Integration besides granular visibility is very doable. and we had taken
>> inspiration from Solandra.
>>
>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>> people have already done it, are they thinking to open source it anytime in
>> future?
>>
>>
>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>> wrote:
>>
>>> We briefly toyed with blur on accumulo but didnt get too far just because
>>> it was obe. I think that would be cool.
>>>
>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
>>>>
>>>> It's definitely possible. I remember hearing about someone doing lucene
>>> on top of Accumulo once, but I don't recall seeing a nice package with a
>>> bow on top.
>>>>
>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>> What lexical search package (like lucene/solr) has anyone put on top
>> of
>>> accumulo?  Is this possible or does everyone just index log files and
>>> documents?
>>>>>
>>>>> v/r
>>>>> Bob Thorman
>>>>> Principal Big Data Engineer
>>>>> AT&T Big Data CoE
>>>>> 2900 W. Plano Parkway
>>>>> Plano, TX 75075
>>>>> 972-658-1714
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: Search

Posted by Josh Elser <jo...@gmail.com>.
Hmm, digging through the code, I might retract my claim.

Constraints might fire early enough in the server-side pipeline that it 
just "appears" like another client writing data. I'd need to do some 
more testing to make myself confident though :)

On 7/23/14, 5:02 PM, Donald Miner wrote:
>
> Can you explain the deadlock? Is it only if i have a cyclical trigger? Or is there something else?
>
> Not trying to argue my constraint approach is a good idea. It it is a bad idea.  Just curious.
>
>
>> On Jul 23, 2014, at 4:54 PM, Josh Elser <jo...@gmail.com> wrote:
>>
>> Constraints execute server-side.
>>
>> It might work to write to an external system but it's still bogus for proper lifecycle methods on cleanup inside Accumulo. Writing to Accumulo within a constraint is still deadlock prone, as well.
>>
>>> On 7/23/14, 4:51 PM, Donald Miner wrote:
>>> We thought about this for text indexing. We found we could use Constraints as they trigger on write. Seems like a bit of a hack and i have no idea if it would blow something up.
>>>
>>> Does constraint execution happen on tablet side or client side?
>>>
>>>> On Jul 23, 2014, at 4:39 PM, Roshan Punnoose <ro...@gmail.com> wrote:
>>>>
>>>> Is there a way to tie into the write process in Accumulo? Maybe just use an
>>>> Iterator that worked on compaction to send data to blur/solr? I have seen
>>>> something similar in Cassandra, a data hook to save data in Solr.
>>>>
>>>>
>>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>>>>>
>>>>> We were trying to do so, but adding visibility while adding/searching
>>>>> documents needs lot more thinking. Adding visibility to core search engine
>>>>> needs changes to algorithm and that does not make it very scalable.
>>>>> Integration besides granular visibility is very doable. and we had taken
>>>>> inspiration from Solandra.
>>>>>
>>>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>>>>> people have already done it, are they thinking to open source it anytime in
>>>>> future?
>>>>>
>>>>>
>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>>>>> wrote:
>>>>>
>>>>>> We briefly toyed with blur on accumulo but didnt get too far just because
>>>>>> it was obe. I think that would be cool.
>>>>>>
>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
>>>>>>>
>>>>>>> It's definitely possible. I remember hearing about someone doing lucene
>>>>>> on top of Accumulo once, but I don't recall seeing a nice package with a
>>>>>> bow on top.
>>>>>>>
>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>>>>> What lexical search package (like lucene/solr) has anyone put on top
>>>>> of
>>>>>> accumulo?  Is this possible or does everyone just index log files and
>>>>>> documents?
>>>>>>>>
>>>>>>>> v/r
>>>>>>>> Bob Thorman
>>>>>>>> Principal Big Data Engineer
>>>>>>>> AT&T Big Data CoE
>>>>>>>> 2900 W. Plano Parkway
>>>>>>>> Plano, TX 75075
>>>>>>>> 972-658-1714
>>>>>

Re: Search

Posted by Adam Fuchs <af...@apache.org>.
The nature of the "deadlock" is that the thread that initiates the write
comes from the same pool that handles the other side of the write, and that
thread is not released until the write completes. The tablet server uses a
thread pool that grows slowly if there is demand, so this technically
doesn't create a deadlock in the current implementation (past and future
pool configurations may vary). However, even without deadlock this is not
efficient.

Adam
On Jul 23, 2014 5:02 PM, "Donald Miner" <dm...@clearedgeit.com> wrote:

>
> Can you explain the deadlock? Is it only if i have a cyclical trigger? Or
> is there something else?
>
> Not trying to argue my constraint approach is a good idea. It it is a bad
> idea.  Just curious.
>
>
> > On Jul 23, 2014, at 4:54 PM, Josh Elser <jo...@gmail.com> wrote:
> >
> > Constraints execute server-side.
> >
> > It might work to write to an external system but it's still bogus for
> proper lifecycle methods on cleanup inside Accumulo. Writing to Accumulo
> within a constraint is still deadlock prone, as well.
> >
> >> On 7/23/14, 4:51 PM, Donald Miner wrote:
> >> We thought about this for text indexing. We found we could use
> Constraints as they trigger on write. Seems like a bit of a hack and i have
> no idea if it would blow something up.
> >>
> >> Does constraint execution happen on tablet side or client side?
> >>
> >>> On Jul 23, 2014, at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
> wrote:
> >>>
> >>> Is there a way to tie into the write process in Accumulo? Maybe just
> use an
> >>> Iterator that worked on compaction to send data to blur/solr? I have
> seen
> >>> something similar in Cassandra, a data hook to save data in Solr.
> >>>
> >>>
> >>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com>
> wrote:
> >>>>
> >>>> We were trying to do so, but adding visibility while adding/searching
> >>>> documents needs lot more thinking. Adding visibility to core search
> engine
> >>>> needs changes to algorithm and that does not make it very scalable.
> >>>> Integration besides granular visibility is very doable. and we had
> taken
> >>>> inspiration from Solandra.
> >>>>
> >>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
> >>>> people have already done it, are they thinking to open source it
> anytime in
> >>>> future?
> >>>>
> >>>>
> >>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dminer@clearedgeit.com
> >
> >>>> wrote:
> >>>>
> >>>>> We briefly toyed with blur on accumulo but didnt get too far just
> because
> >>>>> it was obe. I think that would be cool.
> >>>>>
> >>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> It's definitely possible. I remember hearing about someone doing
> lucene
> >>>>> on top of Accumulo once, but I don't recall seeing a nice package
> with a
> >>>>> bow on top.
> >>>>>>
> >>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> >>>>>>> What lexical search package (like lucene/solr) has anyone put on
> top
> >>>> of
> >>>>> accumulo?  Is this possible or does everyone just index log files and
> >>>>> documents?
> >>>>>>>
> >>>>>>> v/r
> >>>>>>> Bob Thorman
> >>>>>>> Principal Big Data Engineer
> >>>>>>> AT&T Big Data CoE
> >>>>>>> 2900 W. Plano Parkway
> >>>>>>> Plano, TX 75075
> >>>>>>> 972-658-1714
> >>>>
>

Re: Search

Posted by Donald Miner <dm...@clearedgeit.com>.
Can you explain the deadlock? Is it only if i have a cyclical trigger? Or is there something else?

Not trying to argue my constraint approach is a good idea. It it is a bad idea.  Just curious. 


> On Jul 23, 2014, at 4:54 PM, Josh Elser <jo...@gmail.com> wrote:
> 
> Constraints execute server-side.
> 
> It might work to write to an external system but it's still bogus for proper lifecycle methods on cleanup inside Accumulo. Writing to Accumulo within a constraint is still deadlock prone, as well.
> 
>> On 7/23/14, 4:51 PM, Donald Miner wrote:
>> We thought about this for text indexing. We found we could use Constraints as they trigger on write. Seems like a bit of a hack and i have no idea if it would blow something up.
>> 
>> Does constraint execution happen on tablet side or client side?
>> 
>>> On Jul 23, 2014, at 4:39 PM, Roshan Punnoose <ro...@gmail.com> wrote:
>>> 
>>> Is there a way to tie into the write process in Accumulo? Maybe just use an
>>> Iterator that worked on compaction to send data to blur/solr? I have seen
>>> something similar in Cassandra, a data hook to save data in Solr.
>>> 
>>> 
>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>>>> 
>>>> We were trying to do so, but adding visibility while adding/searching
>>>> documents needs lot more thinking. Adding visibility to core search engine
>>>> needs changes to algorithm and that does not make it very scalable.
>>>> Integration besides granular visibility is very doable. and we had taken
>>>> inspiration from Solandra.
>>>> 
>>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>>>> people have already done it, are they thinking to open source it anytime in
>>>> future?
>>>> 
>>>> 
>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>>>> wrote:
>>>> 
>>>>> We briefly toyed with blur on accumulo but didnt get too far just because
>>>>> it was obe. I think that would be cool.
>>>>> 
>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
>>>>>> 
>>>>>> It's definitely possible. I remember hearing about someone doing lucene
>>>>> on top of Accumulo once, but I don't recall seeing a nice package with a
>>>>> bow on top.
>>>>>> 
>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>>>> What lexical search package (like lucene/solr) has anyone put on top
>>>> of
>>>>> accumulo?  Is this possible or does everyone just index log files and
>>>>> documents?
>>>>>>> 
>>>>>>> v/r
>>>>>>> Bob Thorman
>>>>>>> Principal Big Data Engineer
>>>>>>> AT&T Big Data CoE
>>>>>>> 2900 W. Plano Parkway
>>>>>>> Plano, TX 75075
>>>>>>> 972-658-1714
>>>> 

Re: Search

Posted by Josh Elser <jo...@gmail.com>.
Constraints execute server-side.

It might work to write to an external system but it's still bogus for 
proper lifecycle methods on cleanup inside Accumulo. Writing to Accumulo 
within a constraint is still deadlock prone, as well.

On 7/23/14, 4:51 PM, Donald Miner wrote:
> We thought about this for text indexing. We found we could use Constraints as they trigger on write. Seems like a bit of a hack and i have no idea if it would blow something up.
>
> Does constraint execution happen on tablet side or client side?
>
>> On Jul 23, 2014, at 4:39 PM, Roshan Punnoose <ro...@gmail.com> wrote:
>>
>> Is there a way to tie into the write process in Accumulo? Maybe just use an
>> Iterator that worked on compaction to send data to blur/solr? I have seen
>> something similar in Cassandra, a data hook to save data in Solr.
>>
>>
>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>>>
>>> We were trying to do so, but adding visibility while adding/searching
>>> documents needs lot more thinking. Adding visibility to core search engine
>>> needs changes to algorithm and that does not make it very scalable.
>>> Integration besides granular visibility is very doable. and we had taken
>>> inspiration from Solandra.
>>>
>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>>> people have already done it, are they thinking to open source it anytime in
>>> future?
>>>
>>>
>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>>> wrote:
>>>
>>>> We briefly toyed with blur on accumulo but didnt get too far just because
>>>> it was obe. I think that would be cool.
>>>>
>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
>>>>>
>>>>> It's definitely possible. I remember hearing about someone doing lucene
>>>> on top of Accumulo once, but I don't recall seeing a nice package with a
>>>> bow on top.
>>>>>
>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>>> What lexical search package (like lucene/solr) has anyone put on top
>>> of
>>>> accumulo?  Is this possible or does everyone just index log files and
>>>> documents?
>>>>>>
>>>>>> v/r
>>>>>> Bob Thorman
>>>>>> Principal Big Data Engineer
>>>>>> AT&T Big Data CoE
>>>>>> 2900 W. Plano Parkway
>>>>>> Plano, TX 75075
>>>>>> 972-658-1714
>>>

Re: Search

Posted by Donald Miner <dm...@clearedgeit.com>.
We thought about this for text indexing. We found we could use Constraints as they trigger on write. Seems like a bit of a hack and i have no idea if it would blow something up. 

Does constraint execution happen on tablet side or client side?

> On Jul 23, 2014, at 4:39 PM, Roshan Punnoose <ro...@gmail.com> wrote:
> 
> Is there a way to tie into the write process in Accumulo? Maybe just use an
> Iterator that worked on compaction to send data to blur/solr? I have seen
> something similar in Cassandra, a data hook to save data in Solr.
> 
> 
>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>> 
>> We were trying to do so, but adding visibility while adding/searching
>> documents needs lot more thinking. Adding visibility to core search engine
>> needs changes to algorithm and that does not make it very scalable.
>> Integration besides granular visibility is very doable. and we had taken
>> inspiration from Solandra.
>> 
>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>> people have already done it, are they thinking to open source it anytime in
>> future?
>> 
>> 
>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>> wrote:
>> 
>>> We briefly toyed with blur on accumulo but didnt get too far just because
>>> it was obe. I think that would be cool.
>>> 
>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
>>>> 
>>>> It's definitely possible. I remember hearing about someone doing lucene
>>> on top of Accumulo once, but I don't recall seeing a nice package with a
>>> bow on top.
>>>> 
>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>> What lexical search package (like lucene/solr) has anyone put on top
>> of
>>> accumulo?  Is this possible or does everyone just index log files and
>>> documents?
>>>>> 
>>>>> v/r
>>>>> Bob Thorman
>>>>> Principal Big Data Engineer
>>>>> AT&T Big Data CoE
>>>>> 2900 W. Plano Parkway
>>>>> Plano, TX 75075
>>>>> 972-658-1714
>> 

Re: Search

Posted by "THORMAN, ROBERT D" <rt...@att.com>.
Yes, you have missed my original request.  I need a fast way (i.e.
Pre-indexed) to perform lexical searches on row values without using a
regex based iterator.  I also do not want to duplicate data from the
cluster onto a document based strategy that is typically required by
packages like Apache Lucene.

v/r
Bob Thorman
Principal Big Data Engineer
AT&T Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714






On 7/24/14, 11:37 AM, "Nehal Mehta" <ne...@gmail.com> wrote:

>If we have two streams, we would just store data into Accumulo and use it
>as backend. What we are/were trying to implement was secure search. So if
>user does not have rights to search that cell, user can see other listing
>but not one which is inaccessible. By doing so we would add lot more
>value.
>
>Am I missing something?
>
>
>On Thu, Jul 24, 2014 at 12:17 PM, THORMAN, ROBERT D <rt...@att.com>
>wrote:
>
>> Search the terms (words, phases, sub-strings, combinations) of the row
>> values.  Lucene is an apache project that does document indexing on
>>terms.
>>
>> v/r
>> Bob Thorman
>> Principal Big Data Engineer
>> AT&T Big Data CoE
>> 2900 W. Plano Parkway
>> Plano, TX 75075
>> 972-658-1714
>>
>>
>>
>>
>>
>>
>> On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <ke...@ll.mit.edu>
>> wrote:
>>
>> >What is meant by lexical search? Lucene style?
>> >
>> >http://www.lucenetutorial.com/lucene-query-syntax.html
>> >
>> >If so, these searches could be prioritized (not all are particularly
>> >useful), and it shouldn't be too hard to come up with recommended
>> >Accumulo approaches for the most important lexical searches.
>> >
>> >On Jul 24, 2014, at 10:44 AM, Donald Miner <dm...@clearedgeit.com>
>> wrote:
>> >
>> >> One problem I ran into when thinking about this problem is
>>throughput.
>> >>In
>> >> accumulo, we talk about tens or hundreds of thousands or millions of
>> >> records per second. A lot of these search solutions talk about
>>hundreds
>> >>or
>> >> thousands of documents per second.
>> >>
>> >> This problem that Accumulo is able to outpace just about anything
>>lead
>> >>me
>> >> to think that some sort of microbatch solution might be the best
>> >>choice. If
>> >> you wait for your data to be indexed before moving on to the next
>> >>Accumulo
>> >> insert you can start lagging behind. Basically, you are crippling
>>your
>> >> ingest throughput by making it the slower of the two systems.
>> >>
>> >> It seems like a more microbatch (or batch) approach might be
>> >>worthwhile--
>> >> what you are trading is your text index lagging behind, but you keep
>> >>your
>> >> ingest throughput in Accumulo. I think Apache Blur does batch
>>parallel
>> >> indexing, which is why I was looking at it for this.
>> >>
>> >>
>> >> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <ro...@gmail.com>
>> >>wrote:
>> >>
>> >>> Yeah I think David's solution is the best. Though I like the idea of
>> >>>having
>> >>> a server side Constraint or hook that puts the updates into the
>>queue.
>> >>>
>> >>> The Cassandra work I had seen actually tightly couples a Cassandra
>> >>>node to
>> >>> a Solr shard. So all the data that exists on that specific node also
>> >>>exists
>> >>> on that specific Solr shard. Would be pretty cool to do the same
>>thing
>> >>>with
>> >>> a tablet server => local Solr shard.
>> >>>
>> >>>
>> >>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets
>> >>><da...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Ingest to a queue. Have two processes subscribe to the queue. One
>> >>>> pushing into Accumulo and the other pushing into SolrCloud. Why
>> >>>> tightly couple the capabilities?
>> >>>>
>> >>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose
>><ro...@gmail.com>
>> >>>> wrote:
>> >>>>> Is there a way to tie into the write process in Accumulo? Maybe
>>just
>> >>> use
>> >>>> an
>> >>>>> Iterator that worked on compaction to send data to blur/solr? I
>>have
>> >>> seen
>> >>>>> something similar in Cassandra, a data hook to save data in Solr.
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com>
>> >>> wrote:
>> >>>>>
>> >>>>>> We were trying to do so, but adding visibility while
>> >>>>>>adding/searching
>> >>>>>> documents needs lot more thinking. Adding visibility to core
>>search
>> >>>> engine
>> >>>>>> needs changes to algorithm and that does not make it very
>>scalable.
>> >>>>>> Integration besides granular visibility is very doable. and we
>>had
>> >>> taken
>> >>>>>> inspiration from Solandra.
>> >>>>>>
>> >>>>>> Obviously if we can get it done it adds lot of value. I believe
>> >>>>>>Sqrrl
>> >>>>>> people have already done it, are they thinking to open source it
>> >>>> anytime in
>> >>>>>> future?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner
>> >>>>>><dminer@clearedgeit.com
>> >>>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> We briefly toyed with blur on accumulo but didnt get too far
>>just
>> >>>> because
>> >>>>>>> it was obe. I think that would be cool.
>> >>>>>>>
>> >>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
>> >>>> wrote:
>> >>>>>>>>
>> >>>>>>>> It's definitely possible. I remember hearing about someone
>>doing
>> >>>> lucene
>> >>>>>>> on top of Accumulo once, but I don't recall seeing a nice
>>package
>> >>>> with a
>> >>>>>>> bow on top.
>> >>>>>>>>
>> >>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>> >>>>>>>>> What lexical search package (like lucene/solr) has anyone put
>>on
>> >>>> top
>> >>>>>> of
>> >>>>>>> accumulo?  Is this possible or does everyone just index log
>>files
>> >>> and
>> >>>>>>> documents?
>> >>>>>>>>>
>> >>>>>>>>> v/r
>> >>>>>>>>> Bob Thorman
>> >>>>>>>>> Principal Big Data Engineer
>> >>>>>>>>> AT&T Big Data CoE
>> >>>>>>>>> 2900 W. Plano Parkway
>> >>>>>>>>> Plano, TX 75075
>> >>>>>>>>> 972-658-1714
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Donald Miner
>> >> Chief Technology Officer
>> >> ClearEdge IT Solutions, LLC
>> >> Cell: 443 799 7807
>> >> www.clearedgeit.com
>> >
>>
>>


Re: Search

Posted by Nehal Mehta <ne...@gmail.com>.
If we have two streams, we would just store data into Accumulo and use it
as backend. What we are/were trying to implement was secure search. So if
user does not have rights to search that cell, user can see other listing
but not one which is inaccessible. By doing so we would add lot more value.

Am I missing something?


On Thu, Jul 24, 2014 at 12:17 PM, THORMAN, ROBERT D <rt...@att.com> wrote:

> Search the terms (words, phases, sub-strings, combinations) of the row
> values.  Lucene is an apache project that does document indexing on terms.
>
> v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>
>
>
>
>
> On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <ke...@ll.mit.edu>
> wrote:
>
> >What is meant by lexical search? Lucene style?
> >
> >http://www.lucenetutorial.com/lucene-query-syntax.html
> >
> >If so, these searches could be prioritized (not all are particularly
> >useful), and it shouldn't be too hard to come up with recommended
> >Accumulo approaches for the most important lexical searches.
> >
> >On Jul 24, 2014, at 10:44 AM, Donald Miner <dm...@clearedgeit.com>
> wrote:
> >
> >> One problem I ran into when thinking about this problem is throughput.
> >>In
> >> accumulo, we talk about tens or hundreds of thousands or millions of
> >> records per second. A lot of these search solutions talk about hundreds
> >>or
> >> thousands of documents per second.
> >>
> >> This problem that Accumulo is able to outpace just about anything lead
> >>me
> >> to think that some sort of microbatch solution might be the best
> >>choice. If
> >> you wait for your data to be indexed before moving on to the next
> >>Accumulo
> >> insert you can start lagging behind. Basically, you are crippling your
> >> ingest throughput by making it the slower of the two systems.
> >>
> >> It seems like a more microbatch (or batch) approach might be
> >>worthwhile--
> >> what you are trading is your text index lagging behind, but you keep
> >>your
> >> ingest throughput in Accumulo. I think Apache Blur does batch parallel
> >> indexing, which is why I was looking at it for this.
> >>
> >>
> >> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <ro...@gmail.com>
> >>wrote:
> >>
> >>> Yeah I think David's solution is the best. Though I like the idea of
> >>>having
> >>> a server side Constraint or hook that puts the updates into the queue.
> >>>
> >>> The Cassandra work I had seen actually tightly couples a Cassandra
> >>>node to
> >>> a Solr shard. So all the data that exists on that specific node also
> >>>exists
> >>> on that specific Solr shard. Would be pretty cool to do the same thing
> >>>with
> >>> a tablet server => local Solr shard.
> >>>
> >>>
> >>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets
> >>><da...@gmail.com>
> >>> wrote:
> >>>
> >>>> Ingest to a queue. Have two processes subscribe to the queue. One
> >>>> pushing into Accumulo and the other pushing into SolrCloud. Why
> >>>> tightly couple the capabilities?
> >>>>
> >>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
> >>>> wrote:
> >>>>> Is there a way to tie into the write process in Accumulo? Maybe just
> >>> use
> >>>> an
> >>>>> Iterator that worked on compaction to send data to blur/solr? I have
> >>> seen
> >>>>> something similar in Cassandra, a data hook to save data in Solr.
> >>>>>
> >>>>>
> >>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> We were trying to do so, but adding visibility while
> >>>>>>adding/searching
> >>>>>> documents needs lot more thinking. Adding visibility to core search
> >>>> engine
> >>>>>> needs changes to algorithm and that does not make it very scalable.
> >>>>>> Integration besides granular visibility is very doable. and we had
> >>> taken
> >>>>>> inspiration from Solandra.
> >>>>>>
> >>>>>> Obviously if we can get it done it adds lot of value. I believe
> >>>>>>Sqrrl
> >>>>>> people have already done it, are they thinking to open source it
> >>>> anytime in
> >>>>>> future?
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner
> >>>>>><dminer@clearedgeit.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> We briefly toyed with blur on accumulo but didnt get too far just
> >>>> because
> >>>>>>> it was obe. I think that would be cool.
> >>>>>>>
> >>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> It's definitely possible. I remember hearing about someone doing
> >>>> lucene
> >>>>>>> on top of Accumulo once, but I don't recall seeing a nice package
> >>>> with a
> >>>>>>> bow on top.
> >>>>>>>>
> >>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> >>>>>>>>> What lexical search package (like lucene/solr) has anyone put on
> >>>> top
> >>>>>> of
> >>>>>>> accumulo?  Is this possible or does everyone just index log files
> >>> and
> >>>>>>> documents?
> >>>>>>>>>
> >>>>>>>>> v/r
> >>>>>>>>> Bob Thorman
> >>>>>>>>> Principal Big Data Engineer
> >>>>>>>>> AT&T Big Data CoE
> >>>>>>>>> 2900 W. Plano Parkway
> >>>>>>>>> Plano, TX 75075
> >>>>>>>>> 972-658-1714
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Donald Miner
> >> Chief Technology Officer
> >> ClearEdge IT Solutions, LLC
> >> Cell: 443 799 7807
> >> www.clearedgeit.com
> >
>
>

Re: Search

Posted by "THORMAN, ROBERT D" <rt...@att.com>.
Search the terms (words, phases, sub-strings, combinations) of the row
values.  Lucene is an apache project that does document indexing on terms.

v/r
Bob Thorman
Principal Big Data Engineer
AT&T Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714






On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <ke...@ll.mit.edu>
wrote:

>What is meant by lexical search? Lucene style?
>
>http://www.lucenetutorial.com/lucene-query-syntax.html
>
>If so, these searches could be prioritized (not all are particularly
>useful), and it shouldn't be too hard to come up with recommended
>Accumulo approaches for the most important lexical searches.
>
>On Jul 24, 2014, at 10:44 AM, Donald Miner <dm...@clearedgeit.com> wrote:
>
>> One problem I ran into when thinking about this problem is throughput.
>>In
>> accumulo, we talk about tens or hundreds of thousands or millions of
>> records per second. A lot of these search solutions talk about hundreds
>>or
>> thousands of documents per second.
>> 
>> This problem that Accumulo is able to outpace just about anything lead
>>me
>> to think that some sort of microbatch solution might be the best
>>choice. If
>> you wait for your data to be indexed before moving on to the next
>>Accumulo
>> insert you can start lagging behind. Basically, you are crippling your
>> ingest throughput by making it the slower of the two systems.
>> 
>> It seems like a more microbatch (or batch) approach might be
>>worthwhile--
>> what you are trading is your text index lagging behind, but you keep
>>your
>> ingest throughput in Accumulo. I think Apache Blur does batch parallel
>> indexing, which is why I was looking at it for this.
>> 
>> 
>> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <ro...@gmail.com>
>>wrote:
>> 
>>> Yeah I think David's solution is the best. Though I like the idea of
>>>having
>>> a server side Constraint or hook that puts the updates into the queue.
>>> 
>>> The Cassandra work I had seen actually tightly couples a Cassandra
>>>node to
>>> a Solr shard. So all the data that exists on that specific node also
>>>exists
>>> on that specific Solr shard. Would be pretty cool to do the same thing
>>>with
>>> a tablet server => local Solr shard.
>>> 
>>> 
>>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets
>>><da...@gmail.com>
>>> wrote:
>>> 
>>>> Ingest to a queue. Have two processes subscribe to the queue. One
>>>> pushing into Accumulo and the other pushing into SolrCloud. Why
>>>> tightly couple the capabilities?
>>>> 
>>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
>>>> wrote:
>>>>> Is there a way to tie into the write process in Accumulo? Maybe just
>>> use
>>>> an
>>>>> Iterator that worked on compaction to send data to blur/solr? I have
>>> seen
>>>>> something similar in Cassandra, a data hook to save data in Solr.
>>>>> 
>>>>> 
>>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> We were trying to do so, but adding visibility while
>>>>>>adding/searching
>>>>>> documents needs lot more thinking. Adding visibility to core search
>>>> engine
>>>>>> needs changes to algorithm and that does not make it very scalable.
>>>>>> Integration besides granular visibility is very doable. and we had
>>> taken
>>>>>> inspiration from Solandra.
>>>>>> 
>>>>>> Obviously if we can get it done it adds lot of value. I believe
>>>>>>Sqrrl
>>>>>> people have already done it, are they thinking to open source it
>>>> anytime in
>>>>>> future?
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner
>>>>>><dminer@clearedgeit.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> We briefly toyed with blur on accumulo but didnt get too far just
>>>> because
>>>>>>> it was obe. I think that would be cool.
>>>>>>> 
>>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> It's definitely possible. I remember hearing about someone doing
>>>> lucene
>>>>>>> on top of Accumulo once, but I don't recall seeing a nice package
>>>> with a
>>>>>>> bow on top.
>>>>>>>> 
>>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>>>>>> What lexical search package (like lucene/solr) has anyone put on
>>>> top
>>>>>> of
>>>>>>> accumulo?  Is this possible or does everyone just index log files
>>> and
>>>>>>> documents?
>>>>>>>>> 
>>>>>>>>> v/r
>>>>>>>>> Bob Thorman
>>>>>>>>> Principal Big Data Engineer
>>>>>>>>> AT&T Big Data CoE
>>>>>>>>> 2900 W. Plano Parkway
>>>>>>>>> Plano, TX 75075
>>>>>>>>> 972-658-1714
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> 
>> Donald Miner
>> Chief Technology Officer
>> ClearEdge IT Solutions, LLC
>> Cell: 443 799 7807
>> www.clearedgeit.com
>


Re: Search

Posted by "Kepner, Jeremy - 0553 - MITLL" <ke...@ll.mit.edu>.
What is meant by lexical search? Lucene style?

http://www.lucenetutorial.com/lucene-query-syntax.html

If so, these searches could be prioritized (not all are particularly useful), and it shouldn't be too hard to come up with recommended Accumulo approaches for the most important lexical searches.

On Jul 24, 2014, at 10:44 AM, Donald Miner <dm...@clearedgeit.com> wrote:

> One problem I ran into when thinking about this problem is throughput. In
> accumulo, we talk about tens or hundreds of thousands or millions of
> records per second. A lot of these search solutions talk about hundreds or
> thousands of documents per second.
> 
> This problem that Accumulo is able to outpace just about anything lead me
> to think that some sort of microbatch solution might be the best choice. If
> you wait for your data to be indexed before moving on to the next Accumulo
> insert you can start lagging behind. Basically, you are crippling your
> ingest throughput by making it the slower of the two systems.
> 
> It seems like a more microbatch (or batch) approach might be worthwhile--
> what you are trading is your text index lagging behind, but you keep your
> ingest throughput in Accumulo. I think Apache Blur does batch parallel
> indexing, which is why I was looking at it for this.
> 
> 
> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <ro...@gmail.com> wrote:
> 
>> Yeah I think David's solution is the best. Though I like the idea of having
>> a server side Constraint or hook that puts the updates into the queue.
>> 
>> The Cassandra work I had seen actually tightly couples a Cassandra node to
>> a Solr shard. So all the data that exists on that specific node also exists
>> on that specific Solr shard. Would be pretty cool to do the same thing with
>> a tablet server => local Solr shard.
>> 
>> 
>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets <da...@gmail.com>
>> wrote:
>> 
>>> Ingest to a queue. Have two processes subscribe to the queue. One
>>> pushing into Accumulo and the other pushing into SolrCloud. Why
>>> tightly couple the capabilities?
>>> 
>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
>>> wrote:
>>>> Is there a way to tie into the write process in Accumulo? Maybe just
>> use
>>> an
>>>> Iterator that worked on compaction to send data to blur/solr? I have
>> seen
>>>> something similar in Cassandra, a data hook to save data in Solr.
>>>> 
>>>> 
>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com>
>> wrote:
>>>> 
>>>>> We were trying to do so, but adding visibility while adding/searching
>>>>> documents needs lot more thinking. Adding visibility to core search
>>> engine
>>>>> needs changes to algorithm and that does not make it very scalable.
>>>>> Integration besides granular visibility is very doable. and we had
>> taken
>>>>> inspiration from Solandra.
>>>>> 
>>>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>>>>> people have already done it, are they thinking to open source it
>>> anytime in
>>>>> future?
>>>>> 
>>>>> 
>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dminer@clearedgeit.com
>>> 
>>>>> wrote:
>>>>> 
>>>>>> We briefly toyed with blur on accumulo but didnt get too far just
>>> because
>>>>>> it was obe. I think that would be cool.
>>>>>> 
>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> It's definitely possible. I remember hearing about someone doing
>>> lucene
>>>>>> on top of Accumulo once, but I don't recall seeing a nice package
>>> with a
>>>>>> bow on top.
>>>>>>> 
>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>>>>> What lexical search package (like lucene/solr) has anyone put on
>>> top
>>>>> of
>>>>>> accumulo?  Is this possible or does everyone just index log files
>> and
>>>>>> documents?
>>>>>>>> 
>>>>>>>> v/r
>>>>>>>> Bob Thorman
>>>>>>>> Principal Big Data Engineer
>>>>>>>> AT&T Big Data CoE
>>>>>>>> 2900 W. Plano Parkway
>>>>>>>> Plano, TX 75075
>>>>>>>> 972-658-1714
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> 
> Donald Miner
> Chief Technology Officer
> ClearEdge IT Solutions, LLC
> Cell: 443 799 7807
> www.clearedgeit.com


Re: Search

Posted by Donald Miner <dm...@clearedgeit.com>.
One problem I ran into when thinking about this problem is throughput. In
accumulo, we talk about tens or hundreds of thousands or millions of
records per second. A lot of these search solutions talk about hundreds or
thousands of documents per second.

This problem that Accumulo is able to outpace just about anything lead me
to think that some sort of microbatch solution might be the best choice. If
you wait for your data to be indexed before moving on to the next Accumulo
insert you can start lagging behind. Basically, you are crippling your
ingest throughput by making it the slower of the two systems.

It seems like a more microbatch (or batch) approach might be worthwhile--
what you are trading is your text index lagging behind, but you keep your
ingest throughput in Accumulo. I think Apache Blur does batch parallel
indexing, which is why I was looking at it for this.


On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <ro...@gmail.com> wrote:

> Yeah I think David's solution is the best. Though I like the idea of having
> a server side Constraint or hook that puts the updates into the queue.
>
> The Cassandra work I had seen actually tightly couples a Cassandra node to
> a Solr shard. So all the data that exists on that specific node also exists
> on that specific Solr shard. Would be pretty cool to do the same thing with
> a tablet server => local Solr shard.
>
>
> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets <da...@gmail.com>
> wrote:
>
> > Ingest to a queue. Have two processes subscribe to the queue. One
> > pushing into Accumulo and the other pushing into SolrCloud. Why
> > tightly couple the capabilities?
> >
> > On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
> > wrote:
> > > Is there a way to tie into the write process in Accumulo? Maybe just
> use
> > an
> > > Iterator that worked on compaction to send data to blur/solr? I have
> seen
> > > something similar in Cassandra, a data hook to save data in Solr.
> > >
> > >
> > > On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com>
> wrote:
> > >
> > >> We were trying to do so, but adding visibility while adding/searching
> > >> documents needs lot more thinking. Adding visibility to core search
> > engine
> > >> needs changes to algorithm and that does not make it very scalable.
> > >> Integration besides granular visibility is very doable. and we had
> taken
> > >> inspiration from Solandra.
> > >>
> > >> Obviously if we can get it done it adds lot of value. I believe Sqrrl
> > >> people have already done it, are they thinking to open source it
> > anytime in
> > >> future?
> > >>
> > >>
> > >> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dminer@clearedgeit.com
> >
> > >> wrote:
> > >>
> > >> > We briefly toyed with blur on accumulo but didnt get too far just
> > because
> > >> > it was obe. I think that would be cool.
> > >> >
> > >> > > On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
> > wrote:
> > >> > >
> > >> > > It's definitely possible. I remember hearing about someone doing
> > lucene
> > >> > on top of Accumulo once, but I don't recall seeing a nice package
> > with a
> > >> > bow on top.
> > >> > >
> > >> > >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> > >> > >> What lexical search package (like lucene/solr) has anyone put on
> > top
> > >> of
> > >> > accumulo?  Is this possible or does everyone just index log files
> and
> > >> > documents?
> > >> > >>
> > >> > >> v/r
> > >> > >> Bob Thorman
> > >> > >> Principal Big Data Engineer
> > >> > >> AT&T Big Data CoE
> > >> > >> 2900 W. Plano Parkway
> > >> > >> Plano, TX 75075
> > >> > >> 972-658-1714
> > >> > >>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> >
>



-- 

Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

Re: Search

Posted by Roshan Punnoose <ro...@gmail.com>.
Yeah I think David's solution is the best. Though I like the idea of having
a server side Constraint or hook that puts the updates into the queue.

The Cassandra work I had seen actually tightly couples a Cassandra node to
a Solr shard. So all the data that exists on that specific node also exists
on that specific Solr shard. Would be pretty cool to do the same thing with
a tablet server => local Solr shard.


On Wed, Jul 23, 2014 at 6:09 PM, David Medinets <da...@gmail.com>
wrote:

> Ingest to a queue. Have two processes subscribe to the queue. One
> pushing into Accumulo and the other pushing into SolrCloud. Why
> tightly couple the capabilities?
>
> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
> wrote:
> > Is there a way to tie into the write process in Accumulo? Maybe just use
> an
> > Iterator that worked on compaction to send data to blur/solr? I have seen
> > something similar in Cassandra, a data hook to save data in Solr.
> >
> >
> > On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
> >
> >> We were trying to do so, but adding visibility while adding/searching
> >> documents needs lot more thinking. Adding visibility to core search
> engine
> >> needs changes to algorithm and that does not make it very scalable.
> >> Integration besides granular visibility is very doable. and we had taken
> >> inspiration from Solandra.
> >>
> >> Obviously if we can get it done it adds lot of value. I believe Sqrrl
> >> people have already done it, are they thinking to open source it
> anytime in
> >> future?
> >>
> >>
> >> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
> >> wrote:
> >>
> >> > We briefly toyed with blur on accumulo but didnt get too far just
> because
> >> > it was obe. I think that would be cool.
> >> >
> >> > > On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
> wrote:
> >> > >
> >> > > It's definitely possible. I remember hearing about someone doing
> lucene
> >> > on top of Accumulo once, but I don't recall seeing a nice package
> with a
> >> > bow on top.
> >> > >
> >> > >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> >> > >> What lexical search package (like lucene/solr) has anyone put on
> top
> >> of
> >> > accumulo?  Is this possible or does everyone just index log files and
> >> > documents?
> >> > >>
> >> > >> v/r
> >> > >> Bob Thorman
> >> > >> Principal Big Data Engineer
> >> > >> AT&T Big Data CoE
> >> > >> 2900 W. Plano Parkway
> >> > >> Plano, TX 75075
> >> > >> 972-658-1714
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
>

Re: Search

Posted by David Medinets <da...@gmail.com>.
Assigning a UUID during the queue placement might be a good idea.

On Thu, Jul 24, 2014 at 10:26 AM, THORMAN, ROBERT D <rt...@att.com> wrote:
> David,
>
> How would you tie the two streams back together so that data is not
> duplicated but referenced by SolrCloud into Accumulo?
>
> v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>
>
>
>
>
> On 7/23/14, 5:09 PM, "David Medinets" <da...@gmail.com> wrote:
>
>>Ingest to a queue. Have two processes subscribe to the queue. One
>>pushing into Accumulo and the other pushing into SolrCloud. Why
>>tightly couple the capabilities?
>>
>>On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
>>wrote:
>>> Is there a way to tie into the write process in Accumulo? Maybe just
>>>use an
>>> Iterator that worked on compaction to send data to blur/solr? I have
>>>seen
>>> something similar in Cassandra, a data hook to save data in Solr.
>>>
>>>
>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>>>
>>>> We were trying to do so, but adding visibility while adding/searching
>>>> documents needs lot more thinking. Adding visibility to core search
>>>>engine
>>>> needs changes to algorithm and that does not make it very scalable.
>>>> Integration besides granular visibility is very doable. and we had
>>>>taken
>>>> inspiration from Solandra.
>>>>
>>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>>>> people have already done it, are they thinking to open source it
>>>>anytime in
>>>> future?
>>>>
>>>>
>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>>>> wrote:
>>>>
>>>> > We briefly toyed with blur on accumulo but didnt get too far just
>>>>because
>>>> > it was obe. I think that would be cool.
>>>> >
>>>> > > On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
>>>>wrote:
>>>> > >
>>>> > > It's definitely possible. I remember hearing about someone doing
>>>>lucene
>>>> > on top of Accumulo once, but I don't recall seeing a nice package
>>>>with a
>>>> > bow on top.
>>>> > >
>>>> > >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>> > >> What lexical search package (like lucene/solr) has anyone put on
>>>>top
>>>> of
>>>> > accumulo?  Is this possible or does everyone just index log files and
>>>> > documents?
>>>> > >>
>>>> > >> v/r
>>>> > >> Bob Thorman
>>>> > >> Principal Big Data Engineer
>>>> > >> AT&T Big Data CoE
>>>> > >> 2900 W. Plano Parkway
>>>> > >> Plano, TX 75075
>>>> > >> 972-658-1714
>>>> > >>
>>>> > >>
>>>> > >>
>>>> >
>>>>
>

Re: Search

Posted by "THORMAN, ROBERT D" <rt...@att.com>.
David,

How would you tie the two streams back together so that data is not
duplicated but referenced by SolrCloud into Accumulo?

v/r
Bob Thorman
Principal Big Data Engineer
AT&T Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714






On 7/23/14, 5:09 PM, "David Medinets" <da...@gmail.com> wrote:

>Ingest to a queue. Have two processes subscribe to the queue. One
>pushing into Accumulo and the other pushing into SolrCloud. Why
>tightly couple the capabilities?
>
>On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com>
>wrote:
>> Is there a way to tie into the write process in Accumulo? Maybe just
>>use an
>> Iterator that worked on compaction to send data to blur/solr? I have
>>seen
>> something similar in Cassandra, a data hook to save data in Solr.
>>
>>
>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>>
>>> We were trying to do so, but adding visibility while adding/searching
>>> documents needs lot more thinking. Adding visibility to core search
>>>engine
>>> needs changes to algorithm and that does not make it very scalable.
>>> Integration besides granular visibility is very doable. and we had
>>>taken
>>> inspiration from Solandra.
>>>
>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>>> people have already done it, are they thinking to open source it
>>>anytime in
>>> future?
>>>
>>>
>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>>> wrote:
>>>
>>> > We briefly toyed with blur on accumulo but didnt get too far just
>>>because
>>> > it was obe. I think that would be cool.
>>> >
>>> > > On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com>
>>>wrote:
>>> > >
>>> > > It's definitely possible. I remember hearing about someone doing
>>>lucene
>>> > on top of Accumulo once, but I don't recall seeing a nice package
>>>with a
>>> > bow on top.
>>> > >
>>> > >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>> > >> What lexical search package (like lucene/solr) has anyone put on
>>>top
>>> of
>>> > accumulo?  Is this possible or does everyone just index log files and
>>> > documents?
>>> > >>
>>> > >> v/r
>>> > >> Bob Thorman
>>> > >> Principal Big Data Engineer
>>> > >> AT&T Big Data CoE
>>> > >> 2900 W. Plano Parkway
>>> > >> Plano, TX 75075
>>> > >> 972-658-1714
>>> > >>
>>> > >>
>>> > >>
>>> >
>>>


Re: Search

Posted by David Medinets <da...@gmail.com>.
Ingest to a queue. Have two processes subscribe to the queue. One
pushing into Accumulo and the other pushing into SolrCloud. Why
tightly couple the capabilities?

On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <ro...@gmail.com> wrote:
> Is there a way to tie into the write process in Accumulo? Maybe just use an
> Iterator that worked on compaction to send data to blur/solr? I have seen
> something similar in Cassandra, a data hook to save data in Solr.
>
>
> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:
>
>> We were trying to do so, but adding visibility while adding/searching
>> documents needs lot more thinking. Adding visibility to core search engine
>> needs changes to algorithm and that does not make it very scalable.
>> Integration besides granular visibility is very doable. and we had taken
>> inspiration from Solandra.
>>
>> Obviously if we can get it done it adds lot of value. I believe Sqrrl
>> people have already done it, are they thinking to open source it anytime in
>> future?
>>
>>
>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
>> wrote:
>>
>> > We briefly toyed with blur on accumulo but didnt get too far just because
>> > it was obe. I think that would be cool.
>> >
>> > > On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
>> > >
>> > > It's definitely possible. I remember hearing about someone doing lucene
>> > on top of Accumulo once, but I don't recall seeing a nice package with a
>> > bow on top.
>> > >
>> > >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>> > >> What lexical search package (like lucene/solr) has anyone put on top
>> of
>> > accumulo?  Is this possible or does everyone just index log files and
>> > documents?
>> > >>
>> > >> v/r
>> > >> Bob Thorman
>> > >> Principal Big Data Engineer
>> > >> AT&T Big Data CoE
>> > >> 2900 W. Plano Parkway
>> > >> Plano, TX 75075
>> > >> 972-658-1714
>> > >>
>> > >>
>> > >>
>> >
>>

Re: Search

Posted by Roshan Punnoose <ro...@gmail.com>.
Is there a way to tie into the write process in Accumulo? Maybe just use an
Iterator that worked on compaction to send data to blur/solr? I have seen
something similar in Cassandra, a data hook to save data in Solr.


On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <ne...@gmail.com> wrote:

> We were trying to do so, but adding visibility while adding/searching
> documents needs lot more thinking. Adding visibility to core search engine
> needs changes to algorithm and that does not make it very scalable.
> Integration besides granular visibility is very doable. and we had taken
> inspiration from Solandra.
>
> Obviously if we can get it done it adds lot of value. I believe Sqrrl
> people have already done it, are they thinking to open source it anytime in
> future?
>
>
> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
> wrote:
>
> > We briefly toyed with blur on accumulo but didnt get too far just because
> > it was obe. I think that would be cool.
> >
> > > On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
> > >
> > > It's definitely possible. I remember hearing about someone doing lucene
> > on top of Accumulo once, but I don't recall seeing a nice package with a
> > bow on top.
> > >
> > >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> > >> What lexical search package (like lucene/solr) has anyone put on top
> of
> > accumulo?  Is this possible or does everyone just index log files and
> > documents?
> > >>
> > >> v/r
> > >> Bob Thorman
> > >> Principal Big Data Engineer
> > >> AT&T Big Data CoE
> > >> 2900 W. Plano Parkway
> > >> Plano, TX 75075
> > >> 972-658-1714
> > >>
> > >>
> > >>
> >
>

Re: Search

Posted by Nehal Mehta <ne...@gmail.com>.
We were trying to do so, but adding visibility while adding/searching
documents needs lot more thinking. Adding visibility to core search engine
needs changes to algorithm and that does not make it very scalable.
Integration besides granular visibility is very doable. and we had taken
inspiration from Solandra.

Obviously if we can get it done it adds lot of value. I believe Sqrrl
people have already done it, are they thinking to open source it anytime in
future?


On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dm...@clearedgeit.com>
wrote:

> We briefly toyed with blur on accumulo but didnt get too far just because
> it was obe. I think that would be cool.
>
> > On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
> >
> > It's definitely possible. I remember hearing about someone doing lucene
> on top of Accumulo once, but I don't recall seeing a nice package with a
> bow on top.
> >
> >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> >> What lexical search package (like lucene/solr) has anyone put on top of
> accumulo?  Is this possible or does everyone just index log files and
> documents?
> >>
> >> v/r
> >> Bob Thorman
> >> Principal Big Data Engineer
> >> AT&T Big Data CoE
> >> 2900 W. Plano Parkway
> >> Plano, TX 75075
> >> 972-658-1714
> >>
> >>
> >>
>

Re: Search

Posted by Donald Miner <dm...@clearedgeit.com>.
We briefly toyed with blur on accumulo but didnt get too far just because it was obe. I think that would be cool. 

> On Jul 17, 2014, at 3:06 PM, Josh Elser <jo...@gmail.com> wrote:
> 
> It's definitely possible. I remember hearing about someone doing lucene on top of Accumulo once, but I don't recall seeing a nice package with a bow on top.
> 
>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>> What lexical search package (like lucene/solr) has anyone put on top of accumulo?  Is this possible or does everyone just index log files and documents?
>> 
>> v/r
>> Bob Thorman
>> Principal Big Data Engineer
>> AT&T Big Data CoE
>> 2900 W. Plano Parkway
>> Plano, TX 75075
>> 972-658-1714
>> 
>> 
>> 

Re: Search

Posted by Josh Elser <jo...@gmail.com>.
It's definitely possible. I remember hearing about someone doing lucene 
on top of Accumulo once, but I don't recall seeing a nice package with a 
bow on top.

On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> What lexical search package (like lucene/solr) has anyone put on top of accumulo?  Is this possible or does everyone just index log files and documents?
>
> v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>
>