You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ali Nazemian <al...@gmail.com> on 2014/07/23 11:17:45 UTC

integrating Accumulo with solr

Dear All,
Hi,
I was wondering is there anybody out there that tried to integrate Solr
with Accumulo? I was thinking about using Accumulo on top of HDFS and using
Solr to index data inside Accumulo? Do you have any idea how can I do such
integration?

Best regards.

-- 
A.Nazemian

integrating Accumulo with solr

Posted by madhvi <ma...@orkash.com>.

Hi,

I have created lucene indexes of data stored in accumulo in HDFS.
Lucene queries are working fine over that but I want to use those indexes to be searched via accumulo means the lucene queries should run via accumulo.Do you have any idea about that if it is related to what you are trying to do, somehow?

Madhvi

Re: integrating Accumulo with solr

Posted by Jack Krupansky <ja...@basetechnology.com>.

To be clear, I wasn't suggesting that Accumulo was the cause of integration 
complexity - EVERY NoSQL will have integration complexity of comparable 
magnitude. The advantage of DataStax Enterprise or Sqrrl Enterprise is that 
they have done the integration work for you.

-- Jack Krupansky

-----Original Message----- 
From: Ali Nazemian
Sent: Wednesday, July 30, 2014 2:53 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Sure,
Thank you very much for your guide. I think I am not that kind of gunslinger
and probably I will go for another NoSQL that can be integrated with
solr/elastic search much easier:)
Best regards.


On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> Right, and that's exactly what DataStax Enterprise provides (at great
> engineering effort!) - synchronization of database updates and search
> indexing. Sure, you can do it as well, but that's a significant 
> engineering
> challenge with both sides of the equation, and not a simple "plug and 
> play"
> configuration setting by writing a simple "connector."
>
> But, hey, if you consider yourself one of those "true hard-core
> gunslingers" then you'll be able to code that up in a weekend without any
> of our assistance, right?
>
> In short, synchronizing two data stores is a real challenge. Yes, it is
> doable, but... it is non-trivial. Especially if both stores are 
> distributed
> clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route
> instead of Solr.
>
> I'm certainly not suggesting that it can't be done. Just highlighting the
> challenge of such a task.
>
> Just to be clear, you are referring to "sync mode" and not mere "ETL",
> which people do all the time with batch scripts, Java extraction and
> ingestion connectors, and cron jobs.
>
> Give it a shot and let us know how it works out.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Ali Nazemian
> Sent: Sunday, July 27, 2014 1:20 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: integrating Accumulo with solr
>
> Dear Jack,
> Hi,
> One more thing to mention: I dont want to use solr or lucence for indexing
> accumulo or full text search inside that. I am looking for have both in a
> sync mode. I mean import some parts of data to solr for indexing. For this
> purpose probably I need something like trigger in RDBMS, I have to define
> something (probably with accumulo iterator) to import to solr on inserting
> new data.
> Regards.
>
> On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian <al...@gmail.com>
> wrote:
>
>  Dear Jack,
>> Actually I am going to do benefit-cost analysis for in-house developement
>> or going for sqrrl support.
>> Best regards.
>>
>>
>> On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky <jack@basetechnology.com
>> >
>> wrote:
>>
>>  Like I said, you're going to have to be a real, hard-core gunslinger to
>>> do that well. Sqrrl uses Lucene directly, BTW:
>>>
>>> "Full-Text Search: Utilizing open-source Lucene and custom indexing
>>> methods, Sqrrl Enterprise users can conduct real-time, full-text search
>>> across data in Sqrrl Enterprise."
>>>
>>> See:
>>> http://sqrrl.com/product/search/
>>>
>>> Out of curiosity, why are you not using that integrated Lucene support 
>>> of
>>> Sqrrl Enterprise?
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Ali Nazemian
>>> Sent: Thursday, July 24, 2014 3:07 PM
>>>
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: integrating Accumulo with solr
>>>
>>> Dear Jack,
>>> Thank you. I am aware of datastax but I am looking for integrating
>>> accumulo
>>> with solr. This is something like what sqrrl guys offer.
>>> Regards.
>>>
>>>
>>> On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <jack@basetechnology.com
>>> >
>>> wrote:
>>>
>>>  If you are not a "true hard-core gunslinger" who is willing to dive in
>>>
>>>> and
>>>> integrate the code yourself, instead you should give serious
>>>> consideration
>>>> to a product such as DataStax Enterprise that fully integrates and
>>>> packages
>>>> a NoSQL database (Cassandra) and Solr for search. The security aspects
>>>> are
>>>> still a work in progress, but certainly headed in the right direction.
>>>> And
>>>> it has Hadoop and Spark integration as well.
>>>>
>>>> See:
>>>> http://www.datastax.com/what-we-offer/products-services/
>>>> datastax-enterprise
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Ali Nazemian
>>>> Sent: Thursday, July 24, 2014 10:30 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: integrating Accumulo with solr
>>>>
>>>>
>>>> Thank you very much. Nice Idea but how can Solr and Accumulo can be
>>>> synchronized in this way?
>>>> I know that Solr can be integrated with HDFS and also Accumulo works on
>>>> the
>>>> top of HDFS. So can I use HDFS as integration point? I mean set Solr to
>>>> use
>>>> HDFS as a source of documents as well as the destination of documents.
>>>> Regards.
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com>
>>>> wrote:
>>>>
>>>>  Ali,
>>>>
>>>>
>>>>> Sounds like a good choice.  It's pretty standard to store the primary
>>>>> storage id as a field in Solr so that you can search the full text in
>>>>> Solr
>>>>> and then retrieve the full document elsewhere.
>>>>>
>>>>> I would recommend creating a document structure in Solr with whatever
>>>>> fields you want indexed (most likely as text_en, etc.), and then store
>>>>> a
>>>>> "string" field named "content_id", which would be the Accumulo row id
>>>>> that
>>>>> you look up with a scan.
>>>>>
>>>>> One caveat -- Accumulo will be protected at the cell level, but if you
>>>>> need
>>>>> your Solr search results to be protected by complex authorization
>>>>> strings
>>>>> similar to Accumulo, you will need to write your own QParserPlugin and
>>>>> use
>>>>> post filtering:
>>>>> http://java.dzone.com/articles/custom-security-filtering-solr
>>>>>
>>>>> The code you see in that article is written for an earlier version of
>>>>> Solr,
>>>>> but it's not too difficult to adjust it for the latest (we've done so
>>>>> in
>>>>> our project).  Once you've implemented this, you would store an
>>>>> "authorizations" string field in each Solr document, and pass in the
>>>>> authorizations that the user has access to in the fq parameter of 
>>>>> every
>>>>> query.  It's also not too bad to write something that parses the
>>>>> Accumulo
>>>>> authorizations string (like A&B&(C|D|E|F)) and interpret it 
>>>>> accordingly
>>>>> in
>>>>> the QParserPlugin.
>>>>>
>>>>> This will give you true row level security in Solr and Accumulo, and 
>>>>> it
>>>>> performs quite well in Solr.
>>>>>
>>>>> Let me know if you have any other questions.
>>>>>
>>>>> Joe
>>>>>
>>>>>
>>>>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Dear Joe,
>>>>> > Hi,
>>>>> > I am going to store the crawl web pages in accumulo as the main
>>>>> storage
>>>>> > part of my project and I need to give these data to solr for 
>>>>> > indexing
>>>>> >
>>>>> and
>>>>> > user searches. I need to do some social and web analysis on my data
>>>>> > as
>>>>> well
>>>>> > as having some security features. Therefore accumulo is my choice 
>>>>> > for
>>>>> >
>>>>> the
>>>>> > database part and for index and search I am going to use Solr. Would
>>>>> > you
>>>>> > please guide me through that?
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > > We store data in both Solr and Accumulo -- do you have more 
>>>>> > > details
>>>>> about
>>>>> > > what kind of data and indexing you want?  Is there a reason you're
>>>>> > thinking
>>>>> > > of using both databases in particular?
>>>>> > >
>>>>> > >
>>>>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <
>>>>> alinazemian@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > Dear All,
>>>>> > > > Hi,
>>>>> > > > I was wondering is there anybody out there that tried to > > >
>>>>> integrate
>>>>> Solr
>>>>> > > > with Accumulo? I was thinking about using Accumulo on top of 
>>>>> > > > HDFS
>>>>> >
>>>>> > > and
>>>>> > > using
>>>>> > > > Solr to index data inside Accumulo? Do you have any idea how can
>>>>> > > > I
>>>>> > > > do
>>>>> > > such
>>>>> > > > integration?
>>>>> > > >
>>>>> > > > Best regards.
>>>>> > > >
>>>>> > > > --
>>>>> > > > A.Nazemian
>>>>> > > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > I know what it is to be in need, and I know what it is to have > >
>>>>> plenty.
>>>>>  I
>>>>> > > have learned the secret of being content in any and every > >
>>>>> situation,
>>>>> > > whether well fed or hungry, whether living in plenty or in want. 
>>>>> > > I
>>>>> >
>>>>> > can
>>>>> > do
>>>>> > > all this through him who gives me strength.    *-Philippians
>>>>> 4:12-13*
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > A.Nazemian
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> I know what it is to be in need, and I know what it is to have plenty.
>>>>>  I
>>>>> have learned the secret of being content in any and every situation,
>>>>> whether well fed or hungry, whether living in plenty or in want.  I 
>>>>> can
>>>>> do
>>>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> A.Nazemian
>>>>
>>>>
>>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Ali Nazemian <al...@gmail.com>.

Sure,
Thank you very much for your guide. I think I am not that kind of gunslinger
and probably I will go for another NoSQL that can be integrated with
solr/elastic search much easier:)
Best regards.


On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> Right, and that's exactly what DataStax Enterprise provides (at great
> engineering effort!) - synchronization of database updates and search
> indexing. Sure, you can do it as well, but that's a significant engineering
> challenge with both sides of the equation, and not a simple "plug and play"
> configuration setting by writing a simple "connector."
>
> But, hey, if you consider yourself one of those "true hard-core
> gunslingers" then you'll be able to code that up in a weekend without any
> of our assistance, right?
>
> In short, synchronizing two data stores is a real challenge. Yes, it is
> doable, but... it is non-trivial. Especially if both stores are distributed
> clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route
> instead of Solr.
>
> I'm certainly not suggesting that it can't be done. Just highlighting the
> challenge of such a task.
>
> Just to be clear, you are referring to "sync mode" and not mere "ETL",
> which people do all the time with batch scripts, Java extraction and
> ingestion connectors, and cron jobs.
>
> Give it a shot and let us know how it works out.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Ali Nazemian
> Sent: Sunday, July 27, 2014 1:20 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: integrating Accumulo with solr
>
> Dear Jack,
> Hi,
> One more thing to mention: I dont want to use solr or lucence for indexing
> accumulo or full text search inside that. I am looking for have both in a
> sync mode. I mean import some parts of data to solr for indexing. For this
> purpose probably I need something like trigger in RDBMS, I have to define
> something (probably with accumulo iterator) to import to solr on inserting
> new data.
> Regards.
>
> On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian <al...@gmail.com>
> wrote:
>
>  Dear Jack,
>> Actually I am going to do benefit-cost analysis for in-house developement
>> or going for sqrrl support.
>> Best regards.
>>
>>
>> On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky <jack@basetechnology.com
>> >
>> wrote:
>>
>>  Like I said, you're going to have to be a real, hard-core gunslinger to
>>> do that well. Sqrrl uses Lucene directly, BTW:
>>>
>>> "Full-Text Search: Utilizing open-source Lucene and custom indexing
>>> methods, Sqrrl Enterprise users can conduct real-time, full-text search
>>> across data in Sqrrl Enterprise."
>>>
>>> See:
>>> http://sqrrl.com/product/search/
>>>
>>> Out of curiosity, why are you not using that integrated Lucene support of
>>> Sqrrl Enterprise?
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Ali Nazemian
>>> Sent: Thursday, July 24, 2014 3:07 PM
>>>
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: integrating Accumulo with solr
>>>
>>> Dear Jack,
>>> Thank you. I am aware of datastax but I am looking for integrating
>>> accumulo
>>> with solr. This is something like what sqrrl guys offer.
>>> Regards.
>>>
>>>
>>> On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <jack@basetechnology.com
>>> >
>>> wrote:
>>>
>>>  If you are not a "true hard-core gunslinger" who is willing to dive in
>>>
>>>> and
>>>> integrate the code yourself, instead you should give serious
>>>> consideration
>>>> to a product such as DataStax Enterprise that fully integrates and
>>>> packages
>>>> a NoSQL database (Cassandra) and Solr for search. The security aspects
>>>> are
>>>> still a work in progress, but certainly headed in the right direction.
>>>> And
>>>> it has Hadoop and Spark integration as well.
>>>>
>>>> See:
>>>> http://www.datastax.com/what-we-offer/products-services/
>>>> datastax-enterprise
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Ali Nazemian
>>>> Sent: Thursday, July 24, 2014 10:30 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: integrating Accumulo with solr
>>>>
>>>>
>>>> Thank you very much. Nice Idea but how can Solr and Accumulo can be
>>>> synchronized in this way?
>>>> I know that Solr can be integrated with HDFS and also Accumulo works on
>>>> the
>>>> top of HDFS. So can I use HDFS as integration point? I mean set Solr to
>>>> use
>>>> HDFS as a source of documents as well as the destination of documents.
>>>> Regards.
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com>
>>>> wrote:
>>>>
>>>>  Ali,
>>>>
>>>>
>>>>> Sounds like a good choice.  It's pretty standard to store the primary
>>>>> storage id as a field in Solr so that you can search the full text in
>>>>> Solr
>>>>> and then retrieve the full document elsewhere.
>>>>>
>>>>> I would recommend creating a document structure in Solr with whatever
>>>>> fields you want indexed (most likely as text_en, etc.), and then store
>>>>> a
>>>>> "string" field named "content_id", which would be the Accumulo row id
>>>>> that
>>>>> you look up with a scan.
>>>>>
>>>>> One caveat -- Accumulo will be protected at the cell level, but if you
>>>>> need
>>>>> your Solr search results to be protected by complex authorization
>>>>> strings
>>>>> similar to Accumulo, you will need to write your own QParserPlugin and
>>>>> use
>>>>> post filtering:
>>>>> http://java.dzone.com/articles/custom-security-filtering-solr
>>>>>
>>>>> The code you see in that article is written for an earlier version of
>>>>> Solr,
>>>>> but it's not too difficult to adjust it for the latest (we've done so
>>>>> in
>>>>> our project).  Once you've implemented this, you would store an
>>>>> "authorizations" string field in each Solr document, and pass in the
>>>>> authorizations that the user has access to in the fq parameter of every
>>>>> query.  It's also not too bad to write something that parses the
>>>>> Accumulo
>>>>> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly
>>>>> in
>>>>> the QParserPlugin.
>>>>>
>>>>> This will give you true row level security in Solr and Accumulo, and it
>>>>> performs quite well in Solr.
>>>>>
>>>>> Let me know if you have any other questions.
>>>>>
>>>>> Joe
>>>>>
>>>>>
>>>>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Dear Joe,
>>>>> > Hi,
>>>>> > I am going to store the crawl web pages in accumulo as the main
>>>>> storage
>>>>> > part of my project and I need to give these data to solr for indexing
>>>>> >
>>>>> and
>>>>> > user searches. I need to do some social and web analysis on my data
>>>>> > as
>>>>> well
>>>>> > as having some security features. Therefore accumulo is my choice for
>>>>> >
>>>>> the
>>>>> > database part and for index and search I am going to use Solr. Would
>>>>> > you
>>>>> > please guide me through that?
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > > We store data in both Solr and Accumulo -- do you have more details
>>>>> about
>>>>> > > what kind of data and indexing you want?  Is there a reason you're
>>>>> > thinking
>>>>> > > of using both databases in particular?
>>>>> > >
>>>>> > >
>>>>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <
>>>>> alinazemian@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > Dear All,
>>>>> > > > Hi,
>>>>> > > > I was wondering is there anybody out there that tried to > > >
>>>>> integrate
>>>>> Solr
>>>>> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS
>>>>> >
>>>>> > > and
>>>>> > > using
>>>>> > > > Solr to index data inside Accumulo? Do you have any idea how can
>>>>> > > > I
>>>>> > > > do
>>>>> > > such
>>>>> > > > integration?
>>>>> > > >
>>>>> > > > Best regards.
>>>>> > > >
>>>>> > > > --
>>>>> > > > A.Nazemian
>>>>> > > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > I know what it is to be in need, and I know what it is to have > >
>>>>> plenty.
>>>>>  I
>>>>> > > have learned the secret of being content in any and every > >
>>>>> situation,
>>>>> > > whether well fed or hungry, whether living in plenty or in want.  I
>>>>> >
>>>>> > can
>>>>> > do
>>>>> > > all this through him who gives me strength.    *-Philippians
>>>>> 4:12-13*
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > A.Nazemian
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> I know what it is to be in need, and I know what it is to have plenty.
>>>>>  I
>>>>> have learned the secret of being content in any and every situation,
>>>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>>>> do
>>>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> A.Nazemian
>>>>
>>>>
>>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Jack Krupansky <ja...@basetechnology.com>.

Right, and that's exactly what DataStax Enterprise provides (at great 
engineering effort!) - synchronization of database updates and search 
indexing. Sure, you can do it as well, but that's a significant engineering 
challenge with both sides of the equation, and not a simple "plug and play" 
configuration setting by writing a simple "connector."

But, hey, if you consider yourself one of those "true hard-core gunslingers" 
then you'll be able to code that up in a weekend without any of our 
assistance, right?

In short, synchronizing two data stores is a real challenge. Yes, it is 
doable, but... it is non-trivial. Especially if both stores are distributed 
clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route 
instead of Solr.

I'm certainly not suggesting that it can't be done. Just highlighting the 
challenge of such a task.

Just to be clear, you are referring to "sync mode" and not mere "ETL", which 
people do all the time with batch scripts, Java extraction and ingestion 
connectors, and cron jobs.

Give it a shot and let us know how it works out.

-- Jack Krupansky

-----Original Message----- 
From: Ali Nazemian
Sent: Sunday, July 27, 2014 1:20 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Hi,
One more thing to mention: I dont want to use solr or lucence for indexing
accumulo or full text search inside that. I am looking for have both in a
sync mode. I mean import some parts of data to solr for indexing. For this
purpose probably I need something like trigger in RDBMS, I have to define
something (probably with accumulo iterator) to import to solr on inserting
new data.
Regards.

On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian <al...@gmail.com>
wrote:

> Dear Jack,
> Actually I am going to do benefit-cost analysis for in-house developement
> or going for sqrrl support.
> Best regards.
>
>
> On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>
>> Like I said, you're going to have to be a real, hard-core gunslinger to
>> do that well. Sqrrl uses Lucene directly, BTW:
>>
>> "Full-Text Search: Utilizing open-source Lucene and custom indexing
>> methods, Sqrrl Enterprise users can conduct real-time, full-text search
>> across data in Sqrrl Enterprise."
>>
>> See:
>> http://sqrrl.com/product/search/
>>
>> Out of curiosity, why are you not using that integrated Lucene support of
>> Sqrrl Enterprise?
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Ali Nazemian
>> Sent: Thursday, July 24, 2014 3:07 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: integrating Accumulo with solr
>>
>> Dear Jack,
>> Thank you. I am aware of datastax but I am looking for integrating
>> accumulo
>> with solr. This is something like what sqrrl guys offer.
>> Regards.
>>
>>
>> On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <ja...@basetechnology.com>
>> wrote:
>>
>>  If you are not a "true hard-core gunslinger" who is willing to dive in
>>> and
>>> integrate the code yourself, instead you should give serious
>>> consideration
>>> to a product such as DataStax Enterprise that fully integrates and
>>> packages
>>> a NoSQL database (Cassandra) and Solr for search. The security aspects
>>> are
>>> still a work in progress, but certainly headed in the right direction.
>>> And
>>> it has Hadoop and Spark integration as well.
>>>
>>> See:
>>> http://www.datastax.com/what-we-offer/products-services/
>>> datastax-enterprise
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Ali Nazemian
>>> Sent: Thursday, July 24, 2014 10:30 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: integrating Accumulo with solr
>>>
>>>
>>> Thank you very much. Nice Idea but how can Solr and Accumulo can be
>>> synchronized in this way?
>>> I know that Solr can be integrated with HDFS and also Accumulo works on
>>> the
>>> top of HDFS. So can I use HDFS as integration point? I mean set Solr to
>>> use
>>> HDFS as a source of documents as well as the destination of documents.
>>> Regards.
>>>
>>>
>>> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com> wrote:
>>>
>>>  Ali,
>>>
>>>>
>>>> Sounds like a good choice.  It's pretty standard to store the primary
>>>> storage id as a field in Solr so that you can search the full text in
>>>> Solr
>>>> and then retrieve the full document elsewhere.
>>>>
>>>> I would recommend creating a document structure in Solr with whatever
>>>> fields you want indexed (most likely as text_en, etc.), and then store 
>>>> a
>>>> "string" field named "content_id", which would be the Accumulo row id
>>>> that
>>>> you look up with a scan.
>>>>
>>>> One caveat -- Accumulo will be protected at the cell level, but if you
>>>> need
>>>> your Solr search results to be protected by complex authorization
>>>> strings
>>>> similar to Accumulo, you will need to write your own QParserPlugin and
>>>> use
>>>> post filtering:
>>>> http://java.dzone.com/articles/custom-security-filtering-solr
>>>>
>>>> The code you see in that article is written for an earlier version of
>>>> Solr,
>>>> but it's not too difficult to adjust it for the latest (we've done so 
>>>> in
>>>> our project).  Once you've implemented this, you would store an
>>>> "authorizations" string field in each Solr document, and pass in the
>>>> authorizations that the user has access to in the fq parameter of every
>>>> query.  It's also not too bad to write something that parses the
>>>> Accumulo
>>>> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly
>>>> in
>>>> the QParserPlugin.
>>>>
>>>> This will give you true row level security in Solr and Accumulo, and it
>>>> performs quite well in Solr.
>>>>
>>>> Let me know if you have any other questions.
>>>>
>>>> Joe
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
>>>> wrote:
>>>>
>>>> > Dear Joe,
>>>> > Hi,
>>>> > I am going to store the crawl web pages in accumulo as the main
>>>> storage
>>>> > part of my project and I need to give these data to solr for indexing
>>>> >
>>>> and
>>>> > user searches. I need to do some social and web analysis on my data 
>>>> > as
>>>> well
>>>> > as having some security features. Therefore accumulo is my choice for
>>>> >
>>>> the
>>>> > database part and for index and search I am going to use Solr. Would
>>>> > you
>>>> > please guide me through that?
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com>
>>>> wrote:
>>>> >
>>>> > > We store data in both Solr and Accumulo -- do you have more details
>>>> about
>>>> > > what kind of data and indexing you want?  Is there a reason you're
>>>> > thinking
>>>> > > of using both databases in particular?
>>>> > >
>>>> > >
>>>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <
>>>> alinazemian@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > > Dear All,
>>>> > > > Hi,
>>>> > > > I was wondering is there anybody out there that tried to 
>>>> > > > integrate
>>>> Solr
>>>> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS
>>>> >
>>>> > > and
>>>> > > using
>>>> > > > Solr to index data inside Accumulo? Do you have any idea how can 
>>>> > > > I
>>>> > > > do
>>>> > > such
>>>> > > > integration?
>>>> > > >
>>>> > > > Best regards.
>>>> > > >
>>>> > > > --
>>>> > > > A.Nazemian
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > I know what it is to be in need, and I know what it is to have > >
>>>> plenty.
>>>>  I
>>>> > > have learned the secret of being content in any and every 
>>>> > > situation,
>>>> > > whether well fed or hungry, whether living in plenty or in want.  I
>>>> >
>>>> > can
>>>> > do
>>>> > > all this through him who gives me strength.    *-Philippians
>>>> 4:12-13*
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > A.Nazemian
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> I know what it is to be in need, and I know what it is to have plenty.
>>>>  I
>>>> have learned the secret of being content in any and every situation,
>>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>>> do
>>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>>>
>>>>
>>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Ali Nazemian <al...@gmail.com>.

Dear Jack,
Hi,
One more thing to mention: I dont want to use solr or lucence for indexing
accumulo or full text search inside that. I am looking for have both in a
sync mode. I mean import some parts of data to solr for indexing. For this
purpose probably I need something like trigger in RDBMS, I have to define
something (probably with accumulo iterator) to import to solr on inserting
new data.
Regards.

On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian <al...@gmail.com>
wrote:

> Dear Jack,
> Actually I am going to do benefit-cost analysis for in-house developement
> or going for sqrrl support.
> Best regards.
>
>
> On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>
>> Like I said, you're going to have to be a real, hard-core gunslinger to
>> do that well. Sqrrl uses Lucene directly, BTW:
>>
>> "Full-Text Search: Utilizing open-source Lucene and custom indexing
>> methods, Sqrrl Enterprise users can conduct real-time, full-text search
>> across data in Sqrrl Enterprise."
>>
>> See:
>> http://sqrrl.com/product/search/
>>
>> Out of curiosity, why are you not using that integrated Lucene support of
>> Sqrrl Enterprise?
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Ali Nazemian
>> Sent: Thursday, July 24, 2014 3:07 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: integrating Accumulo with solr
>>
>> Dear Jack,
>> Thank you. I am aware of datastax but I am looking for integrating
>> accumulo
>> with solr. This is something like what sqrrl guys offer.
>> Regards.
>>
>>
>> On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <ja...@basetechnology.com>
>> wrote:
>>
>>  If you are not a "true hard-core gunslinger" who is willing to dive in
>>> and
>>> integrate the code yourself, instead you should give serious
>>> consideration
>>> to a product such as DataStax Enterprise that fully integrates and
>>> packages
>>> a NoSQL database (Cassandra) and Solr for search. The security aspects
>>> are
>>> still a work in progress, but certainly headed in the right direction.
>>> And
>>> it has Hadoop and Spark integration as well.
>>>
>>> See:
>>> http://www.datastax.com/what-we-offer/products-services/
>>> datastax-enterprise
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Ali Nazemian
>>> Sent: Thursday, July 24, 2014 10:30 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: integrating Accumulo with solr
>>>
>>>
>>> Thank you very much. Nice Idea but how can Solr and Accumulo can be
>>> synchronized in this way?
>>> I know that Solr can be integrated with HDFS and also Accumulo works on
>>> the
>>> top of HDFS. So can I use HDFS as integration point? I mean set Solr to
>>> use
>>> HDFS as a source of documents as well as the destination of documents.
>>> Regards.
>>>
>>>
>>> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com> wrote:
>>>
>>>  Ali,
>>>
>>>>
>>>> Sounds like a good choice.  It's pretty standard to store the primary
>>>> storage id as a field in Solr so that you can search the full text in
>>>> Solr
>>>> and then retrieve the full document elsewhere.
>>>>
>>>> I would recommend creating a document structure in Solr with whatever
>>>> fields you want indexed (most likely as text_en, etc.), and then store a
>>>> "string" field named "content_id", which would be the Accumulo row id
>>>> that
>>>> you look up with a scan.
>>>>
>>>> One caveat -- Accumulo will be protected at the cell level, but if you
>>>> need
>>>> your Solr search results to be protected by complex authorization
>>>> strings
>>>> similar to Accumulo, you will need to write your own QParserPlugin and
>>>> use
>>>> post filtering:
>>>> http://java.dzone.com/articles/custom-security-filtering-solr
>>>>
>>>> The code you see in that article is written for an earlier version of
>>>> Solr,
>>>> but it's not too difficult to adjust it for the latest (we've done so in
>>>> our project).  Once you've implemented this, you would store an
>>>> "authorizations" string field in each Solr document, and pass in the
>>>> authorizations that the user has access to in the fq parameter of every
>>>> query.  It's also not too bad to write something that parses the
>>>> Accumulo
>>>> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly
>>>> in
>>>> the QParserPlugin.
>>>>
>>>> This will give you true row level security in Solr and Accumulo, and it
>>>> performs quite well in Solr.
>>>>
>>>> Let me know if you have any other questions.
>>>>
>>>> Joe
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
>>>> wrote:
>>>>
>>>> > Dear Joe,
>>>> > Hi,
>>>> > I am going to store the crawl web pages in accumulo as the main
>>>> storage
>>>> > part of my project and I need to give these data to solr for indexing
>>>> >
>>>> and
>>>> > user searches. I need to do some social and web analysis on my data as
>>>> well
>>>> > as having some security features. Therefore accumulo is my choice for
>>>> >
>>>> the
>>>> > database part and for index and search I am going to use Solr. Would
>>>> > you
>>>> > please guide me through that?
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com>
>>>> wrote:
>>>> >
>>>> > > We store data in both Solr and Accumulo -- do you have more details
>>>> about
>>>> > > what kind of data and indexing you want?  Is there a reason you're
>>>> > thinking
>>>> > > of using both databases in particular?
>>>> > >
>>>> > >
>>>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <
>>>> alinazemian@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > > Dear All,
>>>> > > > Hi,
>>>> > > > I was wondering is there anybody out there that tried to integrate
>>>> Solr
>>>> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS
>>>> >
>>>> > > and
>>>> > > using
>>>> > > > Solr to index data inside Accumulo? Do you have any idea how can I
>>>> > > > do
>>>> > > such
>>>> > > > integration?
>>>> > > >
>>>> > > > Best regards.
>>>> > > >
>>>> > > > --
>>>> > > > A.Nazemian
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > I know what it is to be in need, and I know what it is to have > >
>>>> plenty.
>>>>  I
>>>> > > have learned the secret of being content in any and every situation,
>>>> > > whether well fed or hungry, whether living in plenty or in want.  I
>>>> >
>>>> > can
>>>> > do
>>>> > > all this through him who gives me strength.    *-Philippians
>>>> 4:12-13*
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > A.Nazemian
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> I know what it is to be in need, and I know what it is to have plenty.
>>>>  I
>>>> have learned the secret of being content in any and every situation,
>>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>>> do
>>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>>>
>>>>
>>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Ali Nazemian <al...@gmail.com>.

Dear Jack,
Actually I am going to do benefit-cost analysis for in-house developement
or going for sqrrl support.
Best regards.


On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> Like I said, you're going to have to be a real, hard-core gunslinger to do
> that well. Sqrrl uses Lucene directly, BTW:
>
> "Full-Text Search: Utilizing open-source Lucene and custom indexing
> methods, Sqrrl Enterprise users can conduct real-time, full-text search
> across data in Sqrrl Enterprise."
>
> See:
> http://sqrrl.com/product/search/
>
> Out of curiosity, why are you not using that integrated Lucene support of
> Sqrrl Enterprise?
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Ali Nazemian
> Sent: Thursday, July 24, 2014 3:07 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: integrating Accumulo with solr
>
> Dear Jack,
> Thank you. I am aware of datastax but I am looking for integrating accumulo
> with solr. This is something like what sqrrl guys offer.
> Regards.
>
>
> On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>
>  If you are not a "true hard-core gunslinger" who is willing to dive in and
>> integrate the code yourself, instead you should give serious consideration
>> to a product such as DataStax Enterprise that fully integrates and
>> packages
>> a NoSQL database (Cassandra) and Solr for search. The security aspects are
>> still a work in progress, but certainly headed in the right direction. And
>> it has Hadoop and Spark integration as well.
>>
>> See:
>> http://www.datastax.com/what-we-offer/products-services/
>> datastax-enterprise
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Ali Nazemian
>> Sent: Thursday, July 24, 2014 10:30 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: integrating Accumulo with solr
>>
>>
>> Thank you very much. Nice Idea but how can Solr and Accumulo can be
>> synchronized in this way?
>> I know that Solr can be integrated with HDFS and also Accumulo works on
>> the
>> top of HDFS. So can I use HDFS as integration point? I mean set Solr to
>> use
>> HDFS as a source of documents as well as the destination of documents.
>> Regards.
>>
>>
>> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com> wrote:
>>
>>  Ali,
>>
>>>
>>> Sounds like a good choice.  It's pretty standard to store the primary
>>> storage id as a field in Solr so that you can search the full text in
>>> Solr
>>> and then retrieve the full document elsewhere.
>>>
>>> I would recommend creating a document structure in Solr with whatever
>>> fields you want indexed (most likely as text_en, etc.), and then store a
>>> "string" field named "content_id", which would be the Accumulo row id
>>> that
>>> you look up with a scan.
>>>
>>> One caveat -- Accumulo will be protected at the cell level, but if you
>>> need
>>> your Solr search results to be protected by complex authorization strings
>>> similar to Accumulo, you will need to write your own QParserPlugin and
>>> use
>>> post filtering:
>>> http://java.dzone.com/articles/custom-security-filtering-solr
>>>
>>> The code you see in that article is written for an earlier version of
>>> Solr,
>>> but it's not too difficult to adjust it for the latest (we've done so in
>>> our project).  Once you've implemented this, you would store an
>>> "authorizations" string field in each Solr document, and pass in the
>>> authorizations that the user has access to in the fq parameter of every
>>> query.  It's also not too bad to write something that parses the Accumulo
>>> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly
>>> in
>>> the QParserPlugin.
>>>
>>> This will give you true row level security in Solr and Accumulo, and it
>>> performs quite well in Solr.
>>>
>>> Let me know if you have any other questions.
>>>
>>> Joe
>>>
>>>
>>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
>>> wrote:
>>>
>>> > Dear Joe,
>>> > Hi,
>>> > I am going to store the crawl web pages in accumulo as the main storage
>>> > part of my project and I need to give these data to solr for indexing >
>>> and
>>> > user searches. I need to do some social and web analysis on my data as
>>> well
>>> > as having some security features. Therefore accumulo is my choice for >
>>> the
>>> > database part and for index and search I am going to use Solr. Would >
>>> you
>>> > please guide me through that?
>>> >
>>> >
>>> >
>>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com>
>>> wrote:
>>> >
>>> > > We store data in both Solr and Accumulo -- do you have more details
>>> about
>>> > > what kind of data and indexing you want?  Is there a reason you're
>>> > thinking
>>> > > of using both databases in particular?
>>> > >
>>> > >
>>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <alinazemian@gmail.com
>>> >
>>> > > wrote:
>>> > >
>>> > > > Dear All,
>>> > > > Hi,
>>> > > > I was wondering is there anybody out there that tried to integrate
>>> Solr
>>> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS >
>>> > > and
>>> > > using
>>> > > > Solr to index data inside Accumulo? Do you have any idea how can I
>>> > > > do
>>> > > such
>>> > > > integration?
>>> > > >
>>> > > > Best regards.
>>> > > >
>>> > > > --
>>> > > > A.Nazemian
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > I know what it is to be in need, and I know what it is to have > >
>>> plenty.
>>>  I
>>> > > have learned the secret of being content in any and every situation,
>>> > > whether well fed or hungry, whether living in plenty or in want.  I >
>>> > can
>>> > do
>>> > > all this through him who gives me strength.    *-Philippians 4:12-13*
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > A.Nazemian
>>> >
>>>
>>>
>>>
>>> --
>>> I know what it is to be in need, and I know what it is to have plenty.  I
>>> have learned the secret of being content in any and every situation,
>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>> do
>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>>
>>>
>>>
>>
>> --
>> A.Nazemian
>>
>>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Jack Krupansky <ja...@basetechnology.com>.

Like I said, you're going to have to be a real, hard-core gunslinger to do 
that well. Sqrrl uses Lucene directly, BTW:

"Full-Text Search: Utilizing open-source Lucene and custom indexing methods, 
Sqrrl Enterprise users can conduct real-time, full-text search across data 
in Sqrrl Enterprise."

See:
http://sqrrl.com/product/search/

Out of curiosity, why are you not using that integrated Lucene support of 
Sqrrl Enterprise?

-- Jack Krupansky

-----Original Message----- 
From: Ali Nazemian
Sent: Thursday, July 24, 2014 3:07 PM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> If you are not a "true hard-core gunslinger" who is willing to dive in and
> integrate the code yourself, instead you should give serious consideration
> to a product such as DataStax Enterprise that fully integrates and 
> packages
> a NoSQL database (Cassandra) and Solr for search. The security aspects are
> still a work in progress, but certainly headed in the right direction. And
> it has Hadoop and Spark integration as well.
>
> See:
> http://www.datastax.com/what-we-offer/products-services/
> datastax-enterprise
>
> -- Jack Krupansky
>
> -----Original Message----- From: Ali Nazemian
> Sent: Thursday, July 24, 2014 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: integrating Accumulo with solr
>
>
> Thank you very much. Nice Idea but how can Solr and Accumulo can be
> synchronized in this way?
> I know that Solr can be integrated with HDFS and also Accumulo works on 
> the
> top of HDFS. So can I use HDFS as integration point? I mean set Solr to 
> use
> HDFS as a source of documents as well as the destination of documents.
> Regards.
>
>
> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com> wrote:
>
>  Ali,
>>
>> Sounds like a good choice.  It's pretty standard to store the primary
>> storage id as a field in Solr so that you can search the full text in 
>> Solr
>> and then retrieve the full document elsewhere.
>>
>> I would recommend creating a document structure in Solr with whatever
>> fields you want indexed (most likely as text_en, etc.), and then store a
>> "string" field named "content_id", which would be the Accumulo row id 
>> that
>> you look up with a scan.
>>
>> One caveat -- Accumulo will be protected at the cell level, but if you
>> need
>> your Solr search results to be protected by complex authorization strings
>> similar to Accumulo, you will need to write your own QParserPlugin and 
>> use
>> post filtering:
>> http://java.dzone.com/articles/custom-security-filtering-solr
>>
>> The code you see in that article is written for an earlier version of
>> Solr,
>> but it's not too difficult to adjust it for the latest (we've done so in
>> our project).  Once you've implemented this, you would store an
>> "authorizations" string field in each Solr document, and pass in the
>> authorizations that the user has access to in the fq parameter of every
>> query.  It's also not too bad to write something that parses the Accumulo
>> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly 
>> in
>> the QParserPlugin.
>>
>> This will give you true row level security in Solr and Accumulo, and it
>> performs quite well in Solr.
>>
>> Let me know if you have any other questions.
>>
>> Joe
>>
>>
>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>> > Dear Joe,
>> > Hi,
>> > I am going to store the crawl web pages in accumulo as the main storage
>> > part of my project and I need to give these data to solr for indexing >
>> and
>> > user searches. I need to do some social and web analysis on my data as
>> well
>> > as having some security features. Therefore accumulo is my choice for >
>> the
>> > database part and for index and search I am going to use Solr. Would 
>> > you
>> > please guide me through that?
>> >
>> >
>> >
>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com>
>> wrote:
>> >
>> > > We store data in both Solr and Accumulo -- do you have more details
>> about
>> > > what kind of data and indexing you want?  Is there a reason you're
>> > thinking
>> > > of using both databases in particular?
>> > >
>> > >
>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com>
>> > > wrote:
>> > >
>> > > > Dear All,
>> > > > Hi,
>> > > > I was wondering is there anybody out there that tried to integrate
>> Solr
>> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS >
>> > > and
>> > > using
>> > > > Solr to index data inside Accumulo? Do you have any idea how can I
>> > > > do
>> > > such
>> > > > integration?
>> > > >
>> > > > Best regards.
>> > > >
>> > > > --
>> > > > A.Nazemian
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > I know what it is to be in need, and I know what it is to have 
>> > > plenty.
>>  I
>> > > have learned the secret of being content in any and every situation,
>> > > whether well fed or hungry, whether living in plenty or in want.  I >
>> > can
>> > do
>> > > all this through him who gives me strength.    *-Philippians 4:12-13*
>> > >
>> >
>> >
>> >
>> > --
>> > A.Nazemian
>> >
>>
>>
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can 
>> do
>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Ali Nazemian <al...@gmail.com>.

Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> If you are not a "true hard-core gunslinger" who is willing to dive in and
> integrate the code yourself, instead you should give serious consideration
> to a product such as DataStax Enterprise that fully integrates and packages
> a NoSQL database (Cassandra) and Solr for search. The security aspects are
> still a work in progress, but certainly headed in the right direction. And
> it has Hadoop and Spark integration as well.
>
> See:
> http://www.datastax.com/what-we-offer/products-services/
> datastax-enterprise
>
> -- Jack Krupansky
>
> -----Original Message----- From: Ali Nazemian
> Sent: Thursday, July 24, 2014 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: integrating Accumulo with solr
>
>
> Thank you very much. Nice Idea but how can Solr and Accumulo can be
> synchronized in this way?
> I know that Solr can be integrated with HDFS and also Accumulo works on the
> top of HDFS. So can I use HDFS as integration point? I mean set Solr to use
> HDFS as a source of documents as well as the destination of documents.
> Regards.
>
>
> On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com> wrote:
>
>  Ali,
>>
>> Sounds like a good choice.  It's pretty standard to store the primary
>> storage id as a field in Solr so that you can search the full text in Solr
>> and then retrieve the full document elsewhere.
>>
>> I would recommend creating a document structure in Solr with whatever
>> fields you want indexed (most likely as text_en, etc.), and then store a
>> "string" field named "content_id", which would be the Accumulo row id that
>> you look up with a scan.
>>
>> One caveat -- Accumulo will be protected at the cell level, but if you
>> need
>> your Solr search results to be protected by complex authorization strings
>> similar to Accumulo, you will need to write your own QParserPlugin and use
>> post filtering:
>> http://java.dzone.com/articles/custom-security-filtering-solr
>>
>> The code you see in that article is written for an earlier version of
>> Solr,
>> but it's not too difficult to adjust it for the latest (we've done so in
>> our project).  Once you've implemented this, you would store an
>> "authorizations" string field in each Solr document, and pass in the
>> authorizations that the user has access to in the fq parameter of every
>> query.  It's also not too bad to write something that parses the Accumulo
>> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly in
>> the QParserPlugin.
>>
>> This will give you true row level security in Solr and Accumulo, and it
>> performs quite well in Solr.
>>
>> Let me know if you have any other questions.
>>
>> Joe
>>
>>
>> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>> > Dear Joe,
>> > Hi,
>> > I am going to store the crawl web pages in accumulo as the main storage
>> > part of my project and I need to give these data to solr for indexing >
>> and
>> > user searches. I need to do some social and web analysis on my data as
>> well
>> > as having some security features. Therefore accumulo is my choice for >
>> the
>> > database part and for index and search I am going to use Solr. Would you
>> > please guide me through that?
>> >
>> >
>> >
>> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com>
>> wrote:
>> >
>> > > We store data in both Solr and Accumulo -- do you have more details
>> about
>> > > what kind of data and indexing you want?  Is there a reason you're
>> > thinking
>> > > of using both databases in particular?
>> > >
>> > >
>> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com>
>> > > wrote:
>> > >
>> > > > Dear All,
>> > > > Hi,
>> > > > I was wondering is there anybody out there that tried to integrate
>> Solr
>> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS >
>> > > and
>> > > using
>> > > > Solr to index data inside Accumulo? Do you have any idea how can I
>> > > > do
>> > > such
>> > > > integration?
>> > > >
>> > > > Best regards.
>> > > >
>> > > > --
>> > > > A.Nazemian
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > I know what it is to be in need, and I know what it is to have plenty.
>>  I
>> > > have learned the secret of being content in any and every situation,
>> > > whether well fed or hungry, whether living in plenty or in want.  I >
>> > can
>> > do
>> > > all this through him who gives me strength.    *-Philippians 4:12-13*
>> > >
>> >
>> >
>> >
>> > --
>> > A.Nazemian
>> >
>>
>>
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can do
>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Jack Krupansky <ja...@basetechnology.com>.

If you are not a "true hard-core gunslinger" who is willing to dive in and 
integrate the code yourself, instead you should give serious consideration 
to a product such as DataStax Enterprise that fully integrates and packages 
a NoSQL database (Cassandra) and Solr for search. The security aspects are 
still a work in progress, but certainly headed in the right direction. And 
it has Hadoop and Spark integration as well.

See:
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-----Original Message----- 
From: Ali Nazemian
Sent: Thursday, July 24, 2014 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com> wrote:

> Ali,
>
> Sounds like a good choice.  It's pretty standard to store the primary
> storage id as a field in Solr so that you can search the full text in Solr
> and then retrieve the full document elsewhere.
>
> I would recommend creating a document structure in Solr with whatever
> fields you want indexed (most likely as text_en, etc.), and then store a
> "string" field named "content_id", which would be the Accumulo row id that
> you look up with a scan.
>
> One caveat -- Accumulo will be protected at the cell level, but if you 
> need
> your Solr search results to be protected by complex authorization strings
> similar to Accumulo, you will need to write your own QParserPlugin and use
> post filtering:
> http://java.dzone.com/articles/custom-security-filtering-solr
>
> The code you see in that article is written for an earlier version of 
> Solr,
> but it's not too difficult to adjust it for the latest (we've done so in
> our project).  Once you've implemented this, you would store an
> "authorizations" string field in each Solr document, and pass in the
> authorizations that the user has access to in the fq parameter of every
> query.  It's also not too bad to write something that parses the Accumulo
> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly in
> the QParserPlugin.
>
> This will give you true row level security in Solr and Accumulo, and it
> performs quite well in Solr.
>
> Let me know if you have any other questions.
>
> Joe
>
>
> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
> wrote:
>
> > Dear Joe,
> > Hi,
> > I am going to store the crawl web pages in accumulo as the main storage
> > part of my project and I need to give these data to solr for indexing 
> > and
> > user searches. I need to do some social and web analysis on my data as
> well
> > as having some security features. Therefore accumulo is my choice for 
> > the
> > database part and for index and search I am going to use Solr. Would you
> > please guide me through that?
> >
> >
> >
> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com> wrote:
> >
> > > We store data in both Solr and Accumulo -- do you have more details
> about
> > > what kind of data and indexing you want?  Is there a reason you're
> > thinking
> > > of using both databases in particular?
> > >
> > >
> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com>
> > > wrote:
> > >
> > > > Dear All,
> > > > Hi,
> > > > I was wondering is there anybody out there that tried to integrate
> Solr
> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS 
> > > > and
> > > using
> > > > Solr to index data inside Accumulo? Do you have any idea how can I 
> > > > do
> > > such
> > > > integration?
> > > >
> > > > Best regards.
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> > >
> > >
> > > --
> > > I know what it is to be in need, and I know what it is to have plenty.
>  I
> > > have learned the secret of being content in any and every situation,
> > > whether well fed or hungry, whether living in plenty or in want.  I 
> > > can
> > do
> > > all this through him who gives me strength.    *-Philippians 4:12-13*
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Ali Nazemian <al...@gmail.com>.

Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock <jg...@gmail.com> wrote:

> Ali,
>
> Sounds like a good choice.  It's pretty standard to store the primary
> storage id as a field in Solr so that you can search the full text in Solr
> and then retrieve the full document elsewhere.
>
> I would recommend creating a document structure in Solr with whatever
> fields you want indexed (most likely as text_en, etc.), and then store a
> "string" field named "content_id", which would be the Accumulo row id that
> you look up with a scan.
>
> One caveat -- Accumulo will be protected at the cell level, but if you need
> your Solr search results to be protected by complex authorization strings
> similar to Accumulo, you will need to write your own QParserPlugin and use
> post filtering:
> http://java.dzone.com/articles/custom-security-filtering-solr
>
> The code you see in that article is written for an earlier version of Solr,
> but it's not too difficult to adjust it for the latest (we've done so in
> our project).  Once you've implemented this, you would store an
> "authorizations" string field in each Solr document, and pass in the
> authorizations that the user has access to in the fq parameter of every
> query.  It's also not too bad to write something that parses the Accumulo
> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly in
> the QParserPlugin.
>
> This will give you true row level security in Solr and Accumulo, and it
> performs quite well in Solr.
>
> Let me know if you have any other questions.
>
> Joe
>
>
> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com>
> wrote:
>
> > Dear Joe,
> > Hi,
> > I am going to store the crawl web pages in accumulo as the main storage
> > part of my project and I need to give these data to solr for indexing and
> > user searches. I need to do some social and web analysis on my data as
> well
> > as having some security features. Therefore accumulo is my choice for the
> > database part and for index and search I am going to use Solr. Would you
> > please guide me through that?
> >
> >
> >
> > On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com> wrote:
> >
> > > We store data in both Solr and Accumulo -- do you have more details
> about
> > > what kind of data and indexing you want?  Is there a reason you're
> > thinking
> > > of using both databases in particular?
> > >
> > >
> > > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com>
> > > wrote:
> > >
> > > > Dear All,
> > > > Hi,
> > > > I was wondering is there anybody out there that tried to integrate
> Solr
> > > > with Accumulo? I was thinking about using Accumulo on top of HDFS and
> > > using
> > > > Solr to index data inside Accumulo? Do you have any idea how can I do
> > > such
> > > > integration?
> > > >
> > > > Best regards.
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> > >
> > >
> > > --
> > > I know what it is to be in need, and I know what it is to have plenty.
>  I
> > > have learned the secret of being content in any and every situation,
> > > whether well fed or hungry, whether living in plenty or in want.  I can
> > do
> > > all this through him who gives me strength.    *-Philippians 4:12-13*
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>



-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Erik Hatcher <er...@gmail.com>.

Just FYI, the blog Joe mentioned below (authored by me) has been adjusted to Solr 4.x in the original blog location here:

   <http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/>

	Erik

On Jul 24, 2014, at 8:03 AM, Joe Gresock <jg...@gmail.com> wrote:

> Ali,
> 
> Sounds like a good choice.  It's pretty standard to store the primary
> storage id as a field in Solr so that you can search the full text in Solr
> and then retrieve the full document elsewhere.
> 
> I would recommend creating a document structure in Solr with whatever
> fields you want indexed (most likely as text_en, etc.), and then store a
> "string" field named "content_id", which would be the Accumulo row id that
> you look up with a scan.
> 
> One caveat -- Accumulo will be protected at the cell level, but if you need
> your Solr search results to be protected by complex authorization strings
> similar to Accumulo, you will need to write your own QParserPlugin and use
> post filtering:
> http://java.dzone.com/articles/custom-security-filtering-solr
> 
> The code you see in that article is written for an earlier version of Solr,
> but it's not too difficult to adjust it for the latest (we've done so in
> our project).  Once you've implemented this, you would store an
> "authorizations" string field in each Solr document, and pass in the
> authorizations that the user has access to in the fq parameter of every
> query.  It's also not too bad to write something that parses the Accumulo
> authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly in
> the QParserPlugin.
> 
> This will give you true row level security in Solr and Accumulo, and it
> performs quite well in Solr.
> 
> Let me know if you have any other questions.
> 
> Joe
> 
> 
> On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com> wrote:
> 
>> Dear Joe,
>> Hi,
>> I am going to store the crawl web pages in accumulo as the main storage
>> part of my project and I need to give these data to solr for indexing and
>> user searches. I need to do some social and web analysis on my data as well
>> as having some security features. Therefore accumulo is my choice for the
>> database part and for index and search I am going to use Solr. Would you
>> please guide me through that?
>> 
>> 
>> 
>> On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com> wrote:
>> 
>>> We store data in both Solr and Accumulo -- do you have more details about
>>> what kind of data and indexing you want?  Is there a reason you're
>> thinking
>>> of using both databases in particular?
>>> 
>>> 
>>> On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com>
>>> wrote:
>>> 
>>>> Dear All,
>>>> Hi,
>>>> I was wondering is there anybody out there that tried to integrate Solr
>>>> with Accumulo? I was thinking about using Accumulo on top of HDFS and
>>> using
>>>> Solr to index data inside Accumulo? Do you have any idea how can I do
>>> such
>>>> integration?
>>>> 
>>>> Best regards.
>>>> 
>>>> --
>>>> A.Nazemian
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> I know what it is to be in need, and I know what it is to have plenty.  I
>>> have learned the secret of being content in any and every situation,
>>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do
>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>> 
>> 
>> 
>> 
>> --
>> A.Nazemian
>> 
> 
> 
> 
> -- 
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*

Re: integrating Accumulo with solr

Posted by Joe Gresock <jg...@gmail.com>.

Ali,

Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in Solr
and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store a
"string" field named "content_id", which would be the Accumulo row id that
you look up with a scan.

One caveat -- Accumulo will be protected at the cell level, but if you need
your Solr search results to be protected by complex authorization strings
similar to Accumulo, you will need to write your own QParserPlugin and use
post filtering:
http://java.dzone.com/articles/custom-security-filtering-solr

The code you see in that article is written for an earlier version of Solr,
but it's not too difficult to adjust it for the latest (we've done so in
our project).  Once you've implemented this, you would store an
"authorizations" string field in each Solr document, and pass in the
authorizations that the user has access to in the fq parameter of every
query.  It's also not too bad to write something that parses the Accumulo
authorizations string (like A&B&(C|D|E|F)) and interpret it accordingly in
the QParserPlugin.

This will give you true row level security in Solr and Accumulo, and it
performs quite well in Solr.

Let me know if you have any other questions.

Joe

On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian <al...@gmail.com> wrote:

> Dear Joe,
> Hi,
> I am going to store the crawl web pages in accumulo as the main storage
> part of my project and I need to give these data to solr for indexing and
> user searches. I need to do some social and web analysis on my data as well
> as having some security features. Therefore accumulo is my choice for the
> database part and for index and search I am going to use Solr. Would you
> please guide me through that?
>
>
>
> On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com> wrote:
>
> > We store data in both Solr and Accumulo -- do you have more details about
> > what kind of data and indexing you want?  Is there a reason you're
> thinking
> > of using both databases in particular?
> >
> >
> > On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com>
> > wrote:
> >
> > > Dear All,
> > > Hi,
> > > I was wondering is there anybody out there that tried to integrate Solr
> > > with Accumulo? I was thinking about using Accumulo on top of HDFS and
> > using
> > > Solr to index data inside Accumulo? Do you have any idea how can I do
> > such
> > > integration?
> > >
> > > Best regards.
> > >
> > > --
> > > A.Nazemian
> > >
> >
> >
> >
> > --
> > I know what it is to be in need, and I know what it is to have plenty.  I
> > have learned the secret of being content in any and every situation,
> > whether well fed or hungry, whether living in plenty or in want.  I can
> do
> > all this through him who gives me strength.    *-Philippians 4:12-13*
> >
>
>
>
> --
> A.Nazemian
>

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Re: integrating Accumulo with solr

Posted by Ali Nazemian <al...@gmail.com>.

Dear Joe,
Hi,
I am going to store the crawl web pages in accumulo as the main storage
part of my project and I need to give these data to solr for indexing and
user searches. I need to do some social and web analysis on my data as well
as having some security features. Therefore accumulo is my choice for the
database part and for index and search I am going to use Solr. Would you
please guide me through that?

On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock <jg...@gmail.com> wrote:

> We store data in both Solr and Accumulo -- do you have more details about
> what kind of data and indexing you want?  Is there a reason you're thinking
> of using both databases in particular?
>
>
> On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com>
> wrote:
>
> > Dear All,
> > Hi,
> > I was wondering is there anybody out there that tried to integrate Solr
> > with Accumulo? I was thinking about using Accumulo on top of HDFS and
> using
> > Solr to index data inside Accumulo? Do you have any idea how can I do
> such
> > integration?
> >
> > Best regards.
> >
> > --
> > A.Nazemian
> >
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>

-- 
A.Nazemian

Re: integrating Accumulo with solr

Posted by Joe Gresock <jg...@gmail.com>.

We store data in both Solr and Accumulo -- do you have more details about
what kind of data and indexing you want?  Is there a reason you're thinking
of using both databases in particular?

On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian <al...@gmail.com> wrote:

> Dear All,
> Hi,
> I was wondering is there anybody out there that tried to integrate Solr
> with Accumulo? I was thinking about using Accumulo on top of HDFS and using
> Solr to index data inside Accumulo? Do you have any idea how can I do such
> integration?
>
> Best regards.
>
> --
> A.Nazemian
>

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*