You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by blargy <zm...@hotmail.com> on 2010/03/10 01:56:09 UTC

Architectural help

I was wondering if someone could be so kind to give me some architectural
guidance.

A little about our setup. We are RoR shop that is currently using Ferret (no
laughs please) as our search technology. Our indexing process at the moment
is quite poor as well as our search results. After some deliberation we have
decided to switch to Solr to satisfy our search requirements. 

We have about 5M records ranging in size all coming from a DB source (only 2
tables). What will be the most efficient way of indexing all of these
documents? I am looking at DIH but before I go down that road I wanted to
get some guidance. Are there any pitfalls I should be aware of before I
start? Anything I can do now that will help me down the road?

I have also been exploring the Sunspot rails plugin
(http://outoftime.github.com/sunspot/) which so far seems amazing. There is
an easy way to reindex all of your models like Model.reindex but I doubt
this is the most efficient. Has anyone had any experience using Sunspot with
their rails environment and if so should I bother with the DIH?

Please let me know of any suggestions/opinions you may have. Thanks.


-- 
View this message in context: http://old.nabble.com/Architectural-help-tp27844268p27844268.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snapshot / Distribution Process

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Snapshot / Distribution Process
: In-Reply-To: <27...@talk.nabble.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss


Re: Snapshot / Distribution Process

Posted by Bill Au <bi...@gmail.com>.
Have you started rsyncd on the master?  Make sure that it is enabled before
you start:

http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

You can also tried running snappuller with the -V option to et more
debugging info.

Bill

On Wed, Mar 10, 2010 at 4:09 PM, Lars R. Noldan <la...@sixfeetup.com> wrote:

> Is anyone aware of a comprehensive guide for setting up the Snapshot
> Distribution process on Solr 1.3?
>
> I'm working through:
> http://wiki.apache.org/solr/CollectionDistribution#The_Snapshot_and_Distribution_Process
>
> And have run into a roadblock where the solr/bin/snappuller finds the
> appropriate snapshot, but rsync fails.  (according to the logs.)
>
> Any guidance you can provide, even if it's asking for additional
> troubleshooting information is welcome and appreciated.
>
> Thanks
> Lars
> --
> lars@sixfeetup.com | +1 (317) 861-5948 x609
> six feet up presents INDIGO : The Help Line for Plone
> More info at http://sixfeetup.com/indigo or call +1 (866) 749-3338

Snapshot / Distribution Process

Posted by "Lars R. Noldan" <la...@sixfeetup.com>.
Is anyone aware of a comprehensive guide for setting up the Snapshot Distribution process on Solr 1.3?  

I'm working through: http://wiki.apache.org/solr/CollectionDistribution#The_Snapshot_and_Distribution_Process

And have run into a roadblock where the solr/bin/snappuller finds the appropriate snapshot, but rsync fails.  (according to the logs.)

Any guidance you can provide, even if it's asking for additional troubleshooting information is welcome and appreciated.

Thanks
Lars
-- 
lars@sixfeetup.com | +1 (317) 861-5948 x609
six feet up presents INDIGO : The Help Line for Plone
More info at http://sixfeetup.com/indigo or call +1 (866) 749-3338

Re: Architectural help

Posted by Erick Erickson <er...@gmail.com>.
Data Import Handler, see
http://wiki.apache.org/solr/DataImportHandler
Erick

On Fri, Mar 12, 2010 at 12:08 AM, Dennis Gearon <ge...@sbcglobal.net>wrote:

> What is DIH? I feel like I'm saying, "Duh . . .", sorry.
>
>
> Dennis Gearon
>
> Signature Warning
> ----------------
> EARTH has a Right To Life,
>  otherwise we all die.
>
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
>
>
> --- On Thu, 3/11/10, Constantijn Visinescu <ba...@gmail.com> wrote:
>
> > From: Constantijn Visinescu <ba...@gmail.com>
> > Subject: Re: Architectural help
> > To: solr-user@lucene.apache.org
> > Date: Thursday, March 11, 2010, 5:25 AM
> > Assuming you create the view in such
> > a way that it returns 1 row for each
> > solrdocument you want indexed: yes
> >
> > On Wed, Mar 10, 2010 at 7:54 PM, blargy <zm...@hotmail.com>
> > wrote:
> >
> > >
> > > So I can just create a view  (or temporary table)
> > and then just have a
> > > simple
> > > "select * from (view or table)" in my DIH config?
> > >
> > >
> > > Constantijn Visinescu wrote:
> > > >
> > > > Try making a database view that contains
> > everything you want to index,
> > > and
> > > > then just use the DIH.
> > > >
> > > > Worked when i tested it ;)
> > > >
> > > > On Wed, Mar 10, 2010 at 1:56 AM, blargy <zm...@hotmail.com>
> > wrote:
> > > >
> > > >>
> > > >> I was wondering if someone could be so kind
> > to give me some
> > > architectural
> > > >> guidance.
> > > >>
> > > >> A little about our setup. We are RoR shop
> > that is currently using Ferret
> > > >> (no
> > > >> laughs please) as our search technology. Our
> > indexing process at the
> > > >> moment
> > > >> is quite poor as well as our search results.
> > After some deliberation we
> > > >> have
> > > >> decided to switch to Solr to satisfy our
> > search requirements.
> > > >>
> > > >> We have about 5M records ranging in size all
> > coming from a DB source
> > > >> (only
> > > >> 2
> > > >> tables). What will be the most efficient way
> > of indexing all of these
> > > >> documents? I am looking at DIH but before I
> > go down that road I wanted
> > > to
> > > >> get some guidance. Are there any pitfalls I
> > should be aware of before I
> > > >> start? Anything I can do now that will help
> > me down the road?
> > > >>
> > > >> I have also been exploring the Sunspot rails
> > plugin
> > > >> (http://outoftime.github.com/sunspot/) which so far
> > seems amazing.
> > > There
> > > >> is
> > > >> an easy way to reindex all of your models
> > like Model.reindex but I doubt
> > > >> this is the most efficient. Has anyone had
> > any experience using Sunspot
> > > >> with
> > > >> their rails environment and if so should I
> > bother with the DIH?
> > > >>
> > > >> Please let me know of any
> > suggestions/opinions you may have. Thanks.
> > > >>
> > > >>
> > > >> --
> > > >> View this message in context:
> > > >> http://old.nabble.com/Architectural-help-tp27844268p27844268.html
> > > >> Sent from the Solr - User mailing list
> > archive at Nabble.com.
> > > >>
> > > >>
> > > >
> > > >
> > >
> > > --
> > > View this message in context:
> > > http://old.nabble.com/Architectural-help-tp27844268p27854256.html
> > > Sent from the Solr - User mailing list archive at
> > Nabble.com.
> > >
> > >
> >
>

Re: Architectural help

Posted by Dennis Gearon <ge...@sbcglobal.net>.
What is DIH? I feel like I'm saying, "Duh . . .", sorry.


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 3/11/10, Constantijn Visinescu <ba...@gmail.com> wrote:

> From: Constantijn Visinescu <ba...@gmail.com>
> Subject: Re: Architectural help
> To: solr-user@lucene.apache.org
> Date: Thursday, March 11, 2010, 5:25 AM
> Assuming you create the view in such
> a way that it returns 1 row for each
> solrdocument you want indexed: yes
> 
> On Wed, Mar 10, 2010 at 7:54 PM, blargy <zm...@hotmail.com>
> wrote:
> 
> >
> > So I can just create a view  (or temporary table)
> and then just have a
> > simple
> > "select * from (view or table)" in my DIH config?
> >
> >
> > Constantijn Visinescu wrote:
> > >
> > > Try making a database view that contains
> everything you want to index,
> > and
> > > then just use the DIH.
> > >
> > > Worked when i tested it ;)
> > >
> > > On Wed, Mar 10, 2010 at 1:56 AM, blargy <zm...@hotmail.com>
> wrote:
> > >
> > >>
> > >> I was wondering if someone could be so kind
> to give me some
> > architectural
> > >> guidance.
> > >>
> > >> A little about our setup. We are RoR shop
> that is currently using Ferret
> > >> (no
> > >> laughs please) as our search technology. Our
> indexing process at the
> > >> moment
> > >> is quite poor as well as our search results.
> After some deliberation we
> > >> have
> > >> decided to switch to Solr to satisfy our
> search requirements.
> > >>
> > >> We have about 5M records ranging in size all
> coming from a DB source
> > >> (only
> > >> 2
> > >> tables). What will be the most efficient way
> of indexing all of these
> > >> documents? I am looking at DIH but before I
> go down that road I wanted
> > to
> > >> get some guidance. Are there any pitfalls I
> should be aware of before I
> > >> start? Anything I can do now that will help
> me down the road?
> > >>
> > >> I have also been exploring the Sunspot rails
> plugin
> > >> (http://outoftime.github.com/sunspot/) which so far
> seems amazing.
> > There
> > >> is
> > >> an easy way to reindex all of your models
> like Model.reindex but I doubt
> > >> this is the most efficient. Has anyone had
> any experience using Sunspot
> > >> with
> > >> their rails environment and if so should I
> bother with the DIH?
> > >>
> > >> Please let me know of any
> suggestions/opinions you may have. Thanks.
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >> http://old.nabble.com/Architectural-help-tp27844268p27844268.html
> > >> Sent from the Solr - User mailing list
> archive at Nabble.com.
> > >>
> > >>
> > >
> > >
> >
> > --
> > View this message in context:
> > http://old.nabble.com/Architectural-help-tp27844268p27854256.html
> > Sent from the Solr - User mailing list archive at
> Nabble.com.
> >
> >
> 

Re: Architectural help

Posted by Constantijn Visinescu <ba...@gmail.com>.
Assuming you create the view in such a way that it returns 1 row for each
solrdocument you want indexed: yes

On Wed, Mar 10, 2010 at 7:54 PM, blargy <zm...@hotmail.com> wrote:

>
> So I can just create a view  (or temporary table) and then just have a
> simple
> "select * from (view or table)" in my DIH config?
>
>
> Constantijn Visinescu wrote:
> >
> > Try making a database view that contains everything you want to index,
> and
> > then just use the DIH.
> >
> > Worked when i tested it ;)
> >
> > On Wed, Mar 10, 2010 at 1:56 AM, blargy <zm...@hotmail.com> wrote:
> >
> >>
> >> I was wondering if someone could be so kind to give me some
> architectural
> >> guidance.
> >>
> >> A little about our setup. We are RoR shop that is currently using Ferret
> >> (no
> >> laughs please) as our search technology. Our indexing process at the
> >> moment
> >> is quite poor as well as our search results. After some deliberation we
> >> have
> >> decided to switch to Solr to satisfy our search requirements.
> >>
> >> We have about 5M records ranging in size all coming from a DB source
> >> (only
> >> 2
> >> tables). What will be the most efficient way of indexing all of these
> >> documents? I am looking at DIH but before I go down that road I wanted
> to
> >> get some guidance. Are there any pitfalls I should be aware of before I
> >> start? Anything I can do now that will help me down the road?
> >>
> >> I have also been exploring the Sunspot rails plugin
> >> (http://outoftime.github.com/sunspot/) which so far seems amazing.
> There
> >> is
> >> an easy way to reindex all of your models like Model.reindex but I doubt
> >> this is the most efficient. Has anyone had any experience using Sunspot
> >> with
> >> their rails environment and if so should I bother with the DIH?
> >>
> >> Please let me know of any suggestions/opinions you may have. Thanks.
> >>
> >>
> >> --
> >> View this message in context:
> >> http://old.nabble.com/Architectural-help-tp27844268p27844268.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Architectural-help-tp27844268p27854256.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Architectural help

Posted by blargy <zm...@hotmail.com>.
So I can just create a view  (or temporary table) and then just have a simple
"select * from (view or table)" in my DIH config?


Constantijn Visinescu wrote:
> 
> Try making a database view that contains everything you want to index, and
> then just use the DIH.
> 
> Worked when i tested it ;)
> 
> On Wed, Mar 10, 2010 at 1:56 AM, blargy <zm...@hotmail.com> wrote:
> 
>>
>> I was wondering if someone could be so kind to give me some architectural
>> guidance.
>>
>> A little about our setup. We are RoR shop that is currently using Ferret
>> (no
>> laughs please) as our search technology. Our indexing process at the
>> moment
>> is quite poor as well as our search results. After some deliberation we
>> have
>> decided to switch to Solr to satisfy our search requirements.
>>
>> We have about 5M records ranging in size all coming from a DB source
>> (only
>> 2
>> tables). What will be the most efficient way of indexing all of these
>> documents? I am looking at DIH but before I go down that road I wanted to
>> get some guidance. Are there any pitfalls I should be aware of before I
>> start? Anything I can do now that will help me down the road?
>>
>> I have also been exploring the Sunspot rails plugin
>> (http://outoftime.github.com/sunspot/) which so far seems amazing. There
>> is
>> an easy way to reindex all of your models like Model.reindex but I doubt
>> this is the most efficient. Has anyone had any experience using Sunspot
>> with
>> their rails environment and if so should I bother with the DIH?
>>
>> Please let me know of any suggestions/opinions you may have. Thanks.
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Architectural-help-tp27844268p27844268.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Architectural-help-tp27844268p27854256.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Architectural help

Posted by Constantijn Visinescu <ba...@gmail.com>.
Try making a database view that contains everything you want to index, and
then just use the DIH.

Worked when i tested it ;)

On Wed, Mar 10, 2010 at 1:56 AM, blargy <zm...@hotmail.com> wrote:

>
> I was wondering if someone could be so kind to give me some architectural
> guidance.
>
> A little about our setup. We are RoR shop that is currently using Ferret
> (no
> laughs please) as our search technology. Our indexing process at the moment
> is quite poor as well as our search results. After some deliberation we
> have
> decided to switch to Solr to satisfy our search requirements.
>
> We have about 5M records ranging in size all coming from a DB source (only
> 2
> tables). What will be the most efficient way of indexing all of these
> documents? I am looking at DIH but before I go down that road I wanted to
> get some guidance. Are there any pitfalls I should be aware of before I
> start? Anything I can do now that will help me down the road?
>
> I have also been exploring the Sunspot rails plugin
> (http://outoftime.github.com/sunspot/) which so far seems amazing. There
> is
> an easy way to reindex all of your models like Model.reindex but I doubt
> this is the most efficient. Has anyone had any experience using Sunspot
> with
> their rails environment and if so should I bother with the DIH?
>
> Please let me know of any suggestions/opinions you may have. Thanks.
>
>
> --
> View this message in context:
> http://old.nabble.com/Architectural-help-tp27844268p27844268.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Architectural help

Posted by Chris Hostetter <ho...@fucit.org>.
: We have about 5M records ranging in size all coming from a DB source (only 2
: tables). What will be the most efficient way of indexing all of these
: documents? I am looking at DIH but before I go down that road I wanted to

The main question to ask yourself is what your indexing freshness 
requirements are.  

If you have a small amount of data, or if a large percentage of your data 
is changing all the time, and you can tollerate lag in how quickly updates 
to your data make it into the index, then doing complete re/full-builds 
(with DIH or anything else) periodicly is certianly the simplest way to 
go.

If you have a lot of data, or a small percentage of your data is changing 
within the largest interval of time you are willing to wait before your 
index is updated, then a "batch delta indexing" approach like DIH's 
deltaQuery provides is only a little bit more effort on top of 
implementing fullbuilds.

if you really need your index to be updated as soon as the authoritative 
data changes, then having your publishing flow immediately make changes to 
the index by pushing it over HTTP to the /update API is probably your best 
bet.



-Hoss