You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by BenMccarthy <be...@gmail.com> on 2011/12/09 10:46:32 UTC

Solr Best Practice Configuration

Good Morning.

I have now been through the various Solr tutorials and read the SOLR 3
Enterprise server book.  Im not at the point of figuring out if Solr can
help us with a scaling problem.  Im looking for advice on the following
scenario any pointers or references will be great:

I have two sets of distinct data:

Advert
Advertiser

An Advertiser has many Adverts in the db looking like

Advert {
    id
    field a
    field b
    advertiser_id
}

Advertiser {
    id
    field c
    field d
    lat
    long
}

So ive followed some docs and ive created a DIH which pulls all this into
one SOLR index.  Which is great.  The problem im looking at is that we have
a massive churn on Advertiser updates and with the one index i dont think it
will scale (Correct me if im wrong).

Would it be possible to have two seperate cores each with its own index and
then when issuing queries the results are returned as they are in a single
core setup.

Im basically looking for some pointers telling me if im going in the right
direction.  I dont want to have to update 50000 adverts when a advertiser
simply updated field c.  This is a problem we have with our current search.

Thanks
Ben



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572492.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Best Practice Configuration

Posted by Marc SCHNEIDER <ma...@gmail.com>.

Hi,

What about using the delta-import command of the DIH?
http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command

If you want two have 2 separated indexes, you could play with the "swap"
command.
One index would be continuously updated and the other one used for the user
requests.
>From time two time you will have to "swap" to that users get the most
recent data.
http://wiki.apache.org/solr/CoreAdmin (swap part).

Marc.

On Fri, Dec 9, 2011 at 10:46 AM, BenMccarthy <be...@gmail.com> wrote:

> Good Morning.
>
> I have now been through the various Solr tutorials and read the SOLR 3
> Enterprise server book.  Im not at the point of figuring out if Solr can
> help us with a scaling problem.  Im looking for advice on the following
> scenario any pointers or references will be great:
>
> I have two sets of distinct data:
>
> Advert
> Advertiser
>
> An Advertiser has many Adverts in the db looking like
>
> Advert {
>    id
>    field a
>    field b
>    advertiser_id
> }
>
> Advertiser {
>    id
>    field c
>    field d
>    lat
>    long
> }
>
> So ive followed some docs and ive created a DIH which pulls all this into
> one SOLR index.  Which is great.  The problem im looking at is that we have
> a massive churn on Advertiser updates and with the one index i dont think
> it
> will scale (Correct me if im wrong).
>
> Would it be possible to have two seperate cores each with its own index and
> then when issuing queries the results are returned as they are in a single
> core setup.
>
> Im basically looking for some pointers telling me if im going in the right
> direction.  I dont want to have to update 50000 adverts when a advertiser
> simply updated field c.  This is a problem we have with our current search.
>
> Thanks
> Ben
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572492.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr Best Practice Configuration

Posted by Chantal Ackermann <ch...@btelligent.de>.

Hi Ben,

what I understand from your post is:

Advertiser (1) <-> (*) Advert
(one-to-many where there can be 50,000 per single Advertiser)

Your index entity is based on Advert which means that there can be
50,000 documents in the index that need to be changed if a field of an
Advertiser is updated in the database.

I am using multi-core setups with differently structured indexes for
these needs. This means that some more complex lookups require queries
on several cores. This has not been a problem, so far. Our indexes,
however, have rather few data (ranging from a few hundred thousand
entries to some millions, rather a lot of fields with short texts) and
are highly dynamic (rebuilt several times a day, full rebuilt, no
increments).

Moving the Advertiser data out of the Advertiser index means:
(1) on updates of the Advertiser fields you don't need to change the
Advert index
(2) the Advert index might be a bit smaller (if that matters)
(3) the statistics on the Advertiser data will be in relation to the
Advertiser data and not in relation to the Adverts, while the statistics
on the Adverts won't contain any Advertiser data, anymore.

(This list might not be complete.)

What does (3) imply?
You will not be able to facet or sort or group on Adverts using any of
the Advertiser fields (as they reside in a different index core).

If you need facetting or similar then consider first testing the
performance of a massive update or rebuilding your index before starting
to change to multiple cores. Maybe the performance is better than you
fear it to be and no change is required.

Cheers,
Chantal

On Fri, 2011-12-09 at 10:46 +0100, BenMccarthy wrote:
> Good Morning.
> 
> I have now been through the various Solr tutorials and read the SOLR 3
> Enterprise server book.  Im not at the point of figuring out if Solr can
> help us with a scaling problem.  Im looking for advice on the following
> scenario any pointers or references will be great:
> 
> I have two sets of distinct data:
> 
> Advert
> Advertiser
> 
> An Advertiser has many Adverts in the db looking like
> 
> Advert {
>     id
>     field a
>     field b
>     advertiser_id
> }
> 
> Advertiser {
>     id
>     field c
>     field d
>     lat
>     long
> }
> 
> So ive followed some docs and ive created a DIH which pulls all this into
> one SOLR index.  Which is great.  The problem im looking at is that we have
> a massive churn on Advertiser updates and with the one index i dont think it
> will scale (Correct me if im wrong).
> 
> Would it be possible to have two seperate cores each with its own index and
> then when issuing queries the results are returned as they are in a single
> core setup.
> 
> Im basically looking for some pointers telling me if im going in the right
> direction.  I dont want to have to update 50000 adverts when a advertiser
> simply updated field c.  This is a problem we have with our current search.
> 
> Thanks
> Ben
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572492.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Best Practice Configuration

Posted by Erick Erickson <er...@gmail.com>.

What kind of data gets changed for your adverts? Is it
anything ExternalFileField could help with?

In the 4.0 (trunk) code there's a limited join capability
that may help evenutally

Best
Erick

On Fri, Dec 9, 2011 at 8:33 AM, BenMccarthy <be...@gmail.com> wrote:
> What does (3) imply?
> You will not be able to facet or sort or group on Adverts using any of
> the Advertiser fields (as they reside in a different index core).
>
> In relation to my last reply this is exactly what i need to do.
>
> Return all adverts where a postcode (translated to lang/long) is within a
> bounding box.  As the lat/long is on the advertiser record and fields in the
> advertiser contain information needed in the displaying of the advert.
>
> Thanks
> Ben
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572901.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Best Practice Configuration

Posted by BenMccarthy <be...@gmail.com>.

What does (3) imply? 
You will not be able to facet or sort or group on Adverts using any of 
the Advertiser fields (as they reside in a different index core). 

In relation to my last reply this is exactly what i need to do.

Return all adverts where a postcode (translated to lang/long) is within a
bounding box.  As the lat/long is on the advertiser record and fields in the
advertiser contain information needed in the displaying of the advert.

Thanks
Ben

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572901.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Best Practice Configuration

Posted by BenMccarthy <be...@gmail.com>.

Thanks for the replies guys.

The Advert index would be around 1 million records and we have a churn of
around 400K record changes per day.  Our current system does around 400K
updates a day to the index and we have over 72 million searches on the
index.

Im more wondering what type of configuration i would need to be able to get
results back which include both the advert record and the advertiser record
as one document.  If that sheds any more light on the situation.

Thanks
Ben


--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572894.html
Sent from the Solr - User mailing list archive at Nabble.com.