You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by James liu <li...@gmail.com> on 2007/03/16 08:53:05 UTC

how to balance index and search

I find index html will make tomcat obtain cpu 100% . It make seach become
slow.

So how to balance index and search.


web i use apache+php

solr i use tomcat 6+java1.6


Any suguesstion i m waiting.

-- 
regards
jl

Re: how to balance index and search

Posted by James liu <li...@gmail.com>.
2007/3/19, Chris Hostetter <ho...@fucit.org>:
>
>
> : I think it have problem that we use win2003 and i remember replication
>
> The scripts thta come with Solr don't work on windows becaues they rely on
> hardlinks to efficinelty copy only things that have changed -- but the
> principle of indexing on one server, creating "snapshots" (which could be
> true copies instead of hardlinks) and the nreplicating those snapshots out
> to slave servers for searching is still a solid one.


Now i m reading cwRsync which is Rsync in Window.

the hooks Solr provides for triggering snapshot creation on the master and
> snapshot installation on the slave make it possible for you to implement
> those anyway thta makes sense for your environment.
>
-Hoss
>
>


-- 
regards
jl

Re: how to balance index and search

Posted by Chris Hostetter <ho...@fucit.org>.
: I think it have problem that we use win2003 and i remember replication

The scripts thta come with Solr don't work on windows becaues they rely on
hardlinks to efficinelty copy only things that have changed -- but the
principle of indexing on one server, creating "snapshots" (which could be
true copies instead of hardlinks) and the nreplicating those snapshots out
to slave servers for searching is still a solid one.

the hooks Solr provides for triggering snapshot creation on the master and
snapshot installation on the slave make it possible for you to implement
those anyway thta makes sense for your environment.



-Hoss


Re: how to balance index and search

Posted by James liu <li...@gmail.com>.
2007/3/17, Chris Hostetter <ho...@fucit.org>:
>
>
> if your indexing while searching is causing problems, one way to reduce
> the impact is to index on a master instance and then use the replication
> scripts to sync it up with a slave instance (where all of your searches
> happen)


I think it have problem that we use win2003 and i remember replication
scripts have problem in FreeBSD.

if you are specificly seeing high CPU when indexing HTML, that's probably
> because the HTML Analyzers have to do a lot of compelx stuff to strip out
> hte HTML ... another option might be to parse that HTML on the client side
> before sending it to Solr.


Spider crawl html data into MS sql server. I just get data from SQL Server
and curl it to solr.
Tomorrow i will test under this option .


: I find index html will make tomcat obtain cpu 100% . It make seach become
> : slow.
> :
> : So how to balance index and search.
> :
> :
> : web i use apache+php
> :
> : solr i use tomcat 6+java1.6

-Hoss
>
>


-- 
regards
jl

Re: how to balance index and search

Posted by Chris Hostetter <ho...@fucit.org>.
if your indexing while searching is causing problems, one way to reduce
the impact is to index on a master instance and then use the replication
scripts to sync it up with a slave instance (where all of your searches
happen)

if you are specificly seeing high CPU when indexing HTML, that's probably
because the HTML Analyzers have to do a lot of compelx stuff to strip out
hte HTML ... another option might be to parse that HTML on the client side
before sending it to Solr.

: I find index html will make tomcat obtain cpu 100% . It make seach become
: slow.
:
: So how to balance index and search.
:
:
: web i use apache+php
:
: solr i use tomcat 6+java1.6



-Hoss


Re: how to balance index and search

Posted by James liu <li...@gmail.com>.
2007/3/19, Chris Hostetter <ho...@fucit.org>:
>
>
> : I just wana know CNET.com's index and search architecture if it can be
> : public.
> : Many people who use solr or wanna use,,they all wanna know and learn.
>
> I'm not sure what to tell you: Solr *is* our search arch.


Below information  that i wanna learn. Thks  Chris.

Maybe this thing should add to wiki. I think person will be happy reading
it.

  We have a dozen
> or so Solr, indexes, all of them use hte master/slave model -- but they
> are all configured in various ways based on the nature of the data and the
> types of queries we do.  the news collection doesn't do faceted search and
> surfacing new stories immediately is crucial, so they have small cache
> configs, with very low auto warming, and replication cranked up to happen
> very frequently; meanwhile hte product index where update latency of 20
> minutes isn't the end of the world but we do want to support faceted
> searching does snapinstalls only every 15 minutes (i think) with big
> caches, that are 100% auto warmed.
>
>
>
>
>
> -Hoss
>
>


-- 
regards
jl

Re: how to balance index and search

Posted by Chris Hostetter <ho...@fucit.org>.
: I just wana know CNET.com's index and search architecture if it can be
: public.
: Many people who use solr or wanna use,,they all wanna know and learn.

I'm not sure what to tell you: Solr *is* our search arch.  We have a dozen
or so Solr, indexes, all of them use hte master/slave model -- but they
are all configured in various ways based on the nature of the data and the
types of queries we do.  the news collection doesn't do faceted search and
surfacing new stories immediately is crucial, so they have small cache
configs, with very low auto warming, and replication cranked up to happen
very frequently; meanwhile hte product index where update latency of 20
minutes isn't the end of the world but we do want to support faceted
searching does snapinstalls only every 15 minutes (i think) with big
caches, that are 100% auto warmed.





-Hoss


Re: how to balance index and search

Posted by James liu <li...@gmail.com>.
2007/3/17, Chris Hostetter <ho...@fucit.org>:
>
>
> : Can people from cnet tell how to use solr in CNET.COM ?
>
> I really don't understand your question, here's some links to CNET.com
> that use Solr...
>
> http://www.cnet.com/4244-5_1-0.html?query=ipod
> http://search.news.com/search?q=apple
> http://reviews.cnet.com/4566-3121-0.html


I just wana know CNET.com's index and search architecture if it can be
public.
Many people who use solr or wanna use,,they all wanna know and learn.



-Hoss
>
>


-- 
regards
jl

Re: how to balance index and search

Posted by Chris Hostetter <ho...@fucit.org>.
: Can people from cnet tell how to use solr in CNET.COM ?

I really don't understand your question, here's some links to CNET.com
that use Solr...

http://www.cnet.com/4244-5_1-0.html?query=ipod
http://search.news.com/search?q=apple
http://reviews.cnet.com/4566-3121-0.html



-Hoss


Re: how to balance index and search

Posted by James liu <li...@gmail.com>.
Can people from cnet tell how to use solr in CNET.COM ?


2007/3/16, James liu <li...@gmail.com>:
>
> I find index html will make tomcat obtain cpu 100% . It make seach become
> slow.
>
> So how to balance index and search.
>
>
> web i use apache+php
>
> solr i use tomcat 6+java1.6
>
>
> Any suguesstion i m waiting.
>
> --
> regards
> jl




-- 
regards
jl