You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sunnyfr <jo...@gmail.com> on 2009/06/03 16:20:06 UTC

Re: Solr vs Sphinx

Hi guys,

I work now for serveral month on solr and really you provide quick answer
... and you're very nice to work with.
But I've got huge issue that I couldn't fixe after lot of post.

My indexation take one two days to be done. For 8G of data indexed and 1,5M
of docs (ok I've plenty of links in my table but it takes such a long time).

Second I've to do update every 20mn but every update represent maybe 20
000docs
and when I use the replication I must replicate all the new index folder
optimized because Ive too much datas updated and too much segment needs to
be generate and I have to merge datas. So I lost my cache and my CPU goes
mad.

And I can't have more than 20request/sec.




Fergus McMenemie-2 wrote:
> 
>>Something that would be interesting is to share solr configs for  
>>various types of indexing tasks.  From a solr configuration aimed at  
>>indexing web pages to one doing large amounts of text to one that  
>>indexes specific structured data.  I could see those being posted on  
>>the wiki and helping folks who say "I want to do X, is there an  
>>example?".
>>
>>I think most folks start with the example Solr install and tweak from  
>>there, which probably isn't the best path...
>>
>>Eric
> 
> Yep a solr "cookbook" with lots of different example recipes. However
> these would need to be very actively maintained to ensure they always
> represented best practice. While using cocoon I made extensive use
> of the examples section of the cocoon website. However most of the,
> massive number of, examples represent obsolete cocoon practise. Or 
> there were four or five examples doing the same thing in different 
> ways with no text explaining the pros/cons of the different approaches.
> This held me, as a newcomer, back and gave a bad impression of cocoon.
> 
> I was wondering about a performance hints page. I was caught by an
> issue indexing CSV content where the use of &overwrite=false made
> an almost 3x difference to my indexing speed. Still do not really
> know why!
> 
>>
>>On May 15, 2009, at 8:09 AM, Mark Miller wrote:
>>
>>> In the spirit of good defaults:
>>>
>>> I think we should change the Solr highlighter to highlight phrase  
>>> queries by default, as well as prefix,range,wildcard constantscore  
>>> queries. Its awkward to have to tell people you have to turn those  
>>> on. I'd certainly prefer to have to turn them off if I have some  
>>> limitation rather than on.
> 
> Yep I agree, all whizzy new features should ideally be on by default
> unless there is a significant performance penalty. It is not enough
> that to issue a default solrconfig.xml with the feature on, it has to
> be on by default inside the code.
>  
>>>
>>> - Mark
>>
>>-----------------------------------------------------
>>Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
>>Free/Busy: http://tinyurl.com/eric-cal
> 
> Fergus
> 
> 

-- 
View this message in context: http://www.nabble.com/Solr-vs-Sphinx-tp23524676p23852364.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr vs Sphinx

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

Could you please start a new thread?


Thanks,
Otis


----- Original Message ----
> From: sunnyfr <jo...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 3, 2009 10:20:06 AM
> Subject: Re: Solr vs Sphinx
> 
> 
> Hi guys,
> 
> I work now for serveral month on solr and really you provide quick answer
> ... and you're very nice to work with.
> But I've got huge issue that I couldn't fixe after lot of post.
> 
> My indexation take one two days to be done. For 8G of data indexed and 1,5M
> of docs (ok I've plenty of links in my table but it takes such a long time).
> 
> Second I've to do update every 20mn but every update represent maybe 20
> 000docs
> and when I use the replication I must replicate all the new index folder
> optimized because Ive too much datas updated and too much segment needs to
> be generate and I have to merge datas. So I lost my cache and my CPU goes
> mad.
> 
> And I can't have more than 20request/sec.
> 
> 
> 
> 
> Fergus McMenemie-2 wrote:
> > 
> >>Something that would be interesting is to share solr configs for  
> >>various types of indexing tasks.  From a solr configuration aimed at  
> >>indexing web pages to one doing large amounts of text to one that  
> >>indexes specific structured data.  I could see those being posted on  
> >>the wiki and helping folks who say "I want to do X, is there an  
> >>example?".
> >>
> >>I think most folks start with the example Solr install and tweak from  
> >>there, which probably isn't the best path...
> >>
> >>Eric
> > 
> > Yep a solr "cookbook" with lots of different example recipes. However
> > these would need to be very actively maintained to ensure they always
> > represented best practice. While using cocoon I made extensive use
> > of the examples section of the cocoon website. However most of the,
> > massive number of, examples represent obsolete cocoon practise. Or 
> > there were four or five examples doing the same thing in different 
> > ways with no text explaining the pros/cons of the different approaches.
> > This held me, as a newcomer, back and gave a bad impression of cocoon.
> > 
> > I was wondering about a performance hints page. I was caught by an
> > issue indexing CSV content where the use of &overwrite=false made
> > an almost 3x difference to my indexing speed. Still do not really
> > know why!
> > 
> >>
> >>On May 15, 2009, at 8:09 AM, Mark Miller wrote:
> >>
> >>> In the spirit of good defaults:
> >>>
> >>> I think we should change the Solr highlighter to highlight phrase  
> >>> queries by default, as well as prefix,range,wildcard constantscore  
> >>> queries. Its awkward to have to tell people you have to turn those  
> >>> on. I'd certainly prefer to have to turn them off if I have some  
> >>> limitation rather than on.
> > 
> > Yep I agree, all whizzy new features should ideally be on by default
> > unless there is a significant performance penalty. It is not enough
> > that to issue a default solrconfig.xml with the feature on, it has to
> > be on by default inside the code.
> >  
> >>>
> >>> - Mark
> >>
> >>-----------------------------------------------------
> >>Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> >>Free/Busy: http://tinyurl.com/eric-cal
> > 
> > Fergus
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Solr-vs-Sphinx-tp23524676p23852364.html
> Sent from the Solr - User mailing list archive at Nabble.com.