You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/04/01 07:20:06 UTC

Re: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown

Grant,

I am messing with the script, and with your tip I expect I can
make it recurse over as many releases as needed.

I did run it again using the full file, this time using my Imac:-
	643465 	  took 	22min 14sec		2008-04-01
	734796		73min 58sec		2009-01-15
	758795 		70min 55sec		2009-03-26
I then ran it again using only the first 1M records:-
	643465 	  took 	2m51.516s		2008-04-01
	734796		7m29.326s		2009-01-15
	758795 		8m18.403s		2009-03-26
this time with commit=true.
	643465 	  took 	2m49.200s		2008-04-01
	734796		8m27.414s		2009-01-15
	758795 		9m32.459s		2009-03-26
this time with commit=false&overwrite=false.
	643465 	  took 	2m46.149s		2008-04-01
	734796		3m29.909s		2009-01-15
	758795 		3m26.248s		2009-03-26

Just read your latest post. I will apply the patches and retest
the above.

>Can you try adding &overwrite=false and running against the latest  
>version?  My current working theory is that Solr/Lucene has changed  
>how deletes are handled such that work that was deferred before is now  
>not deferred as often.  In fact, you are not seeing this cost paid (or  
>at least not noticing it) because you are not committing, but I  
>believe you do see it when you are closing down Solr, which is why it  
>takes so long to exit.
It can take ages! (>15min to get tomcat to quit). Also my script does
have the separate commit step, which does not take any time!

>I also think that Lucene adding fsync() into  
>the equation may cause some slow down, but that is a penalty we are  
>willing to pay as it gives us higher data integrity.
Data integrity is always good. However if performance seems
unreasonable, user/customers tend to take things into their
own hands and kill the process or machine. This tends to be
very bad for data integrity.

>So, depending on how you have your data, I think a workaround is to:
>Add a field that contains a single term identifying the data type for  
>this particular CSV file, i.e. something like field: type, value:  
>fergs-csv
>Then, before indexing, you can issue a Delete By Query: type:fergs-csv  
>and then add your CSV file using overwrite=false.  This amounts to a  
>batch delete followed by a batch add, but without the add having to  
>issue deletes for each add.
Ok.. but... for these test cases I am starting off with an empty
index. The script does a "rm -rf solr/data" before tomcat is launched.
So I do not understand how the above helps. UNLESS there are duplicate
gaz entries.

>In the meantime, I'm trying to see if I can pinpoint down a specific  
>change and see if there is anything that might help it perform better.
>
>-Grant
>

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================