You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/04/01 07:20:06 UTC
Re: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown
Grant,
I am messing with the script, and with your tip I expect I can
make it recurse over as many releases as needed.
I did run it again using the full file, this time using my Imac:-
643465 took 22min 14sec 2008-04-01
734796 73min 58sec 2009-01-15
758795 70min 55sec 2009-03-26
I then ran it again using only the first 1M records:-
643465 took 2m51.516s 2008-04-01
734796 7m29.326s 2009-01-15
758795 8m18.403s 2009-03-26
this time with commit=true.
643465 took 2m49.200s 2008-04-01
734796 8m27.414s 2009-01-15
758795 9m32.459s 2009-03-26
this time with commit=false&overwrite=false.
643465 took 2m46.149s 2008-04-01
734796 3m29.909s 2009-01-15
758795 3m26.248s 2009-03-26
Just read your latest post. I will apply the patches and retest
the above.
>Can you try adding &overwrite=false and running against the latest
>version? My current working theory is that Solr/Lucene has changed
>how deletes are handled such that work that was deferred before is now
>not deferred as often. In fact, you are not seeing this cost paid (or
>at least not noticing it) because you are not committing, but I
>believe you do see it when you are closing down Solr, which is why it
>takes so long to exit.
It can take ages! (>15min to get tomcat to quit). Also my script does
have the separate commit step, which does not take any time!
>I also think that Lucene adding fsync() into
>the equation may cause some slow down, but that is a penalty we are
>willing to pay as it gives us higher data integrity.
Data integrity is always good. However if performance seems
unreasonable, user/customers tend to take things into their
own hands and kill the process or machine. This tends to be
very bad for data integrity.
>So, depending on how you have your data, I think a workaround is to:
>Add a field that contains a single term identifying the data type for
>this particular CSV file, i.e. something like field: type, value:
>fergs-csv
>Then, before indexing, you can issue a Delete By Query: type:fergs-csv
>and then add your CSV file using overwrite=false. This amounts to a
>batch delete followed by a batch add, but without the add having to
>issue deletes for each add.
Ok.. but... for these test cases I am starting off with an empty
index. The script does a "rm -rf solr/data" before tomcat is launched.
So I do not understand how the above helps. UNLESS there are duplicate
gaz entries.
>In the meantime, I'm trying to see if I can pinpoint down a specific
>change and see if there is anything that might help it perform better.
>
>-Grant
>
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================