You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2008/11/19 14:25:01 UTC
Upgrade from 1.2 to 1.3 gives 3x slowdown
Hello,
I have a CSV file with 6M records which took 22min to index with
solr 1.2. I then stopped tomcat replaced the solr stuff inside
webapps with version 1.3, wiped my index and restarted tomcat.
Indexing the exact same content now takes 69min. My machine has
2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
Are there any tweaks I can use to get the original index time
back. I read through the release notes and was expecting a
speed up. I saw the bit about increasing ramBufferSizeMB and set
it to 64MB; it had no effect.
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!
Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hello Grant,
>
>Haven't forgotten about you, but I've been traveling and then into
>some US Holidays here.
Happy thanks giving!
>
>To confirm I am understanding, you are seeing a slowdown between 1.3-
>dev from April and one from September, right?
Yep.
Here are the MD5 hashes:-
fergus: md5 *.war
MD5 (solr-bc.war) = 8d4f95628d6978c959d63d304788bc25
MD5 (solr-nightly.war) = 10281455a66b0035ee1f805496d880da
This is the META-INF/MANIFEST.MF from a recent nightly build. (slow)
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.0
Created-By: 1.5.0_06-b05 (Sun Microsystems Inc.)
Extension-Name: org.apache.solr
Specification-Title: Apache Solr Search Server
Specification-Version: 1.3.0.2008.11.13.08.16.12
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.solr
Implementation-Version: nightly exported - yonik - 2008-11-13 08:16:12
Implementation-Vendor: The Apache Software Foundation
X-Compile-Source-JDK: 1.5
X-Compile-Target-JDK: 1.5
This is war file we were given on the course
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.0
Created-By: 1.5.0_13-121 ("Apple Computer, Inc.")
Extension-Name: org.apache.solr
Specification-Title: Apache Solr Search Server
Specification-Version: 1.2.2008.04.04.08.09.14
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.solr
Implementation-Version: 1.3-dev exported - erik - 2008-04-04 08:09:14
Implementation-Vendor: The Apache Software Foundation
X-Compile-Source-JDK: 1.5
X-Compile-Target-JDK: 1.5
I have copied both war files to a web site
http://www.twig.me.uk/solr/solr-bc.war (solr 1.3 dev == bootcamp)
http://www.twig.me.uk/solr/solr-nightly.war (nightly)
Regards Fergus.
>Can you produce an MD5 hash of the WAR file or something, such that I
>can know I have the exact bits. Better yet, perhaps you can put those
>files up somewhere where they can be downloaded.
>
>Thanks,
>Grant
>
>On Nov 26, 2008, at 10:54 AM, Fergus McMenemie wrote:
>
>> Hello Grant,
>>
>> Not much good with Java profilers (yet!) so I thought I
>> would send a script!
>>
>> Details... details! Having decided to produce a script to
>> replicate the 1.2 vis 1.3 speed problem. The required rigor
>> revealed a lot more.
>>
>> 1) The faster version I have previously referred to as 1.2,
>> was actually a "1.3-dev" I had downloaded as part of the
>> solr bootcamp class at ApacheCon Europe 2008. The ID
>> string in the CHANGES.txt document is:-
>> $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>>
>> 2) I did actually download and speed test a version of 1.2
>> from the internet. It's CHANGES.txt id is:-
>> $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
>> Speed wise it was about the same as 1.3 at 64min. It also
>> had lots of char set issues and is ignored from now on.
>>
>> 3) The version I was planning to use, till I found this,
>> speed issue was the "latest" official version:-
>> $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
>> I also verified the behavior with a nightly build.
>> $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>>
>> Anyway, The following script indexes the content in 22min
>> for the 1.3-dev version and takes 68min for the newer releases
>> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
>> release and used it replace the conf directory from the
>> official 1.3 release. The 3x slow down was still there; it is
>> not a configuration issue!
>> =================================
>>
>>
>>
>>
>>
>>
>> #! /bin/bash
>>
>> # This script assumes a /usr/local/tomcat link to whatever version
>> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
>> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
>> # All the following was done as root.
>>
>>
>> # I have a directory /usr/local/ts which contains four versions of
>> solr. The
>> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or
>> a 1.3beata
>> # I got while attending a solr bootcamp. I indexed the same content
>> using the
>> # different versions of solr as follows:
>> cd /usr/local/ts
>> if [ "" ]
>> then
>> echo "Starting from a-fresh"
>> sleep 5 # allow time for me to interrupt!
>> cp -Rp apache-solr-bc/example/solr ./solrbc #bc = bootcamp
>> cp -Rp apache-solr-nightly/example/solr ./solrnightly
>> cp -Rp apache-solr-1.3.0/example/solr ./solr13
>>
>> # the gaz is regularly updated and its name keeps changing :-) The
>> page
>> # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to
>> the latest
>> # version.
>> curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip
>> " > geonames.zip
>> unzip -q geonames.zip
>> # delete corrupt blips!
>> perl -i -n -e 'print unless
>> ($. > 2128495 and $. < 2128505) or
>> ($. > 5944254 and $. < 5944260)
>> ;' geonames_dd_dms_date_20081118.txt
>> #following was used to detect bad short records
>> #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F),"
>> args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>>
>> # my set of fields and copyfields for the schema.xml
>> fields='
>> <fields>
>> <field name="UNI" type="string" indexed="true"
>> stored="true" required="true" />
>> <field name="CCODE" type="string" indexed="true"
>> stored="true"/>
>> <field name="DSG" type="string" indexed="true"
>> stored="true"/>
>> <field name="CC1" type="string" indexed="true"
>> stored="true"/>
>> <field name="LAT" type="sfloat" indexed="true"
>> stored="true"/>
>> <field name="LONG" type="sfloat" indexed="true"
>> stored="true"/>
>> <field name="MGRS" type="string" indexed="false"
>> stored="true"/>
>> <field name="JOG" type="string" indexed="false"
>> stored="true"/>
>> <field name="FULL_NAME" type="string" indexed="true"
>> stored="true"/>
>> <field name="FULL_NAME_ND" type="string" indexed="true"
>> stored="true"/>
>> <!--field name="text" type="text" indexed="true"
>> stored="false" multiValued="true"/ -->
>> <!--field name="timestamp" type="date" indexed="true"
>> stored="true" default="NOW" multiValued="false"/-->
>> '
>> copyfields='
>> </fields>
>> <copyField source="FULL_NAME" dest="text"/>
>> <copyField source="FULL_NAME_ND" dest="text"/>
>> '
>>
>> # add in my fields and copyfields
>> perl -i -p -e "print qq($fields) if s/<fields>//;" solr*/
>> conf/schema.xml
>> perl -i -p -e "print qq($copyfields) if s[</fields>][];" solr*/
>> conf/schema.xml
>> # change the unique key and mark the "id" field as not required
>> perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;" solr*/
>> conf/schema.xml
>> perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/
>> conf/schema.xml
>> # enable remote streaming in solrconfig file
>> perl -i -p -e 's/enableRemoteStreaming="false"/
>> enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
>> fi
>>
>> # some constants to keep the curl command shorter
>> skip
>> =
>> "MODIFY_DATE
>> ,RC
>> ,UFI
>> ,DMS_LAT
>> ,DMS_LONG
>> ,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
>> file=`pwd`"/geonames.txt"
>>
>> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -
>> Dsolr.solr.home=`pwd`/solr"
>>
>> echo 'Getting ready to index the data set using solrbc (bc =
>> bootcamp)'
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>> then
>> echo "Tomcat would not shutdown"
>> exit
>> fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
>> rm solr # rm the symbolic link
>> ln -s solrbc solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip
>> "
>>
>> echo "Getting ready to index the data set using solrnightly"
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>> then
>> echo "Tomcat would not shutdown"
>> exit
>> fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/
>> webapps
>> rm solr # rm the symbolic link
>> ln -s solrnightly solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrnightly"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip
>> "
>>
>>
>>
>>
>>> On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>>
>>>> Hello Grant,
>>>>
>>>>> Were you overwriting the existing index or did you also clean out
>>>>> the
>>>>> Solr data directory, too? In other words, was it a fresh index, or
>>>>> an
>>>>> existing one? And was that also the case for the 22 minute time?
>>>>
>>>> No in each case it was a new index. I store the indexes (the "data"
>>>> dir)
>>>> outside the solr home directory. For the moment I, rm -rf the index
>>>> dir
>>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>>
>>>>> Would it be possible to profile the two instance and see if you
>>>>> notice
>>>>> anything different?
>>>> I dont understand this. Do mean run a profiler against the tomcat
>>>> image as indexing takes place, or somehow compare the indexes?
>>>
>>> Something like JProfiler or any other Java profiler.
>>>
>>>>
>>>>
>>>> I was think of making a short script that replicates the results,
>>>> and posting it here, would that help?
>>>
>>>
>>> Very much so.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Grant
>>>>>
>>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>>
>>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -
>>>>>> Xms512M.
>>>>>>
>>>>>> Are there any tweaks I can use to get the original index time
>>>>>> back. I read through the release notes and was expecting a
>>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>>> it to 64MB; it had no effect.
>>>>>> --
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie Email:fergus@twig.me.uk
>> Techmore Ltd Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets Analyst Programmer
>> ===============================================================
>
>--------------------------
>Grant Ingersoll
>
>Lucene Helpful Hints:
>http://wiki.apache.org/lucene-java/BasicsOfPerformance
>http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
></x-flowed>
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Fergie,
Haven't forgotten about you, but I've been traveling and then into
some US Holidays here.
To confirm I am understanding, you are seeing a slowdown between 1.3-
dev from April and one from September, right?
Can you produce an MD5 hash of the WAR file or something, such that I
can know I have the exact bits. Better yet, perhaps you can put those
files up somewhere where they can be downloaded.
Thanks,
Grant
On Nov 26, 2008, at 10:54 AM, Fergus McMenemie wrote:
> Hello Grant,
>
> Not much good with Java profilers (yet!) so I thought I
> would send a script!
>
> Details... details! Having decided to produce a script to
> replicate the 1.2 vis 1.3 speed problem. The required rigor
> revealed a lot more.
>
> 1) The faster version I have previously referred to as 1.2,
> was actually a "1.3-dev" I had downloaded as part of the
> solr bootcamp class at ApacheCon Europe 2008. The ID
> string in the CHANGES.txt document is:-
> $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>
> 2) I did actually download and speed test a version of 1.2
> from the internet. It's CHANGES.txt id is:-
> $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
> Speed wise it was about the same as 1.3 at 64min. It also
> had lots of char set issues and is ignored from now on.
>
> 3) The version I was planning to use, till I found this,
> speed issue was the "latest" official version:-
> $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
> I also verified the behavior with a nightly build.
> $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>
> Anyway, The following script indexes the content in 22min
> for the 1.3-dev version and takes 68min for the newer releases
> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
> release and used it replace the conf directory from the
> official 1.3 release. The 3x slow down was still there; it is
> not a configuration issue!
> =================================
>
>
>
>
>
>
> #! /bin/bash
>
> # This script assumes a /usr/local/tomcat link to whatever version
> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
> # All the following was done as root.
>
>
> # I have a directory /usr/local/ts which contains four versions of
> solr. The
> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or
> a 1.3beata
> # I got while attending a solr bootcamp. I indexed the same content
> using the
> # different versions of solr as follows:
> cd /usr/local/ts
> if [ "" ]
> then
> echo "Starting from a-fresh"
> sleep 5 # allow time for me to interrupt!
> cp -Rp apache-solr-bc/example/solr ./solrbc #bc = bootcamp
> cp -Rp apache-solr-nightly/example/solr ./solrnightly
> cp -Rp apache-solr-1.3.0/example/solr ./solr13
>
> # the gaz is regularly updated and its name keeps changing :-) The
> page
> # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to
> the latest
> # version.
> curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip
> " > geonames.zip
> unzip -q geonames.zip
> # delete corrupt blips!
> perl -i -n -e 'print unless
> ($. > 2128495 and $. < 2128505) or
> ($. > 5944254 and $. < 5944260)
> ;' geonames_dd_dms_date_20081118.txt
> #following was used to detect bad short records
> #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F),"
> args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>
> # my set of fields and copyfields for the schema.xml
> fields='
> <fields>
> <field name="UNI" type="string" indexed="true"
> stored="true" required="true" />
> <field name="CCODE" type="string" indexed="true"
> stored="true"/>
> <field name="DSG" type="string" indexed="true"
> stored="true"/>
> <field name="CC1" type="string" indexed="true"
> stored="true"/>
> <field name="LAT" type="sfloat" indexed="true"
> stored="true"/>
> <field name="LONG" type="sfloat" indexed="true"
> stored="true"/>
> <field name="MGRS" type="string" indexed="false"
> stored="true"/>
> <field name="JOG" type="string" indexed="false"
> stored="true"/>
> <field name="FULL_NAME" type="string" indexed="true"
> stored="true"/>
> <field name="FULL_NAME_ND" type="string" indexed="true"
> stored="true"/>
> <!--field name="text" type="text" indexed="true"
> stored="false" multiValued="true"/ -->
> <!--field name="timestamp" type="date" indexed="true"
> stored="true" default="NOW" multiValued="false"/-->
> '
> copyfields='
> </fields>
> <copyField source="FULL_NAME" dest="text"/>
> <copyField source="FULL_NAME_ND" dest="text"/>
> '
>
> # add in my fields and copyfields
> perl -i -p -e "print qq($fields) if s/<fields>//;" solr*/
> conf/schema.xml
> perl -i -p -e "print qq($copyfields) if s[</fields>][];" solr*/
> conf/schema.xml
> # change the unique key and mark the "id" field as not required
> perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;" solr*/
> conf/schema.xml
> perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/
> conf/schema.xml
> # enable remote streaming in solrconfig file
> perl -i -p -e 's/enableRemoteStreaming="false"/
> enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
> fi
>
> # some constants to keep the curl command shorter
> skip
> =
> "MODIFY_DATE
> ,RC
> ,UFI
> ,DMS_LAT
> ,DMS_LONG
> ,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
> file=`pwd`"/geonames.txt"
>
> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -
> Dsolr.solr.home=`pwd`/solr"
>
> echo 'Getting ready to index the data set using solrbc (bc =
> bootcamp)'
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
> then
> echo "Tomcat would not shutdown"
> exit
> fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
> rm solr # rm the symbolic link
> ln -s solrbc solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip
> "
>
> echo "Getting ready to index the data set using solrnightly"
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
> then
> echo "Tomcat would not shutdown"
> exit
> fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/
> webapps
> rm solr # rm the symbolic link
> ln -s solrnightly solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrnightly"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip
> "
>
>
>
>
>> On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>
>>> Hello Grant,
>>>
>>>> Were you overwriting the existing index or did you also clean out
>>>> the
>>>> Solr data directory, too? In other words, was it a fresh index, or
>>>> an
>>>> existing one? And was that also the case for the 22 minute time?
>>>
>>> No in each case it was a new index. I store the indexes (the "data"
>>> dir)
>>> outside the solr home directory. For the moment I, rm -rf the index
>>> dir
>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>
>>>> Would it be possible to profile the two instance and see if you
>>>> notice
>>>> anything different?
>>> I dont understand this. Do mean run a profiler against the tomcat
>>> image as indexing takes place, or somehow compare the indexes?
>>
>> Something like JProfiler or any other Java profiler.
>>
>>>
>>>
>>> I was think of making a short script that replicates the results,
>>> and posting it here, would that help?
>>
>>
>> Very much so.
>>
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Grant
>>>>
>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>
>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -
>>>>> Xms512M.
>>>>>
>>>>> Are there any tweaks I can use to get the original index time
>>>>> back. I read through the release notes and was expecting a
>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>> it to 64MB; it had no effect.
>>>>> --
>
> --
>
> ===============================================================
> Fergus McMenemie Email:fergus@twig.me.uk
> Techmore Ltd Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===============================================================
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!
Posted by Fergus McMenemie <fe...@twig.me.uk>.
Yonik
>Another thought I just had - do you have autocommit enabled?
>
No; not as far as I know!
The solrconfig.xml from the two versions are equivalent as best I can tell,
also they are exactly as provided in the download. The only changes were
made by the attached script and should not affect committing. Finally the
indexing command has commit=true, which I think means do a single commit
at the end of the file?
Regards Fergus.
>A lucene commit is now more expensive because it syncs the files for
>safety. If you commit frequently, this could definitely cause a
>slowdown.
>
>-Yonik
>
>On Wed, Nov 26, 2008 at 10:54 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>> Hello Grant,
>>
>> Not much good with Java profilers (yet!) so I thought I
>> would send a script!
>>
>> Details... details! Having decided to produce a script to
>> replicate the 1.2 vis 1.3 speed problem. The required rigor
>> revealed a lot more.
>>
>> 1) The faster version I have previously referred to as 1.2,
>> was actually a "1.3-dev" I had downloaded as part of the
>> solr bootcamp class at ApacheCon Europe 2008. The ID
>> string in the CHANGES.txt document is:-
>> $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>>
>> 2) I did actually download and speed test a version of 1.2
>> from the internet. It's CHANGES.txt id is:-
>> $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
>> Speed wise it was about the same as 1.3 at 64min. It also
>> had lots of char set issues and is ignored from now on.
>>
>> 3) The version I was planning to use, till I found this,
>> speed issue was the "latest" official version:-
>> $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
>> I also verified the behavior with a nightly build.
>> $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>>
>> Anyway, The following script indexes the content in 22min
>> for the 1.3-dev version and takes 68min for the newer releases
>> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
>> release and used it replace the conf directory from the
>> official 1.3 release. The 3x slow down was still there; it is
>> not a configuration issue!
>> =================================
>>
>>
>>
>>
>>
>>
>> #! /bin/bash
>>
>> # This script assumes a /usr/local/tomcat link to whatever version
>> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
>> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
>> # All the following was done as root.
>>
>>
>> # I have a directory /usr/local/ts which contains four versions of solr. The
>> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or a 1.3beata
>> # I got while attending a solr bootcamp. I indexed the same content using the
>> # different versions of solr as follows:
>> cd /usr/local/ts
>> if [ "" ]
>> then
>> echo "Starting from a-fresh"
>> sleep 5 # allow time for me to interrupt!
>> cp -Rp apache-solr-bc/example/solr ./solrbc #bc = bootcamp
>> cp -Rp apache-solr-nightly/example/solr ./solrnightly
>> cp -Rp apache-solr-1.3.0/example/solr ./solr13
>>
>> # the gaz is regularly updated and its name keeps changing :-) The page
>> # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to the latest
>> # version.
>> curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip" > geonames.zip
>> unzip -q geonames.zip
>> # delete corrupt blips!
>> perl -i -n -e 'print unless
>> ($. > 2128495 and $. < 2128505) or
>> ($. > 5944254 and $. < 5944260)
>> ;' geonames_dd_dms_date_20081118.txt
>> #following was used to detect bad short records
>> #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F)," args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>>
>> # my set of fields and copyfields for the schema.xml
>> fields='
>> <fields>
>> <field name="UNI" type="string" indexed="true" stored="true" required="true" />
>> <field name="CCODE" type="string" indexed="true" stored="true"/>
>> <field name="DSG" type="string" indexed="true" stored="true"/>
>> <field name="CC1" type="string" indexed="true" stored="true"/>
>> <field name="LAT" type="sfloat" indexed="true" stored="true"/>
>> <field name="LONG" type="sfloat" indexed="true" stored="true"/>
>> <field name="MGRS" type="string" indexed="false" stored="true"/>
>> <field name="JOG" type="string" indexed="false" stored="true"/>
>> <field name="FULL_NAME" type="string" indexed="true" stored="true"/>
>> <field name="FULL_NAME_ND" type="string" indexed="true" stored="true"/>
>> <!--field name="text" type="text" indexed="true" stored="false" multiValued="true"/ -->
>> <!--field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/-->
>> '
>> copyfields='
>> </fields>
>> <copyField source="FULL_NAME" dest="text"/>
>> <copyField source="FULL_NAME_ND" dest="text"/>
>> '
>>
>> # add in my fields and copyfields
>> perl -i -p -e "print qq($fields) if s/<fields>//;" solr*/conf/schema.xml
>> perl -i -p -e "print qq($copyfields) if s[</fields>][];" solr*/conf/schema.xml
>> # change the unique key and mark the "id" field as not required
>> perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;" solr*/conf/schema.xml
>> perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/conf/schema.xml
>> # enable remote streaming in solrconfig file
>> perl -i -p -e 's/enableRemoteStreaming="false"/enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
>> fi
>>
>> # some constants to keep the curl command shorter
>> skip="MODIFY_DATE,RC,UFI,DMS_LAT,DMS_LONG,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
>> file=`pwd`"/geonames.txt"
>>
>> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -Dsolr.solr.home=`pwd`/solr"
>>
>> echo 'Getting ready to index the data set using solrbc (bc = bootcamp)'
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>> then
>> echo "Tomcat would not shutdown"
>> exit
>> fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
>> rm solr # rm the symbolic link
>> ln -s solrbc solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>>
>> echo "Getting ready to index the data set using solrnightly"
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>> then
>> echo "Tomcat would not shutdown"
>> exit
>> fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/webapps
>> rm solr # rm the symbolic link
>> ln -s solrnightly solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrnightly"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>>
>>
>>
>>
>>>On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>>
>>>> Hello Grant,
>>>>
>>>>> Were you overwriting the existing index or did you also clean out the
>>>>> Solr data directory, too? In other words, was it a fresh index, or
>>>>> an
>>>>> existing one? And was that also the case for the 22 minute time?
>>>>
>>>> No in each case it was a new index. I store the indexes (the "data"
>>>> dir)
>>>> outside the solr home directory. For the moment I, rm -rf the index
>>>> dir
>>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>>
>>>>> Would it be possible to profile the two instance and see if you
>>>>> notice
>>>>> anything different?
>>>> I dont understand this. Do mean run a profiler against the tomcat
>>>> image as indexing takes place, or somehow compare the indexes?
>>>
>>>Something like JProfiler or any other Java profiler.
>>>
>>>>
>>>>
>>>> I was think of making a short script that replicates the results,
>>>> and posting it here, would that help?
>>>
>>>
>>>Very much so.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Grant
>>>>>
>>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>>
>>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>>>>
>>>>>> Are there any tweaks I can use to get the original index time
>>>>>> back. I read through the release notes and was expecting a
>>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>>> it to 64MB; it had no effect.
>>>>>> --
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie Email:fergus@twig.me.uk
>> Techmore Ltd Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets Analyst Programmer
>> ===============================================================
>>
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!
Posted by Yonik Seeley <yo...@apache.org>.
Another thought I just had - do you have autocommit enabled?
A lucene commit is now more expensive because it syncs the files for
safety. If you commit frequently, this could definitely cause a
slowdown.
-Yonik
On Wed, Nov 26, 2008 at 10:54 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
> Hello Grant,
>
> Not much good with Java profilers (yet!) so I thought I
> would send a script!
>
> Details... details! Having decided to produce a script to
> replicate the 1.2 vis 1.3 speed problem. The required rigor
> revealed a lot more.
>
> 1) The faster version I have previously referred to as 1.2,
> was actually a "1.3-dev" I had downloaded as part of the
> solr bootcamp class at ApacheCon Europe 2008. The ID
> string in the CHANGES.txt document is:-
> $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>
> 2) I did actually download and speed test a version of 1.2
> from the internet. It's CHANGES.txt id is:-
> $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
> Speed wise it was about the same as 1.3 at 64min. It also
> had lots of char set issues and is ignored from now on.
>
> 3) The version I was planning to use, till I found this,
> speed issue was the "latest" official version:-
> $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
> I also verified the behavior with a nightly build.
> $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>
> Anyway, The following script indexes the content in 22min
> for the 1.3-dev version and takes 68min for the newer releases
> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
> release and used it replace the conf directory from the
> official 1.3 release. The 3x slow down was still there; it is
> not a configuration issue!
> =================================
>
>
>
>
>
>
> #! /bin/bash
>
> # This script assumes a /usr/local/tomcat link to whatever version
> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
> # All the following was done as root.
>
>
> # I have a directory /usr/local/ts which contains four versions of solr. The
> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or a 1.3beata
> # I got while attending a solr bootcamp. I indexed the same content using the
> # different versions of solr as follows:
> cd /usr/local/ts
> if [ "" ]
> then
> echo "Starting from a-fresh"
> sleep 5 # allow time for me to interrupt!
> cp -Rp apache-solr-bc/example/solr ./solrbc #bc = bootcamp
> cp -Rp apache-solr-nightly/example/solr ./solrnightly
> cp -Rp apache-solr-1.3.0/example/solr ./solr13
>
> # the gaz is regularly updated and its name keeps changing :-) The page
> # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to the latest
> # version.
> curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip" > geonames.zip
> unzip -q geonames.zip
> # delete corrupt blips!
> perl -i -n -e 'print unless
> ($. > 2128495 and $. < 2128505) or
> ($. > 5944254 and $. < 5944260)
> ;' geonames_dd_dms_date_20081118.txt
> #following was used to detect bad short records
> #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F)," args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>
> # my set of fields and copyfields for the schema.xml
> fields='
> <fields>
> <field name="UNI" type="string" indexed="true" stored="true" required="true" />
> <field name="CCODE" type="string" indexed="true" stored="true"/>
> <field name="DSG" type="string" indexed="true" stored="true"/>
> <field name="CC1" type="string" indexed="true" stored="true"/>
> <field name="LAT" type="sfloat" indexed="true" stored="true"/>
> <field name="LONG" type="sfloat" indexed="true" stored="true"/>
> <field name="MGRS" type="string" indexed="false" stored="true"/>
> <field name="JOG" type="string" indexed="false" stored="true"/>
> <field name="FULL_NAME" type="string" indexed="true" stored="true"/>
> <field name="FULL_NAME_ND" type="string" indexed="true" stored="true"/>
> <!--field name="text" type="text" indexed="true" stored="false" multiValued="true"/ -->
> <!--field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/-->
> '
> copyfields='
> </fields>
> <copyField source="FULL_NAME" dest="text"/>
> <copyField source="FULL_NAME_ND" dest="text"/>
> '
>
> # add in my fields and copyfields
> perl -i -p -e "print qq($fields) if s/<fields>//;" solr*/conf/schema.xml
> perl -i -p -e "print qq($copyfields) if s[</fields>][];" solr*/conf/schema.xml
> # change the unique key and mark the "id" field as not required
> perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;" solr*/conf/schema.xml
> perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/conf/schema.xml
> # enable remote streaming in solrconfig file
> perl -i -p -e 's/enableRemoteStreaming="false"/enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
> fi
>
> # some constants to keep the curl command shorter
> skip="MODIFY_DATE,RC,UFI,DMS_LAT,DMS_LONG,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
> file=`pwd`"/geonames.txt"
>
> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -Dsolr.solr.home=`pwd`/solr"
>
> echo 'Getting ready to index the data set using solrbc (bc = bootcamp)'
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
> then
> echo "Tomcat would not shutdown"
> exit
> fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
> rm solr # rm the symbolic link
> ln -s solrbc solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>
> echo "Getting ready to index the data set using solrnightly"
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
> then
> echo "Tomcat would not shutdown"
> exit
> fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/webapps
> rm solr # rm the symbolic link
> ln -s solrnightly solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrnightly"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>
>
>
>
>>On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>
>>> Hello Grant,
>>>
>>>> Were you overwriting the existing index or did you also clean out the
>>>> Solr data directory, too? In other words, was it a fresh index, or
>>>> an
>>>> existing one? And was that also the case for the 22 minute time?
>>>
>>> No in each case it was a new index. I store the indexes (the "data"
>>> dir)
>>> outside the solr home directory. For the moment I, rm -rf the index
>>> dir
>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>
>>>> Would it be possible to profile the two instance and see if you
>>>> notice
>>>> anything different?
>>> I dont understand this. Do mean run a profiler against the tomcat
>>> image as indexing takes place, or somehow compare the indexes?
>>
>>Something like JProfiler or any other Java profiler.
>>
>>>
>>>
>>> I was think of making a short script that replicates the results,
>>> and posting it here, would that help?
>>
>>
>>Very much so.
>>
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Grant
>>>>
>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>
>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>>>
>>>>> Are there any tweaks I can use to get the original index time
>>>>> back. I read through the release notes and was expecting a
>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>> it to 64MB; it had no effect.
>>>>> --
>
> --
>
> ===============================================================
> Fergus McMenemie Email:fergus@twig.me.uk
> Techmore Ltd Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===============================================================
>
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!
Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hello Grant,
Not much good with Java profilers (yet!) so I thought I
would send a script!
Details... details! Having decided to produce a script to
replicate the 1.2 vis 1.3 speed problem. The required rigor
revealed a lot more.
1) The faster version I have previously referred to as 1.2,
was actually a "1.3-dev" I had downloaded as part of the
solr bootcamp class at ApacheCon Europe 2008. The ID
string in the CHANGES.txt document is:-
$Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
2) I did actually download and speed test a version of 1.2
from the internet. It's CHANGES.txt id is:-
$Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
Speed wise it was about the same as 1.3 at 64min. It also
had lots of char set issues and is ignored from now on.
3) The version I was planning to use, till I found this,
speed issue was the "latest" official version:-
$Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
I also verified the behavior with a nightly build.
$Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
Anyway, The following script indexes the content in 22min
for the 1.3-dev version and takes 68min for the newer releases
of 1.3. I took the conf directory from the 1.3dev (bootcamp)
release and used it replace the conf directory from the
official 1.3 release. The 3x slow down was still there; it is
not a configuration issue!
=================================
#! /bin/bash
# This script assumes a /usr/local/tomcat link to whatever version
# of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
# /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
# All the following was done as root.
# I have a directory /usr/local/ts which contains four versions of solr. The
# "official" 1.2 along with two 1.3 releases and a version of 1.2 or a 1.3beata
# I got while attending a solr bootcamp. I indexed the same content using the
# different versions of solr as follows:
cd /usr/local/ts
if [ "" ]
then
echo "Starting from a-fresh"
sleep 5 # allow time for me to interrupt!
cp -Rp apache-solr-bc/example/solr ./solrbc #bc = bootcamp
cp -Rp apache-solr-nightly/example/solr ./solrnightly
cp -Rp apache-solr-1.3.0/example/solr ./solr13
# the gaz is regularly updated and its name keeps changing :-) The page
# http://earth-info.nga.mil/gns/html/namefiles.htm has a link to the latest
# version.
curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip" > geonames.zip
unzip -q geonames.zip
# delete corrupt blips!
perl -i -n -e 'print unless
($. > 2128495 and $. < 2128505) or
($. > 5944254 and $. < 5944260)
;' geonames_dd_dms_date_20081118.txt
#following was used to detect bad short records
#perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F)," args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
# my set of fields and copyfields for the schema.xml
fields='
<fields>
<field name="UNI" type="string" indexed="true" stored="true" required="true" />
<field name="CCODE" type="string" indexed="true" stored="true"/>
<field name="DSG" type="string" indexed="true" stored="true"/>
<field name="CC1" type="string" indexed="true" stored="true"/>
<field name="LAT" type="sfloat" indexed="true" stored="true"/>
<field name="LONG" type="sfloat" indexed="true" stored="true"/>
<field name="MGRS" type="string" indexed="false" stored="true"/>
<field name="JOG" type="string" indexed="false" stored="true"/>
<field name="FULL_NAME" type="string" indexed="true" stored="true"/>
<field name="FULL_NAME_ND" type="string" indexed="true" stored="true"/>
<!--field name="text" type="text" indexed="true" stored="false" multiValued="true"/ -->
<!--field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/-->
'
copyfields='
</fields>
<copyField source="FULL_NAME" dest="text"/>
<copyField source="FULL_NAME_ND" dest="text"/>
'
# add in my fields and copyfields
perl -i -p -e "print qq($fields) if s/<fields>//;" solr*/conf/schema.xml
perl -i -p -e "print qq($copyfields) if s[</fields>][];" solr*/conf/schema.xml
# change the unique key and mark the "id" field as not required
perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;" solr*/conf/schema.xml
perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/conf/schema.xml
# enable remote streaming in solrconfig file
perl -i -p -e 's/enableRemoteStreaming="false"/enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
fi
# some constants to keep the curl command shorter
skip="MODIFY_DATE,RC,UFI,DMS_LAT,DMS_LONG,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
file=`pwd`"/geonames.txt"
export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -Dsolr.solr.home=`pwd`/solr"
echo 'Getting ready to index the data set using solrbc (bc = bootcamp)'
/usr/local/tomcat/bin/shutdown.sh
sleep 15
if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
then
echo "Tomcat would not shutdown"
exit
fi
rm -r /usr/local/tomcat/webapps/solr*
rm -r /usr/local/tomcat/logs/*.out
rm -r /usr/local/tomcat/work/Catalina/localhost/solr
cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
rm solr # rm the symbolic link
ln -s solrbc solr
rm -r solr/data
/usr/local/tomcat/bin/startup.sh
sleep 10 # give solr time to launch and setup
echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
echo "Getting ready to index the data set using solrnightly"
/usr/local/tomcat/bin/shutdown.sh
sleep 15
if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
then
echo "Tomcat would not shutdown"
exit
fi
rm -r /usr/local/tomcat/webapps/solr*
rm -r /usr/local/tomcat/logs/*.out
rm -r /usr/local/tomcat/work/Catalina/localhost/solr
cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/webapps
rm solr # rm the symbolic link
ln -s solrnightly solr
rm -r solr/data
/usr/local/tomcat/bin/startup.sh
sleep 10 # give solr time to launch and setup
echo "Starting indexing at " `date` " with solrnightly"
time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>
>> Hello Grant,
>>
>>> Were you overwriting the existing index or did you also clean out the
>>> Solr data directory, too? In other words, was it a fresh index, or
>>> an
>>> existing one? And was that also the case for the 22 minute time?
>>
>> No in each case it was a new index. I store the indexes (the "data"
>> dir)
>> outside the solr home directory. For the moment I, rm -rf the index
>> dir
>> after each edit to the solrconfig.sml or schema.xml file and reindex
>> from scratch. The relaunch of tomcat recreates the index dir.
>>
>>> Would it be possible to profile the two instance and see if you
>>> notice
>>> anything different?
>> I dont understand this. Do mean run a profiler against the tomcat
>> image as indexing takes place, or somehow compare the indexes?
>
>Something like JProfiler or any other Java profiler.
>
>>
>>
>> I was think of making a short script that replicates the results,
>> and posting it here, would that help?
>
>
>Very much so.
>
>
>>
>>
>>>
>>> Thanks,
>>> Grant
>>>
>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a CSV file with 6M records which took 22min to index with
>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>
>>>> Indexing the exact same content now takes 69min. My machine has
>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>>
>>>> Are there any tweaks I can use to get the original index time
>>>> back. I read through the release notes and was expecting a
>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>> it to 64MB; it had no effect.
>>>> --
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown
Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
> Hello Grant,
>
>> Were you overwriting the existing index or did you also clean out the
>> Solr data directory, too? In other words, was it a fresh index, or
>> an
>> existing one? And was that also the case for the 22 minute time?
>
> No in each case it was a new index. I store the indexes (the "data"
> dir)
> outside the solr home directory. For the moment I, rm -rf the index
> dir
> after each edit to the solrconfig.sml or schema.xml file and reindex
> from scratch. The relaunch of tomcat recreates the index dir.
>
>> Would it be possible to profile the two instance and see if you
>> notice
>> anything different?
> I dont understand this. Do mean run a profiler against the tomcat
> image as indexing takes place, or somehow compare the indexes?
Something like JProfiler or any other Java profiler.
>
>
> I was think of making a short script that replicates the results,
> and posting it here, would that help?
Very much so.
>
>
>>
>> Thanks,
>> Grant
>>
>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>
>>> Hello,
>>>
>>> I have a CSV file with 6M records which took 22min to index with
>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>
>>> Indexing the exact same content now takes 69min. My machine has
>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>
>>> Are there any tweaks I can use to get the original index time
>>> back. I read through the release notes and was expecting a
>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>> it to 64MB; it had no effect.
>>> --
>>>
>>> ===============================================================
>>> Fergus McMenemie Email:fergus@twig.me.uk
>>> Techmore Ltd Phone:(UK) 07721 376021
>>>
>>> Unix/Mac/Intranets Analyst Programmer
>>> ===============================================================
>
> --
>
> ===============================================================
> Fergus McMenemie Email:fergus@twig.me.uk
> Techmore Ltd Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===============================================================
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown
Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hello Grant,
>Were you overwriting the existing index or did you also clean out the
>Solr data directory, too? In other words, was it a fresh index, or an
>existing one? And was that also the case for the 22 minute time?
No in each case it was a new index. I store the indexes (the "data" dir)
outside the solr home directory. For the moment I, rm -rf the index dir
after each edit to the solrconfig.sml or schema.xml file and reindex
from scratch. The relaunch of tomcat recreates the index dir.
>Would it be possible to profile the two instance and see if you notice
>anything different?
I dont understand this. Do mean run a profiler against the tomcat
image as indexing takes place, or somehow compare the indexes?
I was think of making a short script that replicates the results,
and posting it here, would that help?
>
>Thanks,
>Grant
>
>On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>
>> Hello,
>>
>> I have a CSV file with 6M records which took 22min to index with
>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>> webapps with version 1.3, wiped my index and restarted tomcat.
>>
>> Indexing the exact same content now takes 69min. My machine has
>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>
>> Are there any tweaks I can use to get the original index time
>> back. I read through the release notes and was expecting a
>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>> it to 64MB; it had no effect.
>> --
>>
>> ===============================================================
>> Fergus McMenemie Email:fergus@twig.me.uk
>> Techmore Ltd Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets Analyst Programmer
>> ===============================================================
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
Re: Upgrade from 1.2 to 1.3 gives 3x slowdown
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Fergus,
Were you overwriting the existing index or did you also clean out the
Solr data directory, too? In other words, was it a fresh index, or an
existing one? And was that also the case for the 22 minute time?
Would it be possible to profile the two instance and see if you notice
anything different?
Thanks,
Grant
On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
> Hello,
>
> I have a CSV file with 6M records which took 22min to index with
> solr 1.2. I then stopped tomcat replaced the solr stuff inside
> webapps with version 1.3, wiped my index and restarted tomcat.
>
> Indexing the exact same content now takes 69min. My machine has
> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>
> Are there any tweaks I can use to get the original index time
> back. I read through the release notes and was expecting a
> speed up. I saw the bit about increasing ramBufferSizeMB and set
> it to 64MB; it had no effect.
> --
>
> ===============================================================
> Fergus McMenemie Email:fergus@twig.me.uk
> Techmore Ltd Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===============================================================