You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2008/11/19 14:25:01 UTC

Upgrade from 1.2 to 1.3 gives 3x slowdown

Hello,

I have a CSV file with 6M records which took 22min to index with 
solr 1.2. I then stopped tomcat replaced the solr stuff inside 
webapps with version 1.3, wiped my index and restarted tomcat.

Indexing the exact same content now takes 69min. My machine has
2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.

Are there any tweaks I can use to get the original index time
back. I read through the release notes and was expecting a
speed up. I saw the bit about increasing ramBufferSizeMB and set
it to 64MB; it had no effect.
-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!

Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hello Grant,
>
>Haven't forgotten about you, but I've been traveling and then into  
>some US Holidays here.
Happy thanks giving!

>
>To confirm I am understanding, you are seeing a slowdown between 1.3- 
>dev from April and one from September, right?
Yep.

Here are the MD5 hashes:-
fergus: md5 *.war
MD5 (solr-bc.war) = 8d4f95628d6978c959d63d304788bc25
MD5 (solr-nightly.war) = 10281455a66b0035ee1f805496d880da

This is the META-INF/MANIFEST.MF from a recent nightly build. (slow)
  Manifest-Version: 1.0
  Ant-Version: Apache Ant 1.7.0
  Created-By: 1.5.0_06-b05 (Sun Microsystems Inc.)
  Extension-Name: org.apache.solr
  Specification-Title: Apache Solr Search Server
  Specification-Version: 1.3.0.2008.11.13.08.16.12
  Specification-Vendor: The Apache Software Foundation
  Implementation-Title: org.apache.solr
  Implementation-Version: nightly exported - yonik - 2008-11-13 08:16:12
  Implementation-Vendor: The Apache Software Foundation
  X-Compile-Source-JDK: 1.5
  X-Compile-Target-JDK: 1.5

This is  war file we were given on the course
  Manifest-Version: 1.0
  Ant-Version: Apache Ant 1.7.0
  Created-By: 1.5.0_13-121 ("Apple Computer, Inc.")
  Extension-Name: org.apache.solr
  Specification-Title: Apache Solr Search Server
  Specification-Version: 1.2.2008.04.04.08.09.14
  Specification-Vendor: The Apache Software Foundation
  Implementation-Title: org.apache.solr
  Implementation-Version: 1.3-dev exported - erik - 2008-04-04 08:09:14
  Implementation-Vendor: The Apache Software Foundation
  X-Compile-Source-JDK: 1.5
  X-Compile-Target-JDK: 1.5

I have copied both war files to a web site

http://www.twig.me.uk/solr/solr-bc.war (solr 1.3 dev == bootcamp)

http://www.twig.me.uk/solr/solr-nightly.war (nightly)


Regards Fergus.

>Can you produce an MD5 hash of the WAR file or something, such that I  
>can know I have the exact bits.  Better yet, perhaps you can put those  
>files up somewhere where they can be downloaded.
>
>Thanks,
>Grant
>
>On Nov 26, 2008, at 10:54 AM, Fergus McMenemie wrote:
>
>> Hello Grant,
>>
>> Not much good with Java profilers (yet!) so I thought I
>> would send a script!
>>
>> Details... details! Having decided to produce a script to
>> replicate the 1.2 vis 1.3 speed problem. The required rigor
>> revealed a lot more.
>>
>> 1) The faster version I have previously referred to as 1.2,
>>   was actually a "1.3-dev" I had downloaded as part of the
>>   solr bootcamp class at ApacheCon Europe 2008. The ID
>>   string in the CHANGES.txt document is:-
>>   $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>>
>> 2) I did actually download and speed test a version of 1.2
>>   from the internet. It's CHANGES.txt id is:-
>>   $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
>>   Speed wise it was about the same as 1.3 at 64min. It also
>>   had lots of char set issues and is ignored from now on.
>>
>> 3) The version I was planning to use, till I found this,
>>   speed issue was the "latest" official version:-
>>   $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
>>   I also verified the behavior with a nightly build.
>>   $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>>
>> Anyway, The following script indexes the content in 22min
>> for the 1.3-dev version and takes 68min for the newer releases
>> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
>> release and used it replace the conf directory from the
>> official 1.3 release. The 3x slow down was still there; it is
>> not a configuration issue!
>> =================================
>>
>>
>>
>>
>>
>>
>> #! /bin/bash
>>
>> # This script assumes a /usr/local/tomcat link to whatever version
>> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
>> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
>> # All the following was done as root.
>>
>>
>> # I have a directory /usr/local/ts which contains four versions of  
>> solr. The
>> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or  
>> a 1.3beata
>> # I got while attending a solr bootcamp. I indexed the same content  
>> using the
>> # different versions of solr as follows:
>> cd /usr/local/ts
>> if [ "" ]
>> then
>>   echo "Starting from a-fresh"
>>   sleep 5 # allow time for me to interrupt!
>>   cp -Rp apache-solr-bc/example/solr      ./solrbc  #bc = bootcamp
>>   cp -Rp apache-solr-nightly/example/solr ./solrnightly
>>   cp -Rp apache-solr-1.3.0/example/solr   ./solr13
>>
>>   # the gaz is regularly updated and its name keeps changing :-) The  
>> page
>>   # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to  
>> the latest
>>   # version.
>>   curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip 
>> " > geonames.zip
>>   unzip -q geonames.zip
>>   # delete corrupt blips!
>>   perl -i -n -e 'print unless
>>       ($. > 2128495 and $. < 2128505) or
>>       ($. > 5944254 and $. < 5944260)
>>       ;' geonames_dd_dms_date_20081118.txt
>>   #following was used to detect bad short records
>>   #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F),"  
>> args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>>
>>   # my set of fields and copyfields for the schema.xml
>>   fields='
>>   <fields>
>>      <field name="UNI"           type="string" indexed="true"   
>> stored="true" required="true" />
>>      <field name="CCODE"         type="string" indexed="true"   
>> stored="true"/>
>>      <field name="DSG"           type="string" indexed="true"   
>> stored="true"/>
>>      <field name="CC1"           type="string" indexed="true"   
>> stored="true"/>
>>      <field name="LAT"           type="sfloat" indexed="true"   
>> stored="true"/>
>>      <field name="LONG"          type="sfloat" indexed="true"   
>> stored="true"/>
>>      <field name="MGRS"          type="string" indexed="false"  
>> stored="true"/>
>>      <field name="JOG"           type="string" indexed="false"  
>> stored="true"/>
>>      <field name="FULL_NAME"     type="string" indexed="true"   
>> stored="true"/>
>>      <field name="FULL_NAME_ND"  type="string" indexed="true"   
>> stored="true"/>
>>      <!--field name="text"       type="text"   indexed="true"   
>> stored="false" multiValued="true"/ -->
>>      <!--field name="timestamp"  type="date"   indexed="true"   
>> stored="true"  default="NOW" multiValued="false"/-->
>>   '
>>   copyfields='
>>      </fields>
>>      <copyField source="FULL_NAME" dest="text"/>
>>      <copyField source="FULL_NAME_ND" dest="text"/>
>>   '
>>
>>   # add in my fields and copyfields
>>   perl -i -p -e "print qq($fields) if s/<fields>//;"           solr*/ 
>> conf/schema.xml
>>   perl -i -p -e "print qq($copyfields) if s[</fields>][];"     solr*/ 
>> conf/schema.xml
>>   # change the unique key and mark the "id" field as not required
>>   perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;"            solr*/ 
>> conf/schema.xml
>>   perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/ 
>> conf/schema.xml
>>   # enable remote streaming in solrconfig file
>>   perl -i -p -e 's/enableRemoteStreaming="false"/ 
>> enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
>>   fi
>>
>> # some constants to keep the curl command shorter
>> skip 
>> = 
>> "MODIFY_DATE 
>> ,RC 
>> ,UFI 
>> ,DMS_LAT 
>> ,DMS_LONG 
>> ,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
>> file=`pwd`"/geonames.txt"
>>
>> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr - 
>> Dsolr.solr.home=`pwd`/solr"
>>
>> echo 'Getting ready to index the data set using solrbc (bc =  
>> bootcamp)'
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>>   then
>>   echo "Tomcat would not shutdown"
>>   exit
>>   fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
>> rm solr # rm the symbolic link
>> ln -s solrbc solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip 
>> "
>>
>> echo "Getting ready to index the data set using solrnightly"
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>>   then
>>   echo "Tomcat would not shutdown"
>>   exit
>>   fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/ 
>> webapps
>> rm solr # rm the symbolic link
>> ln -s solrnightly solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrnightly"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip 
>> "
>>
>>
>>
>>
>>> On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>>
>>>> Hello Grant,
>>>>
>>>>> Were you overwriting the existing index or did you also clean out  
>>>>> the
>>>>> Solr data directory, too?  In other words, was it a fresh index, or
>>>>> an
>>>>> existing one?  And was that also the case for the 22 minute time?
>>>>
>>>> No in each case it was a new index. I store the indexes (the "data"
>>>> dir)
>>>> outside the solr home directory. For the moment I, rm -rf the index
>>>> dir
>>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>>
>>>>> Would it be possible to profile the two instance and see if you
>>>>> notice
>>>>> anything different?
>>>> I dont understand this. Do mean run a profiler against the tomcat
>>>> image as indexing takes place, or somehow compare the indexes?
>>>
>>> Something like JProfiler or any other Java profiler.
>>>
>>>>
>>>>
>>>> I was think of making a short script that replicates the results,
>>>> and posting it here, would that help?
>>>
>>>
>>> Very much so.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Grant
>>>>>
>>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>>
>>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M - 
>>>>>> Xms512M.
>>>>>>
>>>>>> Are there any tweaks I can use to get the original index time
>>>>>> back. I read through the release notes and was expecting a
>>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>>> it to 64MB; it had no effect.
>>>>>> -- 
>>
>> -- 
>>
>> ===============================================================
>> Fergus McMenemie               Email:fergus@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>
>--------------------------
>Grant Ingersoll
>
>Lucene Helpful Hints:
>http://wiki.apache.org/lucene-java/BasicsOfPerformance
>http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
></x-flowed>

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Fergie,

Haven't forgotten about you, but I've been traveling and then into  
some US Holidays here.

To confirm I am understanding, you are seeing a slowdown between 1.3- 
dev from April and one from September, right?

Can you produce an MD5 hash of the WAR file or something, such that I  
can know I have the exact bits.  Better yet, perhaps you can put those  
files up somewhere where they can be downloaded.

Thanks,
Grant

On Nov 26, 2008, at 10:54 AM, Fergus McMenemie wrote:

> Hello Grant,
>
> Not much good with Java profilers (yet!) so I thought I
> would send a script!
>
> Details... details! Having decided to produce a script to
> replicate the 1.2 vis 1.3 speed problem. The required rigor
> revealed a lot more.
>
> 1) The faster version I have previously referred to as 1.2,
>   was actually a "1.3-dev" I had downloaded as part of the
>   solr bootcamp class at ApacheCon Europe 2008. The ID
>   string in the CHANGES.txt document is:-
>   $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>
> 2) I did actually download and speed test a version of 1.2
>   from the internet. It's CHANGES.txt id is:-
>   $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
>   Speed wise it was about the same as 1.3 at 64min. It also
>   had lots of char set issues and is ignored from now on.
>
> 3) The version I was planning to use, till I found this,
>   speed issue was the "latest" official version:-
>   $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
>   I also verified the behavior with a nightly build.
>   $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>
> Anyway, The following script indexes the content in 22min
> for the 1.3-dev version and takes 68min for the newer releases
> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
> release and used it replace the conf directory from the
> official 1.3 release. The 3x slow down was still there; it is
> not a configuration issue!
> =================================
>
>
>
>
>
>
> #! /bin/bash
>
> # This script assumes a /usr/local/tomcat link to whatever version
> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
> # All the following was done as root.
>
>
> # I have a directory /usr/local/ts which contains four versions of  
> solr. The
> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or  
> a 1.3beata
> # I got while attending a solr bootcamp. I indexed the same content  
> using the
> # different versions of solr as follows:
> cd /usr/local/ts
> if [ "" ]
> then
>   echo "Starting from a-fresh"
>   sleep 5 # allow time for me to interrupt!
>   cp -Rp apache-solr-bc/example/solr      ./solrbc  #bc = bootcamp
>   cp -Rp apache-solr-nightly/example/solr ./solrnightly
>   cp -Rp apache-solr-1.3.0/example/solr   ./solr13
>
>   # the gaz is regularly updated and its name keeps changing :-) The  
> page
>   # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to  
> the latest
>   # version.
>   curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip 
> " > geonames.zip
>   unzip -q geonames.zip
>   # delete corrupt blips!
>   perl -i -n -e 'print unless
>       ($. > 2128495 and $. < 2128505) or
>       ($. > 5944254 and $. < 5944260)
>       ;' geonames_dd_dms_date_20081118.txt
>   #following was used to detect bad short records
>   #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F),"  
> args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>
>   # my set of fields and copyfields for the schema.xml
>   fields='
>   <fields>
>      <field name="UNI"           type="string" indexed="true"   
> stored="true" required="true" />
>      <field name="CCODE"         type="string" indexed="true"   
> stored="true"/>
>      <field name="DSG"           type="string" indexed="true"   
> stored="true"/>
>      <field name="CC1"           type="string" indexed="true"   
> stored="true"/>
>      <field name="LAT"           type="sfloat" indexed="true"   
> stored="true"/>
>      <field name="LONG"          type="sfloat" indexed="true"   
> stored="true"/>
>      <field name="MGRS"          type="string" indexed="false"  
> stored="true"/>
>      <field name="JOG"           type="string" indexed="false"  
> stored="true"/>
>      <field name="FULL_NAME"     type="string" indexed="true"   
> stored="true"/>
>      <field name="FULL_NAME_ND"  type="string" indexed="true"   
> stored="true"/>
>      <!--field name="text"       type="text"   indexed="true"   
> stored="false" multiValued="true"/ -->
>      <!--field name="timestamp"  type="date"   indexed="true"   
> stored="true"  default="NOW" multiValued="false"/-->
>   '
>   copyfields='
>      </fields>
>      <copyField source="FULL_NAME" dest="text"/>
>      <copyField source="FULL_NAME_ND" dest="text"/>
>   '
>
>   # add in my fields and copyfields
>   perl -i -p -e "print qq($fields) if s/<fields>//;"           solr*/ 
> conf/schema.xml
>   perl -i -p -e "print qq($copyfields) if s[</fields>][];"     solr*/ 
> conf/schema.xml
>   # change the unique key and mark the "id" field as not required
>   perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;"            solr*/ 
> conf/schema.xml
>   perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/ 
> conf/schema.xml
>   # enable remote streaming in solrconfig file
>   perl -i -p -e 's/enableRemoteStreaming="false"/ 
> enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
>   fi
>
> # some constants to keep the curl command shorter
> skip 
> = 
> "MODIFY_DATE 
> ,RC 
> ,UFI 
> ,DMS_LAT 
> ,DMS_LONG 
> ,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
> file=`pwd`"/geonames.txt"
>
> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr - 
> Dsolr.solr.home=`pwd`/solr"
>
> echo 'Getting ready to index the data set using solrbc (bc =  
> bootcamp)'
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>   then
>   echo "Tomcat would not shutdown"
>   exit
>   fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
> rm solr # rm the symbolic link
> ln -s solrbc solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip 
> "
>
> echo "Getting ready to index the data set using solrnightly"
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>   then
>   echo "Tomcat would not shutdown"
>   exit
>   fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/ 
> webapps
> rm solr # rm the symbolic link
> ln -s solrnightly solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrnightly"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip 
> "
>
>
>
>
>> On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>
>>> Hello Grant,
>>>
>>>> Were you overwriting the existing index or did you also clean out  
>>>> the
>>>> Solr data directory, too?  In other words, was it a fresh index, or
>>>> an
>>>> existing one?  And was that also the case for the 22 minute time?
>>>
>>> No in each case it was a new index. I store the indexes (the "data"
>>> dir)
>>> outside the solr home directory. For the moment I, rm -rf the index
>>> dir
>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>
>>>> Would it be possible to profile the two instance and see if you
>>>> notice
>>>> anything different?
>>> I dont understand this. Do mean run a profiler against the tomcat
>>> image as indexing takes place, or somehow compare the indexes?
>>
>> Something like JProfiler or any other Java profiler.
>>
>>>
>>>
>>> I was think of making a short script that replicates the results,
>>> and posting it here, would that help?
>>
>>
>> Very much so.
>>
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Grant
>>>>
>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>
>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M - 
>>>>> Xms512M.
>>>>>
>>>>> Are there any tweaks I can use to get the original index time
>>>>> back. I read through the release notes and was expecting a
>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>> it to 64MB; it had no effect.
>>>>> -- 
>
> -- 
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!

Posted by Fergus McMenemie <fe...@twig.me.uk>.
Yonik

>Another thought I just had - do you have autocommit enabled?
>
No; not as far as I know!

The solrconfig.xml from the two versions are equivalent as best I can tell,
also they are exactly as provided in the download. The only changes were
made by the attached script and should not affect committing. Finally the
indexing command has commit=true, which I think means do a single commit
at the end of the file?

Regards Fergus.


>A lucene commit is now more expensive because it syncs the files for
>safety.  If you commit frequently, this could definitely cause a
>slowdown.
>
>-Yonik
>
>On Wed, Nov 26, 2008 at 10:54 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>> Hello Grant,
>>
>> Not much good with Java profilers (yet!) so I thought I
>> would send a script!
>>
>> Details... details! Having decided to produce a script to
>> replicate the 1.2 vis 1.3 speed problem. The required rigor
>> revealed a lot more.
>>
>> 1) The faster version I have previously referred to as 1.2,
>>   was actually a "1.3-dev" I had downloaded as part of the
>>   solr bootcamp class at ApacheCon Europe 2008. The ID
>>   string in the CHANGES.txt document is:-
>>   $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>>
>> 2) I did actually download and speed test a version of 1.2
>>   from the internet. It's CHANGES.txt id is:-
>>   $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
>>   Speed wise it was about the same as 1.3 at 64min. It also
>>   had lots of char set issues and is ignored from now on.
>>
>> 3) The version I was planning to use, till I found this,
>>   speed issue was the "latest" official version:-
>>   $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
>>   I also verified the behavior with a nightly build.
>>   $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>>
>> Anyway, The following script indexes the content in 22min
>> for the 1.3-dev version and takes 68min for the newer releases
>> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
>> release and used it replace the conf directory from the
>> official 1.3 release. The 3x slow down was still there; it is
>> not a configuration issue!
>> =================================
>>
>>
>>
>>
>>
>>
>> #! /bin/bash
>>
>> # This script assumes a /usr/local/tomcat link to whatever version
>> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
>> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
>> # All the following was done as root.
>>
>>
>> # I have a directory /usr/local/ts which contains four versions of solr. The
>> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or a 1.3beata
>> # I got while attending a solr bootcamp. I indexed the same content using the
>> # different versions of solr as follows:
>> cd /usr/local/ts
>> if [ "" ]
>> then
>>   echo "Starting from a-fresh"
>>   sleep 5 # allow time for me to interrupt!
>>   cp -Rp apache-solr-bc/example/solr      ./solrbc  #bc = bootcamp
>>   cp -Rp apache-solr-nightly/example/solr ./solrnightly
>>   cp -Rp apache-solr-1.3.0/example/solr   ./solr13
>>
>>   # the gaz is regularly updated and its name keeps changing :-) The page
>>   # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to the latest
>>   # version.
>>   curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip" > geonames.zip
>>   unzip -q geonames.zip
>>   # delete corrupt blips!
>>   perl -i -n -e 'print unless
>>       ($. > 2128495 and $. < 2128505) or
>>       ($. > 5944254 and $. < 5944260)
>>       ;' geonames_dd_dms_date_20081118.txt
>>   #following was used to detect bad short records
>>   #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F)," args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>>
>>   # my set of fields and copyfields for the schema.xml
>>   fields='
>>   <fields>
>>      <field name="UNI"           type="string" indexed="true"  stored="true" required="true" />
>>      <field name="CCODE"         type="string" indexed="true"  stored="true"/>
>>      <field name="DSG"           type="string" indexed="true"  stored="true"/>
>>      <field name="CC1"           type="string" indexed="true"  stored="true"/>
>>      <field name="LAT"           type="sfloat" indexed="true"  stored="true"/>
>>      <field name="LONG"          type="sfloat" indexed="true"  stored="true"/>
>>      <field name="MGRS"          type="string" indexed="false" stored="true"/>
>>      <field name="JOG"           type="string" indexed="false" stored="true"/>
>>      <field name="FULL_NAME"     type="string" indexed="true"  stored="true"/>
>>      <field name="FULL_NAME_ND"  type="string" indexed="true"  stored="true"/>
>>      <!--field name="text"       type="text"   indexed="true"  stored="false" multiValued="true"/ -->
>>      <!--field name="timestamp"  type="date"   indexed="true"  stored="true"  default="NOW" multiValued="false"/-->
>>   '
>>   copyfields='
>>      </fields>
>>      <copyField source="FULL_NAME" dest="text"/>
>>      <copyField source="FULL_NAME_ND" dest="text"/>
>>   '
>>
>>   # add in my fields and copyfields
>>   perl -i -p -e "print qq($fields) if s/<fields>//;"           solr*/conf/schema.xml
>>   perl -i -p -e "print qq($copyfields) if s[</fields>][];"     solr*/conf/schema.xml
>>   # change the unique key and mark the "id" field as not required
>>   perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;"            solr*/conf/schema.xml
>>   perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/conf/schema.xml
>>   # enable remote streaming in solrconfig file
>>   perl -i -p -e 's/enableRemoteStreaming="false"/enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
>>   fi
>>
>> # some constants to keep the curl command shorter
>> skip="MODIFY_DATE,RC,UFI,DMS_LAT,DMS_LONG,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
>> file=`pwd`"/geonames.txt"
>>
>> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -Dsolr.solr.home=`pwd`/solr"
>>
>> echo 'Getting ready to index the data set using solrbc (bc = bootcamp)'
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>>   then
>>   echo "Tomcat would not shutdown"
>>   exit
>>   fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
>> rm solr # rm the symbolic link
>> ln -s solrbc solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>>
>> echo "Getting ready to index the data set using solrnightly"
>> /usr/local/tomcat/bin/shutdown.sh
>> sleep 15
>> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>>   then
>>   echo "Tomcat would not shutdown"
>>   exit
>>   fi
>> rm -r /usr/local/tomcat/webapps/solr*
>> rm -r /usr/local/tomcat/logs/*.out
>> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
>> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/webapps
>> rm solr # rm the symbolic link
>> ln -s solrnightly solr
>> rm -r solr/data
>> /usr/local/tomcat/bin/startup.sh
>> sleep 10 # give solr time to launch and setup
>> echo "Starting indexing at " `date` " with solrnightly"
>> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>>
>>
>>
>>
>>>On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>>
>>>> Hello Grant,
>>>>
>>>>> Were you overwriting the existing index or did you also clean out the
>>>>> Solr data directory, too?  In other words, was it a fresh index, or
>>>>> an
>>>>> existing one?  And was that also the case for the 22 minute time?
>>>>
>>>> No in each case it was a new index. I store the indexes (the "data"
>>>> dir)
>>>> outside the solr home directory. For the moment I, rm -rf the index
>>>> dir
>>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>>
>>>>> Would it be possible to profile the two instance and see if you
>>>>> notice
>>>>> anything different?
>>>> I dont understand this. Do mean run a profiler against the tomcat
>>>> image as indexing takes place, or somehow compare the indexes?
>>>
>>>Something like JProfiler or any other Java profiler.
>>>
>>>>
>>>>
>>>> I was think of making a short script that replicates the results,
>>>> and posting it here, would that help?
>>>
>>>
>>>Very much so.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Grant
>>>>>
>>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>>
>>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>>>>
>>>>>> Are there any tweaks I can use to get the original index time
>>>>>> back. I read through the release notes and was expecting a
>>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>>> it to 64MB; it had no effect.
>>>>>> --
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie               Email:fergus@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!

Posted by Yonik Seeley <yo...@apache.org>.
Another thought I just had - do you have autocommit enabled?

A lucene commit is now more expensive because it syncs the files for
safety.  If you commit frequently, this could definitely cause a
slowdown.

-Yonik

On Wed, Nov 26, 2008 at 10:54 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
> Hello Grant,
>
> Not much good with Java profilers (yet!) so I thought I
> would send a script!
>
> Details... details! Having decided to produce a script to
> replicate the 1.2 vis 1.3 speed problem. The required rigor
> revealed a lot more.
>
> 1) The faster version I have previously referred to as 1.2,
>   was actually a "1.3-dev" I had downloaded as part of the
>   solr bootcamp class at ApacheCon Europe 2008. The ID
>   string in the CHANGES.txt document is:-
>   $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
>
> 2) I did actually download and speed test a version of 1.2
>   from the internet. It's CHANGES.txt id is:-
>   $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
>   Speed wise it was about the same as 1.3 at 64min. It also
>   had lots of char set issues and is ignored from now on.
>
> 3) The version I was planning to use, till I found this,
>   speed issue was the "latest" official version:-
>   $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
>   I also verified the behavior with a nightly build.
>   $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
>
> Anyway, The following script indexes the content in 22min
> for the 1.3-dev version and takes 68min for the newer releases
> of 1.3. I took the conf directory from the 1.3dev (bootcamp)
> release and used it replace the conf directory from the
> official 1.3 release. The 3x slow down was still there; it is
> not a configuration issue!
> =================================
>
>
>
>
>
>
> #! /bin/bash
>
> # This script assumes a /usr/local/tomcat link to whatever version
> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also
> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml.
> # All the following was done as root.
>
>
> # I have a directory /usr/local/ts which contains four versions of solr. The
> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or a 1.3beata
> # I got while attending a solr bootcamp. I indexed the same content using the
> # different versions of solr as follows:
> cd /usr/local/ts
> if [ "" ]
> then
>   echo "Starting from a-fresh"
>   sleep 5 # allow time for me to interrupt!
>   cp -Rp apache-solr-bc/example/solr      ./solrbc  #bc = bootcamp
>   cp -Rp apache-solr-nightly/example/solr ./solrnightly
>   cp -Rp apache-solr-1.3.0/example/solr   ./solr13
>
>   # the gaz is regularly updated and its name keeps changing :-) The page
>   # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to the latest
>   # version.
>   curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip" > geonames.zip
>   unzip -q geonames.zip
>   # delete corrupt blips!
>   perl -i -n -e 'print unless
>       ($. > 2128495 and $. < 2128505) or
>       ($. > 5944254 and $. < 5944260)
>       ;' geonames_dd_dms_date_20081118.txt
>   #following was used to detect bad short records
>   #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F)," args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
>
>   # my set of fields and copyfields for the schema.xml
>   fields='
>   <fields>
>      <field name="UNI"           type="string" indexed="true"  stored="true" required="true" />
>      <field name="CCODE"         type="string" indexed="true"  stored="true"/>
>      <field name="DSG"           type="string" indexed="true"  stored="true"/>
>      <field name="CC1"           type="string" indexed="true"  stored="true"/>
>      <field name="LAT"           type="sfloat" indexed="true"  stored="true"/>
>      <field name="LONG"          type="sfloat" indexed="true"  stored="true"/>
>      <field name="MGRS"          type="string" indexed="false" stored="true"/>
>      <field name="JOG"           type="string" indexed="false" stored="true"/>
>      <field name="FULL_NAME"     type="string" indexed="true"  stored="true"/>
>      <field name="FULL_NAME_ND"  type="string" indexed="true"  stored="true"/>
>      <!--field name="text"       type="text"   indexed="true"  stored="false" multiValued="true"/ -->
>      <!--field name="timestamp"  type="date"   indexed="true"  stored="true"  default="NOW" multiValued="false"/-->
>   '
>   copyfields='
>      </fields>
>      <copyField source="FULL_NAME" dest="text"/>
>      <copyField source="FULL_NAME_ND" dest="text"/>
>   '
>
>   # add in my fields and copyfields
>   perl -i -p -e "print qq($fields) if s/<fields>//;"           solr*/conf/schema.xml
>   perl -i -p -e "print qq($copyfields) if s[</fields>][];"     solr*/conf/schema.xml
>   # change the unique key and mark the "id" field as not required
>   perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;"            solr*/conf/schema.xml
>   perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/conf/schema.xml
>   # enable remote streaming in solrconfig file
>   perl -i -p -e 's/enableRemoteStreaming="false"/enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
>   fi
>
> # some constants to keep the curl command shorter
> skip="MODIFY_DATE,RC,UFI,DMS_LAT,DMS_LONG,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
> file=`pwd`"/geonames.txt"
>
> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -Dsolr.solr.home=`pwd`/solr"
>
> echo 'Getting ready to index the data set using solrbc (bc = bootcamp)'
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>   then
>   echo "Tomcat would not shutdown"
>   exit
>   fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
> rm solr # rm the symbolic link
> ln -s solrbc solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>
> echo "Getting ready to index the data set using solrnightly"
> /usr/local/tomcat/bin/shutdown.sh
> sleep 15
> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ]
>   then
>   echo "Tomcat would not shutdown"
>   exit
>   fi
> rm -r /usr/local/tomcat/webapps/solr*
> rm -r /usr/local/tomcat/logs/*.out
> rm -r /usr/local/tomcat/work/Catalina/localhost/solr
> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/webapps
> rm solr # rm the symbolic link
> ln -s solrnightly solr
> rm -r solr/data
> /usr/local/tomcat/bin/startup.sh
> sleep 10 # give solr time to launch and setup
> echo "Starting indexing at " `date` " with solrnightly"
> time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"
>
>
>
>
>>On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>>
>>> Hello Grant,
>>>
>>>> Were you overwriting the existing index or did you also clean out the
>>>> Solr data directory, too?  In other words, was it a fresh index, or
>>>> an
>>>> existing one?  And was that also the case for the 22 minute time?
>>>
>>> No in each case it was a new index. I store the indexes (the "data"
>>> dir)
>>> outside the solr home directory. For the moment I, rm -rf the index
>>> dir
>>> after each edit to the solrconfig.sml or schema.xml file and reindex
>>> from scratch. The relaunch of tomcat recreates the index dir.
>>>
>>>> Would it be possible to profile the two instance and see if you
>>>> notice
>>>> anything different?
>>> I dont understand this. Do mean run a profiler against the tomcat
>>> image as indexing takes place, or somehow compare the indexes?
>>
>>Something like JProfiler or any other Java profiler.
>>
>>>
>>>
>>> I was think of making a short script that replicates the results,
>>> and posting it here, would that help?
>>
>>
>>Very much so.
>>
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Grant
>>>>
>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have a CSV file with 6M records which took 22min to index with
>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>>
>>>>> Indexing the exact same content now takes 69min. My machine has
>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>>>
>>>>> Are there any tweaks I can use to get the original index time
>>>>> back. I read through the release notes and was expecting a
>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>>> it to 64MB; it had no effect.
>>>>> --
>
> --
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>

Re: Upgrade from 1.2 to 1.3 gives 3x slowdown + script!

Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hello Grant, 

Not much good with Java profilers (yet!) so I thought I 
would send a script!

Details... details! Having decided to produce a script to 
replicate the 1.2 vis 1.3 speed problem. The required rigor 
revealed a lot more.

1) The faster version I have previously referred to as 1.2,
   was actually a "1.3-dev" I had downloaded as part of the
   solr bootcamp class at ApacheCon Europe 2008. The ID
   string in the CHANGES.txt document is:-
   $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $
   
2) I did actually download and speed test a version of 1.2 
   from the internet. It's CHANGES.txt id is:-
   $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $
   Speed wise it was about the same as 1.3 at 64min. It also
   had lots of char set issues and is ignored from now on.
   
3) The version I was planning to use, till I found this,
   speed issue was the "latest" official version:-
   $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $
   I also verified the behavior with a nightly build.
   $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $
   
Anyway, The following script indexes the content in 22min
for the 1.3-dev version and takes 68min for the newer releases
of 1.3. I took the conf directory from the 1.3dev (bootcamp) 
release and used it replace the conf directory from the
official 1.3 release. The 3x slow down was still there; it is
not a configuration issue!
=================================






#! /bin/bash

# This script assumes a /usr/local/tomcat link to whatever version
# of tomcat you have installed. I have "apache-tomcat-5.5.20" Also 
# /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml. 
# All the following was done as root.


# I have a directory /usr/local/ts which contains four versions of solr. The
# "official" 1.2 along with two 1.3 releases and a version of 1.2 or a 1.3beata
# I got while attending a solr bootcamp. I indexed the same content using the
# different versions of solr as follows:
cd /usr/local/ts
if [ "" ] 
then 
   echo "Starting from a-fresh"
   sleep 5 # allow time for me to interrupt!
   cp -Rp apache-solr-bc/example/solr      ./solrbc  #bc = bootcamp
   cp -Rp apache-solr-nightly/example/solr ./solrnightly
   cp -Rp apache-solr-1.3.0/example/solr   ./solr13
   
   # the gaz is regularly updated and its name keeps changing :-) The page
   # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to the latest
   # version.
   curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip" > geonames.zip
   unzip -q geonames.zip
   # delete corrupt blips!
   perl -i -n -e 'print unless  
       ($. > 2128495 and $. < 2128505) or
       ($. > 5944254 and $. < 5944260) 
       ;' geonames_dd_dms_date_20081118.txt
   #following was used to detect bad short records
   #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F)," args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt
   
   # my set of fields and copyfields for the schema.xml
   fields='
   <fields>
      <field name="UNI"           type="string" indexed="true"  stored="true" required="true" /> 
      <field name="CCODE"         type="string" indexed="true"  stored="true"/>
      <field name="DSG"           type="string" indexed="true"  stored="true"/>
      <field name="CC1"           type="string" indexed="true"  stored="true"/>
      <field name="LAT"           type="sfloat" indexed="true"  stored="true"/>
      <field name="LONG"          type="sfloat" indexed="true"  stored="true"/>
      <field name="MGRS"          type="string" indexed="false" stored="true"/>
      <field name="JOG"           type="string" indexed="false" stored="true"/>
      <field name="FULL_NAME"     type="string" indexed="true"  stored="true"/>
      <field name="FULL_NAME_ND"  type="string" indexed="true"  stored="true"/>
      <!--field name="text"       type="text"   indexed="true"  stored="false" multiValued="true"/ -->
      <!--field name="timestamp"  type="date"   indexed="true"  stored="true"  default="NOW" multiValued="false"/-->
   '
   copyfields='
      </fields>
      <copyField source="FULL_NAME" dest="text"/>
      <copyField source="FULL_NAME_ND" dest="text"/>
   '
   
   # add in my fields and copyfields
   perl -i -p -e "print qq($fields) if s/<fields>//;"           solr*/conf/schema.xml
   perl -i -p -e "print qq($copyfields) if s[</fields>][];"     solr*/conf/schema.xml
   # change the unique key and mark the "id" field as not required
   perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;"            solr*/conf/schema.xml
   perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/conf/schema.xml
   # enable remote streaming in solrconfig file
   perl -i -p -e 's/enableRemoteStreaming="false"/enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml
   fi

# some constants to keep the curl command shorter
skip="MODIFY_DATE,RC,UFI,DMS_LAT,DMS_LONG,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME"
file=`pwd`"/geonames.txt"

export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr -Dsolr.solr.home=`pwd`/solr"

echo 'Getting ready to index the data set using solrbc (bc = bootcamp)'
/usr/local/tomcat/bin/shutdown.sh
sleep 15
if [ -n "`ps awxww | grep tomcat | grep -v grep`" ] 
   then 
   echo "Tomcat would not shutdown"
   exit
   fi
rm -r /usr/local/tomcat/webapps/solr*
rm -r /usr/local/tomcat/logs/*.out
rm -r /usr/local/tomcat/work/Catalina/localhost/solr
cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps
rm solr # rm the symbolic link
ln -s solrbc solr
rm -r solr/data
/usr/local/tomcat/bin/startup.sh
sleep 10 # give solr time to launch and setup
echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)"
time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"

echo "Getting ready to index the data set using solrnightly"
/usr/local/tomcat/bin/shutdown.sh
sleep 15
if [ -n "`ps awxww | grep tomcat | grep -v grep`" ] 
   then 
   echo "Tomcat would not shutdown"
   exit
   fi
rm -r /usr/local/tomcat/webapps/solr*
rm -r /usr/local/tomcat/logs/*.out
rm -r /usr/local/tomcat/work/Catalina/localhost/solr
cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/webapps
rm solr # rm the symbolic link
ln -s solrnightly solr
rm -r solr/data
/usr/local/tomcat/bin/startup.sh
sleep 10 # give solr time to launch and setup
echo "Starting indexing at " `date` " with solrnightly"
time curl "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip"




>On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:
>
>> Hello Grant,
>>
>>> Were you overwriting the existing index or did you also clean out the
>>> Solr data directory, too?  In other words, was it a fresh index, or  
>>> an
>>> existing one?  And was that also the case for the 22 minute time?
>>
>> No in each case it was a new index. I store the indexes (the "data"  
>> dir)
>> outside the solr home directory. For the moment I, rm -rf the index  
>> dir
>> after each edit to the solrconfig.sml or schema.xml file and reindex
>> from scratch. The relaunch of tomcat recreates the index dir.
>>
>>> Would it be possible to profile the two instance and see if you  
>>> notice
>>> anything different?
>> I dont understand this. Do mean run a profiler against the tomcat
>> image as indexing takes place, or somehow compare the indexes?
>
>Something like JProfiler or any other Java profiler.
>
>>
>>
>> I was think of making a short script that replicates the results,
>> and posting it here, would that help?
>
>
>Very much so.
>
>
>>
>>
>>>
>>> Thanks,
>>> Grant
>>>
>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a CSV file with 6M records which took 22min to index with
>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>>
>>>> Indexing the exact same content now takes 69min. My machine has
>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>>
>>>> Are there any tweaks I can use to get the original index time
>>>> back. I read through the release notes and was expecting a
>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>>> it to 64MB; it had no effect.
>>>> -- 

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: Upgrade from 1.2 to 1.3 gives 3x slowdown

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote:

> Hello Grant,
>
>> Were you overwriting the existing index or did you also clean out the
>> Solr data directory, too?  In other words, was it a fresh index, or  
>> an
>> existing one?  And was that also the case for the 22 minute time?
>
> No in each case it was a new index. I store the indexes (the "data"  
> dir)
> outside the solr home directory. For the moment I, rm -rf the index  
> dir
> after each edit to the solrconfig.sml or schema.xml file and reindex
> from scratch. The relaunch of tomcat recreates the index dir.
>
>> Would it be possible to profile the two instance and see if you  
>> notice
>> anything different?
> I dont understand this. Do mean run a profiler against the tomcat
> image as indexing takes place, or somehow compare the indexes?

Something like JProfiler or any other Java profiler.

>
>
> I was think of making a short script that replicates the results,
> and posting it here, would that help?


Very much so.


>
>
>>
>> Thanks,
>> Grant
>>
>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>>
>>> Hello,
>>>
>>> I have a CSV file with 6M records which took 22min to index with
>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>>> webapps with version 1.3, wiped my index and restarted tomcat.
>>>
>>> Indexing the exact same content now takes 69min. My machine has
>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>>
>>> Are there any tweaks I can use to get the original index time
>>> back. I read through the release notes and was expecting a
>>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>>> it to 64MB; it had no effect.
>>> -- 
>>>
>>> ===============================================================
>>> Fergus McMenemie               Email:fergus@twig.me.uk
>>> Techmore Ltd                   Phone:(UK) 07721 376021
>>>
>>> Unix/Mac/Intranets             Analyst Programmer
>>> ===============================================================
>
> -- 
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Upgrade from 1.2 to 1.3 gives 3x slowdown

Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hello Grant, 

>Were you overwriting the existing index or did you also clean out the  
>Solr data directory, too?  In other words, was it a fresh index, or an  
>existing one?  And was that also the case for the 22 minute time?

No in each case it was a new index. I store the indexes (the "data" dir)
outside the solr home directory. For the moment I, rm -rf the index dir
after each edit to the solrconfig.sml or schema.xml file and reindex
from scratch. The relaunch of tomcat recreates the index dir.

>Would it be possible to profile the two instance and see if you notice  
>anything different?
I dont understand this. Do mean run a profiler against the tomcat
image as indexing takes place, or somehow compare the indexes?

I was think of making a short script that replicates the results, 
and posting it here, would that help?

>
>Thanks,
>Grant
>
>On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:
>
>> Hello,
>>
>> I have a CSV file with 6M records which took 22min to index with
>> solr 1.2. I then stopped tomcat replaced the solr stuff inside
>> webapps with version 1.3, wiped my index and restarted tomcat.
>>
>> Indexing the exact same content now takes 69min. My machine has
>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>>
>> Are there any tweaks I can use to get the original index time
>> back. I read through the release notes and was expecting a
>> speed up. I saw the bit about increasing ramBufferSizeMB and set
>> it to 64MB; it had no effect.
>> -- 
>>
>> ===============================================================
>> Fergus McMenemie               Email:fergus@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: Upgrade from 1.2 to 1.3 gives 3x slowdown

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Fergus,

Were you overwriting the existing index or did you also clean out the  
Solr data directory, too?  In other words, was it a fresh index, or an  
existing one?  And was that also the case for the 22 minute time?

Would it be possible to profile the two instance and see if you notice  
anything different?

Thanks,
Grant

On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote:

> Hello,
>
> I have a CSV file with 6M records which took 22min to index with
> solr 1.2. I then stopped tomcat replaced the solr stuff inside
> webapps with version 1.3, wiped my index and restarted tomcat.
>
> Indexing the exact same content now takes 69min. My machine has
> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.
>
> Are there any tweaks I can use to get the original index time
> back. I read through the release notes and was expecting a
> speed up. I saw the bit about increasing ramBufferSizeMB and set
> it to 64MB; it had no effect.
> -- 
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================