You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Christian Moen (Created) (JIRA)" <ji...@apache.org> on 2012/03/27 12:47:27 UTC

[jira] [Created] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Perform Kuromoji/Japanese stability test before 3.6 freeze
----------------------------------------------------------

                 Key: SOLR-3282
                 URL: https://issues.apache.org/jira/browse/SOLR-3282
             Project: Solr
          Issue Type: Task
          Components: Schema and Analysis
    Affects Versions: 3.6, 4.0
            Reporter: Christian Moen


Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.

My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:

# Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
# Simultaneously run 1 million or so typical Japanese queries against the index at 3-5 queries per second

While Solr is indexing and searching, I'd like to verify that:

* Indexing and queries are working as expected
* Memory and heap usage looks stable over time
* Garbage collection is overall low over time -- no Full-GC issues

I'll post findings to this JIRA as I get things going.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:22 PM:
---------------------------------------------------------------

h3. Test 1 - Indexing Japanese Wikipedia

In this test I'm only indexing documents -- no searching is being done.

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine with a maximum GC time of 0.0187319 seconds.  There wasn't even a full GC probably like to the large heap size.  However, if Kuromoji was generating garbage, I'd expect to see it here since input in XML format is 1.7GB and the Viterbi would generate data many many times that size during tokenization.

I'm attaching these files

|| Attachment || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

Note that GCViewer had problems parsing the log file so the data in the screenshot might be off.
                
      was (Author: cm):
    h5. Test 1: Indexing Japanese Wikipedia

In this test I'm only indexing documents -- no searching is being done.

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine with a maximum GC time of 0.0187319 seconds.  There wasn't even a full GC probably like to the large heap size.  However, if Kuromoji was generating garbage, I'd expect to see it here since input in XML format is 1.7GB and the Viterbi would generate data many many times that size during tokenization.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

Note that GCViewer had problems parsing the log file so the data in the screenshot might be off.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:00 PM:
---------------------------------------------------------------

h5. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                
      was (Author: cm):
    h5. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:49 PM:
---------------------------------------------------------------

h5. Test 1: Indexing Japanese Wikipedia

In this test I'm only indexing documents -- no searching is being done.

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine with a maximum GC time of 0.0187319 seconds.  There wasn't even a full GC probably like to the large heap size.  However, if Kuromoji was generating garbage, I'd expect to see it here since input in XML format is 1.7GB and the Viterbi would generate data many many times that size during tokenization.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

Note that GCViewer had problems parsing the log file so the data in the screenshot might be off.
                
      was (Author: cm):
    h5. Test 1: Indexing Japanese Wikipedia

In this test I'm only indexing documents -- no searching is being done.

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine with a maximum GC time of 0.0187319 seconds.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

Note that GCViewer had problems parsing the log file so the data in the screenshot might be off.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239597#comment-13239597 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:38 PM:
---------------------------------------------------------------

h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options -- and all of Wikipedia searchable.  Very nice :)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                
      was (Author: cm):
    h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options -- and all of Wikipedia searchable (!)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen updated SOLR-3282:
---------------------------------

    Attachment: 62k-queries-highlight-visualvm.png
                62k-queries-highlight-gc.log
    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239714#comment-13239714 ] 

Christian Moen edited comment on SOLR-3282 at 3/28/12 2:48 AM:
---------------------------------------------------------------

h3. Test 4 - Combined search and indexing test

In this test, we are both indexing all of Wikipedia while searching.

The search rate is a constant 10 QPS with highlighting.  The queries in this test are identical to those run above and they are also unique.

Solr is started using

{noformat}
java -verbose:gc -Xmx256m  -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I've given it a little more heap because of the memory pressure issue seen in _Test 3_.

The indexing posts the XML described in _Test 1_ - each file contains 1,000 documents and - different from _Test 1_ we now do a commit after each post.  No optimize is being done.

The test had been running for 8 hours and 33 minutes before I stopped it and 312,900 queries were run.  Japanese Wikipedia was indexed 23 times.

Full GC occurred 84 times and the maximum heap-size provided to the VM was allocated.  The longest Full GC times are given below.

|| Longest Full GC (seconds) ||
|1.0789668|
|1.0518156|
|1.0288781|
|0.9973905|
|0.9799409|
|0.9582144|
|0.9555027|
|0.9517524|
|0.9456611|
|0.9387380|
|0.9313493|
|0.9117388|
|0.8771426|
|...|


The longest regular (non-Full) GC times are below.

|| Longest non-Full GC (seconds) | 
|0.1375324|
|0.1206866|
|0.1009028|
|0.0952712|
|0.0928364|
|...|

The VisualVM screenshot suggests that the VM is nice and stable.  It might be good to provide a little more maximum heap-space than 256MB to index all of Japanese Wikipedia and serve 10 QPS to have a little more headroom, but 256MB seems quite fine.

|| Attachment || Description ||
| long-query-indexing-gc.log | GC log |
| long-search-indexing-visualvm.png | VisualVM screenshot |



                
      was (Author: cm):
    h3. Test 4 - Combined search and indexing test

In this test, we are both indexing all of Wikipedia while searching.

The search rate is a constant 10 QPS.  The queries in this test are identical to those run above and they are also unique.

Solr is started using

{noformat}
java -verbose:gc -Xmx256m  -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I've given it a little more heap because of the memory pressure issue seen in _Test 3_.

The indexing posts the XML described in _Test 1_ - each file contains 1,000 documents and - different from _Test 1_ we now do a commit after each post.  No optimize is being done.

The test had been running for 8 hours and 33 minutes before I stopped it and 312,900 queries were run.  Japanese Wikipedia was indexed 23 times.

Full GC occurred 84 times and the maximum heap-size provided to the VM was allocated.  The longest Full GC times are given below.

|| Longest Full GC (seconds) ||
|1.0789668|
|1.0518156|
|1.0288781|
|0.9973905|
|0.9799409|
|0.9582144|
|0.9555027|
|0.9517524|
|0.9456611|
|0.9387380|
|0.9313493|
|0.9117388|
|0.8771426|
|...|


The longest regular (non-Full) GC times are below.

|| Longest non-Full GC (seconds) | 
|0.1375324|
|0.1206866|
|0.1009028|
|0.0952712|
|0.0928364|
|...|

The VisualVM screenshot suggests that the VM is nice and stable.  It might be good to provide a little more maximum heap-space than 256MB to index all of Japanese Wikipedia and serve 10 QPS to have a little more headroom, but 256MB seems quite fine.

|| Attachment || Description ||
| long-query-indexing-gc.log | GC log |
| long-search-indexing-visualvm.png | VisualVM screenshot |



                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png, long-query-indexing-gc.log, long-search-indexing-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:02 PM:
---------------------------------------------------------------

h5. Test 1: Indexing Japanese Wikipedia

In this test I'm only indexing documents -- no searching is being done.

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                
      was (Author: cm):
    h5. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239447#comment-13239447 ] 

Michael McCandless commented on SOLR-3282:
------------------------------------------

This sounds like a fabulous test!

I wonder if we can somehow make this easily runnable "on demand" (eg, like Test2BTerms), assuming you have the prereqs installed locally (eg Japanese Wikipedia export).
                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239597#comment-13239597 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size and no fancy GC options (and all of Wikipedia searchable)

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen resolved SOLR-3282.
----------------------------------

    Resolution: Fixed
    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png, long-query-indexing-gc.log, long-search-indexing-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239577#comment-13239577 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

*Setup*

My set up is a MacBook Pro running Mac OS X Lion (10.7) with 8GB memory, a Core i7 CPU (4 cores), a 500GB SSD and too many things running.  (The purpose of the test is to test stability and not to provide accurate performance numbers, although I also hope to do that.)

My java is as follows:

{noformat}
[cm@ayu:~] java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)
{noformat}

I've added fields body and title to {{schema.xml}} and they're using the default Japanese configuration in {{text_ja}}.

                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen reassigned SOLR-3282:
------------------------------------

    Assignee: Christian Moen
    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run 1 million or so typical Japanese queries against the index at 3-5 queries per second
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings to this JIRA as I get things going.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen updated SOLR-3282:
---------------------------------

    Attachment: long-search-indexing-visualvm.png
                long-query-indexing-gc.log
    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png, long-query-indexing-gc.log, long-search-indexing-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen updated SOLR-3282:
---------------------------------

    Attachment: jawiki-index-visualvm.png
                jawiki-index-gcviewer.png
                jawiki-index-gc.log
    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen updated SOLR-3282:
---------------------------------

    Description: 
Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.

My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:

# Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
# Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on

While Solr is indexing and searching, I'd like to verify that:

* Indexing and queries are working as expected
* Memory and heap usage looks stable over time
* Garbage collection is overall low over time -- no Full-GC issues

I'll post findings to this JIRA as I get things going.

  was:
Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.

My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:

# Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
# Simultaneously run 1 million or so typical Japanese queries against the index at 3-5 queries per second

While Solr is indexing and searching, I'd like to verify that:

* Indexing and queries are working as expected
* Memory and heap usage looks stable over time
* Garbage collection is overall low over time -- no Full-GC issues

I'll post findings to this JIRA as I get things going.

    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings to this JIRA as I get things going.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen updated SOLR-3282:
---------------------------------

    Attachment: 250k-queries-no-highlight-visualvm.png
                250k-queries-no-highlight-gc.log
    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250751#comment-13250751 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

I'll resolve this issue now.

I've also been doing additional testing using the VisualVM Visual GC plugin and I'm seeing that the {{org.apache.lucene.analysis.ja.Token}} objects gets collected fairly as we expect.  In actual deployments, it's perhaps a good idea to use a larger eden space by using the server GC defaults or tune things up.

In longer term tests, it seems like Solr's heap-space is being used also in the case of a 512MB heap before a full GC recovered lots of heap.  I suspect this might be caused by searching with highlighting and the heap seems very stable with indexing only.  (In either case, this doesn't seem to be caused by Kuromoji.)

                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png, long-query-indexing-gc.log, long-search-indexing-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239577#comment-13239577 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 3:59 PM:
---------------------------------------------------------------

.h5 Test setup

My set up is a MacBook Pro running Mac OS X Lion (10.7) with 8GB memory, a Core i7 CPU (4 cores), a 500GB SSD and too many things running.  (The purpose of the test is to test stability and not to provide accurate performance numbers, although I also hope to do that.)

My java is as follows:

{noformat}
[cm@ayu:~] java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)
{noformat}

I've added fields body and title to {{schema.xml}} and they're using the default Japanese configuration in {{text_ja}}.

                
      was (Author: cm):
    *Setup*

My set up is a MacBook Pro running Mac OS X Lion (10.7) with 8GB memory, a Core i7 CPU (4 cores), a 500GB SSD and too many things running.  (The purpose of the test is to test stability and not to provide accurate performance numbers, although I also hope to do that.)

My java is as follows:

{noformat}
[cm@ayu:~] java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)
{noformat}

I've added fields body and title to {{schema.xml}} and they're using the default Japanese configuration in {{text_ja}}.

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239659#comment-13239659 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:25 PM:
---------------------------------------------------------------

h3. Test 3 - Searching with highlighting (no indexing)

The test is similar to _Test 2_ with highlighting turned on, but only ~62,000 queries were run.  No indexing was done.

Solr was run as follows

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

and - again - notice a small heap size and regular GC options.

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84&hl=on&hl.fl=body
{noformat}

which is

{noformat}
/solr/select/?q=無料占い&hl=on&hl.fl=body
{noformat}

in unquoted form.

We have turned on highlighting and we are highlighting on the body field.

The test completes in 1648.1 seconds and 63200 queries were run and the sustainable query rate was 47 QPS.

Turning on highlighting has a fairly significant performance penalty if we compare QPS to the non-highlighting case where we could sustain 142 QPS.

There is also increased memory pressure with highlighting turned on.  There were 652 Full GC events in total in the period and the longest Full GC times is given below. 

|| Longest Full GC times (seconds) ||
|0.9769069|
|0.8564934|
|0.7585956|
|0.7084318|
|0.6928327|
|0.6781336|
|0.6358398|
|0.6099899|
|0.5628532|
|0.5540237|
|0.5443075|
|0.5429399|
|0.5423989|
|...|

The extra memory pressure can also be seen in the VisualVM screenshot.  I believe the root cause of this is the highlighting.

|| Attachment || Description ||
| 62k-queries-highlight-gc.log|  GC log |
| 62k-queries-highlight-visualvm.png|  Screenshot from VisualVM |
                
      was (Author: cm):
    h5. Test 3 - Searching with highlighting (no indexing)

The test is similar to _Test 2_ with highlighting turned on, but only ~62,000 queries were run.  No indexing was done.

Solr was run as follows

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

and - again - notice a small heap size and regular GC options.

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84&hl=on&hl.fl=body
{noformat}

which is

{noformat}
/solr/select/?q=無料占い&hl=on&hl.fl=body
{noformat}

in unquoted form.

We have turned on highlighting and we are highlighting on the body field.

The test completes in 1648.1 seconds and 63200 queries were run and the sustainable query rate was 47 QPS.

Turning on highlighting has a fairly significant performance penalty if we compare QPS to the non-highlighting case where we could sustain 142 QPS.

There is also increased memory pressure with highlighting turned on.  There were 652 Full GC events in total in the period and the longest Full GC times is given below. 

|| Longest Full GC times (seconds) ||
|0.9769069|
|0.8564934|
|0.7585956|
|0.7084318|
|0.6928327|
|0.6781336|
|0.6358398|
|0.6099899|
|0.5628532|
|0.5540237|
|0.5443075|
|0.5429399|
|0.5423989|
|...|

The extra memory pressure can also be seen in the VisualVM screenshot.  I believe the root cause of this is the highlighting.

|| Attachment || Description ||
| 62k-queries-highlight-gc.log|  GC log |
| 62k-queries-highlight-visualvm.png|  Screenshot from VisualVM |
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:21 PM:
---------------------------------------------------------------

h5. Test 1: Indexing Japanese Wikipedia

In this test I'm only indexing documents -- no searching is being done.

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine with a maximum GC time of 0.0187319 seconds.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

Note that GCViewer had problems parsing the log file so the data in the screenshot might be off.
                
      was (Author: cm):
    h5. Test 1: Indexing Japanese Wikipedia

In this test I'm only indexing documents -- no searching is being done.

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm running my Solr simply using

{noformat}
java -verbose:gc -Xmx512m -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I'm not using any fancy GC options.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239513#comment-13239513 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

Thanks, Mike,

I should have a closer look at Test2BTerms to see how we can automate some of this.  However, this time, I'll do this manually because of the short time available until freeze.  Hope to post some results very soon!
                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239714#comment-13239714 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:46 PM:
---------------------------------------------------------------

h3. Test 4 - Combined search and indexing test

In this test, we are both indexing all of Wikipedia while searching.

The search rate is a constant 10 QPS.  The queries in this test are identical to those run above and they are also unique.

Solr is started using

{noformat}
java -verbose:gc -Xmx256m  -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I've given it a little more heap because of the memory pressure issue seen in _Test 3_.

The indexing posts the XML described in _Test 1_ - each file contains 1,000 documents and - different from _Test 1_ we now do a commit after each post.  No optimize is being done.

The test has now been running for 15 minutes and I'll let it run for hours.  I'll post details later. :)
                
      was (Author: cm):
    h3. Test 4 - Combined search and indexing test

In this test, we are both indexing all of Wikipedia while searching at a constant 10 QPS rate.  The queries in this test are identical to those run above

Solr is started using

{noformat}
java -verbose:gc -Xmx256m  -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I've given it a little more heap because of the memory pressure issue seen in _Test 3_.

The indexing posts the XML described in _Test 1_ - each file contains 1,000 documents and - different from _Test 1_ we now do a commit after each post.  No optimize is being done.

The test has now been running for 15 minutes and I'll let it run for hours.  I'll post details later. :)
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

h3. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240126#comment-13240126 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

h3. Summary

Without spending too much time interpreting details of this little test, I think Kuromoji looks stable and ready for release.

I also think it's very nice that Solr 3.6 can index Japanese Wikipedia (~1.4 million docs) continuously while serving unique user queries at 10 QPS on a laptop with using only 256MB heap space.

Anyone interested, please feel to add your comments and interpretations of the the results.  Thanks!
                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png, long-query-indexing-gc.log, long-search-indexing-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239577#comment-13239577 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:37 PM:
---------------------------------------------------------------

.h5 Test setup

My set up is a MacBook Pro running Mac OS X Lion (10.7) with 8GB memory, a Core i7 CPU (4 cores), a 500GB SSD and too many things running.  (The purpose of the test is to test stability and not to provide accurate performance numbers, although I also hope to do that.)

My java is as follows:

{noformat}
[cm@ayu:~] java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)
{noformat}

I've added fields body and title to {{schema.xml}} and they're using the default Japanese configuration in {{text_ja}}.  The default search field is body.

                
      was (Author: cm):
    .h5 Test setup

My set up is a MacBook Pro running Mac OS X Lion (10.7) with 8GB memory, a Core i7 CPU (4 cores), a 500GB SSD and too many things running.  (The purpose of the test is to test stability and not to provide accurate performance numbers, although I also hope to do that.)

My java is as follows:

{noformat}
[cm@ayu:~] java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)
{noformat}

I've added fields body and title to {{schema.xml}} and they're using the default Japanese configuration in {{text_ja}}.

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Moen updated SOLR-3282:
---------------------------------

    Description: 
Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.

My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:

# Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
# Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on

While Solr is indexing and searching, I'd like to verify that:

* Indexing and queries are working as expected
* Memory and heap usage looks stable over time
* Garbage collection is overall low over time -- no Full-GC issues

I'll post findings and results to this JIRA.

  was:
Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.

My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:

# Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
# Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on

While Solr is indexing and searching, I'd like to verify that:

* Indexing and queries are working as expected
* Memory and heap usage looks stable over time
* Garbage collection is overall low over time -- no Full-GC issues

I'll post findings to this JIRA as I get things going.

    
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239659#comment-13239659 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

h5. Test 3 - Searching with highlighting (no indexing)

The test is similar to _Test 2_ with highlighting turned on, but only ~62,000 queries were run.  No indexing was done.

Solr was run as follows

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

and - again - notice a small heap size and regular GC options.

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84&hl=on&hl.fl=body
{noformat}

which is

{noformat}
/solr/select/?q=無料占い&hl=on&hl.fl=body
{noformat}

in unquoted form.

We have turned on highlighting and we are highlighting on the body field.

The test completes in 1648.1 seconds and 63200 queries were run and the sustainable query rate was 47 QPS.

Turning on highlighting has a fairly significant performance penalty if we compare QPS to the non-highlighting case where we could sustain 142 QPS.

There is also increased memory pressure with highlighting turned on.  There were 652 Full GC events in total in the period and the longest Full GC times is given below. 

|| Longest Full GC times (seconds) ||
|0.9769069|
|0.8564934|
|0.7585956|
|0.7084318|
|0.6928327|
|0.6781336|
|0.6358398|
|0.6099899|
|0.5628532|
|0.5540237|
|0.5443075|
|0.5429399|
|0.5423989|
|...|

The extra memory pressure can also be seen in the VisualVM screenshot.

|| Attachment || Description ||
| 62k-queries-highlight-gc.log|  GC log |
| 62k-queries-highlight-visualvm.png|  Screenshot from VisualVM |
                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239597#comment-13239597 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:26 PM:
---------------------------------------------------------------

h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options -- and all of Wikipedia searchable (!)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                
      was (Author: cm):
    h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size and no fancy GC options (and all of Wikipedia searchable)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239597#comment-13239597 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:25 PM:
---------------------------------------------------------------

h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size and no fancy GC options (and all of Wikipedia searchable)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                
      was (Author: cm):
    h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size and no fancy GC options (and all of Wikipedia searchable)

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239714#comment-13239714 ] 

Christian Moen commented on SOLR-3282:
--------------------------------------

h3. Test 4 - Combined search and indexing test

In this test, we are both indexing all of Wikipedia while searching at a constant 10 QPS rate.  The queries in this test are identical to those run above

Solr is started using

{noformat}
java -verbose:gc -Xmx256m  -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I've given it a little more heap because of the memory pressure issue seen in _Test 3_.

The indexing posts the XML described in _Test 1_ - each file contains 1,000 documents and - different from _Test 1_ we now do a commit after each post.  No optimize is being done.

The test has now been running for 15 minutes and I'll let it run for hours.  I'll post details later. :)
                
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239597#comment-13239597 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:23 PM:
---------------------------------------------------------------

h3. Test 2 - Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options -- and all of Wikipedia searchable.  Very nice :)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Attachment || Description ||
| 250k-queries-no-highlight-gc.log | GC log |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                
      was (Author: cm):
    h3. Test 2 - Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options -- and all of Wikipedia searchable.  Very nice :)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Attachment || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239577#comment-13239577 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 4:59 PM:
---------------------------------------------------------------

h5. Test setup

My set up is a MacBook Pro running Mac OS X Lion (10.7) with 8GB memory, a Core i7 CPU (4 cores), a 500GB SSD and too many things running.  (The purpose of the test is to test stability and not to provide accurate performance numbers, although I also hope to do that.)

My java is as follows:

{noformat}
[cm@ayu:~] java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)
{noformat}

I've added fields body and title to {{schema.xml}} and they're using the default Japanese configuration in {{text_ja}}.  The default search field is body.

                
      was (Author: cm):
    .h5 Test setup

My set up is a MacBook Pro running Mac OS X Lion (10.7) with 8GB memory, a Core i7 CPU (4 cores), a 500GB SSD and too many things running.  (The purpose of the test is to test stability and not to provide accurate performance numbers, although I also hope to do that.)

My java is as follows:

{noformat}
[cm@ayu:~] java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)
{noformat}

I've added fields body and title to {{schema.xml}} and they're using the default Japanese configuration in {{text_ja}}.  The default search field is body.

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239659#comment-13239659 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:23 PM:
---------------------------------------------------------------

h5. Test 3 - Searching with highlighting (no indexing)

The test is similar to _Test 2_ with highlighting turned on, but only ~62,000 queries were run.  No indexing was done.

Solr was run as follows

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

and - again - notice a small heap size and regular GC options.

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84&hl=on&hl.fl=body
{noformat}

which is

{noformat}
/solr/select/?q=無料占い&hl=on&hl.fl=body
{noformat}

in unquoted form.

We have turned on highlighting and we are highlighting on the body field.

The test completes in 1648.1 seconds and 63200 queries were run and the sustainable query rate was 47 QPS.

Turning on highlighting has a fairly significant performance penalty if we compare QPS to the non-highlighting case where we could sustain 142 QPS.

There is also increased memory pressure with highlighting turned on.  There were 652 Full GC events in total in the period and the longest Full GC times is given below. 

|| Longest Full GC times (seconds) ||
|0.9769069|
|0.8564934|
|0.7585956|
|0.7084318|
|0.6928327|
|0.6781336|
|0.6358398|
|0.6099899|
|0.5628532|
|0.5540237|
|0.5443075|
|0.5429399|
|0.5423989|
|...|

The extra memory pressure can also be seen in the VisualVM screenshot.  I believe the root cause of this is the highlighting.

|| Attachment || Description ||
| 62k-queries-highlight-gc.log|  GC log |
| 62k-queries-highlight-visualvm.png|  Screenshot from VisualVM |
                
      was (Author: cm):
    h5. Test 3 - Searching with highlighting (no indexing)

The test is similar to _Test 2_ with highlighting turned on, but only ~62,000 queries were run.  No indexing was done.

Solr was run as follows

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

and - again - notice a small heap size and regular GC options.

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84&hl=on&hl.fl=body
{noformat}

which is

{noformat}
/solr/select/?q=無料占い&hl=on&hl.fl=body
{noformat}

in unquoted form.

We have turned on highlighting and we are highlighting on the body field.

The test completes in 1648.1 seconds and 63200 queries were run and the sustainable query rate was 47 QPS.

Turning on highlighting has a fairly significant performance penalty if we compare QPS to the non-highlighting case where we could sustain 142 QPS.

There is also increased memory pressure with highlighting turned on.  There were 652 Full GC events in total in the period and the longest Full GC times is given below. 

|| Longest Full GC times (seconds) ||
|0.9769069|
|0.8564934|
|0.7585956|
|0.7084318|
|0.6928327|
|0.6781336|
|0.6358398|
|0.6099899|
|0.5628532|
|0.5540237|
|0.5443075|
|0.5429399|
|0.5423989|
|...|

The extra memory pressure can also be seen in the VisualVM screenshot.

|| Attachment || Description ||
| 62k-queries-highlight-gc.log|  GC log |
| 62k-queries-highlight-visualvm.png|  Screenshot from VisualVM |
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239597#comment-13239597 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:21 PM:
---------------------------------------------------------------

h3. Test 2 - Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options -- and all of Wikipedia searchable.  Very nice :)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Attachment || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                
      was (Author: cm):
    h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options -- and all of Wikipedia searchable.  Very nice :)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80% of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching a screenshot for this.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239714#comment-13239714 ] 

Christian Moen edited comment on SOLR-3282 at 3/28/12 2:37 AM:
---------------------------------------------------------------

h3. Test 4 - Combined search and indexing test

In this test, we are both indexing all of Wikipedia while searching.

The search rate is a constant 10 QPS.  The queries in this test are identical to those run above and they are also unique.

Solr is started using

{noformat}
java -verbose:gc -Xmx256m  -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I've given it a little more heap because of the memory pressure issue seen in _Test 3_.

The indexing posts the XML described in _Test 1_ - each file contains 1,000 documents and - different from _Test 1_ we now do a commit after each post.  No optimize is being done.

The test had been running for 8 hours and 33 minutes before I stopped it and 312,900 queries were run.  Japanese Wikipedia was indexed 23 times.

Full GC occurred 84 times and the maximum heap-size provided to the VM was allocated.  The longest Full GC times are given below.

|| Longest Full GC (seconds) ||
|1.0789668|
|1.0518156|
|1.0288781|
|0.9973905|
|0.9799409|
|0.9582144|
|0.9555027|
|0.9517524|
|0.9456611|
|0.9387380|
|0.9313493|
|0.9117388|
|0.8771426|
|...|


The longest regular (non-Full) GC times are below.

|| Longest non-Full GC (seconds) | 
|0.1375324|
|0.1206866|
|0.1009028|
|0.0952712|
|0.0928364|
|...|

The VisualVM screenshot suggests that the VM is nice and stable.  It might be good to provide a little more maximum heap-space than 256MB to index all of Japanese Wikipedia and serve 10 QPS to have a little more headroom, but 256MB seems quite fine.

|| Attachment || Description ||
| long-query-indexing-gc.log | GC log |
| long-search-indexing-visualvm.png | VisualVM screenshot |



                
      was (Author: cm):
    h3. Test 4 - Combined search and indexing test

In this test, we are both indexing all of Wikipedia while searching.

The search rate is a constant 10 QPS.  The queries in this test are identical to those run above and they are also unique.

Solr is started using

{noformat}
java -verbose:gc -Xmx256m  -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so I've given it a little more heap because of the memory pressure issue seen in _Test 3_.

The indexing posts the XML described in _Test 1_ - each file contains 1,000 documents and - different from _Test 1_ we now do a commit after each post.  No optimize is being done.

The test has now been running for 15 minutes and I'll let it run for hours.  I'll post details later. :)
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png, 62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png, jawiki-index-visualvm.png, long-query-indexing-gc.log, long-search-indexing-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Posted by "Christian Moen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579 ] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 3:59 PM:
---------------------------------------------------------------

h5. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                
      was (Author: cm):
    h3. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all the gory markup so the content is clean.  There are 1,443,764 documents in total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 documents per file.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real	18m39.206s
user	0m12.682s
sys	0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org