You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "P Eger (JIRA)" <ji...@apache.org> on 2009/04/13 20:51:15 UTC

[jira] Created: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Reduce usage of String.intern(), performance is terrible
--------------------------------------------------------

                 Key: LUCENE-1600
                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
             Project: Lucene - Java
          Issue Type: Improvement
    Affects Versions: 2.4.1, 2.4
         Environment: Windows Server 2003 x64
Hotspot JDK 1.6.0_12 64-bit
            Reporter: P Eger
            Priority: Minor
         Attachments: intern.png, intern_perf.patch

I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.


TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698534#action_12698534 ] 

Michael McCandless commented on LUCENE-1600:
--------------------------------------------

Thanks for the fix, P.  I'll commit this for 2.9.

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: P Eger
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699865#action_12699865 ] 

Uwe Schindler commented on LUCENE-1600:
---------------------------------------

In addition to Mikes fixes, there are more places in FieldsReader, where intern() is used. The best would be to add the sme ctor to AbstractField, too and use it for LayzyField and so on, too.
If I have time, I attach a patch similar to Mikes (as he is on holidays).

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: Patrick Eger
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "Patrick Eger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699864#action_12699864 ] 

Patrick Eger commented on LUCENE-1600:
--------------------------------------

Hashmaps would work also, but then they either need to be synchronized or kept per-thread, the former would probably kill all your performance gains and the latter would be annoying i think. A moderate usage of String.intern() is fine i think, my patch just takes it out of the hot-path (for my use-case at least). Other uses of String.intern() in the codebase may have different solutions/tradeoffs certainly.

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: Patrick Eger
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "P Eger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698495#action_12698495 ] 

P Eger commented on LUCENE-1600:
--------------------------------

note that there may be other opportunities to reduce interning, i fixed it only for my specific use-case

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: P Eger
>            Priority: Minor
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1600.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.9

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: P Eger
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699857#action_12699857 ] 

Jason Rutherglen commented on LUCENE-1600:
------------------------------------------

contrib/MemoryIndex has a bunch of notes about how interning is
slow, and using (I believe) hashmaps of strings is better.
Comments on this approach?

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: Patrick Eger
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1600:
------------------------------------------

    Assignee: Michael McCandless

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: P Eger
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "P Eger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

P Eger updated LUCENE-1600:
---------------------------

    Attachment: intern_perf.patch
                intern.png

attaching profiler screenshot and patch

> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: P Eger
>            Priority: Minor
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1600) Reduce usage of String.intern(), performance is terrible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699865#action_12699865 ] 

Uwe Schindler edited comment on LUCENE-1600 at 4/16/09 2:13 PM:
----------------------------------------------------------------

In addition to Mikes fixes, there are more places in FieldsReader, where intern() is used. The best would be to add the sme ctor to AbstractField, too and use it for LayzyField and so on, too.
If I have time, I attach a patch similar to Patrick's.

      was (Author: thetaphi):
    In addition to Mikes fixes, there are more places in FieldsReader, where intern() is used. The best would be to add the sme ctor to AbstractField, too and use it for LayzyField and so on, too.
If I have time, I attach a patch similar to Mikes (as he is on holidays).
  
> Reduce usage of String.intern(), performance is terrible
> --------------------------------------------------------
>
>                 Key: LUCENE-1600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1600
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.4.1
>         Environment: Windows Server 2003 x64
> Hotspot JDK 1.6.0_12 64-bit
>            Reporter: Patrick Eger
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: intern.png, intern_perf.patch
>
>
> I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.
> TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
> TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org