You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Kristensson (JIRA)" <ji...@apache.org> on 2011/05/17 00:17:47 UTC

[jira] [Created] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names
----------------------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-3105
                 URL: https://issues.apache.org/jira/browse/LUCENE-3105
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/index
    Affects Versions: 3.1
            Reporter: Mark Kristensson


We have one index with several hundred thousand unqiue field names (we're optimistic that Lucene 4.0 is flexible enough to allow us to change our index design...) and found that opening an index writer and closing an index reader results in horribly slow performance on that one index. I have isolated the problem down to the calls to String.intern() that are used to allow for quick string comparisons of field names throughout Lucene. These String.intern() calls are unnecessary and can be replaced with a hashmap lookup. In fact, StringHelper.java has its own hashmap implementation that it uses in conjunction with String.intern(). Rather than using a one-off hashmap, I've elected to use a ConcurrentHashMap in this patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034639#comment-13034639 ] 

Earwin Burrfoot commented on LUCENE-3105:
-----------------------------------------

StringInterner is in fact faster than CHM. And is compatible with String.intern(), ie - it returns the same String instances. It also won't eat up memory if spammed with numerous unique strings (which is a strange feature, but people requested that).
In Lucene 4.0 all of this is moot anyway, fields there are strongly separated and intern() is not used.

> String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3105
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3105
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.1
>            Reporter: Mark Kristensson
>         Attachments: LUCENE-3105.patch
>
>
> We have one index with several hundred thousand unqiue field names (we're optimistic that Lucene 4.0 is flexible enough to allow us to change our index design...) and found that opening an index writer and closing an index reader results in horribly slow performance on that one index. I have isolated the problem down to the calls to String.intern() that are used to allow for quick string comparisons of field names throughout Lucene. These String.intern() calls are unnecessary and can be replaced with a hashmap lookup. In fact, StringHelper.java has its own hashmap implementation that it uses in conjunction with String.intern(). Rather than using a one-off hashmap, I've elected to use a ConcurrentHashMap in this patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

Posted by "Mark Kristensson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Kristensson updated LUCENE-3105:
-------------------------------------

    Attachment: LUCENE-3105.patch

Patch file to eliminate String.intern() calls while opening indexReaders and closing indexWriters.

> String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3105
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3105
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.1
>            Reporter: Mark Kristensson
>         Attachments: LUCENE-3105.patch
>
>
> We have one index with several hundred thousand unqiue field names (we're optimistic that Lucene 4.0 is flexible enough to allow us to change our index design...) and found that opening an index writer and closing an index reader results in horribly slow performance on that one index. I have isolated the problem down to the calls to String.intern() that are used to allow for quick string comparisons of field names throughout Lucene. These String.intern() calls are unnecessary and can be replaced with a hashmap lookup. In fact, StringHelper.java has its own hashmap implementation that it uses in conjunction with String.intern(). Rather than using a one-off hashmap, I've elected to use a ConcurrentHashMap in this patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-3105.
---------------------------------

    Resolution: Duplicate

fixed as of lucene-2548

> String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3105
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3105
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.1
>            Reporter: Mark Kristensson
>         Attachments: LUCENE-3105.patch
>
>
> We have one index with several hundred thousand unqiue field names (we're optimistic that Lucene 4.0 is flexible enough to allow us to change our index design...) and found that opening an index writer and closing an index reader results in horribly slow performance on that one index. I have isolated the problem down to the calls to String.intern() that are used to allow for quick string comparisons of field names throughout Lucene. These String.intern() calls are unnecessary and can be replaced with a hashmap lookup. In fact, StringHelper.java has its own hashmap implementation that it uses in conjunction with String.intern(). Rather than using a one-off hashmap, I've elected to use a ConcurrentHashMap in this patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034640#comment-13034640 ] 

Earwin Burrfoot commented on LUCENE-3105:
-----------------------------------------

Hmm.. Ok, it *is* still used, but that's gonna be fixed, mm?

> String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3105
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3105
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.1
>            Reporter: Mark Kristensson
>         Attachments: LUCENE-3105.patch
>
>
> We have one index with several hundred thousand unqiue field names (we're optimistic that Lucene 4.0 is flexible enough to allow us to change our index design...) and found that opening an index writer and closing an index reader results in horribly slow performance on that one index. I have isolated the problem down to the calls to String.intern() that are used to allow for quick string comparisons of field names throughout Lucene. These String.intern() calls are unnecessary and can be replaced with a hashmap lookup. In fact, StringHelper.java has its own hashmap implementation that it uses in conjunction with String.intern(). Rather than using a one-off hashmap, I've elected to use a ConcurrentHashMap in this patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034723#comment-13034723 ] 

Uwe Schindler commented on LUCENE-3105:
---------------------------------------

Yes it's gonna fixed, see linked issue LUCENE-2548. The biggest problem is Solr at the moment. The other things are minor identity vs. equals in FieldCache.

> String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3105
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3105
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.1
>            Reporter: Mark Kristensson
>         Attachments: LUCENE-3105.patch
>
>
> We have one index with several hundred thousand unqiue field names (we're optimistic that Lucene 4.0 is flexible enough to allow us to change our index design...) and found that opening an index writer and closing an index reader results in horribly slow performance on that one index. I have isolated the problem down to the calls to String.intern() that are used to allow for quick string comparisons of field names throughout Lucene. These String.intern() calls are unnecessary and can be replaced with a hashmap lookup. In fact, StringHelper.java has its own hashmap implementation that it uses in conjunction with String.intern(). Rather than using a one-off hashmap, I've elected to use a ConcurrentHashMap in this patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org