You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2012/04/30 08:26:49 UTC

[jira] [Created] (SOLR-3424) PhoneticFilterFactory threadsafety bug

David Smiley created SOLR-3424:
----------------------------------

             Summary: PhoneticFilterFactory threadsafety bug
                 Key: SOLR-3424
                 URL: https://issues.apache.org/jira/browse/SOLR-3424
             Project: Solr
          Issue Type: Bug
          Components: Schema and Analysis
    Affects Versions: 3.6, 4.0
            Reporter: David Smiley
            Assignee: David Smiley
            Priority: Minor
             Fix For: 4.0


PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.

I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-3424:
-------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

My new patch creates & initializes the encoder per call to create(). The testSpeed() tests seem about the same. I also added a test for the maxCodeLen arg, and i documented it too.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-3424:
-------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Here is an updated patch. I enhanced the javadocs too, which have become out of date.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267264#comment-13267264 ] 

David Smiley commented on SOLR-3424:
------------------------------------

So it appears the Commons-Codec Encoders are all thread-safe after all:
http://apache-commons.680414.n4.nabble.com/Thread-safety-of-Commons-Codec-Encoder-tt4600956.html
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265862#comment-13265862 ] 

David Smiley commented on SOLR-3424:
------------------------------------

Uwe: Excellent catch RE my "static" bug, and I agree on your improvement RE clazz & null.

I tried to ascertain the thread-safety status of Commons-Codec and unfortunately I don't think we can assume it's thread-safe.  I sent an email to their list about this just now: http://apache-commons.680414.n4.nabble.com/Thread-safety-of-Commons-Codec-Encoder-tt4600956.html

So I agree that we'll have to insatiate one of these in our create() method instead of caching it.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268053#comment-13268053 ] 

David Smiley commented on SOLR-3424:
------------------------------------

Looks good Uwe.
One minor nit is that the URL in the class Javadoc to Commons-Codec's Language package should be this URL:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/language/package-summary.html   which is the one to 1.6; the existing link is older with fewer classes.  We've got this link in both the FilterFactory & Filter.

Feel free to commit if you want or just leave it to me.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267272#comment-13267272 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,
I don't really trust that thread so I would really go with the current solution. In Lucene 4.0, reuse of TokenStreams is *enforced* so there is no cost at all by the reflection and the createInstance. I also improved the reflective method call (which has the side effect of being able to print a useful message when the encoder does not support setMaxLen, the old code throwed unspecified and unhelpful Exception).

In general, the commons encoders might be thread safe, but we use reflection and use any class the user provides and other classes implementing the abstract Encoder interface may not be thread safe.

As it costs us nothing (Class.newInstance() is cheap and only called once a new thread is used and the SolrAnalyzer class has to create new TokenStreamComponents), we should stay with the current code. The thread-safety is then ensured by the Analyzer class (it uses thread-locals to reuse TokenStreams).

What do you think about my latest patch?
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Closed] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley closed SOLR-3424.
------------------------------

    Resolution: Fixed

Committed to trunk in 1334544.

I left the commons-coded javadoc url as-is since it was finally updated to the latest version.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-3424:
-------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

The attached patch fixes the problem by basically "upgrading" the code to use Guava's MapMaker which is excellent for caches. I keep 2 distinct registries, the constant builtin one, and the custom class one, because there is a distinction between the two which is the capitalization of the keys.  I couldn't have just one MapMaker map because the key lookup of the computed value should be the original class name uncapitalized.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264730#comment-13264730 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

The whole design of this cache is wrong:
- it should not be static, the resolved class lookups should only be cached per instance
- the cache only caches "resolves" not instantiations of encoders, the JVM caches Class.forName() lookups, so why cache it again?
- To fix the orginal issue, a ConcurrentHashMap would also be fine
- Solr/Lucene use Analyzers which are now enforced to reuse tokenstreams, so the lookup is only done once when IndexWriter starts and a new indexing thread is created

The last point tells me: "remove this cache"
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267229#comment-13267229 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

+1

I wanted to review the stock Lucene Analyzer about thread safety, too.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267272#comment-13267272 ] 

Uwe Schindler edited comment on SOLR-3424 at 5/3/12 7:27 AM:
-------------------------------------------------------------

Hi David,
I don't really trust that thread so I would really go with the current solution. In Lucene 4.0, reuse of TokenStreams is *enforced* so there is no cost at all by the reflection and the createInstance. I also improved the reflective method call (which has the side effect of being able to print a useful message when the encoder does not support setMaxLen, the old code throwed unspecified and unhelpful Exception).

In general, the commons encoders might be thread safe, but we use reflection and use any class the user provides and other classes implementing the abstract Encoder interface may not be thread safe (unless this is explicitely specified in their spec / javadocs - which isn't)

As it costs us nothing (Class.newInstance() is cheap and only called once a new thread is used and the SolrAnalyzer class has to create new TokenStreamComponents), we should stay with the current code. The thread-safety is then ensured by the Analyzer class (it uses thread-locals to reuse TokenStreams).

What do you think about my latest patch?
                
      was (Author: thetaphi):
    Hi David,
I don't really trust that thread so I would really go with the current solution. In Lucene 4.0, reuse of TokenStreams is *enforced* so there is no cost at all by the reflection and the createInstance. I also improved the reflective method call (which has the side effect of being able to print a useful message when the encoder does not support setMaxLen, the old code throwed unspecified and unhelpful Exception).

In general, the commons encoders might be thread safe, but we use reflection and use any class the user provides and other classes implementing the abstract Encoder interface may not be thread safe.

As it costs us nothing (Class.newInstance() is cheap and only called once a new thread is used and the SolrAnalyzer class has to create new TokenStreamComponents), we should stay with the current code. The thread-safety is then ensured by the Analyzer class (it uses thread-locals to reuse TokenStreams).

What do you think about my latest patch?
                  
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated SOLR-3424:
--------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Updated patch, I added a testcase for the reflection without package name:
- I use "Caverphone2" as package less name, which is not in the REGISTRY. The returned class should be Caverphone2
- I cross-check with the REGISTRY name "Caverphone", which should also return the new Caverphone2 encoder, but without reflection
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266617#comment-13266617 ] 

David Smiley commented on SOLR-3424:
------------------------------------

I'll commit it in ~24 hours.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265696#comment-13265696 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,

looks good, but there is a bug:
The refactoring of the REGISTRY static map was an anonymous inner class before (new HashMap() {{ hashmap ctor code }}; (please note double parens). Now it is assigned to a static field, but the initializer code is an inline ctor of the factory, means the initialization runs on every instantiation.

- add *static* before the anonymous ctor:
{code:java}
private static final Map<String, Class<? extends Encoder>> registry = new HashMap<String, Class<? extends Encoder>>(6);
static {
 registry.put(...
{code}

- or leave the initializer as it was before (anonymous HashMap subclass with overridden ctor).

in resolve encoder maybe remove the clazz variable and directly return the result of forName(). I do't like non-final variables initialized with null because that prevents compilation failures on missing initialization and null could be returned - especially with lots of try-catch blocks.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated SOLR-3424:
--------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Attached is a patch that fixes the initialization bug and improves the reflection code.

*One thing that came into my mind when I looked at the code of PhoneticFilter:* I did not find out if Encoders are threadsafe or not. If they aren't or we are not sure (the documentation states nothing), we should create the Encoder class instance in the TokenStream create() method. TokenStreams itsself are only used per thread (iterator pattern), and the Analyzer reuse ensures that we have a separate instance for each thread.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264986#comment-13264986 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Thanks! Thaks for taking care, as I have no time to provide a patch at the moment :(

bq. but I will since it seems to be a big deal to you (three exclamation points)

:-) It's not so important, if it's private to the class! I just don't want public code to expose Collections not intended to be modified in a modifiable way, so i am fine with or without unmodifiable. There are also places in Lucene violating this (or use public arrays, which cannot be protected - like CompositeReader.getSequentialSubReaders()).
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264976#comment-13264976 ] 

David Smiley commented on SOLR-3424:
------------------------------------

Rob: This is a (minor) thread-safety bug and consequently it's not really feasible to test it without a lot of work and without a guarantee at the end that there is no problem, and so it isn't worth it.  Of course if you want to; go for it ;-)

Uwe: Thanks very much for your comments.  I wasn't sure what the lifecycle of this class was or if/how it is shared; I looked at the javadocs of TokenFilterFactory just now and I think its clearer.  Given that the Factory's init() method is not going to be called frequently, it seems to me that the class name based lookups need not cache at all.  And I also like your suggestion of improving the fallback lookup by looking for a '.'.  BTW I considered wrapping with unmodifiableMap but didn't bother because it's not exposed outside of this class, which is fairly small too, but I will since it seems to be a big deal to you (three exclamation points).  I'll propose another patch later

~ David
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269064#comment-13269064 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,
just commit it, I am about to leave for business trips (including Lucene Rev) so have not much time.
Uwe
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264688#comment-13264688 ] 

David Smiley commented on SOLR-3424:
------------------------------------

And I also noticed that Commons-Codec's Caverphone.class was deprecated for Caverphone2.class so I made that simple change.  The docs say the deprecated one simply forwards calls, so there should be no side-effects of this change.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264720#comment-13264720 ] 

Robert Muir commented on SOLR-3424:
-----------------------------------

where is the test for the bug?

why does this factory have a cache at all?! 

                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267233#comment-13267233 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

There seems to be no Analyzer in Lucene that uses this class. The phonetic package only contains the plain TokenFilters and those are single-thread only.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264741#comment-13264741 ] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

One thing to add:

bq. the cache only caches "resolves" not instantiations of encoders, the JVM caches Class.forName() lookups, so why cache it again?

You may argue, that the lookup may be more expensive, as it uses a try..catch block (first try in expected commons package, later try a full class name). I think the chain of Try...Catch with ClassNoFound should be changed to do it more smart: If the name of encoder contains a period (indexOf('.')>=0), look it up as class name, otherwise prefix it with the commons package name. This way, the JVM cache for loaded classes can be used and the cache is completely useless.


I like in your fix, that it also changed the broken uppercasing: It should only do that for the builtins, class names itsself are case sensitive.

+1 to remove the cache and only keep the static builtins (and please: as Collections.unmodifiableMap!!!), lookup non-builtins without try...catch(ClassNotFound). The error message should only list the builtins and mention that otherwise the name should be a full class name.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated SOLR-3424:
--------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Hi, I looked again at the code and found the following improvement, attached as new patch:

The reflection based-method lookup should be done in the initialization. The pointer to the method is then saved in a instance field, so the getEncoder() method only needs to invoke. This removes more than half of the reflection cost.
The pointer to the method is null, if no maxLength is set.
I also did some minor cleanups in Exception handling.
                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name).  However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org