You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2010/12/16 01:47:01 UTC

[jira] Created: (LUCENE-2815) MultiFields not thread safe

MultiFields not thread safe
---------------------------

                 Key: LUCENE-2815
                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
             Project: Lucene - Java
          Issue Type: Bug
    Affects Versions: 4.0
            Reporter: Yonik Seeley


MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2815) MultiFields not thread safe

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2815:
---------------------------------------

    Fix Version/s: 4.0

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved LUCENE-2815.
----------------------------------

    Resolution: Fixed

committed.

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2815.patch, LUCENE-2815.patch
>
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-2815:
---------------------------------

    Attachment: LUCENE-2815.patch

Here's an updated patch that avoids containsKey() followed by get() (just an optimization)
and avoids caching null Terms instances.  This is the right thing to do anyway, since one could easily blow up the cache with fields that don't exist.

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2815.patch, LUCENE-2815.patch
>
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972505#action_12972505 ] 

Yonik Seeley commented on LUCENE-2815:
--------------------------------------

Hmmm, this patch causes test failures because ConcurrentHashMap doesn't handle null values.


> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2815.patch
>
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972281#action_12972281 ] 

Yonik Seeley commented on LUCENE-2815:
--------------------------------------

I was going to fix InstantiatedIndex, but while I was in there, I saw a lot of non-threadsafe code.  I think that really deserves it's own issue.
What range of docs is InstantiatedIndex faster for, and is it something we want to continue to maintain?

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-2815:
---------------------------------

    Attachment: LUCENE-2815.patch

Here's a patch that uses a ConcurrentHashMap for the Terms cache, and makes IndexReader.fields volatile.

That IndexReader.fields variable is just the type of stuff that could just be stored in a generic cache on the IndexReader, if/when we get something like that.

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2815.patch
>
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2815) MultiFields not thread safe

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972057#action_12972057 ] 

Michael McCandless commented on LUCENE-2815:
--------------------------------------------

Ugh, nice finds Yonik!  We should fix these.

Maybe MultiFields should just pre-build its Map<String,Term> on init?

You're right, we do reuse MultiFields today (we stuff the instance of MultiFields onto the IndexReader with IndexReader.store/retrieveFields), but I wonder whether we really should?  (In fact I thought at one point we decided to stop doing that... yet, we still are... can't remember the details; maybe perf hit was too high eg for MTQs/Solr facets/etc.).

What do we need to do to make the publication safe?  Is making IR.store/retrieveFields sync'd sufficient?

Aside: Java concurrency is a *mess*.  I understand why JMM is needed, to get good perf on modern CPUs, but allowing the low level CPU cache coherency requirements to bubble all the way up to complex requirements in the language itself, is a disaster.

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972096#action_12972096 ] 

Yonik Seeley commented on LUCENE-2815:
--------------------------------------

bq. but I wonder whether we really should? (In fact I thought at one point we decided to stop doing that... yet, we still are... can't remember the details; maybe perf hit was too high eg for MTQs/Solr facets/etc.).

It wouldn't be solr facets... that code asks for fields() once up front (per facet request) and the rest of the work will dwarf that.
I think there probably are a lot of random places that use it where the overhead could be significant.  For example IndexReader.deleteDocuments(), ParallelReader, FuzzyLikeThisQuery, and anyone else that uses any of the static methods on Field on a non-segment reader.

bq. What do we need to do to make the publication safe? Is making IR.store/retrieveFields sync'd sufficient?

More than sufficient.  A volatile would also work fine provided that a race shouldn't matter (i.e. more than one MultiFields object could be constructed).

bq. Maybe MultiFields should just pre-build its Map<String,Term> on init?

Ouch... those folks with 1000s of fields wouldn't be happy about that.

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971930#action_12971930 ] 

Yonik Seeley commented on LUCENE-2815:
--------------------------------------

It looks like MultiReaderBits also has issues with safe object publication.

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971909#action_12971909 ] 

Yonik Seeley commented on LUCENE-2815:
--------------------------------------

So I started looking into MultiFields more closely, just from a performance perspective, and noticed a couple of thread safety issues:
- MultiFields is reused (a good thing), but isn't safely published to other threads.
- MultiFields has a HashMap that is used and modified unsynchronized to cache MultiTerms instances

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2815) MultiFields not thread safe

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972258#action_12972258 ] 

Yonik Seeley commented on LUCENE-2815:
--------------------------------------

bq. It looks like MultiReaderBits also has issues with safe object publication. 

Actually, it looks like this one is OK with most of our current code.
SegmentReader.getDeletedDocs() returns an object stored in a volatile, so that counts as a safe publish.  Other implementations seem to either throw an exception or directly call a segment reader.  One exception is instantiated index (I think).

We can't call getDeletedDocs() just once up-front, because an IndexReader may still be used to delete documents.

> MultiFields not thread safe
> ---------------------------
>
>                 Key: LUCENE-2815
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2815
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>
> MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org