You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ivan Provalov (Jira)" <ji...@apache.org> on 2020/02/28 17:26:00 UTC

[jira] [Created] (SOLR-14293) Payloads Are Written or Read Incorrectly - Across the Documents

Ivan Provalov created SOLR-14293:
------------------------------------

             Summary: Payloads Are Written or Read Incorrectly - Across the Documents
                 Key: SOLR-14293
                 URL: https://issues.apache.org/jira/browse/SOLR-14293
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: search
    Affects Versions: 8.3.1, 7.7.2, 6.3, 5.5.5, 5.1
            Reporter: Ivan Provalov
         Attachments: TestPayloads.java

I noticed a weird payload behavior with Solr 6.3.0, also 7.7.2 and 8.3.1.  After writing the Lucene62Codec specific unit test (see attached, also can be run with the later versions) I think there could be a bug which allows for the same term payloads to be written into another document's same term payload (or the second payload for the second document not being read correctly).  
 
For comparison, I added SimpleTextCodec which doesn't behave this way. 
 
For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to MultiTerms.getTermPostingsEnum(...).
 
Thanks to Alan Woodward, I made the necessary changes to the analyzer to address the sharing of the TokenStreamComponents which was used in the TestPayloads class.  Now I use non-mocked tokenizer and a new filter which would create a random payload (see attached).  So, doc one and two will have the same token, but different payloads.  

I extended the testing to these versions:
<!--5.1.0-->
<!--5.5.5-->
<!--6.3.0-->
<!--7.7.2-->
<!--8.3.1-->

Same idea, SimpleTextCodec passes the test, but these ones don't:

//import org.apache.lucene.codecs.lucene50.Lucene50Codec;
//import org.apache.lucene.codecs.lucene54.Lucene54Codec;
//import org.apache.lucene.codecs.lucene62.Lucene62Codec;
//import org.apache.lucene.codecs.lucene70.Lucene70Codec;
//import org.apache.lucene.codecs.lucene80.Lucene80Codec;

This is also an issue on a running 6.3.0 Solr instance.  
 
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org