You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ivan Provalov (Jira)" <ji...@apache.org> on 2020/02/28 17:26:00 UTC
[jira] [Created] (SOLR-14293) Payloads Are Written or Read
Incorrectly - Across the Documents
Ivan Provalov created SOLR-14293:
------------------------------------
Summary: Payloads Are Written or Read Incorrectly - Across the Documents
Key: SOLR-14293
URL: https://issues.apache.org/jira/browse/SOLR-14293
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: search
Affects Versions: 8.3.1, 7.7.2, 6.3, 5.5.5, 5.1
Reporter: Ivan Provalov
Attachments: TestPayloads.java
I noticed a weird payload behavior with Solr 6.3.0, also 7.7.2 and 8.3.1. After writing the Lucene62Codec specific unit test (see attached, also can be run with the later versions) I think there could be a bug which allows for the same term payloads to be written into another document's same term payload (or the second payload for the second document not being read correctly).
For comparison, I added SimpleTextCodec which doesn't behave this way.
For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to MultiTerms.getTermPostingsEnum(...).
Thanks to Alan Woodward, I made the necessary changes to the analyzer to address the sharing of the TokenStreamComponents which was used in the TestPayloads class. Now I use non-mocked tokenizer and a new filter which would create a random payload (see attached). So, doc one and two will have the same token, but different payloads.
I extended the testing to these versions:
<!--5.1.0-->
<!--5.5.5-->
<!--6.3.0-->
<!--7.7.2-->
<!--8.3.1-->
Same idea, SimpleTextCodec passes the test, but these ones don't:
//import org.apache.lucene.codecs.lucene50.Lucene50Codec;
//import org.apache.lucene.codecs.lucene54.Lucene54Codec;
//import org.apache.lucene.codecs.lucene62.Lucene62Codec;
//import org.apache.lucene.codecs.lucene70.Lucene70Codec;
//import org.apache.lucene.codecs.lucene80.Lucene80Codec;
This is also an issue on a running 6.3.0 Solr instance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org