You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vadim Gindin <vg...@detectum.com> on 2018/03/23 08:15:14 UTC
Postings.getPayload() returns null

Hi all.

I have a simplified test, that defines an index with one document and one
field in it. Just one filter is defined in indexing analyzer that writes 1
byte to payload. My *goal *is to write numbers to payload for each position
of each term, that will be used in custom scoring formula implemented in
custom Query/Scorer.

When I'm trying to read written payload (in the custom query) I got NULL. I
suppose I should call posting.nextPosition() before calling getPayload()
method, but when I called nextPosition() I got an error:

java.lang.AssertionError: got line=field model

at __randomizedtesting.SeedInfo.seed([D334C9D1B5C155E3:2AAE4BE5481F4C8F]:0)
at
org.apache.lucene.codecs.simpletext.SimpleTextFieldsReader$SimpleTextPostingsEnum.nextPosition(SimpleTextFieldsReader.java:455)

I also used SimpleTextCodec as you see to make sure that payload was really
written to index along with positions. It is really written. I probably do
something wrong with positions or reading it incorrectly or missed
something important.

*Question*: What am I doing wrong? How to read/write payload correctly?

Here is my test:


public class PayloadTest extends LuceneTestCase {
    private IndexSearcher searcher;
    private IndexReader reader;
    private byte[] payloadField = new byte[]{1};
    protected Directory directory;

    private class PayloadAnalyzer extends Analyzer {
        @Override
        public TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new LowerCaseTokenizer();
            PayloadFilter filter = new PayloadFilter(tokenizer, fieldName);
            return new TokenStreamComponents(tokenizer, filter);
        }
    }

    private class PayloadFilter extends TokenFilter {
        PayloadAttribute payloadAtt;
        PositionIncrementAttribute positionAtt;

        public PayloadFilter(TokenStream input, String fieldName) {
            super(input);
            payloadAtt = addAttribute(PayloadAttribute.class);
            positionAtt =
addAttribute(PositionIncrementAttribute.class); // I tried also
without position attribute here with the same error
        }

        @Override
        public boolean incrementToken() throws IOException {
            boolean hasNext = input.incrementToken();
            if (hasNext) {
                payloadAtt.setPayload(new BytesRef(payloadField));
                positionAtt.setPositionIncrement(1);  // I tried also
without position attribute here with the same error
                return true;
            } else {
                return false;
            }
        }
    }

    @Override
    public void setUp() throws Exception {
        super.setUp();
        directory = newDirectory();
        RandomIndexWriter writer = new RandomIndexWriter(random(), directory,
                newIndexWriterConfig(new PayloadAnalyzer())
                        .setMergePolicy(newLogMergePolicy())
                        .setCodec(new SimpleTextCodec()));

        Document doc = new Document();
        doc.add(new TextField("model", "ford focus", Field.Store.YES));
        writer.addDocument(doc);

        reader = writer.getReader();
        writer.close();

        searcher = newSearcher(reader);
    }

    @Override
    public void tearDown() throws Exception {
        reader.close();
        directory.close();
        super.tearDown();
    }

    public void testQuery() throws IOException {
        int limit = 20;
        try (IndexReader reader = DirectoryReader.open(directory)) {
            Query query = new CustomPhraseQuery(
                    Arrays.asList("ford", "focus"),
                    new HashMap<String, Float>() {{
                        put("model", 5.0f);
                    }},
                    new HashMap<String, List<String>>() {{
                        put("ford", Arrays.asList("ford^1.0"));
                        put("focus", Arrays.asList("focus^1.0"));
                    }},
                    Arrays.asList("model"),
                    null
            );

            printSearchResults(limit, query, reader);
        }
    }

    private static void printSearchResults(final int limit, final Query query,
                                          final IndexReader reader)
throws IOException {
        IndexSearcher searcher = new IndexSearcher(reader);
        TopDocs docs = searcher.search(query, limit);

        System.out.println(docs.totalHits + " found for query: " + query);

        for (final ScoreDoc scoreDoc : docs.scoreDocs) {
            System.out.println(searcher.doc(scoreDoc.doc));
        }
    }
}

Here is the code from CustomPhraseQuery.scorer():

for (String field: fieldScores.keySet()) {
    final Terms fieldTerms = reader.terms(field);
    if (fieldTerms == null) {
        continue;
    }

    if (!fieldTerms.hasPositions())
        throw new IllegalStateException("Index does not contain positions");

    if (!fieldTerms.hasPayloads())
        throw new IllegalStateException("Index does not contain payloads");

    final TermsEnum te = fieldTerms.iterator();
    for (int j = 0; j < terms.length; j++) {
        final Term t = terms[j];

        if (t.field().equals(field) && te.seekExact(t.bytes())) {
            PostingsEnum postingsEnum = te.postings(null, PostingsEnum.ALL);

            int pos = postingsEnum.nextPosition();
            BytesRef payload = postingsEnum.getPayload();
            // assert payload.bytesEquals(new BytesRef(new byte[]{1}));

            // TODO: use payload in scoring formula
            fldScorers.add(new ConstTermScorer(this, t,
                    fieldScores.get(field) * termScores.get(t.text()),
postingsEnum));
        }
    }
}

Regards,
Vadim Gindin