You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Igor Shalyminov <is...@yandex-team.ru> on 2013/06/03 19:14:30 UTC
Question on payload matching query
Hello!
I've implemented a SpanQuery class that acts like SpanPositionCheckQuery but also matches payloads.
For example, here is the "gram" field in a single indexed document:
"gram": N|1|1 sg|1|0 A|2|0 pl|2|0 A|3|0 sg|3|0
Every token's meaning is as follows:
N - grammatical annotation | 1 - parse number (payload) | 1 - position increment
So, the document has a single word position which has 3 ambiguous parses, #1 and #2, and #3. Each parse has 2 annotations, "N, sg", "A, pl", and "A, sg".
And my SpanQuery is supposed not to match annotations from different parses, e.g. "sg & pl" should not be matched, but "N & sg" should be.
The logic is:
@Override
protected AcceptStatus acceptPosition(Spans spans) throws IOException {
boolean result = spans.isPayloadAvailable();
if (result == true) {
Collection<byte[]> payloads = spans.getPayload();
int first_payload = PayloadHelper.decodeInt(payloads.iterator().next(), 0);
for (byte[] payload: payloads) {
int decoded_payload = PayloadHelper.decodeInt(payload, 0);
if(decoded_payload != first_payload) {
return AcceptStatus.NO;
}
}
}
return AcceptStatus.YES;
}
Then, for the query "sg & pl", which is a wrapped unordered SpanNearQuery: ParseMatchingSpanQuery(SpanNearQuery("gram:sg", "gram:pl", false, -1)) - acceptPosition is called the first time with payloads array containing ['1', '2'], and second time - with just a ['3']. The second match actually matches, and it's totally unintuitive to me.
To my understanding, it should be called with pairs of spans, ideally ['1', '2'], ['1', '3']. Why does it not?:)
Could you please explain to me the logic of matching with payload checking?
--
Best Begards,
Igor
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Question on payload matching query
Posted by Igor Shalyminov <is...@yandex-team.ru>.
Hi all!
Just before diving in the core Lucene code, I would like to ask once again if there are detailed tutorials on SpanQuery execution algorithm, with postings retrieval and positional data matching.
Best Regards,
Igor
03.06.13, 21:15, "Igor Shalyminov" <is...@yandex-team.ru>":
>
> Hello!
>
> I've implemented a SpanQuery class that acts like SpanPositionCheckQuery but also matches payloads.
> For example, here is the "gram" field in a single indexed document:
>
> "gram": N|1|1 sg|1|0 A|2|0 pl|2|0 A|3|0 sg|3|0
>
> Every token's meaning is as follows:
> N - grammatical annotation | 1 - parse number (payload) | 1 - position increment
>
> So, the document has a single word position which has 3 ambiguous parses, #1 and #2, and #3. Each parse has 2 annotations, "N, sg", "A, pl", and "A, sg".
> And my SpanQuery is supposed not to match annotations from different parses, e.g. "sg & pl" should not be matched, but "N & sg" should be.
>
> The logic is:
>
> @Override
> protected AcceptStatus acceptPosition(Spans spans) throws IOException {
> boolean result = spans.isPayloadAvailable();
> if (result == true) {
> Collection<byte[]> payloads = spans.getPayload();
> int first_payload = PayloadHelper.decodeInt(payloads.iterator().next(), 0);
> for (byte[] payload: payloads) {
> int decoded_payload = PayloadHelper.decodeInt(payload, 0);
> if(decoded_payload != first_payload) {
> return AcceptStatus.NO;
> }
> }
> }
> return AcceptStatus.YES;
> }
>
> Then, for the query "sg & pl", which is a wrapped unordered SpanNearQuery: ParseMatchingSpanQuery(SpanNearQuery("gram:sg", "gram:pl", false, -1)) - acceptPosition is called the first time with payloads array containing ['1', '2'], and second time - with just a ['3']. The second match actually matches, and it's totally unintuitive to me.
> To my understanding, it should be called with pairs of spans, ideally ['1', '2'], ['1', '3']. Why does it not?:)
> Could you please explain to me the logic of matching with payload checking?
>
> --
> Best Begards,
> Igor
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org