You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Hasenberger, Josef" <Jo...@zetcom.com> on 2012/12/11 16:19:00 UTC

Opposite of SpanFirstQuery - Searching for documents by last term in a field

Hi,

I wonder if there is a way to use a SpanQuery to find documents with fields that end with a certain term.
Kind of the oppoisite of SpanFirstQuery, i.e. "SpanLastQuery", if you want.

What I would like to do:
Find terms that are at the end of a field.

Example:
Assume the following field content: "the quick brown fox jumps over the lazy dog".
I would like to find all documents that have "dog" at the end of the field.

Any idea, how I can achieve this?

Thanks a lot and best regards,

Josef

Re: Opposite of SpanFirstQuery - Searching for documents by last term in a field

Posted by Alan Woodward <al...@flax.co.uk>.

I’ve done this before by appending a special token to text fields via a TokenFilter.  It hasn’t caused a noticeable problem with term stats, and field:* still works because the token is only added if the document in question actually has data in that particular field.

Alan Woodward
www.flax.co.uk


> On 14 Dec 2016, at 05:02, Trejkaz <tr...@trypticon.org> wrote:
> 
> On Wed, Dec 12, 2012 at 3:04 AM, Ian Lea <ia...@gmail.com> wrote:
>> The javadoc for SpanFirstQuery says it is a special case of
>> SpanPositionRangeQuery so maybe you can use the latter directly,
>> although you might need to know the position of the last term which
>> might be a problem.
>> 
>> Alternatives might include reversing the terms and using SpanFirst or
>> adding a special "thisistheend" token to each field and using
>> SpanNearQuery for dog and thisistheend with suitable value for slop
>> and inOrder = true.
>> 
>> Or take the last term and index it in a separate field so you can just
>> search for lastterm: dog.
> 
> Idly wondering whether anyone has figured out a good way yet in the
> time elapsed since last asked.
> 
> Here's my problems with the existing ideas:
> 
> 1. (Using SpanPositionRangeQuery) I am not really sure how to get the
> position of the last term.
> 
> 2. (Using a special token) Adding a token to every document skews term
> statistics and requires manually filtering it out of term listings.
> Additionally it ruins certain wildcard queries like field:* since now
> every field will match.
> 
> 3. (Indexing the last term(s) in a separate field) In our case we
> don't know how far from the end of the content the user will enter
> into the query. They might write:
> 
>  term w/10 end-of-content
>  term w/1000 end-of-content
>  ...
> 
> Other ideas:
> 
> 4. Storing all the content twice initially seems to be a potential
> solution, but starts looking very hard once you combine queries. For
> instance, what about this:
> 
>  (term w/10 start-of-content) w/30 (another-term w/10 end-of-content)
> 
> 5. Put a payload the last term and then _somehow_ (I have no idea how
> payload queries work yet) use payload queries to do spans from that.
> 
> 
> Is there any good solution to this that people have already figured
> out? Is there another SpanPositionCheckQuery subclass that could be
> written which somehow fetches the last position in the document from
> the acceptPosition method?
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Re: Opposite of SpanFirstQuery - Searching for documents by last term in a field

Posted by Trejkaz <tr...@trypticon.org>.

On Wed, Dec 12, 2012 at 3:04 AM, Ian Lea <ia...@gmail.com> wrote:
> The javadoc for SpanFirstQuery says it is a special case of
> SpanPositionRangeQuery so maybe you can use the latter directly,
> although you might need to know the position of the last term which
> might be a problem.
>
> Alternatives might include reversing the terms and using SpanFirst or
> adding a special "thisistheend" token to each field and using
> SpanNearQuery for dog and thisistheend with suitable value for slop
> and inOrder = true.
>
> Or take the last term and index it in a separate field so you can just
> search for lastterm: dog.

Idly wondering whether anyone has figured out a good way yet in the
time elapsed since last asked.

Here's my problems with the existing ideas:

1. (Using SpanPositionRangeQuery) I am not really sure how to get the
position of the last term.

2. (Using a special token) Adding a token to every document skews term
statistics and requires manually filtering it out of term listings.
Additionally it ruins certain wildcard queries like field:* since now
every field will match.

3. (Indexing the last term(s) in a separate field) In our case we
don't know how far from the end of the content the user will enter
into the query. They might write:

  term w/10 end-of-content
  term w/1000 end-of-content
  ...

Other ideas:

4. Storing all the content twice initially seems to be a potential
solution, but starts looking very hard once you combine queries. For
instance, what about this:

  (term w/10 start-of-content) w/30 (another-term w/10 end-of-content)

5. Put a payload the last term and then _somehow_ (I have no idea how
payload queries work yet) use payload queries to do spans from that.

Is there any good solution to this that people have already figured
out? Is there another SpanPositionCheckQuery subclass that could be
written which somehow fetches the last position in the document from
the acceptPosition method?

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Opposite of SpanFirstQuery - Searching for documents by last term in a field

Posted by Ian Lea <ia...@gmail.com>.

The javadoc for SpanFirstQuery says it is a special case of
SpanPositionRangeQuery so maybe you can use the latter directly,
although you might need to know the position of the last term which
might be a problem.

Alternatives might include reversing the terms and using SpanFirst or
adding a special "thisistheend" token to each field and using
SpanNearQuery for dog and thisistheend with suitable value for slop
and inOrder = true.

Or take the last term and index it in a separate field so you can just
search for lastterm: dog.

--
Ian.

On Tue, Dec 11, 2012 at 3:19 PM, Hasenberger, Josef
<Jo...@zetcom.com> wrote:
> Hi,
>
> I wonder if there is a way to use a SpanQuery to find documents with fields that end with a certain term.
> Kind of the oppoisite of SpanFirstQuery, i.e. "SpanLastQuery", if you want.
>
> What I would like to do:
> Find terms that are at the end of a field.
>
> Example:
> Assume the following field content: "the quick brown fox jumps over the lazy dog".
> I would like to find all documents that have "dog" at the end of the field.
>
> Any idea, how I can achieve this?
>
> Thanks a lot and best regards,
>
> Josef
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org