You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Marc Hadfield <ma...@animarc.com> on 2005/10/10 22:35:42 UTC
query across fields?
hello -
i am looking to perform queries efficiently across multiple fields that
have their token order synchronized, ie:
Field_A[100] has some relationship to Field_B[100]
for example, consider two fields, one the full text of an article and
the other the "type" of the token where type could be from { person,
company, date, ... }
So that for a Document:
Field_A : "Fred Johnson worked for Johnson and Johnson in 2001"
Field_B : "name name other other company company company other date"
and we wish to perform a query:
Field_A:"Johnson" AND Field_B:"name"
which would be true for token number 2 but not for 5 and 7
I think Span Queries could be adapted to this purpose, but I wanted to
get any thoughts from the list.
I would prefer not to mix the full text and "types" in the same field as
it would make the term positions inconsistent which i depend on for
other queries.
In principle I could store the full text in two fields with the second
field containing the types without incrementing the token index. Then,
do a SpanQuery for "Johnson" and "name" with a distance of 0. The
resulting match would have a token position which would refer back to
the matching position in the first field. I don't know if this is a
really good idea.
Any thoughts?
---Marc Hadfield
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Doug Cutting <cu...@apache.org>.
Marc Hadfield wrote:
> I'll give Span Query's a try as they can handle the 0 increment issue.
Note that PhraseQuery can now handle this too.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Marc Hadfield <ma...@animarc.com>.
Thanks Doug -
I'll give Span Query's a try as they can handle the 0 increment issue.
My original desire to have more than one field comes from my document
represention which includes multiple fields containing (the same)
document text using different stemmers, as, depending on the type of
query, i may need to use results from a different stemmer. this is
necessary as we index text containing complex biological names and some
otherwise useful stemming chokes on the names.
So, although my immediate need is met, I would still be interested in
considering how cross-field queries might work.
----Marc
Doug Cutting wrote:
> Marc Hadfield wrote:
>
>> I actually mention your option in my email:
>>
>>> In principle I could store the full text in two fields with the
>>> second field containing the types without incrementing the token
>>> index. Then, do a SpanQuery for "Johnson" and "name" with a
>>> distance of 0. The resulting match would have a token position
>>> which would refer back to the matching position in the first field.
>>> I don't know if this is a really good idea.
>>
>>
>> ie Field_B = full text interlaced with "types" following each full
>> text token with positionIncrement=0
>
>
> Sorry, you confused me when you spoke of this as "two fields" when
> only one field is required.
>
>> However, as far as I understand, the standard TermQuery won't let me
>> check if "Johnson" and "__name__" occur at the **same** position.
>> Perhaps, as I ask above, a SpanQuery will allow multiple terms with a
>> distance of zero (0) , that is they were indexed with
>> positionIncrement=0 and SpanQuery can handle 0 distance terms?
>
>
> TermQuery certainly won't, since it only concerns a single term. But
> PhraseQuery now has an add(Term, position) method that should do the
> trick. And SpanNearQuery should work.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Marc Hadfield <ma...@animarc.com>.
thanks again!
Doug Cutting wrote:
> Marc Hadfield wrote:
>
>> In the SpanNear (or for that matter PhraseQuery), one can set a slop
>> value where 0 (zero) means one following after the other.
>>
>> How can one differentiate between Terms at the **same** position vs.
>> one after the other?
>
>
> The following queries only match "x" and "y" at the same position:
>
> Query pq = new PhraseQuery();
> pq.add(new Term("f", "x"), 0);
> pq.add(new Term("f", "y"), 0);
>
> Query sq =
> new SpanNearQuery(new SpanQuery[]
> { new SpanTermQuery(new Term("f", "x")),
> new SpanTermQuery(new Term("f", "y")) },
> 0, false);
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Doug Cutting <cu...@apache.org>.
Marc Hadfield wrote:
> In the SpanNear (or for that matter PhraseQuery), one can set a slop
> value where 0 (zero) means one following after the other.
>
> How can one differentiate between Terms at the **same** position vs. one
> after the other?
The following queries only match "x" and "y" at the same position:
Query pq = new PhraseQuery();
pq.add(new Term("f", "x"), 0);
pq.add(new Term("f", "y"), 0);
Query sq =
new SpanNearQuery(new SpanQuery[]
{ new SpanTermQuery(new Term("f", "x")),
new SpanTermQuery(new Term("f", "y")) },
0, false);
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Marc Hadfield <ma...@animarc.com>.
Hello -
a quick follow-up to my previous post.
In the SpanNear (or for that matter PhraseQuery), one can set a slop
value where 0 (zero) means one following after the other.
How can one differentiate between Terms at the **same** position vs. one
after the other?
ie:
(Token)/Position
(A)/0 (B)/1 (C)/2 ....
vs
( A B )/0 (C)/1 (D)/2 ...
How can a SpanNear (or anything) query for A,B tell these two cases apart?
---Marc
Doug Cutting wrote:
> Marc Hadfield wrote:
>
>> I actually mention your option in my email:
>>
>>> In principle I could store the full text in two fields with the
>>> second field containing the types without incrementing the token
>>> index. Then, do a SpanQuery for "Johnson" and "name" with a
>>> distance of 0. The resulting match would have a token position
>>> which would refer back to the matching position in the first field.
>>> I don't know if this is a really good idea.
>>
>>
>> ie Field_B = full text interlaced with "types" following each full
>> text token with positionIncrement=0
>
>
> Sorry, you confused me when you spoke of this as "two fields" when
> only one field is required.
>
>> However, as far as I understand, the standard TermQuery won't let me
>> check if "Johnson" and "__name__" occur at the **same** position.
>> Perhaps, as I ask above, a SpanQuery will allow multiple terms with a
>> distance of zero (0) , that is they were indexed with
>> positionIncrement=0 and SpanQuery can handle 0 distance terms?
>
>
> TermQuery certainly won't, since it only concerns a single term. But
> PhraseQuery now has an add(Term, position) method that should do the
> trick. And SpanNearQuery should work.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Doug Cutting <cu...@apache.org>.
Marc Hadfield wrote:
> I actually mention your option in my email:
>
>> In principle I could store the full text in two fields with the second
>> field containing the types without incrementing the token index.
>> Then, do a SpanQuery for "Johnson" and "name" with a distance of 0.
>> The resulting match would have a token position which would refer back
>> to the matching position in the first field. I don't know if this is
>> a really good idea.
>
> ie Field_B = full text interlaced with "types" following each full text
> token with positionIncrement=0
Sorry, you confused me when you spoke of this as "two fields" when only
one field is required.
> However, as far as I understand, the standard TermQuery won't let me
> check if "Johnson" and "__name__" occur at the **same** position.
> Perhaps, as I ask above, a SpanQuery will allow multiple terms with a
> distance of zero (0) , that is they were indexed with
> positionIncrement=0 and SpanQuery can handle 0 distance terms?
TermQuery certainly won't, since it only concerns a single term. But
PhraseQuery now has an add(Term, position) method that should do the
trick. And SpanNearQuery should work.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Marc Hadfield <ma...@animarc.com>.
Doug Cutting wrote:
> Why not store them in the same field using positionIncrement=0 for the
> types? Then they won't change positions of non-type tokens. You
> should distinguish the types syntactically, e.g., prefix them with a
> space or other character that does not occur within words. That way
> queries on this field for the term "name" won't match a type token.
>
> Doug
Hi Doug, thanks for your reply,
I actually mention your option in my email:
> I would prefer not to mix the full text and "types" in the same field
> as it would make the term positions inconsistent which i depend on for
> other queries.
>
> In principle I could store the full text in two fields with the second
> field containing the types without incrementing the token index.
> Then, do a SpanQuery for "Johnson" and "name" with a distance of 0.
> The resulting match would have a token position which would refer back
> to the matching position in the first field. I don't know if this is
> a really good idea.
ie Field_B = full text interlaced with "types" following each full text
token with positionIncrement=0
However, as far as I understand, the standard TermQuery won't let me
check if "Johnson" and "__name__" occur at the **same** position.
Perhaps, as I ask above, a SpanQuery will allow multiple terms with a
distance of zero (0) , that is they were indexed with
positionIncrement=0 and SpanQuery can handle 0 distance terms?
---Marc
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: query across fields?
Posted by Doug Cutting <cu...@apache.org>.
Marc Hadfield wrote:
> I would prefer not to mix the full text and "types" in the same field as
> it would make the term positions inconsistent which i depend on for
> other queries.
Why not store them in the same field using positionIncrement=0 for the
types? Then they won't change positions of non-type tokens. You should
distinguish the types syntactically, e.g., prefix them with a space or
other character that does not occur within words. That way queries on
this field for the term "name" won't match a type token.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org