You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jeff Rodenburg <je...@gmail.com> on 2007/01/13 00:43:26 UTC

One item, multiple fields, and range queries

I'm stuck with a query issue that at present seems unresolvable.  Hoping the
community has some insight to this.

My index contains events that have multiple beginning/ending date ranges and
multiple locations.  For example, event A (uniqueId = 123) occurs every
weekend, sometimes in one location, sometimes in many locations.  Dates have
a beginning and ending date, and locations have a latitude & longitude.  I
need to query for the set of events for a given "area", where area =
bounding box.  So, a single event has multiple beginning and ending dates
and multiple locations.

So, the beginning date, ending date, latitude and longitude values only
apply collectively as a unit.  However, I need to do range queries on both
the dates and the lat/long values.

Any suggested strategies for indexing and query formulation?

thanks,
j

Re: One item, multiple fields, and range queries

Posted by Jeff Rodenburg <je...@gmail.com>.

Thanks Yonik.

> 1) model a single document as a single event at a singe place with a start
and end date.

This was my first approach, but at presentation time I need to display the
event once -- with multiple start/end dates and locations beneath it.

Is treating the given event uniqueId as a facet the way to go?

thanks,
jeff


On 1/12/07, Yonik Seeley <yo...@apache.org> wrote:
>
> On 1/12/07, Jeff Rodenburg <je...@gmail.com> wrote:
> > I'm stuck with a query issue that at present seems unresolvable.  Hoping
> the
> > community has some insight to this.
> >
> > My index contains events that have multiple beginning/ending date ranges
> and
> > multiple locations.  For example, event A (uniqueId = 123) occurs every
> > weekend, sometimes in one location, sometimes in many locations.  Dates
> have
> > a beginning and ending date, and locations have a latitude &
> longitude.  I
> > need to query for the set of events for a given "area", where area =
> > bounding box.  So, a single event has multiple beginning and ending
> dates
> > and multiple locations.
> >
> > So, the beginning date, ending date, latitude and longitude values only
> > apply collectively as a unit.  However, I need to do range queries on
> both
> > the dates and the lat/long values.
>
> 1) model a single document as a single event at a singe place with a
> start and end date.
>   OR
> 2) use multivalued fields as correlated vectors, so the first start
> date corresponds
>    to the first end date corresponds to the first lat and long value.
> You get them all back
>    in a query though, so your app would need to do extra work to sort
> out which matched.
>
> I'd do (1) if you can... it's simpler.
>
> -Yonik
>

RE: One item, multiple fields, and range queries

Posted by wojtekpia <wo...@hotmail.com>.

Hi Hoss,
I realize I'm reviving a really old thread, but I have the same need, and
SpanNumericRangeQuery sounds like a good solution for me. Can you give me
some guidance on how to implement that?

Thanks,

Wojtek

--
View this message in context: http://lucene.472066.n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p2796613.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: One item, multiple fields, and range queries

Posted by Chris Hostetter <ho...@fucit.org>.

: parallel arrays, one array per address-part field.  The parallel array 
: alignment is effected via alignment of position increments.  What's 
: missing from Solr/Lucene is the ability to constrain matches such that 
: the position increment of all matching address-part fields is the same.

It exists using Span Queries...

http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html


...this let's you construct a SpanNearQuery requiring that a span on 
fieldA occurs "near" a span on fieldB (in terms of position value, even 
though the fields are different).  

The only thing that's really missing as far as i can see is a 
"SpanNumericRangeQuery"


-Hoss

RE: One item, multiple fields, and range queries

Posted by Steven A Rowe <sa...@syr.edu>.

Hi David,

On 03/29/2010 at 4:54 PM, David Smiley (@MITRE.org) wrote:
> Did you read my original message where I suggested perhaps a solution
> might lie in intersecting different queries based on common multi-value
> field offsets derived from matching term positions?  I have no idea how
> far off the current codebase is to exposing enough information to make
> such an approach possible.

AFAICT, your above-described solution addresses the "one-to-many problem" by representing multiple records within a single document via parallel arrays, one array per address-part field.  The parallel array alignment is effected via alignment of position increments.  What's missing from Solr/Lucene is the ability to constrain matches such that the position increment of all matching address-part fields is the same.

I suspect that the Flexible Indexing branch would allow a slightly less involved index usage pattern: you could add a new term attribute that explicitly represents the record index.  That way you wouldn't have to fiddle around with increment gaps and guess about maximum record size.

You still need to perform the equivalent of an SQL table join across the matching address-part fields (in addition to any non-address constraints), using parallel array index equality as the join predicate.  I don't know how hard it would be to implement this, but you'd need to: add the ability to express this kind of constraint in the query language; make a new Similarity implementation that could handle it; and, if you go the route of adding a new record index term attribute, add a new postings codec that handles writing/reading it.

Steve

RE: One item, multiple fields, and range queries

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

Steven,

The composite doc idea is an interesting avenue to a solution here that I didn't think of.  What's missing is code to do the group by and then do an intersection in order to get boolean AND behavior between the addresses and primary documents, and  then filter out the non-primary documents.  Perhaps Solr's popular field-collapsing patch would be a starting point.

I realize of course that Lucene/Solr isn't a database but there is plenty of gray area in-between.

Did you read my original message where I suggested perhaps a solution might lie in intersecting different queries based on common multi-value field offsets derived from matching term positions?  I have no idea how far off the current codebase is to exposing enough information to make such an approach possible.

~ David Smiley

From: Steven A Rowe [via Lucene] [mailto:ml-node+684371-1863547009-13222@n3.nabble.com]
Sent: Monday, March 29, 2010 4:29 PM
To: Smiley, David W.
Subject: RE: One item, multiple fields, and range queries

Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
> I'm not sure what to make of "or index using a heterogeneous field
> schema, grouping the different doc type instances with a unique key
> (the one) to form a composite doc"

Lucene is schema-free - you can mix and match different document types in a single index.  You could emulate this in Solr by merging the two document types and leaving blank the parts that are inapplicable to a given instance.  E.g.:

Address-doc-type:
        Field: Unique-key
        Field: Street
        Field: City
        ...

Everything-else-doc-type:
        Field: Unique-key
        Field: Blob-o'-text
        ...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...
....

> I could use the scheme you mention provided with the spanNear query but
> it conflates different fields into one indexed field which will mess
> with the scoring and make queries like range queries if there are dates
> involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use cases where the attendant scoring distortion would be acceptable, e.g. non-scoring filters.  (Stuffing a variable number of addresses into a single document will also "mess with the scoring" unless you turn off norms, which is of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery implementations on the mailing lists, so range queries likely wouldn't be a problem for long, should it become a general issue.

> This "solution" is really a hack workaround to a limitation in
> Lucene/Solr.  I was hoping to start a conversation to a more
> truer resolution to this problem rather than these workarounds
> which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.

"Solutions":
        1. Hack workaround
        2. Rewrite Solr/Lucene to be a database
        3. ? (fill in "more truer resolution" here)

Good luck,
Steve

________________________________
View message @ http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684371.html
To unsubscribe from RE: One item, multiple fields, and range queries, click here< (link removed) ==>.

-----
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684415.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: One item, multiple fields, and range queries

Posted by Steven A Rowe <sa...@syr.edu>.

Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
> I'm not sure what to make of "or index using a heterogeneous field
> schema, grouping the different doc type instances with a unique key
> (the one) to form a composite doc"

Lucene is schema-free - you can mix and match different document types in a single index.  You could emulate this in Solr by merging the two document types and leaving blank the parts that are inapplicable to a given instance.  E.g.:

Address-doc-type: 
	Field: Unique-key
	Field: Street
	Field: City
	...

Everything-else-doc-type:
	Field: Unique-key
	Field: Blob-o'-text
	...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...
....

> I could use the scheme you mention provided with the spanNear query but
> it conflates different fields into one indexed field which will mess
> with the scoring and make queries like range queries if there are dates
> involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use cases where the attendant scoring distortion would be acceptable, e.g. non-scoring filters.  (Stuffing a variable number of addresses into a single document will also "mess with the scoring" unless you turn off norms, which is of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery implementations on the mailing lists, so range queries likely wouldn't be a problem for long, should it become a general issue.

> This "solution" is really a hack workaround to a limitation in
> Lucene/Solr.  I was hoping to start a conversation to a more
> truer resolution to this problem rather than these workarounds
> which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.  

"Solutions":
	1. Hack workaround
	2. Rewrite Solr/Lucene to be a database
	3. ? (fill in "more truer resolution" here)

Good luck,
Steve

RE: One item, multiple fields, and range queries

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

I'm not going to index each address as its own document because the
"one-side" that I have currently has loads of text and there are many
addresses.  Furthermore, it doesn't really address the general case of my
problem statement.
I'm not sure what to make of "or index using a heterogeneous field schema,
grouping the different doc type instances with a unique key (the one) to
form a composite doc"
I could use the scheme you mention provided with the spanNear query but it
conflates different fields into one indexed field which will mess with the
scoring and make queries like range queries if there are dates involved next
to impossible.  This "solution" is really a hack workaround to a limitation
in Lucene/Solr.  I was hoping to start a conversation to a more truer
resolution to this problem rather than these workarounds which aren't always
satisfactory.

~ David Smiley

-----
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684282.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: One item, multiple fields, and range queries

Posted by Steven A Rowe <sa...@syr.edu>.

David,

The standard one-to-many solution is indexing each address (the many) as its own document, and then either copy the other fields from your current schema to these documents, or index using a heterogeneous field schema, grouping the different doc type instances with a unique key (the one) to form a composite doc.  (These solutions address your discomfort with a single address field.)

Also, while you say that you don't have a hierarchy, I think you do; what you have described could be expressed in XML as:

<doc>
  <field1>...</field1>
  ...
  <addresses>
    <address id="1">
      <street>...</street>
      <city>...</city>
      <state>...</state>
      <zip>...</zip>
    </address
    <address id="2">
      <street>...</street>
      <city>...</city>
      <state>...</state>
      <zip>...</zip>
    </address>
    ...
  </addresses>
</doc>

I believe you could use the scheme I described on the other thread, using a single address field, if you encoded it like so:

  _ADDRESS_ _STREET_ 12 Main Street _CITY_ Metripilos _STATE_ MZ _ZIP_ 00000
  _ADDRESS_ _STREET_ 512 23rd Avenue _CITY_ Carmtwon _STATE_ XB _ZIP_ 00001
  ...

Then to find the docs associated with Carmtwon, XB:

<SpanNot>
  <Include>
    <SpanOr>
      <SpanNear slop="2147483647" inOrder="true">
        <SpanTerm>_CITY_</SpanTerm>
        <SpanTerm>Carmtwon</SpanTerm>
        <SpanTerm>_STATE_</SpanTerm>
        <SpanTerm>XB</SpanTerm>
      </SpanNear>
    <SpanOr>
  </Include>
  <Exclude>
    <SpanTerm>_ADDRESS_</SpanTerm>
  </Exclude>
</SpanNot>

Steve

On 03/29/2010 at 9:11 AM, David Smiley (@MITRE.org) wrote:
> 
> Sorry, I intended to design my post so that one wouldn't have to read
> the thread for context but it seems I failed to do that.  Don't bother
> reading the thread.  The use-case I'm pondering modifying Lucene/Solr to
> solve is the one-to-many problem.  Imagine a document that contains
> multiple addresses where each field of an address (like street, state,
> zipcode) go in different multi-valued fields.  The main difficulty is
> considering how Lucene might be modified to have query results across
> different fields be intersected by a matching term position offset
> (which is designed in these fields to refer to a known value offset).
> 
> Following the link you gave is interesting though the general case I'm
> talking about doesn't have a hierarchy.  And I find the use of a single
> multi-valued field unpalatable for a variety of reasons.
> 
> ~ David Smiley
> 
> -----
>  Author: https://www.packtpub.com/solr-1-4-enterprise-search-
> server/book -- View this message in context:
> http://n3.nabble.com/One-item-multiple-
> fields-and-range-queries-tp475030p683361.html Sent from the Solr - User
> mailing list archive at Nabble.com.

Re: One item, multiple fields, and range queries

Posted by Lukas Kahwe Smith <ml...@pooteeweet.org>.

On 29.03.2010, at 15:11, David Smiley (@MITRE.org) wrote:

> 
> Sorry, I intended to design my post so that one wouldn't have to read the
> thread for context but it seems I failed to do that.  Don't bother reading
> the thread.  The use-case I'm pondering modifying Lucene/Solr to solve is
> the one-to-many problem.  Imagine a document that contains multiple
> addresses where each field of an address (like street, state, zipcode) go in
> different multi-valued fields.  The main difficulty is considering how
> Lucene might be modified to have query results across different fields be
> intersected by a matching term position offset (which is designed in these
> fields to refer to a known value offset).

i posted another use case the other day as well .. then again i hope the spatial support in 1.5 will make this use case obsolete soon. basically we have an app where we have offers that can be available in multiple stores. now in order to have a speedy compact index the idea was to simply store the geo location of the stores along with the offers in a multi valued field. however in order to filter on the x-y geo coordinates we would have to filter on the pairs. this is i guess similar to your above example as well with multiple addresses.

here is the link to my post:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201003.mbox/%3cFB3F49C8-31D9-48FC-B416-73A1BBD3F3B8@pooteeweet.org%3e

btw: i was mailed offlist if i have found an answer to the above question. so its not some crazy use case ..

regards,
Lukas Kahwe Smith
mls@pooteeweet.org

RE: One item, multiple fields, and range queries

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

Sorry, I intended to design my post so that one wouldn't have to read the
thread for context but it seems I failed to do that.  Don't bother reading
the thread.  The use-case I'm pondering modifying Lucene/Solr to solve is
the one-to-many problem.  Imagine a document that contains multiple
addresses where each field of an address (like street, state, zipcode) go in
different multi-valued fields.  The main difficulty is considering how
Lucene might be modified to have query results across different fields be
intersected by a matching term position offset (which is designed in these
fields to refer to a known value offset).

Following the link you gave is interesting though the general case I'm
talking about doesn't have a hierarchy.  And I find the use of a single
multi-valued field unpalatable for a variety of reasons.

~ David Smiley

-----
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p683361.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: One item, multiple fields, and range queries

Posted by Steven A Rowe <sa...@syr.edu>.

Hi David,

I confess that even after looking at earlier posts in the thread your subject refers to, I'm not entirely sure exactly what problem you're trying to solve.

However, aspects of your desired solution seem quite similar to what the OP on this thread over on java-user was trying to do:

http://www.lucidimagination.com/search/document/61851fe5651331cc/increase_number_of_available_positions

If the solution described over there is not applicable to what you're trying to do, I apologize for the noise.

Steve

> -----Original Message-----
> From: David Smiley (@MITRE.org) [mailto:DSMILEY@mitre.org]
> Sent: Sunday, March 28, 2010 6:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: One item, multiple fields, and range queries
> 
> 
> It's been three years since this discussion and I'm unaware of any work
> that
> has plugged this capability gap in Lucene/Solr.  In summary, it would
> be
> very, *very*, useful to be able to query multiple multi-valued fields
> and
> require that such matches occur at the same index offset.  I'm working
> on an
> app where I should be able to get away with a single multi-valued field
> and
> query with slop.  If I have time to get fancy, I could induce a delta
> position increment gap scheme since I know my inner fields can't be
> very
> long, and thus I can avoid the slop (a performance win) but still use a
> phrase query.  But for those of you wanting numeric range queries or
> other
> things where the data is indexed differently, this isn't going to work.
> Using multiple fields is cleaner but there's no way to cross-query
> multi-valued fields with restraining the position increment gap.  Has
> anyone
> out there done this yet?
> 
> I think it's a tough problem.  One piece of the solution would be to
> configure a position increment gap such that the gap between values
> isn't
> fixed, it'd be the delta to the next multiple of 1000 (where 1000 is
> configurable). This would allow you to know which value offset a given
> searched term is from based on the term's position as queried from
> Lucene.
> That's the easy part.  But then somehow you'd have to cross-
> correlate/filter
> multiple query results taking the intersection based on common offsets.
> Surely that would take some serious hacking and I have no clue how
> feasible
> that is.  Thoughts?
> 
> ~ David Smiley
> --
> View this message in context: http://n3.nabble.com/One-item-multiple-
> fields-and-range-queries-tp475030p682227.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: One item, multiple fields, and range queries

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

It's been three years since this discussion and I'm unaware of any work that
has plugged this capability gap in Lucene/Solr. In summary, it would be
very, *very*, useful to be able to query multiple multi-valued fields and
require that such matches occur at the same index offset. I'm working on an
app where I should be able to get away with a single multi-valued field and
query with slop. If I have time to get fancy, I could induce a delta
position increment gap scheme since I know my inner fields can't be very
long, and thus I can avoid the slop (a performance win) but still use a
phrase query. But for those of you wanting numeric range queries or other
things where the data is indexed differently, this isn't going to work.
Using multiple fields is cleaner but there's no way to cross-query
multi-valued fields with restraining the position increment gap. Has anyone
out there done this yet?

I think it's a tough problem. One piece of the solution would be to
configure a position increment gap such that the gap between values isn't
fixed, it'd be the delta to the next multiple of 1000 (where 1000 is
configurable). This would allow you to know which value offset a given
searched term is from based on the term's position as queried from Lucene.
That's the easy part. But then somehow you'd have to cross-correlate/filter
multiple query results taking the intersection based on common offsets.
Surely that would take some serious hacking and I have no clue how feasible
that is. Thoughts?

~ David Smiley
--
View this message in context: http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p682227.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: One item, multiple fields, and range queries

Posted by Chris Hostetter <ho...@fucit.org>.

: Now I follow.  I was misreading the first comments, thinking that the field
: content would be deconstructed to smaller components or pieces.  Too much
: (or not enough) coffee.

that's my bad .. i was trying to explain the concept by simplifying the
numeric range part out of the discussion and just tell you about hte
multifield phrase query idea.

: I'm expecting the index doc needs to be constructed with lat/long/dates in
: sequential order, i.e.:

there's no requirement that you actually interleave them in the file, but
yes: the value you add to the lat field would need to corrispond to the
first value you add to the lon field and the when field as a single
event instance.  the second value you add to each field would all ned to
corrispond to each other as the next instance.

: Assuming slop count of 0, while the intention is to match lat/long/when in
: that order, could it possibly match long/when/lat, or when/lat/long?  Does
: PhraseQuery enforce order and starting point as well?

the key is that you aren't storing the lat/lon/when in the same field wo
you'll only match the time in the when field, the lat in the lat field
etc...

: Assuming all of this, how does range query come into play?  Or could the
: PhraseQuery portion be applied as a filter?

this is why i said it was pretty theoretical ... not only would you need a
modified version of PhraseQuery to work across multiple fields, you'd need
to change it to match on ranges as well.


-Hoss

Re: One item, multiple fields, and range queries

Posted by Jeff Rodenburg <je...@gmail.com>.

Now I follow.  I was misreading the first comments, thinking that the field
content would be deconstructed to smaller components or pieces.  Too much
(or not enough) coffee.

I'm expecting the index doc needs to be constructed with lat/long/dates in
sequential order, i.e.:

<doc>
 <add>
   <field name="event_id">123</field>

   <field name="latitude">32.123456</field>
   <field name="longitude">-88.987654</field>
   <field name="when">01/31/2007</field>

   <field name="latitude">42.123456</field>
   <field name="longitude">-98.987654</field>
   <field name="when">01/31/2007</field>

   <field name="latitude">40.123456</field>
   <field name="longitude">-108.987654</field>
   <field name="when">01/30/2007</field>
.....etc.

Assuming slop count of 0, while the intention is to match lat/long/when in
that order, could it possibly match long/when/lat, or when/lat/long?  Does
PhraseQuery enforce order and starting point as well?

Assuming all of this, how does range query come into play?  Or could the
PhraseQuery portion be applied as a filter?



On 1/17/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : OK, you lost me.  It sounds as if this PhraseQuery-ish approach involves
> : breaking datetime and lat/long values into pieces, and evaluation occurs
> : with positioning.  Is that accurate?
>
> i'm not sure what you mean by pieces ... the idea is that you would have a
> single "latitude" field and a single "longitude" field and a single "when"
> field, and if an item had a single event, you would store a single value
> in each field ... but if the item has multiple events, you would store
> them in the same relative ordering, and then use the same kind of logic
> PhraseQuery uses to verify that if the "latitude" field has a value in the
> right range, and the "longitude" field has a value in the right range, and
> the "when" field has a value in the right range, that all of those values
> have the same position (specificly: are within a set amount of slop from
> eachother, which you would allways set to "0")
>
> : > It seems like this could even be done in the same field if one had a
> : > query type that allowed querying for tokens at the same position.
> : > Just index "_noun" at the same position as "house" (and make sure
> : > there can't be collisions between real terms and markers via escaping,
> : > or use \0 instead of _, etc).
>
> true ... but the point doug made way back when is that with a generalized
> multi-field phrase query you wouldn't have to do that escaping ... the
> hard part in this case is the numeric ranges.
>
>
> -Hoss
>
>

Re: One item, multiple fields, and range queries

Posted by Chris Hostetter <ho...@fucit.org>.

: OK, you lost me.  It sounds as if this PhraseQuery-ish approach involves
: breaking datetime and lat/long values into pieces, and evaluation occurs
: with positioning.  Is that accurate?

i'm not sure what you mean by pieces ... the idea is that you would have a
single "latitude" field and a single "longitude" field and a single "when"
field, and if an item had a single event, you would store a single value
in each field ... but if the item has multiple events, you would store
them in the same relative ordering, and then use the same kind of logic
PhraseQuery uses to verify that if the "latitude" field has a value in the
right range, and the "longitude" field has a value in the right range, and
the "when" field has a value in the right range, that all of those values
have the same position (specificly: are within a set amount of slop from
eachother, which you would allways set to "0")

: > It seems like this could even be done in the same field if one had a
: > query type that allowed querying for tokens at the same position.
: > Just index "_noun" at the same position as "house" (and make sure
: > there can't be collisions between real terms and markers via escaping,
: > or use \0 instead of _, etc).

true ... but the point doug made way back when is that with a generalized
multi-field phrase query you wouldn't have to do that escaping ... the
hard part in this case is the numeric ranges.


-Hoss

Re: One item, multiple fields, and range queries

Posted by Jeff Rodenburg <je...@gmail.com>.

Yonik/Hoss -

OK, you lost me.  It sounds as if this PhraseQuery-ish approach involves
breaking datetime and lat/long values into pieces, and evaluation occurs
with positioning.  Is that accurate?



On 1/16/07, Yonik Seeley <yo...@apache.org> wrote:
>
> On 1/15/07, Chris Hostetter <ho...@fucit.org> wrote:
> > PhraseQuery artificially enforces that the Terms you add to it are
> > in the same field ... you could easily write a PhraseQuery-ish query
> that
> > takes Terms from differnet fields, and ensures that they appear "near"
> > eachother in terms of their token sequence -- the context of that
> comment
> > was searching for instances of words with specific usage (ie: "house"
> used
> > as a noun) by putting the usage type of each term in a different term in
> a
> > seperate parallel field, but with identicle token positions.
>
> It seems like this could even be done in the same field if one had a
> query type that allowed querying for tokens at the same position.
> Just index "_noun" at the same position as "house" (and make sure
> there can't be collisions between real terms and markers via escaping,
> or use \0 instead of _, etc).
>
> -Yonik
>

Re: One item, multiple fields, and range queries

Posted by Yonik Seeley <yo...@apache.org>.

On 1/15/07, Chris Hostetter <ho...@fucit.org> wrote:
> PhraseQuery artificially enforces that the Terms you add to it are
> in the same field ... you could easily write a PhraseQuery-ish query that
> takes Terms from differnet fields, and ensures that they appear "near"
> eachother in terms of their token sequence -- the context of that comment
> was searching for instances of words with specific usage (ie: "house" used
> as a noun) by putting the usage type of each term in a different term in a
> seperate parallel field, but with identicle token positions.

It seems like this could even be done in the same field if one had a
query type that allowed querying for tokens at the same position.
Just index "_noun" at the same position as "house" (and make sure
there can't be collisions between real terms and markers via escaping,
or use \0 instead of _, etc).

-Yonik

Re: One item, multiple fields, and range queries

Posted by Chris Hostetter <ho...@fucit.org>.

: I've not yet used dynamic fields in this manner. With that number range,
: what limitations could I encounter? Given the size of that, I would need

very little, yonik recently listed the "costs" of dynamic fields...
http://www.nabble.com/Searching-multiple-indices-%28solr-newbie%29-tf2903899.html#a8245621
..as he points out, with omitNorms="true" you can have thousands of
dynamic fields and not even notice.

: the solr engine to formulate that query, correct? I can't imagine I could
: pass that entire subquery statement in the http request, as the character
: limit would likely be exceeded.

yeah ... if you wanted to try the approach i described, and your "N"
wasn't a single digit number, i would recommend putting the query
building code into a custom RequestHandler ... it could even inspect the
list of field names from the IndexReader and know exactly how big N is at
any given moment. i have no idea how efficient this approach would be if
N really does get up into the hundreds.

A completely different approach you could take if you want to get into
Lucene Query internals would be to take advantage of something Doug
mentioned once that has stayed in the back of my mind for almost a year
now: PhraseQuery artificially enforces that the Terms you add to it are
in the same field ... you could easily write a PhraseQuery-ish query that
takes Terms from differnet fields, and ensures that they appear "near"
eachother in terms of their token sequence -- the context of that comment
was searching for instances of words with specific usage (ie: "house" used
as a noun) by putting the usage type of each term in a different term in a
seperate parallel field, but with identicle token positions.

if you forget for a moment about the ranges you need to do, and imagine
instead that you store the "quadrent number" and "hour of day" for each
event, where e1q is the quadtrent of event1 for an item, and e1h is the
hour of the day that event1 happened at, then for an item with multiple
events you could index the field/terms lists
quadrent: e1q e2q e3q
hour: e1h e2h e3h

and query for your input quadrent at a term position equal to the term
position of your input hour.

if you got *that* working, you could concievably change the query to take
in a range for each field -- using TermEnum to get the list of of all
latitude Terms in your latitude range, then for each of those Terms get
the list of documents and the term position within thta document, and then
look for the longitude terms in the same relative term position which are
in your longitude range, and time terms in the same relative term position
in your time range.

does that make any sense?

this is all purely theoretical, it just seems like it *should* be
possible, but i haven't thought through how it would be implimented. if
you acctually wanted to tackle it, i would start a discussion on
java-dev@lucene first, so people smarter then me can tlel you if i'm
smoking crack or not.

-Hoss

Re: One item, multiple fields, and range queries

Posted by Jeff Rodenburg <je...@gmail.com>.

Thanks Hoss.  Interesting approach, but the "N" bound could be well in the
hundreds, and the N bound would be variable (some maximum number, but
different across events.)

I've not yet used dynamic fields in this manner.  With that number range,
what limitations could I encounter?  Given the size of that, I would need
the solr engine to formulate that query, correct?  I can't imagine I could
pass that entire subquery statement in the http request, as the character
limit would likely be exceeded.

Some of my comments may not make sense, so I'll check into dynamic fields
and such in the meantime.

thanks,
j

On 1/14/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : 2) use multivalued fields as correlated vectors, so the first start
> : date corresponds
> :    to the first end date corresponds to the first lat and long value.
> : You get them all back
> :    in a query though, so your app would need to do extra work to sort
> : out which matched.
>
> if you expect a bounded number of correlated "events" per item, you can
> use dynaimc fields, and build up N correlated subqueries where N is the
> upper bound on the number of events you expect any item to have, ie...
>
>       (+lat1:[x TO y] +lon1:[w TO z] +time1:[a TO b])
>    OR (+lat2:[x TO y] +lon2:[w TO z] +time2:[a TO b])
>    OR (+lat3:[x TO y] +lon3:[w TO z] +time3:[a TO b])
>    ...
>
>
>
>
> -Hoss
>
>

Re: One item, multiple fields, and range queries

Posted by Chris Hostetter <ho...@fucit.org>.

: 2) use multivalued fields as correlated vectors, so the first start
: date corresponds
:    to the first end date corresponds to the first lat and long value.
: You get them all back
:    in a query though, so your app would need to do extra work to sort
: out which matched.

if you expect a bounded number of correlated "events" per item, you can
use dynaimc fields, and build up N correlated subqueries where N is the
upper bound on the number of events you expect any item to have, ie...

      (+lat1:[x TO y] +lon1:[w TO z] +time1:[a TO b])
   OR (+lat2:[x TO y] +lon2:[w TO z] +time2:[a TO b])
   OR (+lat3:[x TO y] +lon3:[w TO z] +time3:[a TO b])
   ...




-Hoss

Re: One item, multiple fields, and range queries

Posted by Yonik Seeley <yo...@apache.org>.

On 1/12/07, Jeff Rodenburg <je...@gmail.com> wrote:
> I'm stuck with a query issue that at present seems unresolvable.  Hoping the
> community has some insight to this.
>
> My index contains events that have multiple beginning/ending date ranges and
> multiple locations.  For example, event A (uniqueId = 123) occurs every
> weekend, sometimes in one location, sometimes in many locations.  Dates have
> a beginning and ending date, and locations have a latitude & longitude.  I
> need to query for the set of events for a given "area", where area =
> bounding box.  So, a single event has multiple beginning and ending dates
> and multiple locations.
>
> So, the beginning date, ending date, latitude and longitude values only
> apply collectively as a unit.  However, I need to do range queries on both
> the dates and the lat/long values.

1) model a single document as a single event at a singe place with a
start and end date.
  OR
2) use multivalued fields as correlated vectors, so the first start
date corresponds
   to the first end date corresponds to the first lat and long value.
You get them all back
   in a query though, so your app would need to do extra work to sort
out which matched.

I'd do (1) if you can... it's simpler.

-Yonik