You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matt Beaumont <mi...@yahoo.co.uk> on 2009/07/30 14:41:15 UTC

Range Query question

Hi,

I have a set of XML data that holds Minimum and Maximum values and I need to
be able to do specific range queries against them.

(Note that this is a contrived example, and that in reality the garage would
probably hold all the individual prices of all its cars, but this is
analogous to the problem we have which is couched in terms that would
obscure the problem.)

For example, the following XML fragment is indexed so that each <car>
element becomes a Solr document:

<cars>
    <car>
        <manufacturer>Ford</manufacturer>
        <model>Ka</model>
        <garage>
           <name>garage1</name>
           <min>2000</min>
           <max>4000</max>
        </garage>
        <garage>
           <name>garage2</name>
           <min>8000</min>
           <max>10000</max>
        </garage>
        ....
    </car>
</cars>

I want to be able do a range query where
  search min value = 2500
  search max value = 3500

This should return garage1 as potentially having cars in my price range as
the range of prices for the garage contains the range I have input.  It's
also worth noting that we can't simply look for min prices that fall inside
our range or max prices that fall inside our range, as in the case outlined
above, none of the individual values fall inside our range, but there is
overlap.

The problem is that the indexed form of this XML is flattened so the <car>
entity has 2 garage names, 2 min values and 2 max values, but the grouping
between the garage name and it's min and max values is lost.  The danger is
that we end up doing a comparison of the min-of-the-mins and the
max-of-the-maxes, which tells us that a car is available in the price range
which may not be true if garage1 has all cars below our search range and
garage2 has all cars above our search range, e.g. if our search range is
5000-6000 then we should get no match.

We wanted to include the garage name as an attritube of the min/max values
to maintain this link, but couldn't find a way to do this.

Finally, it would be extremely difficult for us to modify the XML presented
to our system, hence our approach to date.

Has anyone had a similar problem and if so how did you overcome it?

Thanks for taking the time to look.

-----
Matt Beaumont
mibeaum@yahoo.co.uk

-- 
View this message in context: http://www.nabble.com/Range-Query-question-tp24737656p24737656.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Range Query question

Posted by Matt Beaumont <mi...@yahoo.co.uk>.
Thanks for the reply; 
I had thought the solution would be altering the XML.



Ensdorf Ken wrote:
> 
>> The problem is that the indexed form of this XML is flattened so the
>> <car>
>> entity has 2 garage names, 2 min values and 2 max values, but the
>> grouping
>> between the garage name and it's min and max values is lost.  The
>> danger is
>> that we end up doing a comparison of the min-of-the-mins and the
>> max-of-the-maxes, which tells us that a car is available in the price
>> range
>> which may not be true if garage1 has all cars below our search range
>> and
>> garage2 has all cars above our search range, e.g. if our search range
>> is
>> 5000-6000 then we should get no match.
> 
> You could index each garage-car pairing as a separate document, embedding
> all the necessary information you need for searching.
> 
> e.g.-
> 
> <garage_car>
>         <car_manufacturer>Ford</manufacturer>
>         <car_model>Ka</model>
>            <garage_name>garage1</name>
>            <min>2000</min>
>            <max>4000</max>
> </garage_car>
> 
> 


-----
Matt Beaumont
mibeaum@yahoo.co.uk

-- 
View this message in context: http://www.nabble.com/Range-Query-question-tp24737656p24742062.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Range Query question

Posted by Ensdorf Ken <En...@zoominfo.com>.
> The problem is that the indexed form of this XML is flattened so the
> <car>
> entity has 2 garage names, 2 min values and 2 max values, but the
> grouping
> between the garage name and it's min and max values is lost.  The
> danger is
> that we end up doing a comparison of the min-of-the-mins and the
> max-of-the-maxes, which tells us that a car is available in the price
> range
> which may not be true if garage1 has all cars below our search range
> and
> garage2 has all cars above our search range, e.g. if our search range
> is
> 5000-6000 then we should get no match.

You could index each garage-car pairing as a separate document, embedding all the necessary information you need for searching.

e.g.-

<garage_car>
        <car_manufacturer>Ford</manufacturer>
        <car_model>Ka</model>
           <garage_name>garage1</name>
           <min>2000</min>
           <max>4000</max>
</garage_car>