You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ankush Goyal <An...@orbitz.com> on 2009/04/28 22:32:25 UTC

Multiple Queries

Hi,

I have been trying to solve a performance issue: I have an index of hotels with their ids and another index of reviews. Now, when someone queries for a location, the current process gets all the hotels for that location.
And, then corresponding to each hotel-id from all the hotel documents, it calls the review index to fetch reviews associated with that particular hotel and so on it repeats for all the hotels. This process slows down the request significantly.
I need to accumulate reviews according to corresponding hotel-ids, so I can't just fetch all the reviews for all the hotel ids and show them. Now, I was thinking about fetching all the reviews for all the hotel-ids and then parse all those reviews in one go and create a map with hotel-id as key and list of reviews as values.

Can anyone comment on whether this procedure would be better or worse, or if there's better way of doing this?

--Ankush Goyal

Re: Multiple Queries

Posted by Avlesh Singh <av...@gmail.com>.
Ankush,
Your approach works. Fire a "in" query on the review index for all hotel ids
you care about. Create a map of hotel to its reviews.

Cheers
Avlesh

On Wed, Apr 29, 2009 at 8:09 AM, Amit Nithian <an...@gmail.com> wrote:

> Ankush,
> It seems that unless reviews are changing constantly, why not do what Erick
> was saying in flattening your data by storing reviews with the hotel index
> but re-index your hotels storing the top two reviews. I guess I am
> suggesting computing the top two reviews for each hotel offline and store
> them somewhere.
>
> You could store the top two reviews in an RDBMS and let whatever front end
> you have retrieve the top two from the RDBMS after receiving results from
> Solr based on your unique ID.
>
> HTH
> Amit
>
> On Tue, Apr 28, 2009 at 3:14 PM, Ankush Goyal <Ankush.Goyal@orbitz.com
> >wrote:
>
> > Hi Erick,
> >
> > Thanks for response!...the solution I was talking about was same as your
> > last solution to get reviews for only required hotel-ids and then parsing
> > them in one go to make a hash-map, I guess I didn't explain correctly :)
> >
> > As far as putting reviews inside the hotel index is concerned, we thought
> > about that solution, but we also need to sort the reviews and (let's say)
> > show top 2 of maybe 50 reviews for a hotel, so we couldn't put reviews
> > inside hotel doc itself.
> >
> > Now, this again poses another question for the solution we talked about-,
> > as it seems like getting reviews for required hotel-ids and then making a
> > hash-map corresponding to hotel-ids can improve the performance, but then
> we
> > also need to sort all the reviews for each hotel using a field/ score in
> the
> > review-doc itself, which seems like would lower down the performance
> > drastically.
> >
> > Any ideas on a better solution?
> >
> > Thanks!
> > -Ankush
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Tuesday, April 28, 2009 4:05 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Multiple Queries
> >
> > Have you considered indexing the reviews along with the hotels right
> > in the hotel index? That way you would fetch the reviews right along with
> > the hotels...
> >
> > Really, this is another way of saying "flatten your data" <G>...
> >
> > Your idea of holding all the hotel reviews in memory is also viable,
> > depending upon
> > how many there are. you'd pay some startup costs, but that's what caching
> > is
> > all
> > about.
> >
> > Given your current index structure, have you tried collecting the hotel
> > IDs,
> > and
> > submitting a query to your review index that just ORs together all the
> IDs
> > and
> > then parsing that rather than calling your review index for one hotel ID
> at
> > a time?
> >
> > Best
> > Erick
> >
> > On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal <Ankush.Goyal@orbitz.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I have been trying to solve a performance issue: I have an index of
> > hotels
> > > with their ids and another index of reviews. Now, when someone queries
> > for a
> > > location, the current process gets all the hotels for that location.
> > > And, then corresponding to each hotel-id from all the hotel documents,
> it
> > > calls the review index to fetch reviews associated with that particular
> > > hotel and so on it repeats for all the hotels. This process slows down
> > the
> > > request significantly.
> > > I need to accumulate reviews according to corresponding hotel-ids, so I
> > > can't just fetch all the reviews for all the hotel ids and show them.
> > Now, I
> > > was thinking about fetching all the reviews for all the hotel-ids and
> > then
> > > parse all those reviews in one go and create a map with hotel-id as key
> > and
> > > list of reviews as values.
> > >
> > > Can anyone comment on whether this procedure would be better or worse,
> or
> > > if there's better way of doing this?
> > >
> > > --Ankush Goyal
> > >
> >
>

Re: Multiple Queries

Posted by Amit Nithian <an...@gmail.com>.
Ankush,
It seems that unless reviews are changing constantly, why not do what Erick
was saying in flattening your data by storing reviews with the hotel index
but re-index your hotels storing the top two reviews. I guess I am
suggesting computing the top two reviews for each hotel offline and store
them somewhere.

You could store the top two reviews in an RDBMS and let whatever front end
you have retrieve the top two from the RDBMS after receiving results from
Solr based on your unique ID.

HTH
Amit

On Tue, Apr 28, 2009 at 3:14 PM, Ankush Goyal <An...@orbitz.com>wrote:

> Hi Erick,
>
> Thanks for response!...the solution I was talking about was same as your
> last solution to get reviews for only required hotel-ids and then parsing
> them in one go to make a hash-map, I guess I didn't explain correctly :)
>
> As far as putting reviews inside the hotel index is concerned, we thought
> about that solution, but we also need to sort the reviews and (let's say)
> show top 2 of maybe 50 reviews for a hotel, so we couldn't put reviews
> inside hotel doc itself.
>
> Now, this again poses another question for the solution we talked about-,
> as it seems like getting reviews for required hotel-ids and then making a
> hash-map corresponding to hotel-ids can improve the performance, but then we
> also need to sort all the reviews for each hotel using a field/ score in the
> review-doc itself, which seems like would lower down the performance
> drastically.
>
> Any ideas on a better solution?
>
> Thanks!
> -Ankush
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Tuesday, April 28, 2009 4:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Multiple Queries
>
> Have you considered indexing the reviews along with the hotels right
> in the hotel index? That way you would fetch the reviews right along with
> the hotels...
>
> Really, this is another way of saying "flatten your data" <G>...
>
> Your idea of holding all the hotel reviews in memory is also viable,
> depending upon
> how many there are. you'd pay some startup costs, but that's what caching
> is
> all
> about.
>
> Given your current index structure, have you tried collecting the hotel
> IDs,
> and
> submitting a query to your review index that just ORs together all the IDs
> and
> then parsing that rather than calling your review index for one hotel ID at
> a time?
>
> Best
> Erick
>
> On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal <Ankush.Goyal@orbitz.com
> >wrote:
>
> > Hi,
> >
> > I have been trying to solve a performance issue: I have an index of
> hotels
> > with their ids and another index of reviews. Now, when someone queries
> for a
> > location, the current process gets all the hotels for that location.
> > And, then corresponding to each hotel-id from all the hotel documents, it
> > calls the review index to fetch reviews associated with that particular
> > hotel and so on it repeats for all the hotels. This process slows down
> the
> > request significantly.
> > I need to accumulate reviews according to corresponding hotel-ids, so I
> > can't just fetch all the reviews for all the hotel ids and show them.
> Now, I
> > was thinking about fetching all the reviews for all the hotel-ids and
> then
> > parse all those reviews in one go and create a map with hotel-id as key
> and
> > list of reviews as values.
> >
> > Can anyone comment on whether this procedure would be better or worse, or
> > if there's better way of doing this?
> >
> > --Ankush Goyal
> >
>

RE: Multiple Queries

Posted by Ankush Goyal <An...@orbitz.com>.
Hi Erick,

Thanks for response!...the solution I was talking about was same as your last solution to get reviews for only required hotel-ids and then parsing them in one go to make a hash-map, I guess I didn't explain correctly :)

As far as putting reviews inside the hotel index is concerned, we thought about that solution, but we also need to sort the reviews and (let's say) show top 2 of maybe 50 reviews for a hotel, so we couldn't put reviews inside hotel doc itself.

Now, this again poses another question for the solution we talked about-, as it seems like getting reviews for required hotel-ids and then making a hash-map corresponding to hotel-ids can improve the performance, but then we also need to sort all the reviews for each hotel using a field/ score in the review-doc itself, which seems like would lower down the performance drastically.

Any ideas on a better solution?

Thanks!
-Ankush

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, April 28, 2009 4:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Multiple Queries

Have you considered indexing the reviews along with the hotels right
in the hotel index? That way you would fetch the reviews right along with
the hotels...

Really, this is another way of saying "flatten your data" <G>...

Your idea of holding all the hotel reviews in memory is also viable,
depending upon
how many there are. you'd pay some startup costs, but that's what caching is
all
about.

Given your current index structure, have you tried collecting the hotel IDs,
and
submitting a query to your review index that just ORs together all the IDs
and
then parsing that rather than calling your review index for one hotel ID at
a time?

Best
Erick

On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal <An...@orbitz.com>wrote:

> Hi,
>
> I have been trying to solve a performance issue: I have an index of hotels
> with their ids and another index of reviews. Now, when someone queries for a
> location, the current process gets all the hotels for that location.
> And, then corresponding to each hotel-id from all the hotel documents, it
> calls the review index to fetch reviews associated with that particular
> hotel and so on it repeats for all the hotels. This process slows down the
> request significantly.
> I need to accumulate reviews according to corresponding hotel-ids, so I
> can't just fetch all the reviews for all the hotel ids and show them. Now, I
> was thinking about fetching all the reviews for all the hotel-ids and then
> parse all those reviews in one go and create a map with hotel-id as key and
> list of reviews as values.
>
> Can anyone comment on whether this procedure would be better or worse, or
> if there's better way of doing this?
>
> --Ankush Goyal
>

RE: Multiple Queries

Posted by Ankush Goyal <An...@orbitz.com>.
Hey Guys,

Have a novice type question, regarding how to create a query by ORing multiple terms.

Currently, the query we are creating is a boosting query using following code:

BoostingQuery boosQuery = new BoostingQuery(getHotelIdFilterQuery(hotelIdStr),baseQuery,2.0f);

Wherein, getHotelIdFilterQuery() takes a hotelId and creates a TermQuery like--> "hotId:3453"

Then it is combined with the baseQuery in boosQuery to get a query like-->

[hotel.id_t:3453/(+(rev.headline:lakefront^2.0 | rev.comments:lakefront^2.0)~0.01 ())^0.0]

Now, I wanted to create a boosQuery with multiple hotelIds ORed with each other like-->

[hotel.id_t:342432/hotel.id_t:3453/(+(rev.headline:lakefront^2.0 | rev.comments:lakefront^2.0)~0.01 ())^0.0]

So, how do I pass these multiple termQueries ORed with each other to make BoostQuery?

Thanks!

-Ankush Goyal

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Tuesday, April 28, 2009 4:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Multiple Queries

Have you considered indexing the reviews along with the hotels right
in the hotel index? That way you would fetch the reviews right along with
the hotels...

Really, this is another way of saying "flatten your data" <G>...

Your idea of holding all the hotel reviews in memory is also viable,
depending upon
how many there are. you'd pay some startup costs, but that's what caching is
all
about.

Given your current index structure, have you tried collecting the hotel IDs,
and
submitting a query to your review index that just ORs together all the IDs
and
then parsing that rather than calling your review index for one hotel ID at
a time?

Best
Erick

On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal <An...@orbitz.com>wrote:

> Hi,
>
> I have been trying to solve a performance issue: I have an index of hotels
> with their ids and another index of reviews. Now, when someone queries for a
> location, the current process gets all the hotels for that location.
> And, then corresponding to each hotel-id from all the hotel documents, it
> calls the review index to fetch reviews associated with that particular
> hotel and so on it repeats for all the hotels. This process slows down the
> request significantly.
> I need to accumulate reviews according to corresponding hotel-ids, so I
> can't just fetch all the reviews for all the hotel ids and show them. Now, I
> was thinking about fetching all the reviews for all the hotel-ids and then
> parse all those reviews in one go and create a map with hotel-id as key and
> list of reviews as values.
>
> Can anyone comment on whether this procedure would be better or worse, or
> if there's better way of doing this?
>
> --Ankush Goyal
>

Re: Multiple Queries

Posted by Erick Erickson <er...@gmail.com>.
Have you considered indexing the reviews along with the hotels right
in the hotel index? That way you would fetch the reviews right along with
the hotels...

Really, this is another way of saying "flatten your data" <G>...

Your idea of holding all the hotel reviews in memory is also viable,
depending upon
how many there are. you'd pay some startup costs, but that's what caching is
all
about.

Given your current index structure, have you tried collecting the hotel IDs,
and
submitting a query to your review index that just ORs together all the IDs
and
then parsing that rather than calling your review index for one hotel ID at
a time?

Best
Erick

On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal <An...@orbitz.com>wrote:

> Hi,
>
> I have been trying to solve a performance issue: I have an index of hotels
> with their ids and another index of reviews. Now, when someone queries for a
> location, the current process gets all the hotels for that location.
> And, then corresponding to each hotel-id from all the hotel documents, it
> calls the review index to fetch reviews associated with that particular
> hotel and so on it repeats for all the hotels. This process slows down the
> request significantly.
> I need to accumulate reviews according to corresponding hotel-ids, so I
> can't just fetch all the reviews for all the hotel ids and show them. Now, I
> was thinking about fetching all the reviews for all the hotel-ids and then
> parse all those reviews in one go and create a map with hotel-id as key and
> list of reviews as values.
>
> Can anyone comment on whether this procedure would be better or worse, or
> if there's better way of doing this?
>
> --Ankush Goyal
>