You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Esteban Donato <es...@gmail.com> on 2011/10/12 19:22:01 UTC

text search and data aggregation, thoughts?

Hello community,

 Let me explain my case.  I need to implement a feature that combines
text search and data aggregation.  That is, the app will let the users
search for products and set a date range.  As result, I need to show
them products that matched the search + some data aggregated for that
period of time (like number of visits, number of sells, etc). For
instance, if a user searches for "ipod" and sets dates from "01/01/11"
to "01/10/11", I have to show all the products that match with "ipod"
along with the visits, sells, etc for those products in those 10 days.

 For that purpose, I have 2 tables in a DB: one with the all the
products and the other with the visits and sells per day and product.

 What do you think is the best approach?.  These are my options so far:

Option 1: index both products and visits per day in the same index as
result of doing a join between 2 tables.  That way I can resolve
product search + stats (for aggregated data) in 1 query and with 1
index.  The drawback is having a huge index that grows in documents on
daily basis where just a few of them (one per product) are searchable.
 Also I can see other problems if, for instance, I want to implement
facet navigation over some attributes of the product, like product
category, since the count for each element won't reflect the number of
product for that category but rather the number of products x days
those products have been visited.

Option 2: have 2 cores, one with products, other with visits per day.
Here I can use the product core/index for product searching, pick the
product ids and request stats in the visits core for those ids, then
join both and return everything.  I was checking the "join" feature in
Solr 4.0 to implement this ( http://wiki.apache.org/solr/Join ) , but
as far as I know this feature doesn't return fields from both indexes
(I would need product attributes from product index + stats from
visits index), so I would have to perform both queries manually.  The
drawback here is code complexity (I need to maintain a code that
queries and joins to both indexes) and eventually performance (2
queries per request and eventually with a long list of ids depending
on product match result)

Option 3: 1 core with products and keep visits in the DB.  This option
would work exactly as option 2  with the alternative of calculating
the stats in the DB.  Same drawbacks as option 2.

  Do you guys see any other option or have faced this problem before?
If so what is your advise?.  I think the best approach is to separate
concerns and have Solr for product matching and something else
(high-scalable db?) for stats calculations, but on the other hand it
would be nice to have Solr performs both operations and joins results
behind the scenes.  To make things even more challenging, users could
eventually sort by calculated fields.

Thanks in advance
Esteban

Re: text search and data aggregation, thoughts?

Posted by Esteban Donato <es...@gmail.com>.
thanks Pravesh for your feedback.  I have 10 million products and 165M
rows of visits accumulated for 2 years.  The data-aggregated needs to
be shown in the search result page along with the product description.

I also felt option 2 was the most suitable but wanted to have a second
view.  The only hesitation here is the overhead in doing 2 queries (1
to products and 1 to visits) for every search that could impact in
performance.

Regards,
Esteban

On Fri, Oct 14, 2011 at 8:35 AM, pravesh <su...@yahoo.com> wrote:
> Hi Esteban,
>
> A lot depends on a lot of things: 1) How much volume(total documents) 2)
> size of index 3) How you represent the data-aggregated part in your UI.
>
> Your option-2 seems to be a suitable way to go. This way you tune each cores
> separately. Also the use-cases for updating each document/product in both
> indexes also seems different. One is updated when a product is
> added/updated. Other is updated when a product in viewed/sold from search
> results
>
> Option-1 can be used in case you are showing the data-aggregation stats on
> the search results page only along with each item. If it is shown in the
> item-detail page then option-1 seems better.
>
> Regds
> Pravesh
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/text-search-and-data-aggregation-thoughts-tp3416330p3421361.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: text search and data aggregation, thoughts?

Posted by pravesh <su...@yahoo.com>.
Hi Esteban,

A lot depends on a lot of things: 1) How much volume(total documents) 2)
size of index 3) How you represent the data-aggregated part in your UI.

Your option-2 seems to be a suitable way to go. This way you tune each cores
separately. Also the use-cases for updating each document/product in both
indexes also seems different. One is updated when a product is
added/updated. Other is updated when a product in viewed/sold from search
results

Option-1 can be used in case you are showing the data-aggregation stats on
the search results page only along with each item. If it is shown in the
item-detail page then option-1 seems better.

Regds
Pravesh

--
View this message in context: http://lucene.472066.n3.nabble.com/text-search-and-data-aggregation-thoughts-tp3416330p3421361.html
Sent from the Solr - User mailing list archive at Nabble.com.