You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alejandro Marqués Rodríguez <am...@paradigmatecnologico.com> on 2013/11/21 10:36:19 UTC

Best implementation for multi-price store?

Hi,

I've been recently ask to implement an application to search products from
several stores, each store having different prices and stock for the same
product.

So I have products that have the usual fields (name, description, brand,
etc) and also number of units and price for each store. I must be able to
filter for a given store and order by stock or price for that store. The
application should also allow incresing the number of stores, fields
depending of store and number of products without much work.

The numbers for the application are more or less 100 stores and 7M products.

I've been thinking of some ways of defining the index structure but I don't
know wich one is better as I think each one has it's pros and cons.


   1. *Each product-store as a document:* Denormalizing the information so
   for every product and store I have a different document. Pros are that I
   can filter and order without problems and that adding a new store-depending
   field is very easy. Cons are that the index goes from 7M documents to 700M
   and that most of the info is redundant as most of the fields are repeated
   among stores.
   2. *Each field-store as a field:* For example for price I would have
   "store1_price, store2_price, ...". Pros are that the index stays at 7M
   documents, and I can still filter and sort by those fields. Cons are that I
   have to add some logic so if I filter by one store I order for the
   associated price field, and that number of fields increases as number of
   store-depending fields x number of stores. I don't know if having more
   fields affects performance, but adding new store-depending fields will
   increase the number of fields even more
   3. *Join:* First time I read about solr joins thought it was the way to
   go in this case, but after reading a bit more and doing some tests I'm not
   so sure about it... Maybe I've done it wrong but I think it also
   denormalizes the info (So I will also havee 700M documents) and besides I
   can't order or filter by store fields.


I must say my preferred option is number 2, so I don't duplicate
information, I keep a relatively small number of documents and I can filter
and sort by the store fields. However, my main concern here is I don't know
if having too many fields in a document will be harmful to performance.

Which one do you think is the best approach for this application? Is there
a better approach that I have missed?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42

Re: Best implementation for multi-price store?

Posted by Alejandro Marqués Rodríguez <am...@paradigmatecnologico.com>.
Hi Robert,

That was the idea, dynamic fields, so, as you said, it is easier to sort
and filter. Besides, having dynamic fields it would be easier to add new
stores, as I wouldn't have to modify the schema :)

Thanks for the answer!


2013/11/21 Petersen, Robert <ro...@mail.rakuten.com>

> Hi,
>
> I'd go with (2) also but using dynamic fields so you don't have to define
> all the storeX_price fields in your schema but rather just one *_price
> field.  Then when you filter on store:store1 you'd know to sort with
> store1_price and so forth for units.  That should be pretty straightforward.
>
> Hope that helps,
> Robi
>
> -----Original Message-----
> From: Alejandro Marqués Rodríguez [mailto:
> amarques@paradigmatecnologico.com]
> Sent: Thursday, November 21, 2013 1:36 AM
> To: solr-user@lucene.apache.org
> Subject: Best implementation for multi-price store?
>
> Hi,
>
> I've been recently ask to implement an application to search products from
> several stores, each store having different prices and stock for the same
> product.
>
> So I have products that have the usual fields (name, description, brand,
> etc) and also number of units and price for each store. I must be able to
> filter for a given store and order by stock or price for that store. The
> application should also allow incresing the number of stores, fields
> depending of store and number of products without much work.
>
> The numbers for the application are more or less 100 stores and 7M
> products.
>
> I've been thinking of some ways of defining the index structure but I
> don't know wich one is better as I think each one has it's pros and cons.
>
>
>    1. *Each product-store as a document:* Denormalizing the information so
>    for every product and store I have a different document. Pros are that I
>    can filter and order without problems and that adding a new
> store-depending
>    field is very easy. Cons are that the index goes from 7M documents to
> 700M
>    and that most of the info is redundant as most of the fields are
> repeated
>    among stores.
>    2. *Each field-store as a field:* For example for price I would have
>    "store1_price, store2_price, ...". Pros are that the index stays at 7M
>    documents, and I can still filter and sort by those fields. Cons are
> that I
>    have to add some logic so if I filter by one store I order for the
>    associated price field, and that number of fields increases as number of
>    store-depending fields x number of stores. I don't know if having more
>    fields affects performance, but adding new store-depending fields will
>    increase the number of fields even more
>    3. *Join:* First time I read about solr joins thought it was the way to
>    go in this case, but after reading a bit more and doing some tests I'm
> not
>    so sure about it... Maybe I've done it wrong but I think it also
>    denormalizes the info (So I will also havee 700M documents) and besides
> I
>    can't order or filter by store fields.
>
>
> I must say my preferred option is number 2, so I don't duplicate
> information, I keep a relatively small number of documents and I can filter
> and sort by the store fields. However, my main concern here is I don't know
> if having too many fields in a document will be harmful to performance.
>
> Which one do you think is the best approach for this application? Is there
> a better approach that I have missed?
>
> Thanks in advance
>
>
>
> --
> Alejandro Marqués Rodríguez
>
> Paradigma Tecnológico
> http://www.paradigmatecnologico.com
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón
> Tel.: 91 352 59 42
>
>


-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42

RE: Best implementation for multi-price store?

Posted by "Petersen, Robert" <ro...@mail.rakuten.com>.
Hi,

I'd go with (2) also but using dynamic fields so you don't have to define all the storeX_price fields in your schema but rather just one *_price field.  Then when you filter on store:store1 you'd know to sort with store1_price and so forth for units.  That should be pretty straightforward.

Hope that helps,
Robi

-----Original Message-----
From: Alejandro Marqués Rodríguez [mailto:amarques@paradigmatecnologico.com] 
Sent: Thursday, November 21, 2013 1:36 AM
To: solr-user@lucene.apache.org
Subject: Best implementation for multi-price store?

Hi,

I've been recently ask to implement an application to search products from several stores, each store having different prices and stock for the same product.

So I have products that have the usual fields (name, description, brand,
etc) and also number of units and price for each store. I must be able to filter for a given store and order by stock or price for that store. The application should also allow incresing the number of stores, fields depending of store and number of products without much work.

The numbers for the application are more or less 100 stores and 7M products.

I've been thinking of some ways of defining the index structure but I don't know wich one is better as I think each one has it's pros and cons.


   1. *Each product-store as a document:* Denormalizing the information so
   for every product and store I have a different document. Pros are that I
   can filter and order without problems and that adding a new store-depending
   field is very easy. Cons are that the index goes from 7M documents to 700M
   and that most of the info is redundant as most of the fields are repeated
   among stores.
   2. *Each field-store as a field:* For example for price I would have
   "store1_price, store2_price, ...". Pros are that the index stays at 7M
   documents, and I can still filter and sort by those fields. Cons are that I
   have to add some logic so if I filter by one store I order for the
   associated price field, and that number of fields increases as number of
   store-depending fields x number of stores. I don't know if having more
   fields affects performance, but adding new store-depending fields will
   increase the number of fields even more
   3. *Join:* First time I read about solr joins thought it was the way to
   go in this case, but after reading a bit more and doing some tests I'm not
   so sure about it... Maybe I've done it wrong but I think it also
   denormalizes the info (So I will also havee 700M documents) and besides I
   can't order or filter by store fields.


I must say my preferred option is number 2, so I don't duplicate information, I keep a relatively small number of documents and I can filter and sort by the store fields. However, my main concern here is I don't know if having too many fields in a document will be harmful to performance.

Which one do you think is the best approach for this application? Is there a better approach that I have missed?

Thanks in advance



--
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42