You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by sagarzond <sa...@gmail.com> on 2012/11/23 06:56:57 UTC

User context based search in apache solr

In our application we are providing product master data search with SOLR. Now
our requirement want to provide user context based search(means we are
providing top search result using user history).

For that i have created one score table having following field

1)product_id

2)user_id

3)score_value

As soon as user clicked for any product that will create entry in this table
and also increase score_value if already present product for that user. We
are planning to use boost field and eDisMax from SOLR to improve search
result but for this i have to use one to many mapping between score and
product table(Because we are having one product with different score value
for different user) and solr not providing one to many mapping.

We can solved this issue (one to many mapping handling) by de-normalizing
structure as having multiple product entry with different score value for
different user but it result huge amount of redundant data.

Is this(de-normalized structure) currect way to handle or is there any other
way to handle such context based search.

Plz help me



--
View this message in context: http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: User context based search in apache solr

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

I agree with Otis's suggestion.
I don't think that the preference table/matrix ( 1)product_id, 2)user_id,
3)score_value) should be indexed in Lucene/Solr.  Any key-value/RDBMS/iMDG
updateable storage is fair enough to store preferences lists:
user/sessionID -> { (product, weight), (product, weight), (product,
weight),} Then, you can add list of product_id as a boost query
http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 for
every user's request.
Once again, update product index by the click-through stream is usually a
bad idea.
I think it can be used as starting point, MLT and Mahout are definitely
right directions.

Good Luck.


On Sun, Nov 25, 2012 at 5:05 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> I don't have a full picture here, but why not just have userID = {list of
> clicked product IDs} stored somewhere (in memory, disk, DB...) and then, at
> search time, retrieve last N product IDs, run MLT query on those IDs, and
> then do whatever you desire to do... either take top N of those hits and
> slap them on top of regular results, or take top N of those and boost them
> in the main results, or ...  if you are into this, you may find
> http://sematext.com/search-analytics/index.html very useful, or at least
> interesting.
>
> Otis
> --
> SOLR Performance Monitoring - http://sematext.com/spm/index.html
>
>
>
>
> On Fri, Nov 23, 2012 at 12:56 AM, sagarzond <sa...@gmail.com> wrote:
>
> > In our application we are providing product master data search with SOLR.
> > Now
> > our requirement want to provide user context based search(means we are
> > providing top search result using user history).
> >
> > For that i have created one score table having following field
> >
> > 1)product_id
> >
> > 2)user_id
> >
> > 3)score_value
> >
> > As soon as user clicked for any product that will create entry in this
> > table
> > and also increase score_value if already present product for that user.
> We
> > are planning to use boost field and eDisMax from SOLR to improve search
> > result but for this i have to use one to many mapping between score and
> > product table(Because we are having one product with different score
> value
> > for different user) and solr not providing one to many mapping.
> >
> > We can solved this issue (one to many mapping handling) by de-normalizing
> > structure as having multiple product entry with different score value for
> > different user but it result huge amount of redundant data.
> >
> > Is this(de-normalized structure) currect way to handle or is there any
> > other
> > way to handle such context based search.
> >
> > Plz help me
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: User context based search in apache solr

Posted by Lance Norskog <go...@gmail.com>.

Right, he has talked about this in various ways. But the key is take the user-item matrix in full and generate a new data model for recommendation. These approaches shove that datamodel into the search index. It is a batch process.

LucidWorks does this for search clicks.

----- Original Message -----
| From: "Otis Gospodnetic" <ot...@gmail.com>
| To: solr-user@lucene.apache.org
| Sent: Saturday, November 24, 2012 7:39:04 PM
| Subject: Re: User context based search in apache solr
| 
| On the other hand, people have successfully built recommendation
| engines on
| top of Lucene or Solr before, and I think Ted Dunning just mentioned
| this
| over on the Mahout ML a few weeks ago..... have a look at
| http://search-lucene.com/m/dbxtb1ykRkM though I think I recall a
| separate
| recent email where he was a bit more explicit about this.
| 
| Otis
| --
| SOLR Performance Monitoring - http://sematext.com/spm/index.html
| Search Analytics - http://sematext.com/search-analytics/index.html
| 
| 
| 
| 
| On Sat, Nov 24, 2012 at 9:30 PM, Lance Norskog <go...@gmail.com>
| wrote:
| 
| > sagarzond- you are trying to embed a recommendation system into
| > search.
| > Recommendations are inherently a matrix problem, where Solr and
| > other
| > search engines are one-dimensional databases. What you have is a
| > sparse
| > user-product matrix. This book has a good explanation of
| > recommender
| > systems:
| >
| > Mahout In Action
| > http://manning.com/owen/
| >
| >
| >
| > ----- Original Message -----
| > | From: "Otis Gospodnetic" <ot...@gmail.com>
| > | To: solr-user@lucene.apache.org
| > | Sent: Saturday, November 24, 2012 5:05:53 PM
| > | Subject: Re: User context based search in apache solr
| > |
| > | Hi,
| > |
| > | I don't have a full picture here, but why not just have userID =
| > | {list of
| > | clicked product IDs} stored somewhere (in memory, disk, DB...)
| > | and
| > | then, at
| > | search time, retrieve last N product IDs, run MLT query on those
| > | IDs,
| > | and
| > | then do whatever you desire to do... either take top N of those
| > | hits
| > | and
| > | slap them on top of regular results, or take top N of those and
| > | boost
| > | them
| > | in the main results, or ...  if you are into this, you may find
| > | http://sematext.com/search-analytics/index.html very useful, or
| > | at
| > | least
| > | interesting.
| > |
| > | Otis
| > | --
| > | SOLR Performance Monitoring - http://sematext.com/spm/index.html
| > |
| > |
| > |
| > |
| > | On Fri, Nov 23, 2012 at 12:56 AM, sagarzond <sa...@gmail.com>
| > | wrote:
| > |
| > | > In our application we are providing product master data search
| > | > with
| > | > SOLR.
| > | > Now
| > | > our requirement want to provide user context based search(means
| > | > we
| > | > are
| > | > providing top search result using user history).
| > | >
| > | > For that i have created one score table having following field
| > | >
| > | > 1)product_id
| > | >
| > | > 2)user_id
| > | >
| > | > 3)score_value
| > | >
| > | > As soon as user clicked for any product that will create entry
| > | > in
| > | > this
| > | > table
| > | > and also increase score_value if already present product for
| > | > that
| > | > user. We
| > | > are planning to use boost field and eDisMax from SOLR to
| > | > improve
| > | > search
| > | > result but for this i have to use one to many mapping between
| > | > score
| > | > and
| > | > product table(Because we are having one product with different
| > | > score value
| > | > for different user) and solr not providing one to many mapping.
| > | >
| > | > We can solved this issue (one to many mapping handling) by
| > | > de-normalizing
| > | > structure as having multiple product entry with different score
| > | > value for
| > | > different user but it result huge amount of redundant data.
| > | >
| > | > Is this(de-normalized structure) currect way to handle or is
| > | > there
| > | > any
| > | > other
| > | > way to handle such context based search.
| > | >
| > | > Plz help me
| > | >
| > | >
| > | >
| > | > --
| > | > View this message in context:
| > | >
| > http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964.html
| > | > Sent from the Solr - User mailing list archive at Nabble.com.
| > | >
| > |
| >
|

Re: User context based search in apache solr

Posted by Otis Gospodnetic <ot...@gmail.com>.

On the other hand, people have successfully built recommendation engines on
top of Lucene or Solr before, and I think Ted Dunning just mentioned this
over on the Mahout ML a few weeks ago..... have a look at
http://search-lucene.com/m/dbxtb1ykRkM though I think I recall a separate
recent email where he was a bit more explicit about this.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Sat, Nov 24, 2012 at 9:30 PM, Lance Norskog <go...@gmail.com> wrote:

> sagarzond- you are trying to embed a recommendation system into search.
> Recommendations are inherently a matrix problem, where Solr and other
> search engines are one-dimensional databases. What you have is a sparse
> user-product matrix. This book has a good explanation of recommender
> systems:
>
> Mahout In Action
> http://manning.com/owen/
>
>
>
> ----- Original Message -----
> | From: "Otis Gospodnetic" <ot...@gmail.com>
> | To: solr-user@lucene.apache.org
> | Sent: Saturday, November 24, 2012 5:05:53 PM
> | Subject: Re: User context based search in apache solr
> |
> | Hi,
> |
> | I don't have a full picture here, but why not just have userID =
> | {list of
> | clicked product IDs} stored somewhere (in memory, disk, DB...) and
> | then, at
> | search time, retrieve last N product IDs, run MLT query on those IDs,
> | and
> | then do whatever you desire to do... either take top N of those hits
> | and
> | slap them on top of regular results, or take top N of those and boost
> | them
> | in the main results, or ...  if you are into this, you may find
> | http://sematext.com/search-analytics/index.html very useful, or at
> | least
> | interesting.
> |
> | Otis
> | --
> | SOLR Performance Monitoring - http://sematext.com/spm/index.html
> |
> |
> |
> |
> | On Fri, Nov 23, 2012 at 12:56 AM, sagarzond <sa...@gmail.com>
> | wrote:
> |
> | > In our application we are providing product master data search with
> | > SOLR.
> | > Now
> | > our requirement want to provide user context based search(means we
> | > are
> | > providing top search result using user history).
> | >
> | > For that i have created one score table having following field
> | >
> | > 1)product_id
> | >
> | > 2)user_id
> | >
> | > 3)score_value
> | >
> | > As soon as user clicked for any product that will create entry in
> | > this
> | > table
> | > and also increase score_value if already present product for that
> | > user. We
> | > are planning to use boost field and eDisMax from SOLR to improve
> | > search
> | > result but for this i have to use one to many mapping between score
> | > and
> | > product table(Because we are having one product with different
> | > score value
> | > for different user) and solr not providing one to many mapping.
> | >
> | > We can solved this issue (one to many mapping handling) by
> | > de-normalizing
> | > structure as having multiple product entry with different score
> | > value for
> | > different user but it result huge amount of redundant data.
> | >
> | > Is this(de-normalized structure) currect way to handle or is there
> | > any
> | > other
> | > way to handle such context based search.
> | >
> | > Plz help me
> | >
> | >
> | >
> | > --
> | > View this message in context:
> | >
> http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964.html
> | > Sent from the Solr - User mailing list archive at Nabble.com.
> | >
> |
>

Re: User context based search in apache solr

Posted by Lance Norskog <go...@gmail.com>.

sagarzond- you are trying to embed a recommendation system into search. Recommendations are inherently a matrix problem, where Solr and other search engines are one-dimensional databases. What you have is a sparse user-product matrix. This book has a good explanation of recommender systems:

Mahout In Action
http://manning.com/owen/



----- Original Message -----
| From: "Otis Gospodnetic" <ot...@gmail.com>
| To: solr-user@lucene.apache.org
| Sent: Saturday, November 24, 2012 5:05:53 PM
| Subject: Re: User context based search in apache solr
| 
| Hi,
| 
| I don't have a full picture here, but why not just have userID =
| {list of
| clicked product IDs} stored somewhere (in memory, disk, DB...) and
| then, at
| search time, retrieve last N product IDs, run MLT query on those IDs,
| and
| then do whatever you desire to do... either take top N of those hits
| and
| slap them on top of regular results, or take top N of those and boost
| them
| in the main results, or ...  if you are into this, you may find
| http://sematext.com/search-analytics/index.html very useful, or at
| least
| interesting.
| 
| Otis
| --
| SOLR Performance Monitoring - http://sematext.com/spm/index.html
| 
| 
| 
| 
| On Fri, Nov 23, 2012 at 12:56 AM, sagarzond <sa...@gmail.com>
| wrote:
| 
| > In our application we are providing product master data search with
| > SOLR.
| > Now
| > our requirement want to provide user context based search(means we
| > are
| > providing top search result using user history).
| >
| > For that i have created one score table having following field
| >
| > 1)product_id
| >
| > 2)user_id
| >
| > 3)score_value
| >
| > As soon as user clicked for any product that will create entry in
| > this
| > table
| > and also increase score_value if already present product for that
| > user. We
| > are planning to use boost field and eDisMax from SOLR to improve
| > search
| > result but for this i have to use one to many mapping between score
| > and
| > product table(Because we are having one product with different
| > score value
| > for different user) and solr not providing one to many mapping.
| >
| > We can solved this issue (one to many mapping handling) by
| > de-normalizing
| > structure as having multiple product entry with different score
| > value for
| > different user but it result huge amount of redundant data.
| >
| > Is this(de-normalized structure) currect way to handle or is there
| > any
| > other
| > way to handle such context based search.
| >
| > Plz help me
| >
| >
| >
| > --
| > View this message in context:
| > http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964.html
| > Sent from the Solr - User mailing list archive at Nabble.com.
| >
|

Re: User context based search in apache solr

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

I don't have a full picture here, but why not just have userID = {list of
clicked product IDs} stored somewhere (in memory, disk, DB...) and then, at
search time, retrieve last N product IDs, run MLT query on those IDs, and
then do whatever you desire to do... either take top N of those hits and
slap them on top of regular results, or take top N of those and boost them
in the main results, or ...  if you are into this, you may find
http://sematext.com/search-analytics/index.html very useful, or at least
interesting.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html




On Fri, Nov 23, 2012 at 12:56 AM, sagarzond <sa...@gmail.com> wrote:

> In our application we are providing product master data search with SOLR.
> Now
> our requirement want to provide user context based search(means we are
> providing top search result using user history).
>
> For that i have created one score table having following field
>
> 1)product_id
>
> 2)user_id
>
> 3)score_value
>
> As soon as user clicked for any product that will create entry in this
> table
> and also increase score_value if already present product for that user. We
> are planning to use boost field and eDisMax from SOLR to improve search
> result but for this i have to use one to many mapping between score and
> product table(Because we are having one product with different score value
> for different user) and solr not providing one to many mapping.
>
> We can solved this issue (one to many mapping handling) by de-normalizing
> structure as having multiple product entry with different score value for
> different user but it result huge amount of redundant data.
>
> Is this(de-normalized structure) currect way to handle or is there any
> other
> way to handle such context based search.
>
> Plz help me
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: User context based search in apache solr

Posted by sagarzond <sa...@gmail.com>.

Let me re-phrase. In our application de-normalizing "Will" result in to 
    1. required more amount of memory. 
    2. degrade search performance (cpu and response time) 
    
Let me give example - Our application has product table with 1 million
entries and users are increasing exponentially.  



--
View this message in context: http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964p4022156.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: User context based search in apache solr

Posted by Erick Erickson <er...@gmail.com>.

Yes, of course. But the operative word is "may". You haven't said whether
you have 10 doc or 10B docs. Or whether the redundant data increases the
size of your index by a factor or 5% or 5,000%.

My point is that denormalizing is preferable unless and until you can
demonstrate that denormalizing actually _does_ cause you trouble...

Best
Erick

On Fri, Nov 23, 2012 at 9:34 AM, sagarzond <sa...@gmail.com> wrote:

> Hi Erick
>
>     Thanks for reply.
>     In our application having product table with many fields and we are
> providing these all fields during search. If we made de-normalized
> structure
> then there is having lots of redundant data and that may result in to
>     1. required more amount of memory.
>     2. degrade search performance (cpu and response time).
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964p4022049.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: User context based search in apache solr

Posted by sagarzond <sa...@gmail.com>.

Hi Erick 

    Thanks for reply.
    In our application having product table with many fields and we are
providing these all fields during search. If we made de-normalized structure
then there is having lots of redundant data and that may result in to
    1. required more amount of memory.
    2. degrade search performance (cpu and response time).



--
View this message in context: http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964p4022049.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: User context based search in apache solr

Posted by Erick Erickson <er...@gmail.com>.

By and large the correct answer is to de-normalize unless and until you
experience resource problem. Why is having a lot of redundant data a
problem for you? Aesthetics?

Best
Erick


On Fri, Nov 23, 2012 at 12:56 AM, sagarzond <sa...@gmail.com> wrote:

> In our application we are providing product master data search with SOLR.
> Now
> our requirement want to provide user context based search(means we are
> providing top search result using user history).
>
> For that i have created one score table having following field
>
> 1)product_id
>
> 2)user_id
>
> 3)score_value
>
> As soon as user clicked for any product that will create entry in this
> table
> and also increase score_value if already present product for that user. We
> are planning to use boost field and eDisMax from SOLR to improve search
> result but for this i have to use one to many mapping between score and
> product table(Because we are having one product with different score value
> for different user) and solr not providing one to many mapping.
>
> We can solved this issue (one to many mapping handling) by de-normalizing
> structure as having multiple product entry with different score value for
> different user but it result huge amount of redundant data.
>
> Is this(de-normalized structure) currect way to handle or is there any
> other
> way to handle such context based search.
>
> Plz help me
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/User-context-based-search-in-apache-solr-tp4021964.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>