You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by nolim <al...@gmail.com> on 2014/04/28 20:48:40 UTC

saving user actions on item in solr for later retrieval

Hi,
We are using solr in production system for around ~500 users and we have
around ~10000 queries per day.
Our user's search topics most of the time static and repeat themselves over
time. 

We have in our system an option to specify "specific search subject" (we
also call it "specific information need") and most of our users are using
this option.
We keep in our system logs each query and document retrieved from each
"information need"
and the user can also give feedback if the document is relevant for his
"information need".

We also have special query expansion technique and diversity algorithm based
on MMR.

We want to use this information from logs as data set for training our
ranking system
and preforming "Learning To Rank" for each "information need" or cluster of
"information needs".
We also want to give the user the option filter by "relevant" and "read"
based on his actions\friends actions in the same topic.
When he runs a query again or similar one he can skip already read
documents. That's an important requirement to our users.

We think about 2 possibilities to implement it:
1. Updating each item in solr and creating 2 fields named: "read",
"relevant".
Each field is multivalue field with the corresponding label of the
"information need".
When the user reads a document an update is sent to solr and the field
"read" gets a label with
the "information need" the user is working on...
Will cause update when each item is read by user (still nothing compare to
new items coming in each day).
We are saving information that "belongs" to the application in solr which
may be wrong architecture.

2. Save the information In DB, and then preforming filtering on the
retrieved results.
this option is much more complicated (We now have "fields" that aren't solr
and the user uses them for search). We won't get facets, autocomplete and
other nice stuff that a regular field in solr can have.
cost in preformances, we can''t retrieve easy: "give me top 10 documents
that answer the query and unread from the information need" and more
complicated code to hold.

3. Do you have more ideas?

Which of those options is the better?

Thanks in advance!



--
View this message in context: http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: saving user actions on item in solr for later retrieval

Posted by nolim <al...@gmail.com>.
Thank you, we will check it out.
 On Apr 29, 2014 9:28 PM, "iorixxx [via Lucene]" <
ml-node+s472066n4133796h19@n3.nabble.com> wrote:

> Hi Nolim,
>
> Actually EFF is searchable. See my comments at the end of the page
>
>
> https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
>
> Ahmet
>
>
>
> On Tuesday, April 29, 2014 9:07 PM, nolim <[hidden email]<http://user/SendEmail.jtp?type=node&node=4133796&i=0>>
> wrote:
> Thank you, it was interesting and I have learned some new things in solr
> :)
>
> But the "External File Field" isn't a good option because the field is
> unsearchable which it very important to us.
> We think about the first option (updating document in solr) but preforming
> commit only each 10 minutes - If we would like to retrieve the value
> realtime we can use RealTimeGet.
>
> Maybe you have other suggestion?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133793.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133796.html
>  To unsubscribe from saving user actions on item in solr for later
> retrieval, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4133558&code=YWxvbnlhZG9AZ21haWwuY29tfDQxMzM1NTh8MTMwMDI0NTg3MA==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133955.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: saving user actions on item in solr for later retrieval

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Nolim,

Actually EFF is searchable. See my comments at the end of the pageĀ 

https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes

Ahmet



On Tuesday, April 29, 2014 9:07 PM, nolim <al...@gmail.com> wrote:
Thank you, it was interesting and I have learned some new things in solr :)

But the "External File Field" isn't a good option because the field is
unsearchable which it very important to us.
We think about the first option (updating document in solr) but preforming
commit only each 10 minutes - If we would like to retrieve the value
realtime we can use RealTimeGet.

Maybe you have other suggestion?




--
View this message in context: http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133793.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: saving user actions on item in solr for later retrieval

Posted by nolim <al...@gmail.com>.
Thank you, it was interesting and I have learned some new things in solr :)

But the "External File Field" isn't a good option because the field is
unsearchable which it very important to us.
We think about the first option (updating document in solr) but preforming
commit only each 10 minutes - If we would like to retrieve the value
realtime we can use RealTimeGet.

Maybe you have other suggestion?




--
View this message in context: http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133793.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: saving user actions on item in solr for later retrieval

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
1. might be too expensive in terms of commits and performance of
refreshing the index every time.

3. Have you looked at external fields, custom components, etc. For example:
http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-td4040200.html
(past discussion that seems relevant)

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 29, 2014 at 1:48 AM, nolim <al...@gmail.com> wrote:
> Hi,
> We are using solr in production system for around ~500 users and we have
> around ~10000 queries per day.
> Our user's search topics most of the time static and repeat themselves over
> time.
>
> We have in our system an option to specify "specific search subject" (we
> also call it "specific information need") and most of our users are using
> this option.
> We keep in our system logs each query and document retrieved from each
> "information need"
> and the user can also give feedback if the document is relevant for his
> "information need".
>
> We also have special query expansion technique and diversity algorithm based
> on MMR.
>
> We want to use this information from logs as data set for training our
> ranking system
> and preforming "Learning To Rank" for each "information need" or cluster of
> "information needs".
> We also want to give the user the option filter by "relevant" and "read"
> based on his actions\friends actions in the same topic.
> When he runs a query again or similar one he can skip already read
> documents. That's an important requirement to our users.
>
> We think about 2 possibilities to implement it:
> 1. Updating each item in solr and creating 2 fields named: "read",
> "relevant".
> Each field is multivalue field with the corresponding label of the
> "information need".
> When the user reads a document an update is sent to solr and the field
> "read" gets a label with
> the "information need" the user is working on...
> Will cause update when each item is read by user (still nothing compare to
> new items coming in each day).
> We are saving information that "belongs" to the application in solr which
> may be wrong architecture.
>
> 2. Save the information In DB, and then preforming filtering on the
> retrieved results.
> this option is much more complicated (We now have "fields" that aren't solr
> and the user uses them for search). We won't get facets, autocomplete and
> other nice stuff that a regular field in solr can have.
> cost in preformances, we can''t retrieve easy: "give me top 10 documents
> that answer the query and unread from the information need" and more
> complicated code to hold.
>
> 3. Do you have more ideas?
>
> Which of those options is the better?
>
> Thanks in advance!
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: saving user actions on item in solr for later retrieval

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
is there somebody from LucidWorks who can refer to Click Score Relevance
Framework in LucidWorks Search?


On Mon, Apr 28, 2014 at 10:48 PM, nolim <al...@gmail.com> wrote:

> Hi,
> We are using solr in production system for around ~500 users and we have
> around ~10000 queries per day.
> Our user's search topics most of the time static and repeat themselves over
> time.
>
> We have in our system an option to specify "specific search subject" (we
> also call it "specific information need") and most of our users are using
> this option.
> We keep in our system logs each query and document retrieved from each
> "information need"
> and the user can also give feedback if the document is relevant for his
> "information need".
>
> We also have special query expansion technique and diversity algorithm
> based
> on MMR.
>
> We want to use this information from logs as data set for training our
> ranking system
> and preforming "Learning To Rank" for each "information need" or cluster of
> "information needs".
> We also want to give the user the option filter by "relevant" and "read"
> based on his actions\friends actions in the same topic.
> When he runs a query again or similar one he can skip already read
> documents. That's an important requirement to our users.
>
> We think about 2 possibilities to implement it:
> 1. Updating each item in solr and creating 2 fields named: "read",
> "relevant".
> Each field is multivalue field with the corresponding label of the
> "information need".
> When the user reads a document an update is sent to solr and the field
> "read" gets a label with
> the "information need" the user is working on...
> Will cause update when each item is read by user (still nothing compare to
> new items coming in each day).
> We are saving information that "belongs" to the application in solr which
> may be wrong architecture.
>
> 2. Save the information In DB, and then preforming filtering on the
> retrieved results.
> this option is much more complicated (We now have "fields" that aren't solr
> and the user uses them for search). We won't get facets, autocomplete and
> other nice stuff that a regular field in solr can have.
> cost in preformances, we can''t retrieve easy: "give me top 10 documents
> that answer the query and unread from the information need" and more
> complicated code to hold.
>
> 3. Do you have more ideas?
>
> Which of those options is the better?
>
> Thanks in advance!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>