You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Doug Turnbull <dt...@opensourceconnections.com> on 2016/10/09 00:54:39 UTC

Re: Stream expressions: Break up multivalue field into usable tuples

Joel -- thanks! Got this working and now feel in a better shape to grok
what's happening

Out of curiosity, is there any work being done to customize scoreNodes
scoring? There's a bunch of other forms of similarity I wouldn't mind
playing with as well.

On Thu, Sep 22, 2016 at 6:06 PM Joel Bernstein <jo...@gmail.com> wrote:

You could use the facet() expression which works with multi-value fields.

This emits aggregated tuples useful for recommendations. For example:



facet(baskets,

         q="item:taco",

         buckets="item",

         bucketSorts="count(*) desc",

         bucketSizeLimit="100",

         count(*))



You can feed this to scoreNodes() to score the tuples for a recommendation.

scoreNodes is a graph expression so it expects tuples to be formatted like

a node set. Specifically it looks for the following fields: node, field and

collection, which it uses to retrieve the IDF for each node.



The select() function can turn your facet response into a node set, so

scoreNodes can operate on it:



scoreNodes(

                    select(facet(baskets,

                     q="item:taco",

                     buckets="item",

                     bucketSorts="count(*) desc",

                     bucketSizeLimit=100,

                     count(*)),

               item as node,

               count(*),

               replace(collection, null, withValue=baskets),

               replace(field, null, withValue=item)))



There is a ticket open to have scoreNodes operate directly on the facet()

function so you don't have to deal with

the select() function. https://issues.apache.org/jira/browse/SOLR-9537. I'd

like to get to this soon.















Joel Bernstein

http://joelsolr.blogspot.com/



On Thu, Sep 22, 2016 at 5:02 PM, Doug Turnbull <

dturnbull@opensourceconnections.com> wrote:



> I have a field like follows in my search index

>

> {

>    "shopper_id": 1234,

>    "basket_id": 2512,

>    "items_bought": ["eggs", "tacos", "nachos"]

> }

>

> {

>    "shopper_id" 1236,

>    "basket_id": 2515,

>    "items_bought": ["eggs", "tacos", "chicken", "bubble gum"]

> }

>

> I would like to use some of the stream expression capabilities (in this

> case I'm looking at the recsys stuff) but it seems like I need to break up

> my data into tuples like

>

> {

>    "shopper_id": 1234,

>    "basket_id": 2512,

>     "item": "egg"

> },

> {

>    "shopper_id": 1234

>    "basket_id": 2512,

>    "item": "taco"

> }

> {

>    "shopper_id": 1234

>    "basket_id": 2512,

>    "item": "nacho"

> }

> ...

>

> For various other reasons, I'd prefer to keep my original data model with

> Solr doc == one shopper basket.

>

> Now is there a way to take documents above, output from a search tuple

> source and apply a stream mutator to emit baskets with a field broken up

> like above? (do let me know if I'm missing something completely here)

>

> Thanks!

> -Doug

>

Re: Stream expressions: Break up multivalue field into usable tuples

Posted by Joel Bernstein <jo...@gmail.com>.
Great, I'm not sure if you noticed that SOLR-9537 has been committed and
will be in 6.3. So now you can directly wrap a facet expression with the
scoreNodes expression.

Yeah, other scoring algorithms would be a great thing. We can adjust the
ScoreNodesStream to make this more flexible. Feel free to create a ticket
to kickoff the discussion.

The fetch() expression (SOLR-9337) is also ready to commit. this will allow
you to run the following construct:

classify(fetch(top(scoreNodes(facet()))

This runs the facets, scores them, takes the top N, fetches a text field
(product description), and runs a classifier to personalize the
recommendation.

This will work with graph expression just as well as facets.

classify() uses the model that is optimized by the train() function and
stored in SolrCloud.

This makes combining graph queries with AI models very simple to deploy in
recommender systems.












Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Oct 8, 2016 at 8:54 PM, Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> Joel -- thanks! Got this working and now feel in a better shape to grok
> what's happening
>
> Out of curiosity, is there any work being done to customize scoreNodes
> scoring? There's a bunch of other forms of similarity I wouldn't mind
> playing with as well.
>
> On Thu, Sep 22, 2016 at 6:06 PM Joel Bernstein <jo...@gmail.com> wrote:
>
> You could use the facet() expression which works with multi-value fields.
>
> This emits aggregated tuples useful for recommendations. For example:
>
>
>
> facet(baskets,
>
>          q="item:taco",
>
>          buckets="item",
>
>          bucketSorts="count(*) desc",
>
>          bucketSizeLimit="100",
>
>          count(*))
>
>
>
> You can feed this to scoreNodes() to score the tuples for a recommendation.
>
> scoreNodes is a graph expression so it expects tuples to be formatted like
>
> a node set. Specifically it looks for the following fields: node, field and
>
> collection, which it uses to retrieve the IDF for each node.
>
>
>
> The select() function can turn your facet response into a node set, so
>
> scoreNodes can operate on it:
>
>
>
> scoreNodes(
>
>                     select(facet(baskets,
>
>                      q="item:taco",
>
>                      buckets="item",
>
>                      bucketSorts="count(*) desc",
>
>                      bucketSizeLimit=100,
>
>                      count(*)),
>
>                item as node,
>
>                count(*),
>
>                replace(collection, null, withValue=baskets),
>
>                replace(field, null, withValue=item)))
>
>
>
> There is a ticket open to have scoreNodes operate directly on the facet()
>
> function so you don't have to deal with
>
> the select() function. https://issues.apache.org/jira/browse/SOLR-9537.
> I'd
>
> like to get to this soon.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
>
> http://joelsolr.blogspot.com/
>
>
>
> On Thu, Sep 22, 2016 at 5:02 PM, Doug Turnbull <
>
> dturnbull@opensourceconnections.com> wrote:
>
>
>
> > I have a field like follows in my search index
>
> >
>
> > {
>
> >    "shopper_id": 1234,
>
> >    "basket_id": 2512,
>
> >    "items_bought": ["eggs", "tacos", "nachos"]
>
> > }
>
> >
>
> > {
>
> >    "shopper_id" 1236,
>
> >    "basket_id": 2515,
>
> >    "items_bought": ["eggs", "tacos", "chicken", "bubble gum"]
>
> > }
>
> >
>
> > I would like to use some of the stream expression capabilities (in this
>
> > case I'm looking at the recsys stuff) but it seems like I need to break
> up
>
> > my data into tuples like
>
> >
>
> > {
>
> >    "shopper_id": 1234,
>
> >    "basket_id": 2512,
>
> >     "item": "egg"
>
> > },
>
> > {
>
> >    "shopper_id": 1234
>
> >    "basket_id": 2512,
>
> >    "item": "taco"
>
> > }
>
> > {
>
> >    "shopper_id": 1234
>
> >    "basket_id": 2512,
>
> >    "item": "nacho"
>
> > }
>
> > ...
>
> >
>
> > For various other reasons, I'd prefer to keep my original data model with
>
> > Solr doc == one shopper basket.
>
> >
>
> > Now is there a way to take documents above, output from a search tuple
>
> > source and apply a stream mutator to emit baskets with a field broken up
>
> > like above? (do let me know if I'm missing something completely here)
>
> >
>
> > Thanks!
>
> > -Doug
>
> >
>