You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Dennis Honders <de...@gmail.com> on 2017/05/24 15:28:54 UTC

UR optimizing results

*Current data: *

{"event": "cart-transaction", "entityId": "1", "entityType": "user",
"targetEntityId": "12", "targetEntityType": "item"},

{"event": "$set", "entityType": "item", "entityId": "12", "properties":
{"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1,
"label": "test", "price": "$1-$2"}}

*Questions: *

Cart-transaction is the primary for shopping cart recommendation, maybe use
user-buy-item as secondary event or is there no link between this?

Item-based queries are for similar items. For shopping cart
recommendations, complementary recommendations will suite better? If so,
those are made by 'user-id' (cart-id). How can this be done?

I like to do content-based recommendation for items that haven't been in a
transaction. I think this can be configured in the engine.json. Any advice
for doing this?

*Engine.json: *

{
  "comment":" This config file uses default settings for all but the
required values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "ur-name",
      "appName": "Test",
      "eventNames": ["cart-transaction"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator":
"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer.mb": "300",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity
based backfill, must add eventsNames",
      "name": "ur",
      "params": {
"appName": "Test",
"indexName": "test",
"typeName": "cart",
"comment": "must have data for the first event or the model will not build,
other events are optional",
"eventNames": ["cart-transaction"],
"maxEventsPerEventType": 50000,
"maxCorrelatorsPerEventType": 5000,
"num": 10,
"itemBias": 2.0,
"rankings": [{
"name": "preferredRank",
"type": "userDefined"
}]
      }
    }
  ]
}

Re: UR optimizing results

Posted by Pat Ferrel <pa...@occamsmachete.com>.
There is a good description of bias here: http://actionml.com/docs/ur_advanced_tuning#rules <http://actionml.com/docs/ur_advanced_tuning#rules> and here: http://actionml.com/docs/ur_config#bias <http://actionml.com/docs/ur_config#bias>. A bias < 1 but > 0 will disfavor recommendations with matching attributes. 

**You want to make the bias slightly > 1** or much greater to be more of a filter. It is multiplied by the score and recommendations are re-ranked by the new score—except for 0 and < 0, which have special meanings.

A bias of 1 is neutral, above favors items, below disfavors (except for 0 and below)

BTW I would not use categorical boosting on item-sets. Why would you do this? You may be thinking you know better than the recommender but why have a recommender if you know better? If you want to do this, please try with and without the rules and A/B test the difference, at least the decision to use rules or not with be based on data. 

I have seen cases where intuitive rules have completely zeroed out the benefit of a recommender. They represent overrides to the normal way the recommender works. There are cases where they benefit results too but as I just said—not always.


On May 30, 2017, at 7:20 AM, Dennis Honders <de...@gmail.com> wrote:

I made a mistake in building the query. It is now fixed. 

I found that a bias of 0.01 will boost the recommendations a bit, but will return recommendations based on properties for products that are never sold. Is this correct behaviour for this bias-value? From the docs this should boost the results a bit but as disfavoring. I don't know exactly what is meant with 'disfavoring' here. It feels a bit contradictory with 'boosts'. 

I tested this with products that are never sold. When I tested products individually, I received recommendations based on properties. This is also what I wanted to achieve for now. 
When I tested three products in one query (like in the json below), I received recommendations only for the 'stronger' product, in this case for label 'test3' that belongs to product3. 
Is it possible to tweak this, so recommendations will also be made for like label 'test' and 'test2' from product 1 and 2? 
It is not an ordering problem, like only properties are used for the last product/property in the array.


{
  "itemSet": [
    1, 
    2, 
    3
  ],
  "num": 10,
  "fields": [
    {
      "name": "category",
      "values": [
        "31",
        "32",
        "33",
        "34",
        "35",
        "36"
      ],
      "bias": 0.01
    },
    {
      "name": "manufacturer",
      "values": [
        "11",
        "12",
        "13"
      ],
      "bias": 0.01
    },
    {
      "name": "label",
      "values": [
        "test",
        "test2",
        "test3"
      ],
      "bias": 0.01
    },
    {
      "name": "price",
      "values": [
        "$10-$25",
        "$20-$50",
        "$10-$25"
      ],
      "bias": 0.01
    }
  ]
}


2017-05-26 17:57 GMT+02:00 Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>>:
It would be easier to tell from the JSON but first off I notice the “values” should be arrays of strings, even if they have only one value.

Also be aware that too many -1 filters may cause no results to be returned. Business rules are dangerous, they do no work on all items, they filter recommendations so if you only have a few possible recommendation, they may filter all out of results. Boosts are more forgiving since they will never remove, only re-rank. Even this should be used sparingly since you are overriding the recommended ranking.

Fields and the rules they encode are required for some placements or for things like “in-stock”: [“true”] but be careful about to much use of them without really good cause or unless you plan to A/B test with and without rules.


On May 26, 2017, at 8:29 AM, Dennis Honders <dennishonders@gmail.com <ma...@gmail.com>> wrote:

I was already looking at the docs for property based recs. 

I now have added fields in the query (Java backend): 

JsonObject response = engineClient.sendQuery(ImmutableMap.<String, Object>of(
	queryKey, ImmutableList.builder().addAll(productIds).build(),
	NUM_KEY, NUM_VALUE
	"fields", ImmutableList.builder().add(
		ImmutableMap.<String, Object>of(
		"name", "category",
		"values", ImmutableList.builder().add("5").build(),
		"bias", -1
		)
	).add(
		ImmutableMap.<String, Object>of(
				"name", "manufacturer",
				"values", 33,
				"bias", -1
		)
	).add(
		ImmutableMap.<String, Object>of(
				"name", "label",
				"values", "testlabel",
				"bias", -1
		)
	).add(
		ImmutableMap.<String, Object>of(
				"name", "price",
				"values", "$10-$25",
				"bias", -1
		)
	)
));

Fields is hardcoded for testing. Is this the correct ways to configure fields in the query? 
Currently, there is no difference in results. 

What else needs to be done in the ranking/fields in engine.json?

2017-05-24 19:43 GMT+02:00 Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>>:
I suggest you read the docs here: http://actionml.com/docs/ur <http://actionml.com/docs/ur> Pay particular attention to attaching properties to items and using fields to query for those properties. This is the only way to get items with no usage data. You could promote items with business rules or adopt some kind of ordering or items that puts new items ahead of popular ones. So check custom “rankings” and "item properties”. 

itemBias is used for item-based queries and refers to item-similarity based on usage data, not content similarity.

It is difficult to truly mix content-based recs where no usage data exists and collaborative filtering because you would be giving up the advantage of CF. Therefore I suggest some separate rolling promotion mechanism in a separate placement. Then you’ll get usage data, at least detail views.



On May 24, 2017, at 10:33 AM, Dennis Honders <dennishonders@gmail.com <ma...@gmail.com>> wrote:

Thanks again for the answer. I will read the paper soon. 
How can recommendations be configured for content-based filtering (based on item properties) for products which are never sold? Instead of using e.g. populair items. 

Boosting with these properties is done with itemBias. 

Op 24 mei 2017 om 17:54 heeft Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>> het volgende geschreven:

> I split answers in 2 since the config is a completely separate thing.
> 
> increasing maxCorrelatorsPerEventType it usually the wrong thing to do. It is making the model fuzzier, for lack of a better term. I fact we’d like to restrict the correlators to only the best and maxCorrelatorsPerEventType is a crude way to do this that is worse the more you allow. Another new method is an LLR threshold, which can be set per indicator to use the correlation value as a threshold for inclusion as a correlator. maxCorrelatorsPerEventType just take the top ones even if their scores are low. This is why making this number big will not make things better because it will include more of lower quality.
> 
> Also maxEventsPerEventType increases memory usage and takes far longer to calculate the model for very little if any gain. This is from a paper by Sebastian Schelter, one of the inventors of CCO https://ssc.io/pdf/rec11-schelter.pdf <https://ssc.io/pdf/rec11-schelter.pdf>
> 
> I’d leave those as defaulted and measure a baseline KPI before doing A/B tests or cross-validation to try different numbers there.
> 
> 
> On May 24, 2017, at 8:28 AM, Dennis Honders <dennishonders@gmail.com <ma...@gmail.com>> wrote:
> 
> Current data: 
> 
> {"event": "cart-transaction", "entityId": "1", "entityType": "user", "targetEntityId": "12", "targetEntityType": "item"}, 
> 
> {"event": "$set", "entityType": "item", "entityId": "12", "properties": {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1, "label": "test", "price": "$1-$2"}}
> 
> Questions: 
> 
> Cart-transaction is the primary for shopping cart recommendation, maybe use user-buy-item as secondary event or is there no link between this?
> 
> Item-based queries are for similar items. For shopping cart recommendations, complementary recommendations will suite better? If so, those are made by 'user-id' (cart-id). How can this be done?
> 
> I like to do content-based recommendation for items that haven't been in a transaction. I think this can be configured in the engine.json. Any advice for doing this?
> 
> Engine.json: 
> 
> {
>   "comment":" This config file uses default settings for all but the required values see README.md for docs",
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "com.actionml.RecommendationEngine",
>   "datasource": {
>     "params" : {
>       "name": "ur-name",
>       "appName": "Test",
>       "eventNames": ["cart-transaction"]
>     }
>   },
>   "sparkConf": {
>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io <http://sparkbindings.io/>.MahoutKryoRegistrator",
>     "spark.kryo.referenceTracking": "false",
>     "spark.kryoserializer.buffer.mb": "300",
>     "spark.kryoserializer.buffer": "300m",
>     "es.index.auto.create": "true"
>   },
>   "algorithms": [
>     {
>       "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
>       "name": "ur",
>       "params": {
> 		"appName": "Test",
> 		"indexName": "test",
> 		"typeName": "cart",
> 		"comment": "must have data for the first event or the model will not build, other events are optional",
> 		"eventNames": ["cart-transaction"],
> 		"maxEventsPerEventType": 50000,
> 		"maxCorrelatorsPerEventType": 5000,
> 		"num": 10, 
> 		"itemBias": 2.0,
> 		"rankings": [{
> 			"name": "preferredRank",
> 			"type": "userDefined"
> 		}]
>       }
>     }
>   ]
> }
> 
> 





-- 
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-user+unsubscribe@googlegroups.com <ma...@googlegroups.com>.
To post to this group, send email to actionml-user@googlegroups.com <ma...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CALfiJ8bU6kyrX%2BBRAc3q%2BkoKxD%2BNzBSqMoan2jdNP2uDJc7pOg%40mail.gmail.com <https://groups.google.com/d/msgid/actionml-user/CALfiJ8bU6kyrX%2BBRAc3q%2BkoKxD%2BNzBSqMoan2jdNP2uDJc7pOg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


Re: UR optimizing results

Posted by Dennis Honders <de...@gmail.com>.
I made a mistake in building the query. It is now fixed.

I found that a bias of 0.01 will boost the recommendations a bit, but will
return recommendations based on properties for products that are never
sold. Is this correct behaviour for this bias-value? From the docs this
should boost the results a bit but as disfavoring. I don't know exactly
what is meant with 'disfavoring' here. It feels a bit contradictory with
'boosts'.

I tested this with products that are never sold. When I tested products
individually, I received recommendations based on properties. This is also
what I wanted to achieve for now.
When I tested three products in one query (like in the json below), I
received recommendations only for the 'stronger' product, in this case for
label 'test3' that belongs to product3.
Is it possible to tweak this, so recommendations will also be made for like
label 'test' and 'test2' from product 1 and 2?
It is not an ordering problem, like only properties are used for the last
product/property in the array.


{
  "itemSet": [
    1,
    2,
    3
  ],
  "num": 10,
  "fields": [
    {
      "name": "category",
      "values": [
        "31",
        "32",
        "33",
        "34",
        "35",
        "36"
      ],
      "bias": 0.01
    },
    {
      "name": "manufacturer",
      "values": [
        "11",
        "12",
        "13"
      ],
      "bias": 0.01
    },
    {
      "name": "label",
      "values": [
        "test",
        "test2",
        "test3"
      ],
      "bias": 0.01
    },
    {
      "name": "price",
      "values": [
        "$10-$25",
        "$20-$50",
        "$10-$25"
      ],
      "bias": 0.01
    }
  ]
}


2017-05-26 17:57 GMT+02:00 Pat Ferrel <pa...@occamsmachete.com>:

> It would be easier to tell from the JSON but first off I notice the
> “values” should be arrays of strings, even if they have only one value.
>
> Also be aware that too many -1 filters may cause no results to be
> returned. Business rules are dangerous, they do no work on all items, they
> filter recommendations so if you only have a few possible recommendation,
> they may filter all out of results. Boosts are more forgiving since they
> will never remove, only re-rank. Even this should be used sparingly since
> you are overriding the recommended ranking.
>
> Fields and the rules they encode are required for some placements or for
> things like “in-stock”: [“true”] but be careful about to much use of them
> without really good cause or unless you plan to A/B test with and without
> rules.
>
>
> On May 26, 2017, at 8:29 AM, Dennis Honders <de...@gmail.com>
> wrote:
>
> I was already looking at the docs for property based recs.
>
> I now have added fields in the query (Java backend):
>
> JsonObject response = engineClient.sendQuery(ImmutableMap.<String,
> Object>of(
> queryKey, ImmutableList.builder().addAll(productIds).build(),
> NUM_KEY, NUM_VALUE
> "fields", ImmutableList.builder().add(
> ImmutableMap.<String, Object>of(
> "name", "category",
> "values", ImmutableList.builder().add("5").build(),
> "bias", -1
> )
> ).add(
> ImmutableMap.<String, Object>of(
> "name", "manufacturer",
> "values", 33,
> "bias", -1
> )
> ).add(
> ImmutableMap.<String, Object>of(
> "name", "label",
> "values", "testlabel",
> "bias", -1
> )
> ).add(
> ImmutableMap.<String, Object>of(
> "name", "price",
> "values", "$10-$25",
> "bias", -1
> )
> )
> ));
>
> Fields is hardcoded for testing. Is this the correct ways to configure
> fields in the query?
> Currently, there is no difference in results.
>
> What else needs to be done in the ranking/fields in engine.json?
>
> 2017-05-24 19:43 GMT+02:00 Pat Ferrel <pa...@occamsmachete.com>:
>
>> I suggest you read the docs here: http://actionml.com/docs/ur Pay
>> particular attention to attaching properties to items and using fields to
>> query for those properties. This is the only way to get items with no usage
>> data. You could promote items with business rules or adopt some kind of
>> ordering or items that puts new items ahead of popular ones. So check
>> custom “rankings” and "item properties”.
>>
>> itemBias is used for item-based queries and refers to item-similarity
>> based on usage data, not content similarity.
>>
>> It is difficult to truly mix content-based recs where no usage data
>> exists and collaborative filtering because you would be giving up the
>> advantage of CF. Therefore I suggest some separate rolling promotion
>> mechanism in a separate placement. Then you’ll get usage data, at least
>> detail views.
>>
>>
>>
>> On May 24, 2017, at 10:33 AM, Dennis Honders <de...@gmail.com>
>> wrote:
>>
>> Thanks again for the answer. I will read the paper soon.
>> How can recommendations be configured for content-based filtering (based
>> on item properties) for products which are never sold? Instead of using
>> e.g. populair items.
>>
>> Boosting with these properties is done with itemBias.
>>
>> Op 24 mei 2017 om 17:54 heeft Pat Ferrel <pa...@occamsmachete.com> het
>> volgende geschreven:
>>
>> I split answers in 2 since the config is a completely separate thing.
>>
>> increasing maxCorrelatorsPerEventType it usually the wrong thing to do.
>> It is making the model fuzzier, for lack of a better term. I fact we’d like
>> to restrict the correlators to only the best and maxCorrelatorsPerEventType
>> is a crude way to do this that is worse the more you allow. Another new
>> method is an LLR threshold, which can be set per indicator to use the
>> correlation value as a threshold for inclusion as a
>> correlator. maxCorrelatorsPerEventType just take the top ones even if
>> their scores are low. This is why making this number big will not make
>> things better because it will include more of lower quality.
>>
>> Also maxEventsPerEventType increases memory usage and takes far longer to
>> calculate the model for very little if any gain. This is from a paper by
>> Sebastian Schelter, one of the inventors of CCO
>> https://ssc.io/pdf/rec11-schelter.pdf
>>
>> I’d leave those as defaulted and measure a baseline KPI before doing A/B
>> tests or cross-validation to try different numbers there.
>>
>>
>> On May 24, 2017, at 8:28 AM, Dennis Honders <de...@gmail.com>
>> wrote:
>>
>> *Current data: *
>>
>> {"event": "cart-transaction", "entityId": "1", "entityType": "user",
>> "targetEntityId": "12", "targetEntityType": "item"},
>>
>> {"event": "$set", "entityType": "item", "entityId": "12", "properties":
>> {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1,
>> "label": "test", "price": "$1-$2"}}
>>
>> *Questions: *
>>
>> Cart-transaction is the primary for shopping cart recommendation, maybe
>> use user-buy-item as secondary event or is there no link between this?
>>
>> Item-based queries are for similar items. For shopping cart
>> recommendations, complementary recommendations will suite better? If so,
>> those are made by 'user-id' (cart-id). How can this be done?
>>
>> I like to do content-based recommendation for items that haven't been in
>> a transaction. I think this can be configured in the engine.json. Any
>> advice for doing this?
>>
>> *Engine.json: *
>>
>> {
>>   "comment":" This config file uses default settings for all but the
>> required values see README.md for docs",
>>   "id": "default",
>>   "description": "Default settings",
>>   "engineFactory": "com.actionml.RecommendationEngine",
>>   "datasource": {
>>     "params" : {
>>       "name": "ur-name",
>>       "appName": "Test",
>>       "eventNames": ["cart-transaction"]
>>     }
>>   },
>>   "sparkConf": {
>>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io
>> .MahoutKryoRegistrator",
>>     "spark.kryo.referenceTracking": "false",
>>     "spark.kryoserializer.buffer.mb": "300",
>>     "spark.kryoserializer.buffer": "300m",
>>     "es.index.auto.create": "true"
>>   },
>>   "algorithms": [
>>     {
>>       "comment": "simplest setup where all values are default, popularity
>> based backfill, must add eventsNames",
>>       "name": "ur",
>>       "params": {
>> "appName": "Test",
>> "indexName": "test",
>> "typeName": "cart",
>> "comment": "must have data for the first event or the model will not
>> build, other events are optional",
>> "eventNames": ["cart-transaction"],
>> "maxEventsPerEventType": 50000,
>> "maxCorrelatorsPerEventType": 5000,
>> "num": 10,
>> "itemBias": 2.0,
>> "rankings": [{
>> "name": "preferredRank",
>> "type": "userDefined"
>> }]
>>       }
>>     }
>>   ]
>> }
>>
>>
>>
>>
>
>

Re: UR optimizing results

Posted by Pat Ferrel <pa...@occamsmachete.com>.
It would be easier to tell from the JSON but first off I notice the “values” should be arrays of strings, even if they have only one value.

Also be aware that too many -1 filters may cause no results to be returned. Business rules are dangerous, they do no work on all items, they filter recommendations so if you only have a few possible recommendation, they may filter all out of results. Boosts are more forgiving since they will never remove, only re-rank. Even this should be used sparingly since you are overriding the recommended ranking.

Fields and the rules they encode are required for some placements or for things like “in-stock”: [“true”] but be careful about to much use of them without really good cause or unless you plan to A/B test with and without rules.


On May 26, 2017, at 8:29 AM, Dennis Honders <de...@gmail.com> wrote:

I was already looking at the docs for property based recs. 

I now have added fields in the query (Java backend): 

JsonObject response = engineClient.sendQuery(ImmutableMap.<String, Object>of(
	queryKey, ImmutableList.builder().addAll(productIds).build(),
	NUM_KEY, NUM_VALUE
	"fields", ImmutableList.builder().add(
		ImmutableMap.<String, Object>of(
		"name", "category",
		"values", ImmutableList.builder().add("5").build(),
		"bias", -1
		)
	).add(
		ImmutableMap.<String, Object>of(
				"name", "manufacturer",
				"values", 33,
				"bias", -1
		)
	).add(
		ImmutableMap.<String, Object>of(
				"name", "label",
				"values", "testlabel",
				"bias", -1
		)
	).add(
		ImmutableMap.<String, Object>of(
				"name", "price",
				"values", "$10-$25",
				"bias", -1
		)
	)
));

Fields is hardcoded for testing. Is this the correct ways to configure fields in the query? 
Currently, there is no difference in results. 

What else needs to be done in the ranking/fields in engine.json?

2017-05-24 19:43 GMT+02:00 Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>>:
I suggest you read the docs here: http://actionml.com/docs/ur <http://actionml.com/docs/ur> Pay particular attention to attaching properties to items and using fields to query for those properties. This is the only way to get items with no usage data. You could promote items with business rules or adopt some kind of ordering or items that puts new items ahead of popular ones. So check custom “rankings” and "item properties”. 

itemBias is used for item-based queries and refers to item-similarity based on usage data, not content similarity.

It is difficult to truly mix content-based recs where no usage data exists and collaborative filtering because you would be giving up the advantage of CF. Therefore I suggest some separate rolling promotion mechanism in a separate placement. Then you’ll get usage data, at least detail views.



On May 24, 2017, at 10:33 AM, Dennis Honders <dennishonders@gmail.com <ma...@gmail.com>> wrote:

Thanks again for the answer. I will read the paper soon. 
How can recommendations be configured for content-based filtering (based on item properties) for products which are never sold? Instead of using e.g. populair items. 

Boosting with these properties is done with itemBias. 

Op 24 mei 2017 om 17:54 heeft Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>> het volgende geschreven:

> I split answers in 2 since the config is a completely separate thing.
> 
> increasing maxCorrelatorsPerEventType it usually the wrong thing to do. It is making the model fuzzier, for lack of a better term. I fact we’d like to restrict the correlators to only the best and maxCorrelatorsPerEventType is a crude way to do this that is worse the more you allow. Another new method is an LLR threshold, which can be set per indicator to use the correlation value as a threshold for inclusion as a correlator. maxCorrelatorsPerEventType just take the top ones even if their scores are low. This is why making this number big will not make things better because it will include more of lower quality.
> 
> Also maxEventsPerEventType increases memory usage and takes far longer to calculate the model for very little if any gain. This is from a paper by Sebastian Schelter, one of the inventors of CCO https://ssc.io/pdf/rec11-schelter.pdf <https://ssc.io/pdf/rec11-schelter.pdf>
> 
> I’d leave those as defaulted and measure a baseline KPI before doing A/B tests or cross-validation to try different numbers there.
> 
> 
> On May 24, 2017, at 8:28 AM, Dennis Honders <dennishonders@gmail.com <ma...@gmail.com>> wrote:
> 
> Current data: 
> 
> {"event": "cart-transaction", "entityId": "1", "entityType": "user", "targetEntityId": "12", "targetEntityType": "item"}, 
> 
> {"event": "$set", "entityType": "item", "entityId": "12", "properties": {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1, "label": "test", "price": "$1-$2"}}
> 
> Questions: 
> 
> Cart-transaction is the primary for shopping cart recommendation, maybe use user-buy-item as secondary event or is there no link between this?
> 
> Item-based queries are for similar items. For shopping cart recommendations, complementary recommendations will suite better? If so, those are made by 'user-id' (cart-id). How can this be done?
> 
> I like to do content-based recommendation for items that haven't been in a transaction. I think this can be configured in the engine.json. Any advice for doing this?
> 
> Engine.json: 
> 
> {
>   "comment":" This config file uses default settings for all but the required values see README.md for docs",
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "com.actionml.RecommendationEngine",
>   "datasource": {
>     "params" : {
>       "name": "ur-name",
>       "appName": "Test",
>       "eventNames": ["cart-transaction"]
>     }
>   },
>   "sparkConf": {
>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io <http://sparkbindings.io/>.MahoutKryoRegistrator",
>     "spark.kryo.referenceTracking": "false",
>     "spark.kryoserializer.buffer.mb": "300",
>     "spark.kryoserializer.buffer": "300m",
>     "es.index.auto.create": "true"
>   },
>   "algorithms": [
>     {
>       "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
>       "name": "ur",
>       "params": {
> 		"appName": "Test",
> 		"indexName": "test",
> 		"typeName": "cart",
> 		"comment": "must have data for the first event or the model will not build, other events are optional",
> 		"eventNames": ["cart-transaction"],
> 		"maxEventsPerEventType": 50000,
> 		"maxCorrelatorsPerEventType": 5000,
> 		"num": 10, 
> 		"itemBias": 2.0,
> 		"rankings": [{
> 			"name": "preferredRank",
> 			"type": "userDefined"
> 		}]
>       }
>     }
>   ]
> }
> 
> 




Re: UR optimizing results

Posted by Dennis Honders <de...@gmail.com>.
I was already looking at the docs for property based recs.

I now have added fields in the query (Java backend):

JsonObject response = engineClient.sendQuery(ImmutableMap.<String,
Object>of(
queryKey, ImmutableList.builder().addAll(productIds).build(),
NUM_KEY, NUM_VALUE
"fields", ImmutableList.builder().add(
ImmutableMap.<String, Object>of(
"name", "category",
"values", ImmutableList.builder().add("5").build(),
"bias", -1
)
).add(
ImmutableMap.<String, Object>of(
"name", "manufacturer",
"values", 33,
"bias", -1
)
).add(
ImmutableMap.<String, Object>of(
"name", "label",
"values", "testlabel",
"bias", -1
)
).add(
ImmutableMap.<String, Object>of(
"name", "price",
"values", "$10-$25",
"bias", -1
)
)
));

Fields is hardcoded for testing. Is this the correct ways to configure
fields in the query?
Currently, there is no difference in results.

What else needs to be done in the ranking/fields in engine.json?

2017-05-24 19:43 GMT+02:00 Pat Ferrel <pa...@occamsmachete.com>:

> I suggest you read the docs here: http://actionml.com/docs/ur Pay
> particular attention to attaching properties to items and using fields to
> query for those properties. This is the only way to get items with no usage
> data. You could promote items with business rules or adopt some kind of
> ordering or items that puts new items ahead of popular ones. So check
> custom “rankings” and "item properties”.
>
> itemBias is used for item-based queries and refers to item-similarity
> based on usage data, not content similarity.
>
> It is difficult to truly mix content-based recs where no usage data exists
> and collaborative filtering because you would be giving up the advantage of
> CF. Therefore I suggest some separate rolling promotion mechanism in a
> separate placement. Then you’ll get usage data, at least detail views.
>
>
>
> On May 24, 2017, at 10:33 AM, Dennis Honders <de...@gmail.com>
> wrote:
>
> Thanks again for the answer. I will read the paper soon.
> How can recommendations be configured for content-based filtering (based
> on item properties) for products which are never sold? Instead of using
> e.g. populair items.
>
> Boosting with these properties is done with itemBias.
>
> Op 24 mei 2017 om 17:54 heeft Pat Ferrel <pa...@occamsmachete.com> het
> volgende geschreven:
>
> I split answers in 2 since the config is a completely separate thing.
>
> increasing maxCorrelatorsPerEventType it usually the wrong thing to do.
> It is making the model fuzzier, for lack of a better term. I fact we’d like
> to restrict the correlators to only the best and maxCorrelatorsPerEventType
> is a crude way to do this that is worse the more you allow. Another new
> method is an LLR threshold, which can be set per indicator to use the
> correlation value as a threshold for inclusion as a correlator. maxCorrelatorsPerEventType
> just take the top ones even if their scores are low. This is why making
> this number big will not make things better because it will include more of
> lower quality.
>
> Also maxEventsPerEventType increases memory usage and takes far longer to
> calculate the model for very little if any gain. This is from a paper by
> Sebastian Schelter, one of the inventors of CCO https://ssc.io/pdf/rec11-
> schelter.pdf
>
> I’d leave those as defaulted and measure a baseline KPI before doing A/B
> tests or cross-validation to try different numbers there.
>
>
> On May 24, 2017, at 8:28 AM, Dennis Honders <de...@gmail.com>
> wrote:
>
> *Current data: *
>
> {"event": "cart-transaction", "entityId": "1", "entityType": "user",
> "targetEntityId": "12", "targetEntityType": "item"},
>
> {"event": "$set", "entityType": "item", "entityId": "12", "properties":
> {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1,
> "label": "test", "price": "$1-$2"}}
>
> *Questions: *
>
> Cart-transaction is the primary for shopping cart recommendation, maybe
> use user-buy-item as secondary event or is there no link between this?
>
> Item-based queries are for similar items. For shopping cart
> recommendations, complementary recommendations will suite better? If so,
> those are made by 'user-id' (cart-id). How can this be done?
>
> I like to do content-based recommendation for items that haven't been in a
> transaction. I think this can be configured in the engine.json. Any advice
> for doing this?
>
> *Engine.json: *
>
> {
>   "comment":" This config file uses default settings for all but the
> required values see README.md for docs",
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "com.actionml.RecommendationEngine",
>   "datasource": {
>     "params" : {
>       "name": "ur-name",
>       "appName": "Test",
>       "eventNames": ["cart-transaction"]
>     }
>   },
>   "sparkConf": {
>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.
> MahoutKryoRegistrator",
>     "spark.kryo.referenceTracking": "false",
>     "spark.kryoserializer.buffer.mb": "300",
>     "spark.kryoserializer.buffer": "300m",
>     "es.index.auto.create": "true"
>   },
>   "algorithms": [
>     {
>       "comment": "simplest setup where all values are default, popularity
> based backfill, must add eventsNames",
>       "name": "ur",
>       "params": {
> "appName": "Test",
> "indexName": "test",
> "typeName": "cart",
> "comment": "must have data for the first event or the model will not
> build, other events are optional",
> "eventNames": ["cart-transaction"],
> "maxEventsPerEventType": 50000,
> "maxCorrelatorsPerEventType": 5000,
> "num": 10,
> "itemBias": 2.0,
> "rankings": [{
> "name": "preferredRank",
> "type": "userDefined"
> }]
>       }
>     }
>   ]
> }
>
>
>
>

Re: UR optimizing results

Posted by Pat Ferrel <pa...@occamsmachete.com>.
I suggest you read the docs here: http://actionml.com/docs/ur Pay particular attention to attaching properties to items and using fields to query for those properties. This is the only way to get items with no usage data. You could promote items with business rules or adopt some kind of ordering or items that puts new items ahead of popular ones. So check custom “rankings” and "item properties”. 

itemBias is used for item-based queries and refers to item-similarity based on usage data, not content similarity.

It is difficult to truly mix content-based recs where no usage data exists and collaborative filtering because you would be giving up the advantage of CF. Therefore I suggest some separate rolling promotion mechanism in a separate placement. Then you’ll get usage data, at least detail views.


On May 24, 2017, at 10:33 AM, Dennis Honders <de...@gmail.com> wrote:

Thanks again for the answer. I will read the paper soon. 
How can recommendations be configured for content-based filtering (based on item properties) for products which are never sold? Instead of using e.g. populair items. 

Boosting with these properties is done with itemBias. 

Op 24 mei 2017 om 17:54 heeft Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>> het volgende geschreven:

> I split answers in 2 since the config is a completely separate thing.
> 
> increasing maxCorrelatorsPerEventType it usually the wrong thing to do. It is making the model fuzzier, for lack of a better term. I fact we’d like to restrict the correlators to only the best and maxCorrelatorsPerEventType is a crude way to do this that is worse the more you allow. Another new method is an LLR threshold, which can be set per indicator to use the correlation value as a threshold for inclusion as a correlator. maxCorrelatorsPerEventType just take the top ones even if their scores are low. This is why making this number big will not make things better because it will include more of lower quality.
> 
> Also maxEventsPerEventType increases memory usage and takes far longer to calculate the model for very little if any gain. This is from a paper by Sebastian Schelter, one of the inventors of CCO https://ssc.io/pdf/rec11-schelter.pdf <https://ssc.io/pdf/rec11-schelter.pdf>
> 
> I’d leave those as defaulted and measure a baseline KPI before doing A/B tests or cross-validation to try different numbers there.
> 
> 
> On May 24, 2017, at 8:28 AM, Dennis Honders <dennishonders@gmail.com <ma...@gmail.com>> wrote:
> 
> Current data: 
> 
> {"event": "cart-transaction", "entityId": "1", "entityType": "user", "targetEntityId": "12", "targetEntityType": "item"}, 
> 
> {"event": "$set", "entityType": "item", "entityId": "12", "properties": {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1, "label": "test", "price": "$1-$2"}}
> 
> Questions: 
> 
> Cart-transaction is the primary for shopping cart recommendation, maybe use user-buy-item as secondary event or is there no link between this?
> 
> Item-based queries are for similar items. For shopping cart recommendations, complementary recommendations will suite better? If so, those are made by 'user-id' (cart-id). How can this be done?
> 
> I like to do content-based recommendation for items that haven't been in a transaction. I think this can be configured in the engine.json. Any advice for doing this?
> 
> Engine.json: 
> 
> {
>   "comment":" This config file uses default settings for all but the required values see README.md for docs",
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "com.actionml.RecommendationEngine",
>   "datasource": {
>     "params" : {
>       "name": "ur-name",
>       "appName": "Test",
>       "eventNames": ["cart-transaction"]
>     }
>   },
>   "sparkConf": {
>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
>     "spark.kryo.referenceTracking": "false",
>     "spark.kryoserializer.buffer.mb": "300",
>     "spark.kryoserializer.buffer": "300m",
>     "es.index.auto.create": "true"
>   },
>   "algorithms": [
>     {
>       "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
>       "name": "ur",
>       "params": {
> 		"appName": "Test",
> 		"indexName": "test",
> 		"typeName": "cart",
> 		"comment": "must have data for the first event or the model will not build, other events are optional",
> 		"eventNames": ["cart-transaction"],
> 		"maxEventsPerEventType": 50000,
> 		"maxCorrelatorsPerEventType": 5000,
> 		"num": 10, 
> 		"itemBias": 2.0,
> 		"rankings": [{
> 			"name": "preferredRank",
> 			"type": "userDefined"
> 		}]
>       }
>     }
>   ]
> }
> 
> 


Re: UR optimizing results

Posted by Dennis Honders <de...@gmail.com>.
Thanks again for the answer. I will read the paper soon. 
How can recommendations be configured for content-based filtering (based on item properties) for products which are never sold? Instead of using e.g. populair items. 

Boosting with these properties is done with itemBias. 

> Op 24 mei 2017 om 17:54 heeft Pat Ferrel <pa...@occamsmachete.com> het volgende geschreven:
> 
> I split answers in 2 since the config is a completely separate thing.
> 
> increasing maxCorrelatorsPerEventType it usually the wrong thing to do. It is making the model fuzzier, for lack of a better term. I fact we’d like to restrict the correlators to only the best and maxCorrelatorsPerEventType is a crude way to do this that is worse the more you allow. Another new method is an LLR threshold, which can be set per indicator to use the correlation value as a threshold for inclusion as a correlator. maxCorrelatorsPerEventType just take the top ones even if their scores are low. This is why making this number big will not make things better because it will include more of lower quality.
> 
> Also maxEventsPerEventType increases memory usage and takes far longer to calculate the model for very little if any gain. This is from a paper by Sebastian Schelter, one of the inventors of CCO https://ssc.io/pdf/rec11-schelter.pdf
> 
> I’d leave those as defaulted and measure a baseline KPI before doing A/B tests or cross-validation to try different numbers there.
> 
> 
> On May 24, 2017, at 8:28 AM, Dennis Honders <de...@gmail.com> wrote:
> 
> Current data: 
> 
> {"event": "cart-transaction", "entityId": "1", "entityType": "user", "targetEntityId": "12", "targetEntityType": "item"}, 
> 
> {"event": "$set", "entityType": "item", "entityId": "12", "properties": {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1, "label": "test", "price": "$1-$2"}}
> 
> Questions: 
> 
> Cart-transaction is the primary for shopping cart recommendation, maybe use user-buy-item as secondary event or is there no link between this?
> 
> Item-based queries are for similar items. For shopping cart recommendations, complementary recommendations will suite better? If so, those are made by 'user-id' (cart-id). How can this be done?
> 
> I like to do content-based recommendation for items that haven't been in a transaction. I think this can be configured in the engine.json. Any advice for doing this?
> 
> Engine.json: 
> 
> {
>   "comment":" This config file uses default settings for all but the required values see README.md for docs",
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "com.actionml.RecommendationEngine",
>   "datasource": {
>     "params" : {
>       "name": "ur-name",
>       "appName": "Test",
>       "eventNames": ["cart-transaction"]
>     }
>   },
>   "sparkConf": {
>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
>     "spark.kryo.referenceTracking": "false",
>     "spark.kryoserializer.buffer.mb": "300",
>     "spark.kryoserializer.buffer": "300m",
>     "es.index.auto.create": "true"
>   },
>   "algorithms": [
>     {
>       "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
>       "name": "ur",
>       "params": {
> 		"appName": "Test",
> 		"indexName": "test",
> 		"typeName": "cart",
> 		"comment": "must have data for the first event or the model will not build, other events are optional",
> 		"eventNames": ["cart-transaction"],
> 		"maxEventsPerEventType": 50000,
> 		"maxCorrelatorsPerEventType": 5000,
> 		"num": 10, 
> 		"itemBias": 2.0,
> 		"rankings": [{
> 			"name": "preferredRank",
> 			"type": "userDefined"
> 		}]
>       }
>     }
>   ]
> }
> 
> 

Re: UR optimizing results

Posted by Pat Ferrel <pa...@occamsmachete.com>.
I split answers in 2 since the config is a completely separate thing.

increasing maxCorrelatorsPerEventType it usually the wrong thing to do. It is making the model fuzzier, for lack of a better term. I fact we’d like to restrict the correlators to only the best and maxCorrelatorsPerEventType is a crude way to do this that is worse the more you allow. Another new method is an LLR threshold, which can be set per indicator to use the correlation value as a threshold for inclusion as a correlator. maxCorrelatorsPerEventType just take the top ones even if their scores are low. This is why making this number big will not make things better because it will include more of lower quality.

Also maxEventsPerEventType increases memory usage and takes far longer to calculate the model for very little if any gain. This is from a paper by Sebastian Schelter, one of the inventors of CCO https://ssc.io/pdf/rec11-schelter.pdf <https://ssc.io/pdf/rec11-schelter.pdf>

I’d leave those as defaulted and measure a baseline KPI before doing A/B tests or cross-validation to try different numbers there.


On May 24, 2017, at 8:28 AM, Dennis Honders <de...@gmail.com> wrote:

Current data: 

{"event": "cart-transaction", "entityId": "1", "entityType": "user", "targetEntityId": "12", "targetEntityType": "item"}, 

{"event": "$set", "entityType": "item", "entityId": "12", "properties": {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1, "label": "test", "price": "$1-$2"}}

Questions: 

Cart-transaction is the primary for shopping cart recommendation, maybe use user-buy-item as secondary event or is there no link between this?

Item-based queries are for similar items. For shopping cart recommendations, complementary recommendations will suite better? If so, those are made by 'user-id' (cart-id). How can this be done?

I like to do content-based recommendation for items that haven't been in a transaction. I think this can be configured in the engine.json. Any advice for doing this?

Engine.json: 

{
  "comment":" This config file uses default settings for all but the required values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "ur-name",
      "appName": "Test",
      "eventNames": ["cart-transaction"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer.mb": "300",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
      "name": "ur",
      "params": {
		"appName": "Test",
		"indexName": "test",
		"typeName": "cart",
		"comment": "must have data for the first event or the model will not build, other events are optional",
		"eventNames": ["cart-transaction"],
		"maxEventsPerEventType": 50000,
		"maxCorrelatorsPerEventType": 5000,
		"num": 10, 
		"itemBias": 2.0,
		"rankings": [{
			"name": "preferredRank",
			"type": "userDefined"
		}]
      }
    }
  ]
}



Re: UR optimizing results

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Secondary events are hard to come by for “complimentary purchases" because, as you point out, the entity being tracked is a cart, not a user. The cart has few possible actions or indicators that can be associated with it. The use has many. Also the cart does not have a brain we are trying to look into, humans do and cross-indicators are how we look into different aspects of the human mind.

Back to earth…

You can do a query on a cart-id if you are sending (cart-id, add-to-cart, item-id) in realtime but it’s usually not as convenient as doing an “itemSet” query with the contents of the user’s cart against the model build from cart-transactions, which I assume are purchases tied to a cart id. One small flaw in this is that tracking add-to-cart doesn’t account for remove-from-cart vey well. So training on cart-level purchases and querying with the current contents intuitively seems better. It would take a rigorous A/B test to know for sure.

On May 24, 2017, at 8:28 AM, Dennis Honders <de...@gmail.com> wrote:

Current data: 

{"event": "cart-transaction", "entityId": "1", "entityType": "user", "targetEntityId": "12", "targetEntityType": "item"}, 

{"event": "$set", "entityType": "item", "entityId": "12", "properties": {"category": ["1", "2", "3", "4", "5", "6", "7"], "manufacturer": 1, "label": "test", "price": "$1-$2"}}

Questions: 

Cart-transaction is the primary for shopping cart recommendation, maybe use user-buy-item as secondary event or is there no link between this?

Item-based queries are for similar items. For shopping cart recommendations, complementary recommendations will suite better? If so, those are made by 'user-id' (cart-id). How can this be done?

I like to do content-based recommendation for items that haven't been in a transaction. I think this can be configured in the engine.json. Any advice for doing this?

Engine.json: 

{
  "comment":" This config file uses default settings for all but the required values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "ur-name",
      "appName": "Test",
      "eventNames": ["cart-transaction"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer.mb": "300",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
      "name": "ur",
      "params": {
		"appName": "Test",
		"indexName": "test",
		"typeName": "cart",
		"comment": "must have data for the first event or the model will not build, other events are optional",
		"eventNames": ["cart-transaction"],
		"maxEventsPerEventType": 50000,
		"maxCorrelatorsPerEventType": 5000,
		"num": 10, 
		"itemBias": 2.0,
		"rankings": [{
			"name": "preferredRank",
			"type": "userDefined"
		}]
      }
    }
  ]
}