You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Mark <st...@gmail.com> on 2011/07/04 19:05:26 UTC

MySQLJDBCDataModel vs FileDataModel

I've read the source for FileDataModel and it suggested using a JDBC 
backed implementation for larger datasets so I decided to upgrade our 
recommendation system to use MySQLJDBCDataModel with 
MySQLJDBCInMemoryItemSimilarity.

I've found that the JDBC backed versions performance is actually worse 
that FileDataModel and FileItemSimilarity versions. Should this be the 
case? Which versions are most people using out there? Any recommendations?

Thanks

Re: MySQLJDBCDataModel vs FileDataModel

Posted by Sebastian Schelter <ss...@apache.org>.

If the item similarities are already precomputed there's no sense in
fetching them from the data model, you can just read use the already
precomputed set of possibly similar items as no other items can be
recommended anyway and it's faster to fetch them from a similarity
implementation that holds them in memory then from any data model
implementation.

--sebastian

2011/7/4 Mark <st...@gmail.com>:
> May I ask why you choose to go with AllSimilarItemsCandidateItemsStrategy
> over the default PreferredItemsNeighborhoodCandidateItemsStrategy?
>
> On 7/4/11 10:23 AM, Sebastian Schelter wrote:
>>
>> A look into a recent blogpost of mine might maybe be helpful with
>> choosing the appropriate data access strategies for your recommender
>> setup. It covers a very common usecase in great detail:
>>
>>
>> http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/
>>
>> --sebastian
>>
>> 2011/7/4 Mark<st...@gmail.com>:
>>>
>>> I wouldn't use the in memory JDBC solution.
>>>
>>> I was wondering do most people choose the JDBC backed solutions or the
>>> File
>>> backed?
>>>
>>> On 7/4/11 10:17 AM, Sean Owen wrote:
>>>>
>>>> Yes. Both are just fine to use in production. For speed and avoiding
>>>> abuse
>>>> of the database, I'd load into memory and tell it to periodically
>>>> reload.
>>>> But that too is a bit of a choice between how often you want to consume
>>>> new
>>>> data and how much work you want to do to recompute new values.
>>>>
>>>> On Mon, Jul 4, 2011 at 6:13 PM, Mark<st...@gmail.com>
>>>>  wrote:
>>>>
>>>>> Ahh ok. So if I want everything in memory like the file backed solution
>>>>> I
>>>>> should use ReloadFromJDBCDataModel? I'm going to give that a try right
>>>>> now.
>>>>>
>>>>> Typically which solution is recommended for production use?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>

Re: MySQLJDBCDataModel vs FileDataModel

Posted by Mark <st...@gmail.com>.

May I ask why you choose to go with 
AllSimilarItemsCandidateItemsStrategy over the default 
PreferredItemsNeighborhoodCandidateItemsStrategy?

On 7/4/11 10:23 AM, Sebastian Schelter wrote:
> A look into a recent blogpost of mine might maybe be helpful with
> choosing the appropriate data access strategies for your recommender
> setup. It covers a very common usecase in great detail:
>
> http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/
>
> --sebastian
>
> 2011/7/4 Mark<st...@gmail.com>:
>> I wouldn't use the in memory JDBC solution.
>>
>> I was wondering do most people choose the JDBC backed solutions or the File
>> backed?
>>
>> On 7/4/11 10:17 AM, Sean Owen wrote:
>>> Yes. Both are just fine to use in production. For speed and avoiding abuse
>>> of the database, I'd load into memory and tell it to periodically reload.
>>> But that too is a bit of a choice between how often you want to consume
>>> new
>>> data and how much work you want to do to recompute new values.
>>>
>>> On Mon, Jul 4, 2011 at 6:13 PM, Mark<st...@gmail.com>    wrote:
>>>
>>>> Ahh ok. So if I want everything in memory like the file backed solution I
>>>> should use ReloadFromJDBCDataModel? I'm going to give that a try right
>>>> now.
>>>>
>>>> Typically which solution is recommended for production use?
>>>>
>>>> Thanks
>>>>
>>>>

Re: MySQLJDBCDataModel vs FileDataModel

Posted by Sebastian Schelter <ss...@apache.org>.

A look into a recent blogpost of mine might maybe be helpful with
choosing the appropriate data access strategies for your recommender
setup. It covers a very common usecase in great detail:

http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/

--sebastian

2011/7/4 Mark <st...@gmail.com>:
> I wouldn't use the in memory JDBC solution.
>
> I was wondering do most people choose the JDBC backed solutions or the File
> backed?
>
> On 7/4/11 10:17 AM, Sean Owen wrote:
>>
>> Yes. Both are just fine to use in production. For speed and avoiding abuse
>> of the database, I'd load into memory and tell it to periodically reload.
>> But that too is a bit of a choice between how often you want to consume
>> new
>> data and how much work you want to do to recompute new values.
>>
>> On Mon, Jul 4, 2011 at 6:13 PM, Mark<st...@gmail.com>  wrote:
>>
>>> Ahh ok. So if I want everything in memory like the file backed solution I
>>> should use ReloadFromJDBCDataModel? I'm going to give that a try right
>>> now.
>>>
>>> Typically which solution is recommended for production use?
>>>
>>> Thanks
>>>
>>>
>

Re: MySQLJDBCDataModel vs FileDataModel

Posted by Mark <st...@gmail.com>.

I wouldn't use the in memory JDBC solution.

I was wondering do most people choose the JDBC backed solutions or the 
File backed?

On 7/4/11 10:17 AM, Sean Owen wrote:
> Yes. Both are just fine to use in production. For speed and avoiding abuse
> of the database, I'd load into memory and tell it to periodically reload.
> But that too is a bit of a choice between how often you want to consume new
> data and how much work you want to do to recompute new values.
>
> On Mon, Jul 4, 2011 at 6:13 PM, Mark<st...@gmail.com>  wrote:
>
>> Ahh ok. So if I want everything in memory like the file backed solution I
>> should use ReloadFromJDBCDataModel? I'm going to give that a try right now.
>>
>> Typically which solution is recommended for production use?
>>
>> Thanks
>>
>>

Re: MySQLJDBCDataModel vs FileDataModel

Posted by Sean Owen <sr...@gmail.com>.

Yes. Both are just fine to use in production. For speed and avoiding abuse
of the database, I'd load into memory and tell it to periodically reload.
But that too is a bit of a choice between how often you want to consume new
data and how much work you want to do to recompute new values.

On Mon, Jul 4, 2011 at 6:13 PM, Mark <st...@gmail.com> wrote:

> Ahh ok. So if I want everything in memory like the file backed solution I
> should use ReloadFromJDBCDataModel? I'm going to give that a try right now.
>
> Typically which solution is recommended for production use?
>
> Thanks
>
>

Re: MySQLJDBCDataModel vs FileDataModel

Posted by Mark <st...@gmail.com>.

Ahh ok. So if I want everything in memory like the file backed solution 
I should use ReloadFromJDBCDataModel? I'm going to give that a try right 
now.

Typically which solution is recommended for production use?

Thanks

On 7/4/11 10:09 AM, Sean Owen wrote:
> Yes, this is trading memory for speed. If you can fit everything in memory,
> then you should. FileDataModel is in memory.
>
> MySQLJDBCDataModel is not in memory and queries the DB every time. This is
> pretty slow, though by caching item-item similarity as you do, a lot of the
> load is removed. However if you want to go all in memory, use
> ReloadFromJDBCDataModel.
>
> (The naming is weirder than the actual structure or logic...)
>
> On Mon, Jul 4, 2011 at 6:05 PM, Mark<st...@gmail.com>  wrote:
>
>> I've read the source for FileDataModel and it suggested using a JDBC backed
>> implementation for larger datasets so I decided to upgrade our
>> recommendation system to use MySQLJDBCDataModel with
>> MySQLJDBCInMemoryItemSimilarit**y.
>>
>> I've found that the JDBC backed versions performance is actually worse that
>> FileDataModel and FileItemSimilarity versions. Should this be the case?
>> Which versions are most people using out there? Any recommendations?
>>
>> Thanks
>>

Re: MySQLJDBCDataModel vs FileDataModel

Posted by Sean Owen <sr...@gmail.com>.

Yes, this is trading memory for speed. If you can fit everything in memory,
then you should. FileDataModel is in memory.

MySQLJDBCDataModel is not in memory and queries the DB every time. This is
pretty slow, though by caching item-item similarity as you do, a lot of the
load is removed. However if you want to go all in memory, use
ReloadFromJDBCDataModel.

(The naming is weirder than the actual structure or logic...)

On Mon, Jul 4, 2011 at 6:05 PM, Mark <st...@gmail.com> wrote:

> I've read the source for FileDataModel and it suggested using a JDBC backed
> implementation for larger datasets so I decided to upgrade our
> recommendation system to use MySQLJDBCDataModel with
> MySQLJDBCInMemoryItemSimilarit**y.
>
> I've found that the JDBC backed versions performance is actually worse that
> FileDataModel and FileItemSimilarity versions. Should this be the case?
> Which versions are most people using out there? Any recommendations?
>
> Thanks
>