You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Varad Meru <me...@gmail.com> on 2012/01/22 09:06:14 UTC

Item-based Recommendation Engine Performance for E-Commerce

Hi All,

I am working on a Recommendation Engine for an E-Commerce Application,
with focus on Item-based recommendations with boolean data (implicit
feedback)
The item-based recommendations are built on a SQL Server Boolean JDBC
Data Model which was wrote similar to the already present MySQL
Boolean JDBC Data Model.
The Coefficient to find out Similarity are Tanimoto Coefficient and a
custom Cosine Similarity coefficient written for boolean preferences.
This is wrapped by the CachedSimilarity.

This is not an Hadoop-based Solution, but a stand-alone one.

	The computation time takes to find out similarity between products
(getting the mostSimilarItems) for ~150 products is 8 seconds per
product.
	The computation time takes to find out similarity between products
(getting the mostSimilarItems) for ~1500 products is 8 minutes per
product.

At this rate, the time taken to compute all the similarities would be
around 7 days (Really a lot of time).

We are doing the computations on a 8GB Windows 2008 Server machine
with SQL server as our Data Source.

Mahout gives warning about not using ConnectionPoolDataSource, but in
fact we are passing the object of SQLServerConnectionPoolDataSource.
and we observed lot of time being spent on the queries as around 90%
CPU utilization by SQL Server.

Please help me with the following points-

    I need help regarding how I can improve the performance of the
current recommendation engine implementation.
    Is having a file data model better than the DB Approach.
    Any pointers as to how would moving to Amazon AWS with its S3 and
EMR help us with the computing part, as we need to store the computed
results and built an application with the results, and a background
task working and updating the recommendations.
    I could find the support of cachedRecommender in this case as it
was not supporting an ItemBasedRecommender as its argument.
    Any other performance or domain pointers that we should look out for.
    How product biasing is supported by mahout for e-commerce domain
which helps in biasing the products as per the business needs?


Thanks,
~Varad

-----------------
Varad Meru
Software Engineer,
Business Intelligence and Analytics,
Persistent Systems and Solutions Ltd.,
Pune, India.

Re: Item-based Recommendation Engine Performance for E-Commerce

Posted by Varad Meru <me...@gmail.com>.
Thanks Sean.

I have ordered MiA. Will look into it for the IDRescorer and the
RecommenderJob for the same.

Thanks a lot for your time.

On Sunday, February 5, 2012, Sean Owen <sr...@gmail.com> wrote:
> 1. This is where IDRescorer comes in. You can use one to manually boost
> items in recommendations however you like.
>
> 2. This is completely different, and your DB is not really available or
> appropriate to use from Hadoop, if you are at Hadoop scale. If you're not
> near scaling issues, don't bother with Hadoop. That said if you are
curious
> I really think the best writeup of this is chapter 6 of Mahout in Action.
> It covers RecommenderJob in the source code.
>
> On Sun, Feb 5, 2012 at 10:32 AM, Varad Meru <me...@gmail.com> wrote:
>
>> Thanks Manuel and Sean,
>>
>> The preformance gain with ReloadFromJDBCDataModel was huge. I did profile
>> it using Visual VM and saw the process for SQLServer in task manager too.
>> It helped a lot.
>>
>> Can you help me with the other problems I am facing.
>> 1. How product biasing is supported by mahout for e-commerce domain which
>> helps in biasing the products as per the business needs?
>> I thought of changing the preferences for a product but that change would
>> be permanent in DB and not in memory. Using the sql query passed to the
>> data model maker.
>>
>> 2. I am currently using mahout as purely a java lib. But would like to
>> change this scenario with hadoop. Can you please give me some pointers as
>> to how to deply a recommender on hadoop? At least how the task to
>> recommendation calculations would be done on hadoop.
>>
>> Thanks,
>> Varad
>>
>>
>> On Sunday, January 22, 2012, Manuel Blechschmidt <
>> Manuel.Blechschmidt@gmx.de>
>> wrote:
>> > Hello Varad,
>> >
>> > On 22.01.2012, at 10:47, Sean Owen wrote:
>> >
>> >> If you are always reading from the database it is never going to be
>> >> anywhere near fast. You have to put it in memory, by using
>> >> ReloadFromJDBCDataModel instead.
>> >
>> > I agree to Sean. Actually I had a similar problem about 6 month ago. I
>> used JVisualVM to profile Mahout with an MSQL Server.
>> >
>> > You can see the results here:
>> >
>>
>>
http://ec2-46-137-156-187.eu-west-1.compute.amazonaws.com/MahoutDatabaseLowPerformance.png
>> >
>> > As you can see the TDS protocol (protocol which is used to transfer
data
>> between the java client and the MSQL server) takes nearly all the time.
It
>> takes so much time that the Mahout functions are not even mentioned.
>> >
>> > I would recommend that you first profile your application with a Java
>> profiler e.g. JVisualVM. Then you introduce:
>> >
>>
>>
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/jdbc/ReloadFromJDBCDataModel.html
>> >
>> > Afterwards you profile it again.
>> >
>> > Create screenshots from both of the profiles and send them to the
mailing
>> list. If you need professional support feel free to send me a personal
>> note.
>> >
>> > /Manuel
>> >
>> >
>> >>
>> >> Ignore the warning about connection pools if you are using a pool.
>> >>
>> >> Using S3/AWS is not going to help per se.
>> >>
>> >> On Sun, Jan 22, 2012 at 8:06 AM, Varad Meru <me...@gmail.com>
>> wrote:
>> >>
>> >>> Hi All,
>> >>>
>> >>> I am working on a Recommendation Engine for an E-Commerce
Application,
>> >>> with focus on Item-based recommendations with boolean data (implicit
>> >>> feedback)
>> >>> The item-based recommendations are built on a SQL Server Boolean JDBC
>> >>> Data Model which was wrote similar to the already present MySQL
>> >>> Boolean JDBC Data Model.
>> >>> The Coefficient to find out Similarity are Tanimoto Coefficient and a
>> >>> custom Cosine Similarity coefficient written for boolean preferences.
>> >>> This is wrapped by the CachedSimilarity.
>> >>>
>> >>> This is not an Hadoop-based Solution, but a stand-alone one.
>> >>>
>> >>>       The computation time takes to find out similarity between
>> products
>> >>> (getting the mostSimilarItems) for ~150 products is 8 seconds per
>> >>> product.
>> >>>       The computation time takes to find out similarity between
>> products
>> >>> (getting the mostSimilarItems) for ~1500 products is 8 minutes per
>> >>> product.
>> >>>
>> >>> At this rate, the time taken to compute all the similarities would be
>> >>> around 7 days (Really a lot of time).
>> >>>
>> >>> We are doing the computations on a 8GB Windows 2008 Server machine
>> >>> with SQL server as our Data Source.
>> >>>
>> >>> Mahout gives warning about not using ConnectionPoolDataSource, but in
>> >>> fact we are passing the object of SQLServerConnectionPoolDataSource.
>> >>> and we observed lot of time being spent on the queries as around 90%
>> >>> CPU utilization by SQL Server.
>> >>>
>> >>> Please help me with the following points-
>> >>>
>> >>>   I need help regarding how I can improve the performance of the
>> >>> current recommendation engine implementation.
>> >>>   Is having a file data model better than the DB Approach.
>> >>>   Any pointers as to how would moving to Amazon AWS with its S3 and
>> >>> EMR help us with the computing part, as we need to store the computed
>> >>> results and built an application with the results, and a background
>> >>> task working and updating the recommendations.
>> >>>   I could find the support of cachedRecommender in this case as it
>> >>> was not supporting an ItemBasedRecommender as its argument.
>> >>>   Any other performance or domain pointers that we should look out
for.
>> >>>   How product biasing is supported by mahout for e-commerce domain
>> >>> which helps in biasing the products as per the business needs?
>> >>>
>> >>>
>> >>> Thanks,
>> >>> ~Varad
>> >>>
>> >>> -----------------
>> >>> Varad Meru
>> >>> Software Engineer,
>> >>> Business Intelligence and Analytics,
>> >>> Persistent Systems and Solutions Ltd.,
>> >>> Pune, India.
>> >>>
>> >
>> > --
>> > Manuel Blechschmidt
>> > Dortustr. 57
>> > 14467 Potsdam
>> > Mobil: 0173/6322621
>> > Twitter: http://twitter.com/Manuel_B
>> >
>> >
>>
>> --
>> Thanks,
>> ~Varad
>>
>> *-----------------
>> Varad Meru
>> Software Engineer,
>> Business Intelligence and Analytics,
>> Persistent Systems and Solutions Ltd.,
>> Pune, India.*
>>
>

-- 
Thanks,
~Varad

*-----------------
Varad Meru
Software Engineer,
Business Intelligence and Analytics,
Persistent Systems and Solutions Ltd.,
Pune, India.*

Re: Item-based Recommendation Engine Performance for E-Commerce

Posted by Sean Owen <sr...@gmail.com>.
1. This is where IDRescorer comes in. You can use one to manually boost
items in recommendations however you like.

2. This is completely different, and your DB is not really available or
appropriate to use from Hadoop, if you are at Hadoop scale. If you're not
near scaling issues, don't bother with Hadoop. That said if you are curious
I really think the best writeup of this is chapter 6 of Mahout in Action.
It covers RecommenderJob in the source code.

On Sun, Feb 5, 2012 at 10:32 AM, Varad Meru <me...@gmail.com> wrote:

> Thanks Manuel and Sean,
>
> The preformance gain with ReloadFromJDBCDataModel was huge. I did profile
> it using Visual VM and saw the process for SQLServer in task manager too.
> It helped a lot.
>
> Can you help me with the other problems I am facing.
> 1. How product biasing is supported by mahout for e-commerce domain which
> helps in biasing the products as per the business needs?
> I thought of changing the preferences for a product but that change would
> be permanent in DB and not in memory. Using the sql query passed to the
> data model maker.
>
> 2. I am currently using mahout as purely a java lib. But would like to
> change this scenario with hadoop. Can you please give me some pointers as
> to how to deply a recommender on hadoop? At least how the task to
> recommendation calculations would be done on hadoop.
>
> Thanks,
> Varad
>
>
> On Sunday, January 22, 2012, Manuel Blechschmidt <
> Manuel.Blechschmidt@gmx.de>
> wrote:
> > Hello Varad,
> >
> > On 22.01.2012, at 10:47, Sean Owen wrote:
> >
> >> If you are always reading from the database it is never going to be
> >> anywhere near fast. You have to put it in memory, by using
> >> ReloadFromJDBCDataModel instead.
> >
> > I agree to Sean. Actually I had a similar problem about 6 month ago. I
> used JVisualVM to profile Mahout with an MSQL Server.
> >
> > You can see the results here:
> >
>
> http://ec2-46-137-156-187.eu-west-1.compute.amazonaws.com/MahoutDatabaseLowPerformance.png
> >
> > As you can see the TDS protocol (protocol which is used to transfer data
> between the java client and the MSQL server) takes nearly all the time. It
> takes so much time that the Mahout functions are not even mentioned.
> >
> > I would recommend that you first profile your application with a Java
> profiler e.g. JVisualVM. Then you introduce:
> >
>
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/jdbc/ReloadFromJDBCDataModel.html
> >
> > Afterwards you profile it again.
> >
> > Create screenshots from both of the profiles and send them to the mailing
> list. If you need professional support feel free to send me a personal
> note.
> >
> > /Manuel
> >
> >
> >>
> >> Ignore the warning about connection pools if you are using a pool.
> >>
> >> Using S3/AWS is not going to help per se.
> >>
> >> On Sun, Jan 22, 2012 at 8:06 AM, Varad Meru <me...@gmail.com>
> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I am working on a Recommendation Engine for an E-Commerce Application,
> >>> with focus on Item-based recommendations with boolean data (implicit
> >>> feedback)
> >>> The item-based recommendations are built on a SQL Server Boolean JDBC
> >>> Data Model which was wrote similar to the already present MySQL
> >>> Boolean JDBC Data Model.
> >>> The Coefficient to find out Similarity are Tanimoto Coefficient and a
> >>> custom Cosine Similarity coefficient written for boolean preferences.
> >>> This is wrapped by the CachedSimilarity.
> >>>
> >>> This is not an Hadoop-based Solution, but a stand-alone one.
> >>>
> >>>       The computation time takes to find out similarity between
> products
> >>> (getting the mostSimilarItems) for ~150 products is 8 seconds per
> >>> product.
> >>>       The computation time takes to find out similarity between
> products
> >>> (getting the mostSimilarItems) for ~1500 products is 8 minutes per
> >>> product.
> >>>
> >>> At this rate, the time taken to compute all the similarities would be
> >>> around 7 days (Really a lot of time).
> >>>
> >>> We are doing the computations on a 8GB Windows 2008 Server machine
> >>> with SQL server as our Data Source.
> >>>
> >>> Mahout gives warning about not using ConnectionPoolDataSource, but in
> >>> fact we are passing the object of SQLServerConnectionPoolDataSource.
> >>> and we observed lot of time being spent on the queries as around 90%
> >>> CPU utilization by SQL Server.
> >>>
> >>> Please help me with the following points-
> >>>
> >>>   I need help regarding how I can improve the performance of the
> >>> current recommendation engine implementation.
> >>>   Is having a file data model better than the DB Approach.
> >>>   Any pointers as to how would moving to Amazon AWS with its S3 and
> >>> EMR help us with the computing part, as we need to store the computed
> >>> results and built an application with the results, and a background
> >>> task working and updating the recommendations.
> >>>   I could find the support of cachedRecommender in this case as it
> >>> was not supporting an ItemBasedRecommender as its argument.
> >>>   Any other performance or domain pointers that we should look out for.
> >>>   How product biasing is supported by mahout for e-commerce domain
> >>> which helps in biasing the products as per the business needs?
> >>>
> >>>
> >>> Thanks,
> >>> ~Varad
> >>>
> >>> -----------------
> >>> Varad Meru
> >>> Software Engineer,
> >>> Business Intelligence and Analytics,
> >>> Persistent Systems and Solutions Ltd.,
> >>> Pune, India.
> >>>
> >
> > --
> > Manuel Blechschmidt
> > Dortustr. 57
> > 14467 Potsdam
> > Mobil: 0173/6322621
> > Twitter: http://twitter.com/Manuel_B
> >
> >
>
> --
> Thanks,
> ~Varad
>
> *-----------------
> Varad Meru
> Software Engineer,
> Business Intelligence and Analytics,
> Persistent Systems and Solutions Ltd.,
> Pune, India.*
>

Re: Item-based Recommendation Engine Performance for E-Commerce

Posted by Varad Meru <me...@gmail.com>.
Thanks Manuel and Sean,

The preformance gain with ReloadFromJDBCDataModel was huge. I did profile
it using Visual VM and saw the process for SQLServer in task manager too.
It helped a lot.

Can you help me with the other problems I am facing.
1. How product biasing is supported by mahout for e-commerce domain which
helps in biasing the products as per the business needs?
I thought of changing the preferences for a product but that change would
be permanent in DB and not in memory. Using the sql query passed to the
data model maker.

2. I am currently using mahout as purely a java lib. But would like to
change this scenario with hadoop. Can you please give me some pointers as
to how to deply a recommender on hadoop? At least how the task to
recommendation calculations would be done on hadoop.

Thanks,
Varad


On Sunday, January 22, 2012, Manuel Blechschmidt <Ma...@gmx.de>
wrote:
> Hello Varad,
>
> On 22.01.2012, at 10:47, Sean Owen wrote:
>
>> If you are always reading from the database it is never going to be
>> anywhere near fast. You have to put it in memory, by using
>> ReloadFromJDBCDataModel instead.
>
> I agree to Sean. Actually I had a similar problem about 6 month ago. I
used JVisualVM to profile Mahout with an MSQL Server.
>
> You can see the results here:
>
http://ec2-46-137-156-187.eu-west-1.compute.amazonaws.com/MahoutDatabaseLowPerformance.png
>
> As you can see the TDS protocol (protocol which is used to transfer data
between the java client and the MSQL server) takes nearly all the time. It
takes so much time that the Mahout functions are not even mentioned.
>
> I would recommend that you first profile your application with a Java
profiler e.g. JVisualVM. Then you introduce:
>
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/jdbc/ReloadFromJDBCDataModel.html
>
> Afterwards you profile it again.
>
> Create screenshots from both of the profiles and send them to the mailing
list. If you need professional support feel free to send me a personal note.
>
> /Manuel
>
>
>>
>> Ignore the warning about connection pools if you are using a pool.
>>
>> Using S3/AWS is not going to help per se.
>>
>> On Sun, Jan 22, 2012 at 8:06 AM, Varad Meru <me...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am working on a Recommendation Engine for an E-Commerce Application,
>>> with focus on Item-based recommendations with boolean data (implicit
>>> feedback)
>>> The item-based recommendations are built on a SQL Server Boolean JDBC
>>> Data Model which was wrote similar to the already present MySQL
>>> Boolean JDBC Data Model.
>>> The Coefficient to find out Similarity are Tanimoto Coefficient and a
>>> custom Cosine Similarity coefficient written for boolean preferences.
>>> This is wrapped by the CachedSimilarity.
>>>
>>> This is not an Hadoop-based Solution, but a stand-alone one.
>>>
>>>       The computation time takes to find out similarity between products
>>> (getting the mostSimilarItems) for ~150 products is 8 seconds per
>>> product.
>>>       The computation time takes to find out similarity between products
>>> (getting the mostSimilarItems) for ~1500 products is 8 minutes per
>>> product.
>>>
>>> At this rate, the time taken to compute all the similarities would be
>>> around 7 days (Really a lot of time).
>>>
>>> We are doing the computations on a 8GB Windows 2008 Server machine
>>> with SQL server as our Data Source.
>>>
>>> Mahout gives warning about not using ConnectionPoolDataSource, but in
>>> fact we are passing the object of SQLServerConnectionPoolDataSource.
>>> and we observed lot of time being spent on the queries as around 90%
>>> CPU utilization by SQL Server.
>>>
>>> Please help me with the following points-
>>>
>>>   I need help regarding how I can improve the performance of the
>>> current recommendation engine implementation.
>>>   Is having a file data model better than the DB Approach.
>>>   Any pointers as to how would moving to Amazon AWS with its S3 and
>>> EMR help us with the computing part, as we need to store the computed
>>> results and built an application with the results, and a background
>>> task working and updating the recommendations.
>>>   I could find the support of cachedRecommender in this case as it
>>> was not supporting an ItemBasedRecommender as its argument.
>>>   Any other performance or domain pointers that we should look out for.
>>>   How product biasing is supported by mahout for e-commerce domain
>>> which helps in biasing the products as per the business needs?
>>>
>>>
>>> Thanks,
>>> ~Varad
>>>
>>> -----------------
>>> Varad Meru
>>> Software Engineer,
>>> Business Intelligence and Analytics,
>>> Persistent Systems and Solutions Ltd.,
>>> Pune, India.
>>>
>
> --
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>

-- 
Thanks,
~Varad

*-----------------
Varad Meru
Software Engineer,
Business Intelligence and Analytics,
Persistent Systems and Solutions Ltd.,
Pune, India.*

Re: Item-based Recommendation Engine Performance for E-Commerce

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Hello Varad,

On 22.01.2012, at 10:47, Sean Owen wrote:

> If you are always reading from the database it is never going to be
> anywhere near fast. You have to put it in memory, by using
> ReloadFromJDBCDataModel instead.

I agree to Sean. Actually I had a similar problem about 6 month ago. I used JVisualVM to profile Mahout with an MSQL Server.

You can see the results here:
http://ec2-46-137-156-187.eu-west-1.compute.amazonaws.com/MahoutDatabaseLowPerformance.png

As you can see the TDS protocol (protocol which is used to transfer data between the java client and the MSQL server) takes nearly all the time. It takes so much time that the Mahout functions are not even mentioned.

I would recommend that you first profile your application with a Java profiler e.g. JVisualVM. Then you introduce:
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/jdbc/ReloadFromJDBCDataModel.html

Afterwards you profile it again.

Create screenshots from both of the profiles and send them to the mailing list. If you need professional support feel free to send me a personal note.

/Manuel


> 
> Ignore the warning about connection pools if you are using a pool.
> 
> Using S3/AWS is not going to help per se.
> 
> On Sun, Jan 22, 2012 at 8:06 AM, Varad Meru <me...@gmail.com> wrote:
> 
>> Hi All,
>> 
>> I am working on a Recommendation Engine for an E-Commerce Application,
>> with focus on Item-based recommendations with boolean data (implicit
>> feedback)
>> The item-based recommendations are built on a SQL Server Boolean JDBC
>> Data Model which was wrote similar to the already present MySQL
>> Boolean JDBC Data Model.
>> The Coefficient to find out Similarity are Tanimoto Coefficient and a
>> custom Cosine Similarity coefficient written for boolean preferences.
>> This is wrapped by the CachedSimilarity.
>> 
>> This is not an Hadoop-based Solution, but a stand-alone one.
>> 
>>       The computation time takes to find out similarity between products
>> (getting the mostSimilarItems) for ~150 products is 8 seconds per
>> product.
>>       The computation time takes to find out similarity between products
>> (getting the mostSimilarItems) for ~1500 products is 8 minutes per
>> product.
>> 
>> At this rate, the time taken to compute all the similarities would be
>> around 7 days (Really a lot of time).
>> 
>> We are doing the computations on a 8GB Windows 2008 Server machine
>> with SQL server as our Data Source.
>> 
>> Mahout gives warning about not using ConnectionPoolDataSource, but in
>> fact we are passing the object of SQLServerConnectionPoolDataSource.
>> and we observed lot of time being spent on the queries as around 90%
>> CPU utilization by SQL Server.
>> 
>> Please help me with the following points-
>> 
>>   I need help regarding how I can improve the performance of the
>> current recommendation engine implementation.
>>   Is having a file data model better than the DB Approach.
>>   Any pointers as to how would moving to Amazon AWS with its S3 and
>> EMR help us with the computing part, as we need to store the computed
>> results and built an application with the results, and a background
>> task working and updating the recommendations.
>>   I could find the support of cachedRecommender in this case as it
>> was not supporting an ItemBasedRecommender as its argument.
>>   Any other performance or domain pointers that we should look out for.
>>   How product biasing is supported by mahout for e-commerce domain
>> which helps in biasing the products as per the business needs?
>> 
>> 
>> Thanks,
>> ~Varad
>> 
>> -----------------
>> Varad Meru
>> Software Engineer,
>> Business Intelligence and Analytics,
>> Persistent Systems and Solutions Ltd.,
>> Pune, India.
>> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


Re: Item-based Recommendation Engine Performance for E-Commerce

Posted by Sean Owen <sr...@gmail.com>.
If you are always reading from the database it is never going to be
anywhere near fast. You have to put it in memory, by using
ReloadFromJDBCDataModel instead.

Ignore the warning about connection pools if you are using a pool.

Using S3/AWS is not going to help per se.

On Sun, Jan 22, 2012 at 8:06 AM, Varad Meru <me...@gmail.com> wrote:

> Hi All,
>
> I am working on a Recommendation Engine for an E-Commerce Application,
> with focus on Item-based recommendations with boolean data (implicit
> feedback)
> The item-based recommendations are built on a SQL Server Boolean JDBC
> Data Model which was wrote similar to the already present MySQL
> Boolean JDBC Data Model.
> The Coefficient to find out Similarity are Tanimoto Coefficient and a
> custom Cosine Similarity coefficient written for boolean preferences.
> This is wrapped by the CachedSimilarity.
>
> This is not an Hadoop-based Solution, but a stand-alone one.
>
>        The computation time takes to find out similarity between products
> (getting the mostSimilarItems) for ~150 products is 8 seconds per
> product.
>        The computation time takes to find out similarity between products
> (getting the mostSimilarItems) for ~1500 products is 8 minutes per
> product.
>
> At this rate, the time taken to compute all the similarities would be
> around 7 days (Really a lot of time).
>
> We are doing the computations on a 8GB Windows 2008 Server machine
> with SQL server as our Data Source.
>
> Mahout gives warning about not using ConnectionPoolDataSource, but in
> fact we are passing the object of SQLServerConnectionPoolDataSource.
> and we observed lot of time being spent on the queries as around 90%
> CPU utilization by SQL Server.
>
> Please help me with the following points-
>
>    I need help regarding how I can improve the performance of the
> current recommendation engine implementation.
>    Is having a file data model better than the DB Approach.
>    Any pointers as to how would moving to Amazon AWS with its S3 and
> EMR help us with the computing part, as we need to store the computed
> results and built an application with the results, and a background
> task working and updating the recommendations.
>    I could find the support of cachedRecommender in this case as it
> was not supporting an ItemBasedRecommender as its argument.
>    Any other performance or domain pointers that we should look out for.
>    How product biasing is supported by mahout for e-commerce domain
> which helps in biasing the products as per the business needs?
>
>
> Thanks,
> ~Varad
>
> -----------------
> Varad Meru
> Software Engineer,
> Business Intelligence and Analytics,
> Persistent Systems and Solutions Ltd.,
> Pune, India.
>