You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Steven NASa <cj...@gmail.com> on 2016/05/20 14:07:33 UTC

[Hello] from NASa

Hi Folk & Masters,

My name is *NASa*. I am now working for an e-commerce B2C company in China,
dealing with Transaction Process development in C++ & Java on Linux
environment.

As you know, *Recommender System* is quite valuable and important to an
e-commerce online shopping website like Amazon. I was told and required to
design and implement a Recommender System which can bring some value to my
Company. Our System is based on C++ codes. So I was searching for an robust
Machine Learning framework in C++ which can help me to easily implement a
Recommender System. I did not find any one which can satisfy my
requirements, but only some C++ math libraries.

Our system is based on an internal distributed frameworks like RPC and DB
access on Linux environment based on C++ programming language. But I find
it is really inconvenient to implement a Recommender System in C++ from
zero without distributed computing library supporting, like
implementing *Collaborative
Filtering* with SVD in a distributed computing way. So I am trying to find
a framework/library with is designed based on Distributed-System. There I
come to *Mahout*.

I wish I can build a library that can help people easily and quickly build
up a Recommender System based on Distributed System and also use the
Machine Learning Algorithms in distributed way. Apache has many amazing
projects which can help people to build up robust distributed system
easily. So I am moving to using “Java” environment.

I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
Ng’s “Machine Learning” from Coursera
<https://www.coursera.org/learn/machine-learning/home/welcome>. So I have
the basic knowledge of Machine Learning, and now I am keeping forward to *Deep
Learning* and *Convex Optimization*, some other Mathematical Optimization
implementation. I am now still learning and getting famiIiar with Mahout. I
hope I can contribute some codes to Mahout in the early future with
learning by coding and coding by learning.
NASa 2016/05/20
​

Re: [Hello] from NASa

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Well, IMO big data tensor math is Mahout’s strongest point and GPUs on immediately on the roadmap.

On May 21, 2016, at 7:30 AM, Steven NASa <cj...@gmail.com> wrote:

Hi Pat,

Thank you for your reply, I fully understand that core algorithms and data
are 2 different part of the system, this is why we have 2 major idea: "Big
data" and "Machine Learning".

My requirements of Recommenders are just like what Amazon does: Item-based,
but the number of items and users is very big, so there comes to a very
huge matrix. So I am still learning using Mahout to make the matrix
computing on a distributed system. After I am familiar with Mahout, I think
I can have some works on GPU acceleration for Matrix computing and some
other mathematical optimization.
About the data prep, I think we can define an abstraction of
conventions in data
prep, data ingestion, and serving components. Users can following some
conventions to feed data to Mahout.

Steven NASa
2016/05/21

2016-05-21 22:06 GMT+08:00 Pat Ferrel <pa...@occamsmachete.com>:

> Hi Stephen,
> 
> We have implemented SVD, ALS, and CCO for recommender, but these are only
> core algorithms, not really recommenders as Mahout has done in the past.
> The reason for this is that there are data prep, data ingestion, and
> serving components that, in a modern system, must be supplied also. So far
> Mahout has stayed aways from actually including servers, either for input
> of output.
> 
> That said there is plenty of room for algorithm development in Mahout. I
> worked on the CCO algorithm, which uses PredictionIO (proposed for the
> Apache Incubator) to supply the serving components.
> 
> Someone with your experience in real-life use of recommenders is certainly
> welcome.
> 
> What type of project did you have in mind?
> 
> 
> On May 20, 2016, at 10:00 AM, Suneel Marthi <sm...@apache.org> wrote:
> 
> Welcome to the project Steven!!
> 
> On Fri, May 20, 2016 at 10:07 AM, Steven NASa <cj...@gmail.com> wrote:
> 
>> Hi Folk & Masters,
>> 
>> My name is *NASa*. I am now working for an e-commerce B2C company in
> China,
>> dealing with Transaction Process development in C++ & Java on Linux
>> environment.
>> 
>> As you know, *Recommender System* is quite valuable and important to an
>> e-commerce online shopping website like Amazon. I was told and required
> to
>> design and implement a Recommender System which can bring some value to
> my
>> Company. Our System is based on C++ codes. So I was searching for an
> robust
>> Machine Learning framework in C++ which can help me to easily implement a
>> Recommender System. I did not find any one which can satisfy my
>> requirements, but only some C++ math libraries.
>> 
>> Our system is based on an internal distributed frameworks like RPC and DB
>> access on Linux environment based on C++ programming language. But I find
>> it is really inconvenient to implement a Recommender System in C++ from
>> zero without distributed computing library supporting, like
>> implementing *Collaborative
>> Filtering* with SVD in a distributed computing way. So I am trying to
> find
>> a framework/library with is designed based on Distributed-System. There I
>> come to *Mahout*.
>> 
>> I wish I can build a library that can help people easily and quickly
> build
>> up a Recommender System based on Distributed System and also use the
>> Machine Learning Algorithms in distributed way. Apache has many amazing
>> projects which can help people to build up robust distributed system
>> easily. So I am moving to using “Java” environment.
>> 
>> I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
>> Ng’s “Machine Learning” from Coursera
>> <https://www.coursera.org/learn/machine-learning/home/welcome>. So I
> have
>> the basic knowledge of Machine Learning, and now I am keeping forward to
>> *Deep
>> Learning* and *Convex Optimization*, some other Mathematical Optimization
>> implementation. I am now still learning and getting famiIiar with
> Mahout. I
>> hope I can contribute some codes to Mahout in the early future with
>> learning by coding and coding by learning.
>> NASa 2016/05/20
>> ​
>> 
> 
> 


Re: [Hello] from NASa

Posted by Khurrum Nasim <kh...@useitc.com>.
Interesting.


> On May 21, 2016, at 10:30 AM, Steven NASa <cj...@gmail.com> wrote:
> 
> Hi Pat,
> 
> Thank you for your reply, I fully understand that core algorithms and data
> are 2 different part of the system, this is why we have 2 major idea: "Big
> data" and "Machine Learning".
> 
> My requirements of Recommenders are just like what Amazon does: Item-based,
> but the number of items and users is very big, so there comes to a very
> huge matrix. So I am still learning using Mahout to make the matrix
> computing on a distributed system. After I am familiar with Mahout, I think
> I can have some works on GPU acceleration for Matrix computing and some
> other mathematical optimization.
> About the data prep, I think we can define an abstraction of
> conventions in data
> prep, data ingestion, and serving components. Users can following some
> conventions to feed data to Mahout.
> 
> Steven NASa
> 2016/05/21
> 
> 2016-05-21 22:06 GMT+08:00 Pat Ferrel <pa...@occamsmachete.com>:
> 
>> Hi Stephen,
>> 
>> We have implemented SVD, ALS, and CCO for recommender, but these are only
>> core algorithms, not really recommenders as Mahout has done in the past.
>> The reason for this is that there are data prep, data ingestion, and
>> serving components that, in a modern system, must be supplied also. So far
>> Mahout has stayed aways from actually including servers, either for input
>> of output.
>> 
>> That said there is plenty of room for algorithm development in Mahout. I
>> worked on the CCO algorithm, which uses PredictionIO (proposed for the
>> Apache Incubator) to supply the serving components.
>> 
>> Someone with your experience in real-life use of recommenders is certainly
>> welcome.
>> 
>> What type of project did you have in mind?
>> 
>> 
>> On May 20, 2016, at 10:00 AM, Suneel Marthi <sm...@apache.org> wrote:
>> 
>> Welcome to the project Steven!!
>> 
>> On Fri, May 20, 2016 at 10:07 AM, Steven NASa <cj...@gmail.com> wrote:
>> 
>>> Hi Folk & Masters,
>>> 
>>> My name is *NASa*. I am now working for an e-commerce B2C company in
>> China,
>>> dealing with Transaction Process development in C++ & Java on Linux
>>> environment.
>>> 
>>> As you know, *Recommender System* is quite valuable and important to an
>>> e-commerce online shopping website like Amazon. I was told and required
>> to
>>> design and implement a Recommender System which can bring some value to
>> my
>>> Company. Our System is based on C++ codes. So I was searching for an
>> robust
>>> Machine Learning framework in C++ which can help me to easily implement a
>>> Recommender System. I did not find any one which can satisfy my
>>> requirements, but only some C++ math libraries.
>>> 
>>> Our system is based on an internal distributed frameworks like RPC and DB
>>> access on Linux environment based on C++ programming language. But I find
>>> it is really inconvenient to implement a Recommender System in C++ from
>>> zero without distributed computing library supporting, like
>>> implementing *Collaborative
>>> Filtering* with SVD in a distributed computing way. So I am trying to
>> find
>>> a framework/library with is designed based on Distributed-System. There I
>>> come to *Mahout*.
>>> 
>>> I wish I can build a library that can help people easily and quickly
>> build
>>> up a Recommender System based on Distributed System and also use the
>>> Machine Learning Algorithms in distributed way. Apache has many amazing
>>> projects which can help people to build up robust distributed system
>>> easily. So I am moving to using “Java” environment.
>>> 
>>> I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
>>> Ng’s “Machine Learning” from Coursera
>>> <https://www.coursera.org/learn/machine-learning/home/welcome>. So I
>> have
>>> the basic knowledge of Machine Learning, and now I am keeping forward to
>>> *Deep
>>> Learning* and *Convex Optimization*, some other Mathematical Optimization
>>> implementation. I am now still learning and getting famiIiar with
>> Mahout. I
>>> hope I can contribute some codes to Mahout in the early future with
>>> learning by coding and coding by learning.
>>> NASa 2016/05/20
>>> ​
>>> 
>> 
>> 


Re: [Hello] from NASa

Posted by Steven NASa <cj...@gmail.com>.
Hi Pat,

Thank you for your reply, I fully understand that core algorithms and data
are 2 different part of the system, this is why we have 2 major idea: "Big
data" and "Machine Learning".

My requirements of Recommenders are just like what Amazon does: Item-based,
but the number of items and users is very big, so there comes to a very
huge matrix. So I am still learning using Mahout to make the matrix
computing on a distributed system. After I am familiar with Mahout, I think
I can have some works on GPU acceleration for Matrix computing and some
other mathematical optimization.
About the data prep, I think we can define an abstraction of
conventions in data
prep, data ingestion, and serving components. Users can following some
conventions to feed data to Mahout.

Steven NASa
2016/05/21

2016-05-21 22:06 GMT+08:00 Pat Ferrel <pa...@occamsmachete.com>:

> Hi Stephen,
>
> We have implemented SVD, ALS, and CCO for recommender, but these are only
> core algorithms, not really recommenders as Mahout has done in the past.
> The reason for this is that there are data prep, data ingestion, and
> serving components that, in a modern system, must be supplied also. So far
> Mahout has stayed aways from actually including servers, either for input
> of output.
>
> That said there is plenty of room for algorithm development in Mahout. I
> worked on the CCO algorithm, which uses PredictionIO (proposed for the
> Apache Incubator) to supply the serving components.
>
> Someone with your experience in real-life use of recommenders is certainly
> welcome.
>
> What type of project did you have in mind?
>
>
> On May 20, 2016, at 10:00 AM, Suneel Marthi <sm...@apache.org> wrote:
>
> Welcome to the project Steven!!
>
> On Fri, May 20, 2016 at 10:07 AM, Steven NASa <cj...@gmail.com> wrote:
>
> > Hi Folk & Masters,
> >
> > My name is *NASa*. I am now working for an e-commerce B2C company in
> China,
> > dealing with Transaction Process development in C++ & Java on Linux
> > environment.
> >
> > As you know, *Recommender System* is quite valuable and important to an
> > e-commerce online shopping website like Amazon. I was told and required
> to
> > design and implement a Recommender System which can bring some value to
> my
> > Company. Our System is based on C++ codes. So I was searching for an
> robust
> > Machine Learning framework in C++ which can help me to easily implement a
> > Recommender System. I did not find any one which can satisfy my
> > requirements, but only some C++ math libraries.
> >
> > Our system is based on an internal distributed frameworks like RPC and DB
> > access on Linux environment based on C++ programming language. But I find
> > it is really inconvenient to implement a Recommender System in C++ from
> > zero without distributed computing library supporting, like
> > implementing *Collaborative
> > Filtering* with SVD in a distributed computing way. So I am trying to
> find
> > a framework/library with is designed based on Distributed-System. There I
> > come to *Mahout*.
> >
> > I wish I can build a library that can help people easily and quickly
> build
> > up a Recommender System based on Distributed System and also use the
> > Machine Learning Algorithms in distributed way. Apache has many amazing
> > projects which can help people to build up robust distributed system
> > easily. So I am moving to using “Java” environment.
> >
> > I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
> > Ng’s “Machine Learning” from Coursera
> > <https://www.coursera.org/learn/machine-learning/home/welcome>. So I
> have
> > the basic knowledge of Machine Learning, and now I am keeping forward to
> > *Deep
> > Learning* and *Convex Optimization*, some other Mathematical Optimization
> > implementation. I am now still learning and getting famiIiar with
> Mahout. I
> > hope I can contribute some codes to Mahout in the early future with
> > learning by coding and coding by learning.
> > NASa 2016/05/20
> > ​
> >
>
>

Re: [Hello] from NASa

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Hi Stephen,

We have implemented SVD, ALS, and CCO for recommender, but these are only core algorithms, not really recommenders as Mahout has done in the past. The reason for this is that there are data prep, data ingestion, and serving components that, in a modern system, must be supplied also. So far Mahout has stayed aways from actually including servers, either for input of output.

That said there is plenty of room for algorithm development in Mahout. I worked on the CCO algorithm, which uses PredictionIO (proposed for the Apache Incubator) to supply the serving components.

Someone with your experience in real-life use of recommenders is certainly welcome.

What type of project did you have in mind?


On May 20, 2016, at 10:00 AM, Suneel Marthi <sm...@apache.org> wrote:

Welcome to the project Steven!!

On Fri, May 20, 2016 at 10:07 AM, Steven NASa <cj...@gmail.com> wrote:

> Hi Folk & Masters,
> 
> My name is *NASa*. I am now working for an e-commerce B2C company in China,
> dealing with Transaction Process development in C++ & Java on Linux
> environment.
> 
> As you know, *Recommender System* is quite valuable and important to an
> e-commerce online shopping website like Amazon. I was told and required to
> design and implement a Recommender System which can bring some value to my
> Company. Our System is based on C++ codes. So I was searching for an robust
> Machine Learning framework in C++ which can help me to easily implement a
> Recommender System. I did not find any one which can satisfy my
> requirements, but only some C++ math libraries.
> 
> Our system is based on an internal distributed frameworks like RPC and DB
> access on Linux environment based on C++ programming language. But I find
> it is really inconvenient to implement a Recommender System in C++ from
> zero without distributed computing library supporting, like
> implementing *Collaborative
> Filtering* with SVD in a distributed computing way. So I am trying to find
> a framework/library with is designed based on Distributed-System. There I
> come to *Mahout*.
> 
> I wish I can build a library that can help people easily and quickly build
> up a Recommender System based on Distributed System and also use the
> Machine Learning Algorithms in distributed way. Apache has many amazing
> projects which can help people to build up robust distributed system
> easily. So I am moving to using “Java” environment.
> 
> I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
> Ng’s “Machine Learning” from Coursera
> <https://www.coursera.org/learn/machine-learning/home/welcome>. So I have
> the basic knowledge of Machine Learning, and now I am keeping forward to
> *Deep
> Learning* and *Convex Optimization*, some other Mathematical Optimization
> implementation. I am now still learning and getting famiIiar with Mahout. I
> hope I can contribute some codes to Mahout in the early future with
> learning by coding and coding by learning.
> NASa 2016/05/20
> ​
> 


Re: [Hello] from NASa

Posted by Suneel Marthi <sm...@apache.org>.
Welcome to the project Steven!!

On Fri, May 20, 2016 at 10:07 AM, Steven NASa <cj...@gmail.com> wrote:

> Hi Folk & Masters,
>
> My name is *NASa*. I am now working for an e-commerce B2C company in China,
> dealing with Transaction Process development in C++ & Java on Linux
> environment.
>
> As you know, *Recommender System* is quite valuable and important to an
> e-commerce online shopping website like Amazon. I was told and required to
> design and implement a Recommender System which can bring some value to my
> Company. Our System is based on C++ codes. So I was searching for an robust
> Machine Learning framework in C++ which can help me to easily implement a
> Recommender System. I did not find any one which can satisfy my
> requirements, but only some C++ math libraries.
>
> Our system is based on an internal distributed frameworks like RPC and DB
> access on Linux environment based on C++ programming language. But I find
> it is really inconvenient to implement a Recommender System in C++ from
> zero without distributed computing library supporting, like
> implementing *Collaborative
> Filtering* with SVD in a distributed computing way. So I am trying to find
> a framework/library with is designed based on Distributed-System. There I
> come to *Mahout*.
>
> I wish I can build a library that can help people easily and quickly build
> up a Recommender System based on Distributed System and also use the
> Machine Learning Algorithms in distributed way. Apache has many amazing
> projects which can help people to build up robust distributed system
> easily. So I am moving to using “Java” environment.
>
> I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
> Ng’s “Machine Learning” from Coursera
> <https://www.coursera.org/learn/machine-learning/home/welcome>. So I have
> the basic knowledge of Machine Learning, and now I am keeping forward to
> *Deep
> Learning* and *Convex Optimization*, some other Mathematical Optimization
> implementation. I am now still learning and getting famiIiar with Mahout. I
> hope I can contribute some codes to Mahout in the early future with
> learning by coding and coding by learning.
> NASa 2016/05/20
> ​
>

Re: [Hello] from NASa

Posted by Andrew Musselman <an...@gmail.com>.
Steven, thanks for reaching out, and welcome to the project!

If you want to discuss how to build a recommender system, the user list is
probably more appropriate, and we all hang out there too.

If you'd like to contribute to the project dev's the right list. Let us
know if you have any trouble getting up and running and we can help out.

Best
Andrew

On Fri, May 20, 2016 at 7:07 AM, Steven NASa <cj...@gmail.com> wrote:

> Hi Folk & Masters,
>
> My name is *NASa*. I am now working for an e-commerce B2C company in China,
> dealing with Transaction Process development in C++ & Java on Linux
> environment.
>
> As you know, *Recommender System* is quite valuable and important to an
> e-commerce online shopping website like Amazon. I was told and required to
> design and implement a Recommender System which can bring some value to my
> Company. Our System is based on C++ codes. So I was searching for an robust
> Machine Learning framework in C++ which can help me to easily implement a
> Recommender System. I did not find any one which can satisfy my
> requirements, but only some C++ math libraries.
>
> Our system is based on an internal distributed frameworks like RPC and DB
> access on Linux environment based on C++ programming language. But I find
> it is really inconvenient to implement a Recommender System in C++ from
> zero without distributed computing library supporting, like
> implementing *Collaborative
> Filtering* with SVD in a distributed computing way. So I am trying to find
> a framework/library with is designed based on Distributed-System. There I
> come to *Mahout*.
>
> I wish I can build a library that can help people easily and quickly build
> up a Recommender System based on Distributed System and also use the
> Machine Learning Algorithms in distributed way. Apache has many amazing
> projects which can help people to build up robust distributed system
> easily. So I am moving to using “Java” environment.
>
> I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
> Ng’s “Machine Learning” from Coursera
> <https://www.coursera.org/learn/machine-learning/home/welcome>. So I have
> the basic knowledge of Machine Learning, and now I am keeping forward to
> *Deep
> Learning* and *Convex Optimization*, some other Mathematical Optimization
> implementation. I am now still learning and getting famiIiar with Mahout. I
> hope I can contribute some codes to Mahout in the early future with
> learning by coding and coding by learning.
> NASa 2016/05/20
> ​
>

Re: [Hello] from NASa

Posted by Khurrum Nasim <kh...@useitc.com>.
Sounds more like demand prediction to me.   

However your system should be able to interact with other non-C/C++ systems.  
There is something called Apache Thrift.   

Which brings me to the following - would it be a valuable feature to Mahout library to provide
connectivity with other systems using Thrift.   


Thoughts ?

Khurrum

p.s. Andrew Ng can put you to sleep easily. 


> On May 20, 2016, at 10:07 AM, Steven NASa <cj...@gmail.com> wrote:
> 
> Hi Folk & Masters,
> 
> My name is *NASa*. I am now working for an e-commerce B2C company in China,
> dealing with Transaction Process development in C++ & Java on Linux
> environment.
> 
> As you know, *Recommender System* is quite valuable and important to an
> e-commerce online shopping website like Amazon. I was told and required to
> design and implement a Recommender System which can bring some value to my
> Company. Our System is based on C++ codes. So I was searching for an robust
> Machine Learning framework in C++ which can help me to easily implement a
> Recommender System. I did not find any one which can satisfy my
> requirements, but only some C++ math libraries.
> 
> Our system is based on an internal distributed frameworks like RPC and DB
> access on Linux environment based on C++ programming language. But I find
> it is really inconvenient to implement a Recommender System in C++ from
> zero without distributed computing library supporting, like
> implementing *Collaborative
> Filtering* with SVD in a distributed computing way. So I am trying to find
> a framework/library with is designed based on Distributed-System. There I
> come to *Mahout*.
> 
> I wish I can build a library that can help people easily and quickly build
> up a Recommender System based on Distributed System and also use the
> Machine Learning Algorithms in distributed way. Apache has many amazing
> projects which can help people to build up robust distributed system
> easily. So I am moving to using “Java” environment.
> 
> I am new to *Mahout* and *Hadoop*, *Spark*, *Scala* and I learned Andrew
> Ng’s “Machine Learning” from Coursera
> <https://www.coursera.org/learn/machine-learning/home/welcome>. So I have
> the basic knowledge of Machine Learning, and now I am keeping forward to *Deep
> Learning* and *Convex Optimization*, some other Mathematical Optimization
> implementation. I am now still learning and getting famiIiar with Mahout. I
> hope I can contribute some codes to Mahout in the early future with
> learning by coding and coding by learning.
> NASa 2016/05/20
> ​