You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by shruti ranade <sh...@gmail.com> on 2013/02/20 08:10:10 UTC

Fwd: Precision used by mahout

Hi,

I am a beginner in mahout. I am working on k-means MR implementation and
trying to run it on a GPGPU.* I wanted to know if mahout computations are
all double precision or single precision. *

Suggest me any documentation that I need to refer to.

Thanks,
Shruti

Re: Precision used by mahout

Posted by Sean Owen <sr...@gmail.com>.

This is entirely in-core MapReduce. That's valid, but one of the major
points of MapReduce as we know it is distributing the computation over many
machines (i.e. Hadoop). Eventually you outgrow just 1 computer. That
said... we continue to see bigger and bigger machines available. I can rent
a machine with 224GB of RAM on EC2 now...

In practical terms -- this is "MapReduce" but implemented on a completely
different framework. It would have nothing to do with Mahout. You might be
able to reimplement it though. If you were going to implement it again
anyway -- MapReduce is probably not the best choice. It is worth its price
in complexity if it means you can leverage Hadoop and the like to deal with
machine failure. You don't have that problem here. You also don't
necessarily have to structure computation such that workers have no means
of communicating, because it's in-core. M/R implementations have to
compromise to meet these constraints, but if they're not actually
constraints (inside a GPU) it's just more complex and sub-optimal.



On Wed, Feb 20, 2013 at 11:05 AM, shruti ranade <sh...@gmail.com>wrote:

> That was of great help. Thanks for the input. There is something
> called*MARS
> *for accelerating MapReduce using GPUs. I am quite not sure about this. But
> here's the link <http://www.cse.ust.hk/gpuqp/Mars_tr.pdf>. This paper
> might
> give you a better idea of what we are trying to achieve. And we are
> thinking of using JCUDA for parallelizing things on Nividia Tesla. We are
> still at Ground zero. But if we can accelerate the performance of mahout's
> k-means on GPU, i guess it will be a huge thing.
>
> P.S : Perhaps the paper would transform the artificial marriage to
> something better
>
>
> On Wed, Feb 20, 2013 at 4:08 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > I think this is quite possible too. I just think there's little point in
> > matching this up with Hadoop. They represent entirely different
> > architectures for large-scale computation. I mean, you can probably write
> > an M/R job that uses GPUs on workers, but I imagine it would be an
> > artificial marriage of technologies. Probably Hadoop being used simply to
> > distribute data.
> >
> > If you want to use a GPU, and want to use it properly, most of your work
> is
> > to create an effective in-core parallel implementation, not distributed
> > across computers and distributed file systems. You use JNI or CUDA
> bindings
> > in Java to push computations into hardware from Java.
> >
> > This is an exercise in a) modifying a matrix/vector library to use native
> > hardware, then b) writing algorithms that use that library. I think your
> > best starting point in Java may be something more general like Commons
> > Math.
> >
> >
> >
> >
> > On Wed, Feb 20, 2013 at 10:22 AM, 万代豊 <20...@gmail.com> wrote:
> >
> > > This is the agenda that I'm interested in too.
> > > I believe Item-Based Recomemndation in Mahout (Not only about Mahout
> > > though) should spend sometime
> > > doing multiplication of cooccurrence matrix and user preference vector.
> > > If we could pass this multiplication task off loaded to GGPU, then that
> > > will be a great acceleration.
> > > What I'm not really clear is how double precision multiplication task
> > > inside Java Virtual Machine can take advantage of the HW accelerator.(I
> > > mean how can you make GGPU visible to Mahout through JVM?)
> > >
> > > If we could get over this in addition to what Ted Dunning presented the
> > > other day on Solr involment in building/loading cooccurrence matrix for
> > > Mahout recommendation, it should be a big leap in innovating Mahout
> > > recommendation.
> > >
> > > Am I missing sothing or just dreamig?
> > > Regards,,,
> > > Y.Mandai
> > >
> > > 2013/2/20 Sean Owen <sr...@gmail.com>
> > >
> > > > I think all of the code uses double-precision floats. I imagine much
> of
> > > it
> > > > could work as well with single-precision floats.
> > > >
> > > > MapReduce and a GPU are very different things though, and I'm not
> sure
> > > how
> > > > you would use both together effectively.
> > > >
> > > >
> > > > On Wed, Feb 20, 2013 at 7:10 AM, shruti ranade <
> > shrutiranade38@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am a beginner in mahout. I am working on k-means MR
> implementation
> > > and
> > > > > trying to run it on a GPGPU.* I wanted to know if mahout
> computations
> > > are
> > > > > all double precision or single precision. *
> > > > >
> > > > > Suggest me any documentation that I need to refer to.
> > > > >
> > > > > Thanks,
> > > > > Shruti
> > > > >
> > > >
> > >
> >
>

Re: Precision used by mahout

Posted by shruti ranade <sh...@gmail.com>.

That was of great help. Thanks for the input. There is something called*MARS
*for accelerating MapReduce using GPUs. I am quite not sure about this. But
here's the link <http://www.cse.ust.hk/gpuqp/Mars_tr.pdf>. This paper might
give you a better idea of what we are trying to achieve. And we are
thinking of using JCUDA for parallelizing things on Nividia Tesla. We are
still at Ground zero. But if we can accelerate the performance of mahout's
k-means on GPU, i guess it will be a huge thing.

P.S : Perhaps the paper would transform the artificial marriage to
something better


On Wed, Feb 20, 2013 at 4:08 PM, Sean Owen <sr...@gmail.com> wrote:

> I think this is quite possible too. I just think there's little point in
> matching this up with Hadoop. They represent entirely different
> architectures for large-scale computation. I mean, you can probably write
> an M/R job that uses GPUs on workers, but I imagine it would be an
> artificial marriage of technologies. Probably Hadoop being used simply to
> distribute data.
>
> If you want to use a GPU, and want to use it properly, most of your work is
> to create an effective in-core parallel implementation, not distributed
> across computers and distributed file systems. You use JNI or CUDA bindings
> in Java to push computations into hardware from Java.
>
> This is an exercise in a) modifying a matrix/vector library to use native
> hardware, then b) writing algorithms that use that library. I think your
> best starting point in Java may be something more general like Commons
> Math.
>
>
>
>
> On Wed, Feb 20, 2013 at 10:22 AM, 万代豊 <20...@gmail.com> wrote:
>
> > This is the agenda that I'm interested in too.
> > I believe Item-Based Recomemndation in Mahout (Not only about Mahout
> > though) should spend sometime
> > doing multiplication of cooccurrence matrix and user preference vector.
> > If we could pass this multiplication task off loaded to GGPU, then that
> > will be a great acceleration.
> > What I'm not really clear is how double precision multiplication task
> > inside Java Virtual Machine can take advantage of the HW accelerator.(I
> > mean how can you make GGPU visible to Mahout through JVM?)
> >
> > If we could get over this in addition to what Ted Dunning presented the
> > other day on Solr involment in building/loading cooccurrence matrix for
> > Mahout recommendation, it should be a big leap in innovating Mahout
> > recommendation.
> >
> > Am I missing sothing or just dreamig?
> > Regards,,,
> > Y.Mandai
> >
> > 2013/2/20 Sean Owen <sr...@gmail.com>
> >
> > > I think all of the code uses double-precision floats. I imagine much of
> > it
> > > could work as well with single-precision floats.
> > >
> > > MapReduce and a GPU are very different things though, and I'm not sure
> > how
> > > you would use both together effectively.
> > >
> > >
> > > On Wed, Feb 20, 2013 at 7:10 AM, shruti ranade <
> shrutiranade38@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I am a beginner in mahout. I am working on k-means MR implementation
> > and
> > > > trying to run it on a GPGPU.* I wanted to know if mahout computations
> > are
> > > > all double precision or single precision. *
> > > >
> > > > Suggest me any documentation that I need to refer to.
> > > >
> > > > Thanks,
> > > > Shruti
> > > >
> > >
> >
>

Re: Precision used by mahout

Posted by Sean Owen <sr...@gmail.com>.

I think this is quite possible too. I just think there's little point in
matching this up with Hadoop. They represent entirely different
architectures for large-scale computation. I mean, you can probably write
an M/R job that uses GPUs on workers, but I imagine it would be an
artificial marriage of technologies. Probably Hadoop being used simply to
distribute data.

If you want to use a GPU, and want to use it properly, most of your work is
to create an effective in-core parallel implementation, not distributed
across computers and distributed file systems. You use JNI or CUDA bindings
in Java to push computations into hardware from Java.

This is an exercise in a) modifying a matrix/vector library to use native
hardware, then b) writing algorithms that use that library. I think your
best starting point in Java may be something more general like Commons Math.

On Wed, Feb 20, 2013 at 10:22 AM, 万代豊 <20...@gmail.com> wrote:

> This is the agenda that I'm interested in too.
> I believe Item-Based Recomemndation in Mahout (Not only about Mahout
> though) should spend sometime
> doing multiplication of cooccurrence matrix and user preference vector.
> If we could pass this multiplication task off loaded to GGPU, then that
> will be a great acceleration.
> What I'm not really clear is how double precision multiplication task
> inside Java Virtual Machine can take advantage of the HW accelerator.(I
> mean how can you make GGPU visible to Mahout through JVM?)
>
> If we could get over this in addition to what Ted Dunning presented the
> other day on Solr involment in building/loading cooccurrence matrix for
> Mahout recommendation, it should be a big leap in innovating Mahout
> recommendation.
>
> Am I missing sothing or just dreamig?
> Regards,,,
> Y.Mandai
>
> 2013/2/20 Sean Owen <sr...@gmail.com>
>
> > I think all of the code uses double-precision floats. I imagine much of
> it
> > could work as well with single-precision floats.
> >
> > MapReduce and a GPU are very different things though, and I'm not sure
> how
> > you would use both together effectively.
> >
> >
> > On Wed, Feb 20, 2013 at 7:10 AM, shruti ranade <shrutiranade38@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I am a beginner in mahout. I am working on k-means MR implementation
> and
> > > trying to run it on a GPGPU.* I wanted to know if mahout computations
> are
> > > all double precision or single precision. *
> > >
> > > Suggest me any documentation that I need to refer to.
> > >
> > > Thanks,
> > > Shruti
> > >
> >
>

Re: Precision used by mahout

Posted by 万代豊 <20...@gmail.com>.

This is the agenda that I'm interested in too.
I believe Item-Based Recomemndation in Mahout (Not only about Mahout
though) should spend sometime
doing multiplication of cooccurrence matrix and user preference vector.
If we could pass this multiplication task off loaded to GGPU, then that
will be a great acceleration.
What I'm not really clear is how double precision multiplication task
inside Java Virtual Machine can take advantage of the HW accelerator.(I
mean how can you make GGPU visible to Mahout through JVM?)

If we could get over this in addition to what Ted Dunning presented the
other day on Solr involment in building/loading cooccurrence matrix for
Mahout recommendation, it should be a big leap in innovating Mahout
recommendation.

Am I missing sothing or just dreamig?
Regards,,,
Y.Mandai

2013/2/20 Sean Owen <sr...@gmail.com>

> I think all of the code uses double-precision floats. I imagine much of it
> could work as well with single-precision floats.
>
> MapReduce and a GPU are very different things though, and I'm not sure how
> you would use both together effectively.
>
>
> On Wed, Feb 20, 2013 at 7:10 AM, shruti ranade <shrutiranade38@gmail.com
> >wrote:
>
> > Hi,
> >
> > I am a beginner in mahout. I am working on k-means MR implementation and
> > trying to run it on a GPGPU.* I wanted to know if mahout computations are
> > all double precision or single precision. *
> >
> > Suggest me any documentation that I need to refer to.
> >
> > Thanks,
> > Shruti
> >
>

Re: Precision used by mahout

Posted by shruti ranade <sh...@gmail.com>.

I have found an IEEE paper which has brought together mapreduce and GPU.
Referring it for my implementation.
Thanks anyways.

Regards,
Shruti



On Wed, Feb 20, 2013 at 1:54 PM, Sean Owen <sr...@gmail.com> wrote:

> I think all of the code uses double-precision floats. I imagine much of it
> could work as well with single-precision floats.
>
> MapReduce and a GPU are very different things though, and I'm not sure how
> you would use both together effectively.
>
>
> On Wed, Feb 20, 2013 at 7:10 AM, shruti ranade <shrutiranade38@gmail.com
> >wrote:
>
> > Hi,
> >
> > I am a beginner in mahout. I am working on k-means MR implementation and
> > trying to run it on a GPGPU.* I wanted to know if mahout computations are
> > all double precision or single precision. *
> >
> > Suggest me any documentation that I need to refer to.
> >
> > Thanks,
> > Shruti
> >
>

Re: Precision used by mahout

Posted by Sean Owen <sr...@gmail.com>.

I think all of the code uses double-precision floats. I imagine much of it
could work as well with single-precision floats.

MapReduce and a GPU are very different things though, and I'm not sure how
you would use both together effectively.

On Wed, Feb 20, 2013 at 7:10 AM, shruti ranade <sh...@gmail.com>wrote:

> Hi,
>
> I am a beginner in mahout. I am working on k-means MR implementation and
> trying to run it on a GPGPU.* I wanted to know if mahout computations are
> all double precision or single precision. *
>
> Suggest me any documentation that I need to refer to.
>
> Thanks,
> Shruti
>