You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2015/06/11 21:07:10 UTC

Fwd: [mahout] Mahout 0.10.x ora 0515 MAHOUT-1660 MAHOUT-1713 MAHOUT-1714 MAHOUT-1715 MAHOUT-1716 MAHOUT-1717 MAHOUT-1718 MAHOUT-1719 MAHOUT-1720 MAHOUT-1721 MAHOUT-1722 MAHOUT-1723 MAHOUT-1724 MAHOUT-1725 MAHOUT-1726 MAHOUT-1727 MAHOUT-1728 MAHOUT-1729 MAHOUT-1730 MAHOUT-1731 MAHOUT-1732 (#135)

---------- Forwarded message ----------
From: Dmitriy Lyubimov <dl...@gmail.com>
Date: Thu, Jun 11, 2015 at 12:05 PM
Subject: Re: [mahout] Mahout 0.10.x ora 0515 MAHOUT-1660 MAHOUT-1713
MAHOUT-1714 MAHOUT-1715 MAHOUT-1716 MAHOUT-1717 MAHOUT-1718 MAHOUT-1719
MAHOUT-1720 MAHOUT-1721 MAHOUT-1722 MAHOUT-1723 MAHOUT-1724 MAHOUT-1725
MAHOUT-1726 MAHOUT-1727 MAHOUT-1728 MAHOUT-1729 MAHOUT-1730 MAHOUT-1731
MAHOUT-1732 (#135)
To: apache/mahout <
reply+0007fbffaee3ec1297f829d4cef35d71fe241e110a902c0192cf0000000111919b2592a170ce01ec2bfb@reply.github.com
>


yes.  it lazily puts it into cache if input is not yet put into cache, with
MEMORY_ONLY as to prevent partition recomputation during multiple passes
over input. If input is already in the cache (shoved before the call) then
it has no additional effect.

I was thinking about this situation when functions need to go over inputs
multiple times and decided that they do need to take initiative if it is
not yet taken as user has no idea when input is going to be needed more
than once. Otherwise it may lead to performance degrade that would be hard
to track down.

On the other hand, in spark 1.2 it's my understanding unpersist is now
reference queue-aware, i.e. it will know to garbage-collect RDD from cache
with JVM garbage collect says there's no more RDD reference (in our case,
checkpointed matrix reference). As to how well it works in practice, i did
not investigate, but that has not been causing a problem for me so far in
my otherwise stressed tests.

On Thu, Jun 11, 2015 at 11:53 AM, Andrew Musselman <notifications@github.com
> wrote:

> In
> math-scala/src/main/scala/org/apache/mahout/math/decompositions/DSSVD.scala
> <https://github.com/apache/mahout/pull/135#discussion_r32254971>:
>
> > @@ -43,18 +46,22 @@ object DSSVD {
> >        case (keys, blockA) =>
> >          val blockY = blockA %*% Matrices.symmetricUniformView(n, r, omegaSeed)
> >          keys -> blockY
> > -    }
> > +    }.checkpoint()
>
> This puts results into a cache?
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/mahout/pull/135/files#r32254971>.
>