You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dan Brickley <da...@danbri.org> on 2012/06/21 21:54:58 UTC

Spectral Kmeans wiki category data test - can you confirm if you ran it to completion?

Robin,

Do you remember if this test ran successfully to completion? If not,
I'll submit a JIRA when I've a complete log of a failed run...

Dan

---------- Forwarded message ----------
From: Grant Ingersoll <gs...@apache.org>
Date: 21 June 2012 21:33
Subject: Re: Spectral Kmeans wiki category data test - can you confirm
if you ran it to completion?
To: Dan Brickley <da...@danbri.org>
Cc: Shannon Quinn <sq...@gatech.edu>


I'd ask on dev@, as Robin was actually the one who ran it.

On Jun 21, 2012, at 3:15 PM, Dan Brickley wrote:

Hi

With the patch https://issues.apache.org/jira/browse/MAHOUT-986 in
0.7, this doesn't die so quickly ... but I'm still not seeing it run
to completion.

Using the template commandline you suggested, 'bin/mahout
spectralkmeans -k 20 -d 4192499 -x 7 -i path/to/csv/file/ -o
your/output/path/

I've seen it fail with -k 20, and -k 10

Unfortunately I was running this in a screen session without proper
logging and I want to double-check everything before reporting so I'm
re-running with -k 10 now and will file a bug if it fails, ... but
meanwhile I wanted to check in with you to see if you'd had a
successful run. I'm testing with the 0.7 distro.

The failure was an IndexException, here's the -k 20 version,

mahout  spectralkmeans -k 20 -d 4192499 -x 7 -i spectral/input/  -o
spectral/output/

12/06/19 19:33:11 INFO lanczos.LanczosSolver: 20 passes through the
corpus so far...
Exception in thread "main" org.apache.mahout.math.IndexException:
Index 20 is outside allowable range of [0,20)
       at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)
       at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132)
       at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73)
       at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148)
       at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)

It's barfing out here,

   // Next step: perform eigen-decomposition using LanczosSolver
   // since some of the eigen-output is spurious and will be eliminated
   // upon verification, we have to aim to overshoot and then discard
   // unnecessary vectors later
   int overshoot = (int) ((double) clusters * OVERSHOOT_MULTIPLIER);
   DistributedLanczosSolver solver = new DistributedLanczosSolver();
   LanczosState state = new LanczosState(L, overshoot,
solver.getInitialVector(L));
   Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" +
(System.nanoTime() & 0xFF));
   solver.runJob(conf,
                 state,
                 overshoot,
                 true,
                 lanczosSeqFiles.toString());

With -k 10 I got "12/06/20 20:51:15 INFO lanczos.LanczosSolver: 10
passes through the corpus so far...
Exception in thread "main" org.apache.mahout.math.IndexException:
Index 10 is outside allowable range of [0,10)
       at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)".

...although the logs also showed "12/06/20 20:40:18 INFO
lanczos.LanczosSolver: Finding 20 singular vectors of matrix with
4192499 rows, via Lanczos" which confused me until Shannon reminded me
of the overshoot.

I'm happy to +cc the mailing lists but for starters thought I'd check
to see if the test run had succeeded for you; if so, maybe I've some
local problem.

Dan


--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com

Re: Spectral Kmeans wiki category data test - can you confirm if you ran it to completion?

Posted by Robin Anil <ro...@gmail.com>.
I dont recall.
------
Robin Anil


On Thu, Jun 21, 2012 at 2:54 PM, Dan Brickley <da...@danbri.org> wrote:

> Robin,
>
> Do you remember if this test ran successfully to completion? If not,
> I'll submit a JIRA when I've a complete log of a failed run...
>
> Dan
>
> ---------- Forwarded message ----------
> From: Grant Ingersoll <gs...@apache.org>
> Date: 21 June 2012 21:33
> Subject: Re: Spectral Kmeans wiki category data test - can you confirm
> if you ran it to completion?
> To: Dan Brickley <da...@danbri.org>
> Cc: Shannon Quinn <sq...@gatech.edu>
>
>
> I'd ask on dev@, as Robin was actually the one who ran it.
>
> On Jun 21, 2012, at 3:15 PM, Dan Brickley wrote:
>
> Hi
>
> With the patch https://issues.apache.org/jira/browse/MAHOUT-986 in
> 0.7, this doesn't die so quickly ... but I'm still not seeing it run
> to completion.
>
> Using the template commandline you suggested, 'bin/mahout
> spectralkmeans -k 20 -d 4192499 -x 7 -i path/to/csv/file/ -o
> your/output/path/
>
> I've seen it fail with -k 20, and -k 10
>
> Unfortunately I was running this in a screen session without proper
> logging and I want to double-check everything before reporting so I'm
> re-running with -k 10 now and will file a bug if it fails, ... but
> meanwhile I wanted to check in with you to see if you'd had a
> successful run. I'm testing with the 0.7 distro.
>
> The failure was an IndexException, here's the -k 20 version,
>
> mahout  spectralkmeans -k 20 -d 4192499 -x 7 -i spectral/input/  -o
> spectral/output/
>
> 12/06/19 19:33:11 INFO lanczos.LanczosSolver: 20 passes through the
> corpus so far...
> Exception in thread "main" org.apache.mahout.math.IndexException:
> Index 20 is outside allowable range of [0,20)
>        at
> org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)
>        at
> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132)
>        at
> org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73)
>        at
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148)
>        at
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
>
> It's barfing out here,
>
>    // Next step: perform eigen-decomposition using LanczosSolver
>    // since some of the eigen-output is spurious and will be eliminated
>    // upon verification, we have to aim to overshoot and then discard
>    // unnecessary vectors later
>    int overshoot = (int) ((double) clusters * OVERSHOOT_MULTIPLIER);
>    DistributedLanczosSolver solver = new DistributedLanczosSolver();
>    LanczosState state = new LanczosState(L, overshoot,
> solver.getInitialVector(L));
>    Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" +
> (System.nanoTime() & 0xFF));
>    solver.runJob(conf,
>                  state,
>                  overshoot,
>                  true,
>                  lanczosSeqFiles.toString());
>
> With -k 10 I got "12/06/20 20:51:15 INFO lanczos.LanczosSolver: 10
> passes through the corpus so far...
> Exception in thread "main" org.apache.mahout.math.IndexException:
> Index 10 is outside allowable range of [0,10)
>        at
> org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)".
>
> ...although the logs also showed "12/06/20 20:40:18 INFO
> lanczos.LanczosSolver: Finding 20 singular vectors of matrix with
> 4192499 rows, via Lanczos" which confused me until Shannon reminded me
> of the overshoot.
>
> I'm happy to +cc the mailing lists but for starters thought I'd check
> to see if the test run had succeeded for you; if so, maybe I've some
> local problem.
>
> Dan
>
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>