You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dan Brickley <da...@danbri.org> on 2012/06/21 21:54:58 UTC
Spectral Kmeans wiki category data test - can you confirm if you ran
it to completion?
Robin,
Do you remember if this test ran successfully to completion? If not,
I'll submit a JIRA when I've a complete log of a failed run...
Dan
---------- Forwarded message ----------
From: Grant Ingersoll <gs...@apache.org>
Date: 21 June 2012 21:33
Subject: Re: Spectral Kmeans wiki category data test - can you confirm
if you ran it to completion?
To: Dan Brickley <da...@danbri.org>
Cc: Shannon Quinn <sq...@gatech.edu>
I'd ask on dev@, as Robin was actually the one who ran it.
On Jun 21, 2012, at 3:15 PM, Dan Brickley wrote:
Hi
With the patch https://issues.apache.org/jira/browse/MAHOUT-986 in
0.7, this doesn't die so quickly ... but I'm still not seeing it run
to completion.
Using the template commandline you suggested, 'bin/mahout
spectralkmeans -k 20 -d 4192499 -x 7 -i path/to/csv/file/ -o
your/output/path/
I've seen it fail with -k 20, and -k 10
Unfortunately I was running this in a screen session without proper
logging and I want to double-check everything before reporting so I'm
re-running with -k 10 now and will file a bug if it fails, ... but
meanwhile I wanted to check in with you to see if you'd had a
successful run. I'm testing with the 0.7 distro.
The failure was an IndexException, here's the -k 20 version,
mahout spectralkmeans -k 20 -d 4192499 -x 7 -i spectral/input/ -o
spectral/output/
12/06/19 19:33:11 INFO lanczos.LanczosSolver: 20 passes through the
corpus so far...
Exception in thread "main" org.apache.mahout.math.IndexException:
Index 20 is outside allowable range of [0,20)
at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)
at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132)
at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
It's barfing out here,
// Next step: perform eigen-decomposition using LanczosSolver
// since some of the eigen-output is spurious and will be eliminated
// upon verification, we have to aim to overshoot and then discard
// unnecessary vectors later
int overshoot = (int) ((double) clusters * OVERSHOOT_MULTIPLIER);
DistributedLanczosSolver solver = new DistributedLanczosSolver();
LanczosState state = new LanczosState(L, overshoot,
solver.getInitialVector(L));
Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" +
(System.nanoTime() & 0xFF));
solver.runJob(conf,
state,
overshoot,
true,
lanczosSeqFiles.toString());
With -k 10 I got "12/06/20 20:51:15 INFO lanczos.LanczosSolver: 10
passes through the corpus so far...
Exception in thread "main" org.apache.mahout.math.IndexException:
Index 10 is outside allowable range of [0,10)
at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)".
...although the logs also showed "12/06/20 20:40:18 INFO
lanczos.LanczosSolver: Finding 20 singular vectors of matrix with
4192499 rows, via Lanczos" which confused me until Shannon reminded me
of the overshoot.
I'm happy to +cc the mailing lists but for starters thought I'd check
to see if the test run had succeeded for you; if so, maybe I've some
local problem.
Dan
--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Re: Spectral Kmeans wiki category data test - can you confirm if you
ran it to completion?
Posted by Robin Anil <ro...@gmail.com>.
I dont recall.
------
Robin Anil
On Thu, Jun 21, 2012 at 2:54 PM, Dan Brickley <da...@danbri.org> wrote:
> Robin,
>
> Do you remember if this test ran successfully to completion? If not,
> I'll submit a JIRA when I've a complete log of a failed run...
>
> Dan
>
> ---------- Forwarded message ----------
> From: Grant Ingersoll <gs...@apache.org>
> Date: 21 June 2012 21:33
> Subject: Re: Spectral Kmeans wiki category data test - can you confirm
> if you ran it to completion?
> To: Dan Brickley <da...@danbri.org>
> Cc: Shannon Quinn <sq...@gatech.edu>
>
>
> I'd ask on dev@, as Robin was actually the one who ran it.
>
> On Jun 21, 2012, at 3:15 PM, Dan Brickley wrote:
>
> Hi
>
> With the patch https://issues.apache.org/jira/browse/MAHOUT-986 in
> 0.7, this doesn't die so quickly ... but I'm still not seeing it run
> to completion.
>
> Using the template commandline you suggested, 'bin/mahout
> spectralkmeans -k 20 -d 4192499 -x 7 -i path/to/csv/file/ -o
> your/output/path/
>
> I've seen it fail with -k 20, and -k 10
>
> Unfortunately I was running this in a screen session without proper
> logging and I want to double-check everything before reporting so I'm
> re-running with -k 10 now and will file a bug if it fails, ... but
> meanwhile I wanted to check in with you to see if you'd had a
> successful run. I'm testing with the 0.7 distro.
>
> The failure was an IndexException, here's the -k 20 version,
>
> mahout spectralkmeans -k 20 -d 4192499 -x 7 -i spectral/input/ -o
> spectral/output/
>
> 12/06/19 19:33:11 INFO lanczos.LanczosSolver: 20 passes through the
> corpus so far...
> Exception in thread "main" org.apache.mahout.math.IndexException:
> Index 20 is outside allowable range of [0,20)
> at
> org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)
> at
> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132)
> at
> org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73)
> at
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148)
> at
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
>
> It's barfing out here,
>
> // Next step: perform eigen-decomposition using LanczosSolver
> // since some of the eigen-output is spurious and will be eliminated
> // upon verification, we have to aim to overshoot and then discard
> // unnecessary vectors later
> int overshoot = (int) ((double) clusters * OVERSHOOT_MULTIPLIER);
> DistributedLanczosSolver solver = new DistributedLanczosSolver();
> LanczosState state = new LanczosState(L, overshoot,
> solver.getInitialVector(L));
> Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" +
> (System.nanoTime() & 0xFF));
> solver.runJob(conf,
> state,
> overshoot,
> true,
> lanczosSeqFiles.toString());
>
> With -k 10 I got "12/06/20 20:51:15 INFO lanczos.LanczosSolver: 10
> passes through the corpus so far...
> Exception in thread "main" org.apache.mahout.math.IndexException:
> Index 10 is outside allowable range of [0,10)
> at
> org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)".
>
> ...although the logs also showed "12/06/20 20:40:18 INFO
> lanczos.LanczosSolver: Finding 20 singular vectors of matrix with
> 4192499 rows, via Lanczos" which confused me until Shannon reminded me
> of the overshoot.
>
> I'm happy to +cc the mailing lists but for starters thought I'd check
> to see if the test run had succeeded for you; if so, maybe I've some
> local problem.
>
> Dan
>
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>