You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Shannon Quinn (JIRA)" <ji...@apache.org> on 2011/05/24 17:06:47 UTC

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

    [ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608 ] 

Shannon Quinn commented on MAHOUT-524:
--------------------------------------

+1, I'm on it.

I'm a little unclear as to the context of the initial Hudson comment: the display method is expecting 2D vectors, but getting 5D ones?

> DisplaySpectralKMeans example fails
> -----------------------------------
>
>                 Key: MAHOUT-524
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.4, 0.5
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>              Labels: clustering, k-means, visualization
>             Fix For: 0.6
>
>         Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking of configuration arguments and a bug fix in EigenCleanupJob it runs spectral k-means to completion. The display example is expecting 2-d clustered points and the example is producing 5-d points. Additional I/O work is needed before this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Shannon Quinn <sq...@gatech.edu>.
More or less follow the data through the pipeline?

On Tue, May 24, 2011 at 3:08 PM, Ted Dunning <te...@gmail.com> wrote:

> Yes.  That can be done, but you probably can just remember the references.
>
> On Tue, May 24, 2011 at 12:06 PM, Shannon Quinn <sq...@gatech.edu> wrote:
>
> > That's an excellent analogy! Employing that strategy, would it be
> possible
> > (and not too expensive) to do the QAQ^-1 operation to get the original
> data
> > matrix, after we've clustered the points in eigenspace?
> >
>

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Ted Dunning <te...@gmail.com>.
Yes.  That can be done, but you probably can just remember the references.

On Tue, May 24, 2011 at 12:06 PM, Shannon Quinn <sq...@gatech.edu> wrote:

> That's an excellent analogy! Employing that strategy, would it be possible
> (and not too expensive) to do the QAQ^-1 operation to get the original data
> matrix, after we've clustered the points in eigenspace?
>

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Ted Dunning <te...@gmail.com>.
Nahh... the names are the key.

On Tue, May 24, 2011 at 2:49 PM, Jeff Eastman <je...@narus.com> wrote:

> If the D vectors are NamedVectors, with their index as the name, then this
> will flow through to the clustered points at the output. The order of those
> points won't bear much relation to the order of the input, but the names
> will be preserved. KMeans does not mess with the order of the elements
> within each D vector. I don't know if this is sufficient or if Lanczos does
> anything similar.

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Shannon Quinn <sq...@gatech.edu>.
> Let me see if I'm following you. In the display example, there are 1100, 2d
> vectors generated as raw data D which is 1100x2. Then, the preprocessing
> step uses a distance measure to produce A, which is 1100x1100. They are not
> really affinities, more like distances, so I may have missed the boat on
> that step. Since the distance measure is reducing the [2] dimensionality of
> the Di and Dj vectors with a scalar (aij), I don't see how to reconstruct D
> from A.
>
>
You don't necessarily need to be able to reconstruct D from A, so I suppose
this is where the fourier transform analogy breaks down. A is indexed by row
and column according to the original data, so as long as you know know the
order from which the rows and columns of A were derived from D, then you can
transiently identify the points in D by index.


> KMeans will cluster all the input vectors in an arbitrary order if on a N>1
> cluster and so Di and Dj will lose their index positions in the result. If
> the D vectors are NamedVectors, with their index as the name, then this will
> flow through to the clustered points at the output. The order of those
> points won't bear much relation to the order of the input, but the names
> will be preserved. KMeans does not mess with the order of the elements
> within each D vector. I don't know if this is sufficient or if Lanczos does
> anything similar.
>

Like Ted mentioned, NamedVector may be the key here to identifying the
original points from the clustered projected data. That's probably the right
way to go.


>
> -----Original Message-----
> From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 2:10 PM
> To: dev@mahout.apache.org
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> You're right, that would give you the affinity matrix. However, the
> affinity
> matrix is an easier beast to tame since the matrix is constructed with all
> the points' orders preserved: aff[i][j] is the relationship between
> original_point[i] and original_point[j], so for all practical purposes I
> treat this as the "original data" (since it's easy to go back and forth
> between the two).
>
> Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
> ordering of indices. Does the nth point with label y from the result of
> K-means correspond to the nth row of the column matrix of eigenvectors? If
> so, then does that nth row from the eigenvector matrix also correspond to
> the nth original data point (the one represented by proxy by row n and
> column n of the affinity matrix)? If both these conditions are true, then
> and only then can we say that original_point[n]'s cluster is y.
>
> On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman <je...@narus.com> wrote:
>
> > Would that give you the original data matrix, the clustered data matrix,
> or
> > the clustered affinity matrix? Even with the analogy in mind I'm having
> > trouble connecting the dots. Seems like I lost the original data matrix
> in
> > step 1 when I used a distance measure to produce A from it. If the
> returned
> > eigenvectors define Q, then what is the significance of QAQ^-1? And, more
> > importantly, if the Q eigenvectors define the clusters in eigenspace,
> what
> > is the inverse transformation?
> >
> > -----Original Message-----
> > From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> > Of Shannon Quinn
> > Sent: Tuesday, May 24, 2011 12:07 PM
> > To: dev@mahout.apache.org
> > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> example
> > fails
> >
> > That's an excellent analogy! Employing that strategy, would it be
> possible
> > (and not too expensive) to do the QAQ^-1 operation to get the original
> data
> > matrix, after we've clustered the points in eigenspace?
> >
> > On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <je...@narus.com>
> wrote:
> >
> > > For the display example, it is not necessary to cluster the original
> > > points. The other clustering display examples only train the clusters
> and
> > do
> > > not classify the points. They are drawn first and the cluster centers &
> > > radii are superimposed afterwards. Thus I think it is only necessary to
> > > back-transform the clusters.
> > >
> > > My EE gut tells me this is like Fourier transforms between time- and
> > > frequency-domains. If this is true then what we need is the inverse
> > > transform. Is this a correct analogy?
> > >
> > > -----Original Message-----
> > > From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On
> Behalf
> > > Of Shannon Quinn
> > > Sent: Tuesday, May 24, 2011 11:39 AM
> > > To: dev@mahout.apache.org
> > > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> > example
> > > fails
> > >
> > > This is actually something I could use a little expert Hadoop
> assistance
> > > on.
> > > The general idea is that the points that are clustered in eigenspace
> have
> > a
> > > 1-to-1 correspondence with the original points (which is how you get
> your
> > > cluster assignments), but this back-mapping after clustering isn't
> > > explicitly implemented yet, since that's the core of the IO issue.
> > >
> > > My block on this is my lack of understanding in how the actual ordering
> > of
> > > the points change (or not?) from when they are projected into
> eigenspace
> > > (the Lanczos solver) and when K-means makes its cluster assignments. On
> a
> > > one-node setup the original ordering appears to be preserved through
> all
> > > the
> > > operations, so the labels of the original points can be assigned by
> > giving
> > > original_point[i] the label of projected_point[i], hence the cluster
> > > assignments are easy to determine. For multi-node setups, however, I
> > simply
> > > don't know if this heuristic holds.
> > >
> > > But I believe the immediate issue here is that we're feeding the
> > projected
> > > points to the display, when it should be the original points
> *annotated*
> > > with the cluster assignments from the corresponding projected points.
> The
> > > question is how to shift those assignments over robustly; right now
> it's
> > > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!)
> it's
> > > just the version I have locally :o)
> > >
> > > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <je...@narus.com>
> > wrote:
> > >
> > > > Yes, I expect it is pilot error on my part. The original
> implementation
> > > was
> > > > failing in this manner because I was requesting 5 eigenvectors
> > > (clusters). I
> > > > changed it to 2 and now it displays something but it is not even
> close
> > to
> > > > correct. I think this is because I have not transformed back from
> eigen
> > > > space to vector space. This all relates to the IO issue for the
> > spectral
> > > > clustering code which I don't grok.
> > > >
> > > > The display driver begins with the sample points and generates the
> > > affinity
> > > > matrix using a distance measure. Not clear this is even a correct
> > > > interpretation of that matrix. Then spectral kmeans runs and produces
> 2
> > > > clusters which I display directly. Seems like this number should be
> > more
> > > > like the k in kmeans, and 5 was more realistic given the data. I
> > believe
> > > > there is a missing output transformation to recover the clusters from
> > the
> > > > eigenvectors but I don't know how to do that.
> > > >
> > > > I bet you do :)
> > > >
> > > > -----Original Message-----
> > > > From: Shannon Quinn (JIRA) [mailto:jira@apache.org]
> > > > Sent: Tuesday, May 24, 2011 8:07 AM
> > > > To: dev@mahout.apache.org
> > > > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> example
> > > > fails
> > > >
> > > >
> > > >    [
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
> > > ]
> > > >
> > > > Shannon Quinn commented on MAHOUT-524:
> > > > --------------------------------------
> > > >
> > > > +1, I'm on it.
> > > >
> > > > I'm a little unclear as to the context of the initial Hudson comment:
> > the
> > > > display method is expecting 2D vectors, but getting 5D ones?
> > > >
> > > > > DisplaySpectralKMeans example fails
> > > > > -----------------------------------
> > > > >
> > > > >                 Key: MAHOUT-524
> > > > >                 URL:
> > https://issues.apache.org/jira/browse/MAHOUT-524
> > > > >             Project: Mahout
> > > > >          Issue Type: Bug
> > > > >          Components: Clustering
> > > > >    Affects Versions: 0.4, 0.5
> > > > >            Reporter: Jeff Eastman
> > > > >            Assignee: Jeff Eastman
> > > > >              Labels: clustering, k-means, visualization
> > > > >             Fix For: 0.6
> > > > >
> > > > >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> > > > >
> > > > >
> > > > > I've committed a new display example that attempts to push the
> > standard
> > > > mixture of models data set through spectral k-means. After some
> > tweaking
> > > of
> > > > configuration arguments and a bug fix in EigenCleanupJob it runs
> > spectral
> > > > k-means to completion. The display example is expecting 2-d clustered
> > > points
> > > > and the example is producing 5-d points. Additional I/O work is
> needed
> > > > before this will play with the rest of the clustering algorithms.
> > > >
> > > > --
> > > > This message is automatically generated by JIRA.
> > > > For more information on JIRA, see:
> > > http://www.atlassian.com/software/jira
> > > >
> > >
> >
>

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Ted Dunning <te...@gmail.com>.
Well, it isn't entirely simple, but for suitable distances, D can often be
reverse engineered subject to various isometries like rotation and inversion
that don't change distance.

On Tue, May 24, 2011 at 2:49 PM, Jeff Eastman <je...@narus.com> wrote:

> Since the distance measure is reducing the [2] dimensionality of the Di and
> Dj vectors with a scalar (aij), I don't see how to reconstruct D from A.

RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Jeff Eastman <je...@Narus.com>.
Let me see if I'm following you. In the display example, there are 1100, 2d vectors generated as raw data D which is 1100x2. Then, the preprocessing step uses a distance measure to produce A, which is 1100x1100. They are not really affinities, more like distances, so I may have missed the boat on that step. Since the distance measure is reducing the [2] dimensionality of the Di and Dj vectors with a scalar (aij), I don't see how to reconstruct D from A.

KMeans will cluster all the input vectors in an arbitrary order if on a N>1 cluster and so Di and Dj will lose their index positions in the result. If the D vectors are NamedVectors, with their index as the name, then this will flow through to the clustered points at the output. The order of those points won't bear much relation to the order of the input, but the names will be preserved. KMeans does not mess with the order of the elements within each D vector. I don't know if this is sufficient or if Lanczos does anything similar.

-----Original Message-----
From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf Of Shannon Quinn
Sent: Tuesday, May 24, 2011 2:10 PM
To: dev@mahout.apache.org
Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

You're right, that would give you the affinity matrix. However, the affinity
matrix is an easier beast to tame since the matrix is constructed with all
the points' orders preserved: aff[i][j] is the relationship between
original_point[i] and original_point[j], so for all practical purposes I
treat this as the "original data" (since it's easy to go back and forth
between the two).

Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
ordering of indices. Does the nth point with label y from the result of
K-means correspond to the nth row of the column matrix of eigenvectors? If
so, then does that nth row from the eigenvector matrix also correspond to
the nth original data point (the one represented by proxy by row n and
column n of the affinity matrix)? If both these conditions are true, then
and only then can we say that original_point[n]'s cluster is y.

On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman <je...@narus.com> wrote:

> Would that give you the original data matrix, the clustered data matrix, or
> the clustered affinity matrix? Even with the analogy in mind I'm having
> trouble connecting the dots. Seems like I lost the original data matrix in
> step 1 when I used a distance measure to produce A from it. If the returned
> eigenvectors define Q, then what is the significance of QAQ^-1? And, more
> importantly, if the Q eigenvectors define the clusters in eigenspace, what
> is the inverse transformation?
>
> -----Original Message-----
> From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 12:07 PM
> To: dev@mahout.apache.org
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> That's an excellent analogy! Employing that strategy, would it be possible
> (and not too expensive) to do the QAQ^-1 operation to get the original data
> matrix, after we've clustered the points in eigenspace?
>
> On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <je...@narus.com> wrote:
>
> > For the display example, it is not necessary to cluster the original
> > points. The other clustering display examples only train the clusters and
> do
> > not classify the points. They are drawn first and the cluster centers &
> > radii are superimposed afterwards. Thus I think it is only necessary to
> > back-transform the clusters.
> >
> > My EE gut tells me this is like Fourier transforms between time- and
> > frequency-domains. If this is true then what we need is the inverse
> > transform. Is this a correct analogy?
> >
> > -----Original Message-----
> > From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> > Of Shannon Quinn
> > Sent: Tuesday, May 24, 2011 11:39 AM
> > To: dev@mahout.apache.org
> > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> example
> > fails
> >
> > This is actually something I could use a little expert Hadoop assistance
> > on.
> > The general idea is that the points that are clustered in eigenspace have
> a
> > 1-to-1 correspondence with the original points (which is how you get your
> > cluster assignments), but this back-mapping after clustering isn't
> > explicitly implemented yet, since that's the core of the IO issue.
> >
> > My block on this is my lack of understanding in how the actual ordering
> of
> > the points change (or not?) from when they are projected into eigenspace
> > (the Lanczos solver) and when K-means makes its cluster assignments. On a
> > one-node setup the original ordering appears to be preserved through all
> > the
> > operations, so the labels of the original points can be assigned by
> giving
> > original_point[i] the label of projected_point[i], hence the cluster
> > assignments are easy to determine. For multi-node setups, however, I
> simply
> > don't know if this heuristic holds.
> >
> > But I believe the immediate issue here is that we're feeding the
> projected
> > points to the display, when it should be the original points *annotated*
> > with the cluster assignments from the corresponding projected points. The
> > question is how to shift those assignments over robustly; right now it's
> > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> > just the version I have locally :o)
> >
> > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <je...@narus.com>
> wrote:
> >
> > > Yes, I expect it is pilot error on my part. The original implementation
> > was
> > > failing in this manner because I was requesting 5 eigenvectors
> > (clusters). I
> > > changed it to 2 and now it displays something but it is not even close
> to
> > > correct. I think this is because I have not transformed back from eigen
> > > space to vector space. This all relates to the IO issue for the
> spectral
> > > clustering code which I don't grok.
> > >
> > > The display driver begins with the sample points and generates the
> > affinity
> > > matrix using a distance measure. Not clear this is even a correct
> > > interpretation of that matrix. Then spectral kmeans runs and produces 2
> > > clusters which I display directly. Seems like this number should be
> more
> > > like the k in kmeans, and 5 was more realistic given the data. I
> believe
> > > there is a missing output transformation to recover the clusters from
> the
> > > eigenvectors but I don't know how to do that.
> > >
> > > I bet you do :)
> > >
> > > -----Original Message-----
> > > From: Shannon Quinn (JIRA) [mailto:jira@apache.org]
> > > Sent: Tuesday, May 24, 2011 8:07 AM
> > > To: dev@mahout.apache.org
> > > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> > > fails
> > >
> > >
> > >    [
> > >
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
> > ]
> > >
> > > Shannon Quinn commented on MAHOUT-524:
> > > --------------------------------------
> > >
> > > +1, I'm on it.
> > >
> > > I'm a little unclear as to the context of the initial Hudson comment:
> the
> > > display method is expecting 2D vectors, but getting 5D ones?
> > >
> > > > DisplaySpectralKMeans example fails
> > > > -----------------------------------
> > > >
> > > >                 Key: MAHOUT-524
> > > >                 URL:
> https://issues.apache.org/jira/browse/MAHOUT-524
> > > >             Project: Mahout
> > > >          Issue Type: Bug
> > > >          Components: Clustering
> > > >    Affects Versions: 0.4, 0.5
> > > >            Reporter: Jeff Eastman
> > > >            Assignee: Jeff Eastman
> > > >              Labels: clustering, k-means, visualization
> > > >             Fix For: 0.6
> > > >
> > > >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> > > >
> > > >
> > > > I've committed a new display example that attempts to push the
> standard
> > > mixture of models data set through spectral k-means. After some
> tweaking
> > of
> > > configuration arguments and a bug fix in EigenCleanupJob it runs
> spectral
> > > k-means to completion. The display example is expecting 2-d clustered
> > points
> > > and the example is producing 5-d points. Additional I/O work is needed
> > > before this will play with the rest of the clustering algorithms.
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > For more information on JIRA, see:
> > http://www.atlassian.com/software/jira
> > >
> >
>

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Ted Dunning <te...@gmail.com>.
Ordering matters less than labeling.

And another way to put it is that the affinity or distance matrix A should
have the same labels on the rows AND on the columns as were on the rows of
the original matrix.  Thus, the labels on the rows of Q should be the same
as the original labels.

Forming Q' A Q (not QAQ', btw) only gives us the diagonalized form of A
which is just the affinity matrix of the eigen-representations.  That isn't
all that interesting.


On Tue, May 24, 2011 at 2:09 PM, Shannon Quinn <sq...@gatech.edu> wrote:

> Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
> ordering of indices. Does the nth point with label y from the result of
> K-means correspond to the nth row of the column matrix of eigenvectors?
>

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Shannon Quinn <sq...@gatech.edu>.
You're right, that would give you the affinity matrix. However, the affinity
matrix is an easier beast to tame since the matrix is constructed with all
the points' orders preserved: aff[i][j] is the relationship between
original_point[i] and original_point[j], so for all practical purposes I
treat this as the "original data" (since it's easy to go back and forth
between the two).

Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
ordering of indices. Does the nth point with label y from the result of
K-means correspond to the nth row of the column matrix of eigenvectors? If
so, then does that nth row from the eigenvector matrix also correspond to
the nth original data point (the one represented by proxy by row n and
column n of the affinity matrix)? If both these conditions are true, then
and only then can we say that original_point[n]'s cluster is y.

On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman <je...@narus.com> wrote:

> Would that give you the original data matrix, the clustered data matrix, or
> the clustered affinity matrix? Even with the analogy in mind I'm having
> trouble connecting the dots. Seems like I lost the original data matrix in
> step 1 when I used a distance measure to produce A from it. If the returned
> eigenvectors define Q, then what is the significance of QAQ^-1? And, more
> importantly, if the Q eigenvectors define the clusters in eigenspace, what
> is the inverse transformation?
>
> -----Original Message-----
> From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 12:07 PM
> To: dev@mahout.apache.org
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> That's an excellent analogy! Employing that strategy, would it be possible
> (and not too expensive) to do the QAQ^-1 operation to get the original data
> matrix, after we've clustered the points in eigenspace?
>
> On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <je...@narus.com> wrote:
>
> > For the display example, it is not necessary to cluster the original
> > points. The other clustering display examples only train the clusters and
> do
> > not classify the points. They are drawn first and the cluster centers &
> > radii are superimposed afterwards. Thus I think it is only necessary to
> > back-transform the clusters.
> >
> > My EE gut tells me this is like Fourier transforms between time- and
> > frequency-domains. If this is true then what we need is the inverse
> > transform. Is this a correct analogy?
> >
> > -----Original Message-----
> > From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> > Of Shannon Quinn
> > Sent: Tuesday, May 24, 2011 11:39 AM
> > To: dev@mahout.apache.org
> > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> example
> > fails
> >
> > This is actually something I could use a little expert Hadoop assistance
> > on.
> > The general idea is that the points that are clustered in eigenspace have
> a
> > 1-to-1 correspondence with the original points (which is how you get your
> > cluster assignments), but this back-mapping after clustering isn't
> > explicitly implemented yet, since that's the core of the IO issue.
> >
> > My block on this is my lack of understanding in how the actual ordering
> of
> > the points change (or not?) from when they are projected into eigenspace
> > (the Lanczos solver) and when K-means makes its cluster assignments. On a
> > one-node setup the original ordering appears to be preserved through all
> > the
> > operations, so the labels of the original points can be assigned by
> giving
> > original_point[i] the label of projected_point[i], hence the cluster
> > assignments are easy to determine. For multi-node setups, however, I
> simply
> > don't know if this heuristic holds.
> >
> > But I believe the immediate issue here is that we're feeding the
> projected
> > points to the display, when it should be the original points *annotated*
> > with the cluster assignments from the corresponding projected points. The
> > question is how to shift those assignments over robustly; right now it's
> > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> > just the version I have locally :o)
> >
> > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <je...@narus.com>
> wrote:
> >
> > > Yes, I expect it is pilot error on my part. The original implementation
> > was
> > > failing in this manner because I was requesting 5 eigenvectors
> > (clusters). I
> > > changed it to 2 and now it displays something but it is not even close
> to
> > > correct. I think this is because I have not transformed back from eigen
> > > space to vector space. This all relates to the IO issue for the
> spectral
> > > clustering code which I don't grok.
> > >
> > > The display driver begins with the sample points and generates the
> > affinity
> > > matrix using a distance measure. Not clear this is even a correct
> > > interpretation of that matrix. Then spectral kmeans runs and produces 2
> > > clusters which I display directly. Seems like this number should be
> more
> > > like the k in kmeans, and 5 was more realistic given the data. I
> believe
> > > there is a missing output transformation to recover the clusters from
> the
> > > eigenvectors but I don't know how to do that.
> > >
> > > I bet you do :)
> > >
> > > -----Original Message-----
> > > From: Shannon Quinn (JIRA) [mailto:jira@apache.org]
> > > Sent: Tuesday, May 24, 2011 8:07 AM
> > > To: dev@mahout.apache.org
> > > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> > > fails
> > >
> > >
> > >    [
> > >
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
> > ]
> > >
> > > Shannon Quinn commented on MAHOUT-524:
> > > --------------------------------------
> > >
> > > +1, I'm on it.
> > >
> > > I'm a little unclear as to the context of the initial Hudson comment:
> the
> > > display method is expecting 2D vectors, but getting 5D ones?
> > >
> > > > DisplaySpectralKMeans example fails
> > > > -----------------------------------
> > > >
> > > >                 Key: MAHOUT-524
> > > >                 URL:
> https://issues.apache.org/jira/browse/MAHOUT-524
> > > >             Project: Mahout
> > > >          Issue Type: Bug
> > > >          Components: Clustering
> > > >    Affects Versions: 0.4, 0.5
> > > >            Reporter: Jeff Eastman
> > > >            Assignee: Jeff Eastman
> > > >              Labels: clustering, k-means, visualization
> > > >             Fix For: 0.6
> > > >
> > > >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> > > >
> > > >
> > > > I've committed a new display example that attempts to push the
> standard
> > > mixture of models data set through spectral k-means. After some
> tweaking
> > of
> > > configuration arguments and a bug fix in EigenCleanupJob it runs
> spectral
> > > k-means to completion. The display example is expecting 2-d clustered
> > points
> > > and the example is producing 5-d points. Additional I/O work is needed
> > > before this will play with the rest of the clustering algorithms.
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > For more information on JIRA, see:
> > http://www.atlassian.com/software/jira
> > >
> >
>

RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Jeff Eastman <je...@Narus.com>.
Would that give you the original data matrix, the clustered data matrix, or the clustered affinity matrix? Even with the analogy in mind I'm having trouble connecting the dots. Seems like I lost the original data matrix in step 1 when I used a distance measure to produce A from it. If the returned eigenvectors define Q, then what is the significance of QAQ^-1? And, more importantly, if the Q eigenvectors define the clusters in eigenspace, what is the inverse transformation?

-----Original Message-----
From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf Of Shannon Quinn
Sent: Tuesday, May 24, 2011 12:07 PM
To: dev@mahout.apache.org
Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

That's an excellent analogy! Employing that strategy, would it be possible
(and not too expensive) to do the QAQ^-1 operation to get the original data
matrix, after we've clustered the points in eigenspace?

On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <je...@narus.com> wrote:

> For the display example, it is not necessary to cluster the original
> points. The other clustering display examples only train the clusters and do
> not classify the points. They are drawn first and the cluster centers &
> radii are superimposed afterwards. Thus I think it is only necessary to
> back-transform the clusters.
>
> My EE gut tells me this is like Fourier transforms between time- and
> frequency-domains. If this is true then what we need is the inverse
> transform. Is this a correct analogy?
>
> -----Original Message-----
> From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 11:39 AM
> To: dev@mahout.apache.org
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> This is actually something I could use a little expert Hadoop assistance
> on.
> The general idea is that the points that are clustered in eigenspace have a
> 1-to-1 correspondence with the original points (which is how you get your
> cluster assignments), but this back-mapping after clustering isn't
> explicitly implemented yet, since that's the core of the IO issue.
>
> My block on this is my lack of understanding in how the actual ordering of
> the points change (or not?) from when they are projected into eigenspace
> (the Lanczos solver) and when K-means makes its cluster assignments. On a
> one-node setup the original ordering appears to be preserved through all
> the
> operations, so the labels of the original points can be assigned by giving
> original_point[i] the label of projected_point[i], hence the cluster
> assignments are easy to determine. For multi-node setups, however, I simply
> don't know if this heuristic holds.
>
> But I believe the immediate issue here is that we're feeding the projected
> points to the display, when it should be the original points *annotated*
> with the cluster assignments from the corresponding projected points. The
> question is how to shift those assignments over robustly; right now it's
> just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> just the version I have locally :o)
>
> On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <je...@narus.com> wrote:
>
> > Yes, I expect it is pilot error on my part. The original implementation
> was
> > failing in this manner because I was requesting 5 eigenvectors
> (clusters). I
> > changed it to 2 and now it displays something but it is not even close to
> > correct. I think this is because I have not transformed back from eigen
> > space to vector space. This all relates to the IO issue for the spectral
> > clustering code which I don't grok.
> >
> > The display driver begins with the sample points and generates the
> affinity
> > matrix using a distance measure. Not clear this is even a correct
> > interpretation of that matrix. Then spectral kmeans runs and produces 2
> > clusters which I display directly. Seems like this number should be more
> > like the k in kmeans, and 5 was more realistic given the data. I believe
> > there is a missing output transformation to recover the clusters from the
> > eigenvectors but I don't know how to do that.
> >
> > I bet you do :)
> >
> > -----Original Message-----
> > From: Shannon Quinn (JIRA) [mailto:jira@apache.org]
> > Sent: Tuesday, May 24, 2011 8:07 AM
> > To: dev@mahout.apache.org
> > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> > fails
> >
> >
> >    [
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
> ]
> >
> > Shannon Quinn commented on MAHOUT-524:
> > --------------------------------------
> >
> > +1, I'm on it.
> >
> > I'm a little unclear as to the context of the initial Hudson comment: the
> > display method is expecting 2D vectors, but getting 5D ones?
> >
> > > DisplaySpectralKMeans example fails
> > > -----------------------------------
> > >
> > >                 Key: MAHOUT-524
> > >                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
> > >             Project: Mahout
> > >          Issue Type: Bug
> > >          Components: Clustering
> > >    Affects Versions: 0.4, 0.5
> > >            Reporter: Jeff Eastman
> > >            Assignee: Jeff Eastman
> > >              Labels: clustering, k-means, visualization
> > >             Fix For: 0.6
> > >
> > >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> > >
> > >
> > > I've committed a new display example that attempts to push the standard
> > mixture of models data set through spectral k-means. After some tweaking
> of
> > configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> > k-means to completion. The display example is expecting 2-d clustered
> points
> > and the example is producing 5-d points. Additional I/O work is needed
> > before this will play with the rest of the clustering algorithms.
> >
> > --
> > This message is automatically generated by JIRA.
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Shannon Quinn <sq...@gatech.edu>.
That's an excellent analogy! Employing that strategy, would it be possible
(and not too expensive) to do the QAQ^-1 operation to get the original data
matrix, after we've clustered the points in eigenspace?

On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <je...@narus.com> wrote:

> For the display example, it is not necessary to cluster the original
> points. The other clustering display examples only train the clusters and do
> not classify the points. They are drawn first and the cluster centers &
> radii are superimposed afterwards. Thus I think it is only necessary to
> back-transform the clusters.
>
> My EE gut tells me this is like Fourier transforms between time- and
> frequency-domains. If this is true then what we need is the inverse
> transform. Is this a correct analogy?
>
> -----Original Message-----
> From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 11:39 AM
> To: dev@mahout.apache.org
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> This is actually something I could use a little expert Hadoop assistance
> on.
> The general idea is that the points that are clustered in eigenspace have a
> 1-to-1 correspondence with the original points (which is how you get your
> cluster assignments), but this back-mapping after clustering isn't
> explicitly implemented yet, since that's the core of the IO issue.
>
> My block on this is my lack of understanding in how the actual ordering of
> the points change (or not?) from when they are projected into eigenspace
> (the Lanczos solver) and when K-means makes its cluster assignments. On a
> one-node setup the original ordering appears to be preserved through all
> the
> operations, so the labels of the original points can be assigned by giving
> original_point[i] the label of projected_point[i], hence the cluster
> assignments are easy to determine. For multi-node setups, however, I simply
> don't know if this heuristic holds.
>
> But I believe the immediate issue here is that we're feeding the projected
> points to the display, when it should be the original points *annotated*
> with the cluster assignments from the corresponding projected points. The
> question is how to shift those assignments over robustly; right now it's
> just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> just the version I have locally :o)
>
> On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <je...@narus.com> wrote:
>
> > Yes, I expect it is pilot error on my part. The original implementation
> was
> > failing in this manner because I was requesting 5 eigenvectors
> (clusters). I
> > changed it to 2 and now it displays something but it is not even close to
> > correct. I think this is because I have not transformed back from eigen
> > space to vector space. This all relates to the IO issue for the spectral
> > clustering code which I don't grok.
> >
> > The display driver begins with the sample points and generates the
> affinity
> > matrix using a distance measure. Not clear this is even a correct
> > interpretation of that matrix. Then spectral kmeans runs and produces 2
> > clusters which I display directly. Seems like this number should be more
> > like the k in kmeans, and 5 was more realistic given the data. I believe
> > there is a missing output transformation to recover the clusters from the
> > eigenvectors but I don't know how to do that.
> >
> > I bet you do :)
> >
> > -----Original Message-----
> > From: Shannon Quinn (JIRA) [mailto:jira@apache.org]
> > Sent: Tuesday, May 24, 2011 8:07 AM
> > To: dev@mahout.apache.org
> > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> > fails
> >
> >
> >    [
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
> ]
> >
> > Shannon Quinn commented on MAHOUT-524:
> > --------------------------------------
> >
> > +1, I'm on it.
> >
> > I'm a little unclear as to the context of the initial Hudson comment: the
> > display method is expecting 2D vectors, but getting 5D ones?
> >
> > > DisplaySpectralKMeans example fails
> > > -----------------------------------
> > >
> > >                 Key: MAHOUT-524
> > >                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
> > >             Project: Mahout
> > >          Issue Type: Bug
> > >          Components: Clustering
> > >    Affects Versions: 0.4, 0.5
> > >            Reporter: Jeff Eastman
> > >            Assignee: Jeff Eastman
> > >              Labels: clustering, k-means, visualization
> > >             Fix For: 0.6
> > >
> > >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> > >
> > >
> > > I've committed a new display example that attempts to push the standard
> > mixture of models data set through spectral k-means. After some tweaking
> of
> > configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> > k-means to completion. The display example is expecting 2-d clustered
> points
> > and the example is producing 5-d points. Additional I/O work is needed
> > before this will play with the rest of the clustering algorithms.
> >
> > --
> > This message is automatically generated by JIRA.
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>

RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Jeff Eastman <je...@Narus.com>.
For the display example, it is not necessary to cluster the original points. The other clustering display examples only train the clusters and do not classify the points. They are drawn first and the cluster centers & radii are superimposed afterwards. Thus I think it is only necessary to back-transform the clusters. 

My EE gut tells me this is like Fourier transforms between time- and frequency-domains. If this is true then what we need is the inverse transform. Is this a correct analogy?

-----Original Message-----
From: squinn.squinn@gmail.com [mailto:squinn.squinn@gmail.com] On Behalf Of Shannon Quinn
Sent: Tuesday, May 24, 2011 11:39 AM
To: dev@mahout.apache.org
Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

This is actually something I could use a little expert Hadoop assistance on.
The general idea is that the points that are clustered in eigenspace have a
1-to-1 correspondence with the original points (which is how you get your
cluster assignments), but this back-mapping after clustering isn't
explicitly implemented yet, since that's the core of the IO issue.

My block on this is my lack of understanding in how the actual ordering of
the points change (or not?) from when they are projected into eigenspace
(the Lanczos solver) and when K-means makes its cluster assignments. On a
one-node setup the original ordering appears to be preserved through all the
operations, so the labels of the original points can be assigned by giving
original_point[i] the label of projected_point[i], hence the cluster
assignments are easy to determine. For multi-node setups, however, I simply
don't know if this heuristic holds.

But I believe the immediate issue here is that we're feeding the projected
points to the display, when it should be the original points *annotated*
with the cluster assignments from the corresponding projected points. The
question is how to shift those assignments over robustly; right now it's
just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
just the version I have locally :o)

On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <je...@narus.com> wrote:

> Yes, I expect it is pilot error on my part. The original implementation was
> failing in this manner because I was requesting 5 eigenvectors (clusters). I
> changed it to 2 and now it displays something but it is not even close to
> correct. I think this is because I have not transformed back from eigen
> space to vector space. This all relates to the IO issue for the spectral
> clustering code which I don't grok.
>
> The display driver begins with the sample points and generates the affinity
> matrix using a distance measure. Not clear this is even a correct
> interpretation of that matrix. Then spectral kmeans runs and produces 2
> clusters which I display directly. Seems like this number should be more
> like the k in kmeans, and 5 was more realistic given the data. I believe
> there is a missing output transformation to recover the clusters from the
> eigenvectors but I don't know how to do that.
>
> I bet you do :)
>
> -----Original Message-----
> From: Shannon Quinn (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, May 24, 2011 8:07 AM
> To: dev@mahout.apache.org
> Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608]
>
> Shannon Quinn commented on MAHOUT-524:
> --------------------------------------
>
> +1, I'm on it.
>
> I'm a little unclear as to the context of the initial Hudson comment: the
> display method is expecting 2D vectors, but getting 5D ones?
>
> > DisplaySpectralKMeans example fails
> > -----------------------------------
> >
> >                 Key: MAHOUT-524
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
> >             Project: Mahout
> >          Issue Type: Bug
> >          Components: Clustering
> >    Affects Versions: 0.4, 0.5
> >            Reporter: Jeff Eastman
> >            Assignee: Jeff Eastman
> >              Labels: clustering, k-means, visualization
> >             Fix For: 0.6
> >
> >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> >
> >
> > I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed
> before this will play with the rest of the clustering algorithms.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Shannon Quinn <sq...@gatech.edu>.
This is actually something I could use a little expert Hadoop assistance on.
The general idea is that the points that are clustered in eigenspace have a
1-to-1 correspondence with the original points (which is how you get your
cluster assignments), but this back-mapping after clustering isn't
explicitly implemented yet, since that's the core of the IO issue.

My block on this is my lack of understanding in how the actual ordering of
the points change (or not?) from when they are projected into eigenspace
(the Lanczos solver) and when K-means makes its cluster assignments. On a
one-node setup the original ordering appears to be preserved through all the
operations, so the labels of the original points can be assigned by giving
original_point[i] the label of projected_point[i], hence the cluster
assignments are easy to determine. For multi-node setups, however, I simply
don't know if this heuristic holds.

But I believe the immediate issue here is that we're feeding the projected
points to the display, when it should be the original points *annotated*
with the cluster assignments from the corresponding projected points. The
question is how to shift those assignments over robustly; right now it's
just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
just the version I have locally :o)

On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <je...@narus.com> wrote:

> Yes, I expect it is pilot error on my part. The original implementation was
> failing in this manner because I was requesting 5 eigenvectors (clusters). I
> changed it to 2 and now it displays something but it is not even close to
> correct. I think this is because I have not transformed back from eigen
> space to vector space. This all relates to the IO issue for the spectral
> clustering code which I don't grok.
>
> The display driver begins with the sample points and generates the affinity
> matrix using a distance measure. Not clear this is even a correct
> interpretation of that matrix. Then spectral kmeans runs and produces 2
> clusters which I display directly. Seems like this number should be more
> like the k in kmeans, and 5 was more realistic given the data. I believe
> there is a missing output transformation to recover the clusters from the
> eigenvectors but I don't know how to do that.
>
> I bet you do :)
>
> -----Original Message-----
> From: Shannon Quinn (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, May 24, 2011 8:07 AM
> To: dev@mahout.apache.org
> Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608]
>
> Shannon Quinn commented on MAHOUT-524:
> --------------------------------------
>
> +1, I'm on it.
>
> I'm a little unclear as to the context of the initial Hudson comment: the
> display method is expecting 2D vectors, but getting 5D ones?
>
> > DisplaySpectralKMeans example fails
> > -----------------------------------
> >
> >                 Key: MAHOUT-524
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
> >             Project: Mahout
> >          Issue Type: Bug
> >          Components: Clustering
> >    Affects Versions: 0.4, 0.5
> >            Reporter: Jeff Eastman
> >            Assignee: Jeff Eastman
> >              Labels: clustering, k-means, visualization
> >             Fix For: 0.6
> >
> >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> >
> >
> > I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed
> before this will play with the rest of the clustering algorithms.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

Posted by Jeff Eastman <je...@Narus.com>.
Yes, I expect it is pilot error on my part. The original implementation was failing in this manner because I was requesting 5 eigenvectors (clusters). I changed it to 2 and now it displays something but it is not even close to correct. I think this is because I have not transformed back from eigen space to vector space. This all relates to the IO issue for the spectral clustering code which I don't grok.

The display driver begins with the sample points and generates the affinity matrix using a distance measure. Not clear this is even a correct interpretation of that matrix. Then spectral kmeans runs and produces 2 clusters which I display directly. Seems like this number should be more like the k in kmeans, and 5 was more realistic given the data. I believe there is a missing output transformation to recover the clusters from the eigenvectors but I don't know how to do that.

I bet you do :)

-----Original Message-----
From: Shannon Quinn (JIRA) [mailto:jira@apache.org] 
Sent: Tuesday, May 24, 2011 8:07 AM
To: dev@mahout.apache.org
Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails


    [ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608 ] 

Shannon Quinn commented on MAHOUT-524:
--------------------------------------

+1, I'm on it.

I'm a little unclear as to the context of the initial Hudson comment: the display method is expecting 2D vectors, but getting 5D ones?

> DisplaySpectralKMeans example fails
> -----------------------------------
>
>                 Key: MAHOUT-524
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.4, 0.5
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>              Labels: clustering, k-means, visualization
>             Fix For: 0.6
>
>         Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking of configuration arguments and a bug fix in EigenCleanupJob it runs spectral k-means to completion. The display example is expecting 2-d clustered points and the example is producing 5-d points. Additional I/O work is needed before this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira