You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2012/08/27 05:01:41 UTC

Visualization of word clusters

This is a really cool 3D visualization of a tag cloud with distances:
http://langtech.jrc.ec.europa.eu/Pictures/ThemeScape-overview_EP259.pdf

What is the sequence to make this? I'm thinking:
1) Create a document/term matrix.
2) Random Projection of term vectors onto 2D.
       2D distances match N-dimensional distances between terms.
3) Do SVD of term vectors.
4) Use first feature vector to select height of each term.
       Or, norm of the feature vector X singular values.

After this, the mapping software does the rest of the work via topo
and word placement algorithms.

-- 
Lance Norskog
goksron@gmail.com

Re: Visualization of word clusters

Posted by Ted Dunning <te...@gmail.com>.
Here is some pretty old work that did the same sort of thing.  The self
organizing map (SOM) is an interesting alternative to MDS since it allows
mapping a low dimensional approximate manifold to a linear space.  The
basic idea is that it preserves close distances and doesn't much care about
distances to far away points.  Similar results should be obtainable using
local linear embedding (LLE)

http://comminfo.rutgers.edu/~aspoerri/Teaching/InfoVisResources/papers/UIR-1996-01-Card-CGA-VisSurvey.pdf

http://aclweb.org/anthology-new/X/X96/X96-1032.pdf

http://www.tgc.com/dsstar/99/0518/100758.html



On Mon, Aug 27, 2012 at 12:58 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:

> Mds is usually a way to visualize it close to truth. The rest looks like
> countours of a regular 2d kernel density estimate.
> On Aug 26, 2012 8:02 PM, "Lance Norskog" <go...@gmail.com> wrote:
>
> > This is a really cool 3D visualization of a tag cloud with distances:
> > http://langtech.jrc.ec.europa.eu/Pictures/ThemeScape-overview_EP259.pdf
> >
> > What is the sequence to make this? I'm thinking:
> > 1) Create a document/term matrix.
> > 2) Random Projection of term vectors onto 2D.
> >        2D distances match N-dimensional distances between terms.
> > 3) Do SVD of term vectors.
> > 4) Use first feature vector to select height of each term.
> >        Or, norm of the feature vector X singular values.
> >
> > After this, the mapping software does the rest of the work via topo
> > and word placement algorithms.
> >
> > --
> > Lance Norskog
> > goksron@gmail.com
> >
>

Re: Visualization of word clusters

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Mds is usually a way to visualize it close to truth. The rest looks like
countours of a regular 2d kernel density estimate.
On Aug 26, 2012 8:02 PM, "Lance Norskog" <go...@gmail.com> wrote:

> This is a really cool 3D visualization of a tag cloud with distances:
> http://langtech.jrc.ec.europa.eu/Pictures/ThemeScape-overview_EP259.pdf
>
> What is the sequence to make this? I'm thinking:
> 1) Create a document/term matrix.
> 2) Random Projection of term vectors onto 2D.
>        2D distances match N-dimensional distances between terms.
> 3) Do SVD of term vectors.
> 4) Use first feature vector to select height of each term.
>        Or, norm of the feature vector X singular values.
>
> After this, the mapping software does the rest of the work via topo
> and word placement algorithms.
>
> --
> Lance Norskog
> goksron@gmail.com
>