You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2012/08/27 05:01:41 UTC
Visualization of word clusters
This is a really cool 3D visualization of a tag cloud with distances:
http://langtech.jrc.ec.europa.eu/Pictures/ThemeScape-overview_EP259.pdf
What is the sequence to make this? I'm thinking:
1) Create a document/term matrix.
2) Random Projection of term vectors onto 2D.
2D distances match N-dimensional distances between terms.
3) Do SVD of term vectors.
4) Use first feature vector to select height of each term.
Or, norm of the feature vector X singular values.
After this, the mapping software does the rest of the work via topo
and word placement algorithms.
--
Lance Norskog
goksron@gmail.com
Re: Visualization of word clusters
Posted by Ted Dunning <te...@gmail.com>.
Here is some pretty old work that did the same sort of thing. The self
organizing map (SOM) is an interesting alternative to MDS since it allows
mapping a low dimensional approximate manifold to a linear space. The
basic idea is that it preserves close distances and doesn't much care about
distances to far away points. Similar results should be obtainable using
local linear embedding (LLE)
http://comminfo.rutgers.edu/~aspoerri/Teaching/InfoVisResources/papers/UIR-1996-01-Card-CGA-VisSurvey.pdf
http://aclweb.org/anthology-new/X/X96/X96-1032.pdf
http://www.tgc.com/dsstar/99/0518/100758.html
On Mon, Aug 27, 2012 at 12:58 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:
> Mds is usually a way to visualize it close to truth. The rest looks like
> countours of a regular 2d kernel density estimate.
> On Aug 26, 2012 8:02 PM, "Lance Norskog" <go...@gmail.com> wrote:
>
> > This is a really cool 3D visualization of a tag cloud with distances:
> > http://langtech.jrc.ec.europa.eu/Pictures/ThemeScape-overview_EP259.pdf
> >
> > What is the sequence to make this? I'm thinking:
> > 1) Create a document/term matrix.
> > 2) Random Projection of term vectors onto 2D.
> > 2D distances match N-dimensional distances between terms.
> > 3) Do SVD of term vectors.
> > 4) Use first feature vector to select height of each term.
> > Or, norm of the feature vector X singular values.
> >
> > After this, the mapping software does the rest of the work via topo
> > and word placement algorithms.
> >
> > --
> > Lance Norskog
> > goksron@gmail.com
> >
>
Re: Visualization of word clusters
Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Mds is usually a way to visualize it close to truth. The rest looks like
countours of a regular 2d kernel density estimate.
On Aug 26, 2012 8:02 PM, "Lance Norskog" <go...@gmail.com> wrote:
> This is a really cool 3D visualization of a tag cloud with distances:
> http://langtech.jrc.ec.europa.eu/Pictures/ThemeScape-overview_EP259.pdf
>
> What is the sequence to make this? I'm thinking:
> 1) Create a document/term matrix.
> 2) Random Projection of term vectors onto 2D.
> 2D distances match N-dimensional distances between terms.
> 3) Do SVD of term vectors.
> 4) Use first feature vector to select height of each term.
> Or, norm of the feature vector X singular values.
>
> After this, the mapping software does the rest of the work via topo
> and word placement algorithms.
>
> --
> Lance Norskog
> goksron@gmail.com
>