You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by kaching <wa...@o2.pl> on 2016/10/07 08:05:02 UTC

MLlib: word2vec - words vectors into feature vector

Hi. How exacly MLlib implementation of word2vec converts word vectors 
into one feature vector per row?

           TEXT
[Hi, I, heard, ab..]
[I, wish, Java, c..]
[Logistic, regres.]

             | word2vec

             V

WORD                       VECTOR
heard            [0.14950960874557...|
are                [-0.1639076173305...|
neat              [0.13949351012706...|
classes          [0.03703496977686...|
I                    [-0.0189154129475...|
regression    [0.15298652648925...|
Logistic         [-0.1270201653242...|
Spark            [-0.0535793155431...|
could            [0.12216471135616...|
use               [0.08246973901987...|
Hi                  [0.16548289358615...|
models         [-0.0568316541612...|
case             [0.11626788973808...|
about           [-0.1500445008277...|
Java             [-0.0407485179603...|
wish             [0.11882393807172...|

                 | HOW?

                 V

         TEXT                                RESULT
[Hi, I, heard, ab... ]     [0.01849065460264...|
[I, wish, Java, c...  ]     [0.05958533100783...|
[Logistic, regres...]     [-0.0110558800399...|

Is there a way to change this default method?


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: MLlib: word2vec - words vectors into feature vector

Posted by Sean Owen <so...@cloudera.com>.

It's just the average of the word vectors, for all words in the text.

On Fri, Oct 7, 2016 at 9:04 AM kaching <wa...@o2.pl> wrote:

> Hi. How exacly MLlib implementation of word2vec converts word vectors
> into one feature vector per row?
>
>            TEXT
> [Hi, I, heard, ab..]
> [I, wish, Java, c..]
> [Logistic, regres.]
>
>              | word2vec
>
>              V
>
> WORD                       VECTOR
> heard            [0.14950960874557...|
> are                [-0.1639076173305...|
> neat              [0.13949351012706...|
> classes          [0.03703496977686...|
> I                    [-0.0189154129475...|
> regression    [0.15298652648925...|
> Logistic         [-0.1270201653242...|
> Spark            [-0.0535793155431...|
> could            [0.12216471135616...|
> use               [0.08246973901987...|
> Hi                  [0.16548289358615...|
> models         [-0.0568316541612...|
> case             [0.11626788973808...|
> about           [-0.1500445008277...|
> Java             [-0.0407485179603...|
> wish             [0.11882393807172...|
>
>                  | HOW?
>
>                  V
>
>          TEXT                                RESULT
> [Hi, I, heard, ab... ]     [0.01849065460264...|
> [I, wish, Java, c...  ]     [0.05958533100783...|
> [Logistic, regres...]     [-0.0110558800399...|
>
> Is there a way to change this default method?
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>