You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Koji Sekiguchi <ko...@r.email.ne.jp> on 2014/11/20 10:10:58 UTC

[ANN] word2vec for Lucene

Hello,

It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
available at https://github.com/kojisekig/word2vec-lucene .

As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.

Thank you,

Koji
-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

Re: [ANN] word2vec for Lucene

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi Joseph,

Thank you for asking. If you want to do it in the interactive sense,
it won't work well practically because it takes several minutes for learning.

If you accept working in batch sense, the feature can be implemented,
but I've not done it yet. I have the open ticket for that:

accept filter query
https://github.com/kojisekig/word2vec-lucene/issues/2

Thanks,

Koji

(2014/11/21 8:22), Joseph Obernberger wrote:
> Hi Koji - is it possible to execute word2vec on a subset of documents from
> Solr?  -  ie could I run a query, get back the top n results and pass only
> those to word2vec?
> Will this work with Solr Cloud?
>
> Thank you!
>
> -Joe
>
> On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht <pa...@hoplahup.net> wrote:
>
>> As far as I could tell, word2vec seems more mathematical, which is rather
>> nice.
>> At least I see more transparent math in the web-page.
>> Maybe this helps a bit?
>>
>> SemanticVectors has always rather pleasant for the LSI/LSA-like approach,
>> but precisely this is mathematically opaque.
>> Maybe it's more a question of presentation.
>>
>> Paul
>>
>>
>> On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>>
>>> Hi Paul,
>>>
>>> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
>>> But word vectors that are produced by word2vec have interesting
>> properties.
>>>
>>> Here is the description of the original word2vec web site:
>>>
>>>
>> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
>>> Interesting properties of the word vectors
>>> It was recently shown that the word vectors capture many linguistic
>> regularities, for example vector
>>> operations vector('Paris') - vector('France') + vector('Italy') results
>> in a vector that is very
>>> close to vector('Rome'), and vector('king') - vector('man') +
>> vector('woman') is close to
>>> vector('queen')
>>>
>>> Thanks,
>>>
>>> Koji
>>>
>>>
>>> (2014/11/20 20:01), Paul Libbrecht wrote:
>>>> Hello Koji,
>>>>
>>>> how would you compare that to SemanticVectors?
>>>>
>>>> paul
>>>>
>>>> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> It's my pleasure to share that I have an interesting tool "word2vec
>> for Lucene"
>>>>> available at https://github.com/kojisekig/word2vec-lucene .
>>>>>
>>>>> As you can imagine, you can use "word2vec for Lucene" to extract word
>> vectors from Lucene index.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Koji
>>>>> --
>>>>>
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>


-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

Re: [ANN] word2vec for Lucene

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi Joseph,

Thank you for asking. If you want to do it in the interactive sense,
it won't work well practically because it takes several minutes for learning.

If you accept working in batch sense, the feature can be implemented,
but I've not done it yet. I have the open ticket for that:

accept filter query
https://github.com/kojisekig/word2vec-lucene/issues/2

Thanks,

Koji

(2014/11/21 8:22), Joseph Obernberger wrote:
> Hi Koji - is it possible to execute word2vec on a subset of documents from
> Solr?  -  ie could I run a query, get back the top n results and pass only
> those to word2vec?
> Will this work with Solr Cloud?
>
> Thank you!
>
> -Joe
>
> On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht <pa...@hoplahup.net> wrote:
>
>> As far as I could tell, word2vec seems more mathematical, which is rather
>> nice.
>> At least I see more transparent math in the web-page.
>> Maybe this helps a bit?
>>
>> SemanticVectors has always rather pleasant for the LSI/LSA-like approach,
>> but precisely this is mathematically opaque.
>> Maybe it's more a question of presentation.
>>
>> Paul
>>
>>
>> On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>>
>>> Hi Paul,
>>>
>>> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
>>> But word vectors that are produced by word2vec have interesting
>> properties.
>>>
>>> Here is the description of the original word2vec web site:
>>>
>>>
>> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
>>> Interesting properties of the word vectors
>>> It was recently shown that the word vectors capture many linguistic
>> regularities, for example vector
>>> operations vector('Paris') - vector('France') + vector('Italy') results
>> in a vector that is very
>>> close to vector('Rome'), and vector('king') - vector('man') +
>> vector('woman') is close to
>>> vector('queen')
>>>
>>> Thanks,
>>>
>>> Koji
>>>
>>>
>>> (2014/11/20 20:01), Paul Libbrecht wrote:
>>>> Hello Koji,
>>>>
>>>> how would you compare that to SemanticVectors?
>>>>
>>>> paul
>>>>
>>>> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> It's my pleasure to share that I have an interesting tool "word2vec
>> for Lucene"
>>>>> available at https://github.com/kojisekig/word2vec-lucene .
>>>>>
>>>>> As you can imagine, you can use "word2vec for Lucene" to extract word
>> vectors from Lucene index.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Koji
>>>>> --
>>>>>
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>


-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [ANN] word2vec for Lucene

Posted by Joseph Obernberger <jo...@gmail.com>.
Hi Koji - is it possible to execute word2vec on a subset of documents from
Solr?  -  ie could I run a query, get back the top n results and pass only
those to word2vec?
Will this work with Solr Cloud?

Thank you!

-Joe

On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht <pa...@hoplahup.net> wrote:

> As far as I could tell, word2vec seems more mathematical, which is rather
> nice.
> At least I see more transparent math in the web-page.
> Maybe this helps a bit?
>
> SemanticVectors has always rather pleasant for the LSI/LSA-like approach,
> but precisely this is mathematically opaque.
> Maybe it's more a question of presentation.
>
> Paul
>
>
> On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>
> > Hi Paul,
> >
> > I cannot compare it to SemanticVectors as I don't know SemanticVectors.
> > But word vectors that are produced by word2vec have interesting
> properties.
> >
> > Here is the description of the original word2vec web site:
> >
> >
> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
> > Interesting properties of the word vectors
> > It was recently shown that the word vectors capture many linguistic
> regularities, for example vector
> > operations vector('Paris') - vector('France') + vector('Italy') results
> in a vector that is very
> > close to vector('Rome'), and vector('king') - vector('man') +
> vector('woman') is close to
> > vector('queen')
> >
> > Thanks,
> >
> > Koji
> >
> >
> > (2014/11/20 20:01), Paul Libbrecht wrote:
> >> Hello Koji,
> >>
> >> how would you compare that to SemanticVectors?
> >>
> >> paul
> >>
> >> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> >>
> >>> Hello,
> >>>
> >>> It's my pleasure to share that I have an interesting tool "word2vec
> for Lucene"
> >>> available at https://github.com/kojisekig/word2vec-lucene .
> >>>
> >>> As you can imagine, you can use "word2vec for Lucene" to extract word
> vectors from Lucene index.
> >>>
> >>> Thank you,
> >>>
> >>> Koji
> >>> --
> >>>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> >
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>

Re: [ANN] word2vec for Lucene

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Thanks Glen for the URL. I'd like to check it when I am available.

Thanks Paul for giving me the difference between them. I like your description!

Koji

(2014/11/21 2:18), Paul Libbrecht wrote:
> As far as I could tell, word2vec seems more mathematical, which is rather nice.
> At least I see more transparent math in the web-page.
> Maybe this helps a bit?
> 
> SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque.
> Maybe it's more a question of presentation.
> 
> Paul
> 
> 
> On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> 
>> Hi Paul,
>>
>> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
>> But word vectors that are produced by word2vec have interesting properties.
>>
>> Here is the description of the original word2vec web site:
>>
>> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
>> Interesting properties of the word vectors
>> It was recently shown that the word vectors capture many linguistic regularities, for example vector
>> operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very
>> close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to
>> vector('queen')
>>
>> Thanks,
>>
>> Koji
>>
>>
>> (2014/11/20 20:01), Paul Libbrecht wrote:
>>> Hello Koji,
>>>
>>> how would you compare that to SemanticVectors?
>>>
>>> paul
>>>
>>> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>>>
>>>> Hello,
>>>>
>>>> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
>>>> available at https://github.com/kojisekig/word2vec-lucene .
>>>>
>>>> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
>>>>
>>>> Thank you,
>>>>
>>>> Koji
>>>> -- 
>>>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> -- 
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

Re: [ANN] word2vec for Lucene

Posted by Joseph Obernberger <jo...@gmail.com>.
Hi Koji - is it possible to execute word2vec on a subset of documents from
Solr?  -  ie could I run a query, get back the top n results and pass only
those to word2vec?
Will this work with Solr Cloud?

Thank you!

-Joe

On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht <pa...@hoplahup.net> wrote:

> As far as I could tell, word2vec seems more mathematical, which is rather
> nice.
> At least I see more transparent math in the web-page.
> Maybe this helps a bit?
>
> SemanticVectors has always rather pleasant for the LSI/LSA-like approach,
> but precisely this is mathematically opaque.
> Maybe it's more a question of presentation.
>
> Paul
>
>
> On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>
> > Hi Paul,
> >
> > I cannot compare it to SemanticVectors as I don't know SemanticVectors.
> > But word vectors that are produced by word2vec have interesting
> properties.
> >
> > Here is the description of the original word2vec web site:
> >
> >
> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
> > Interesting properties of the word vectors
> > It was recently shown that the word vectors capture many linguistic
> regularities, for example vector
> > operations vector('Paris') - vector('France') + vector('Italy') results
> in a vector that is very
> > close to vector('Rome'), and vector('king') - vector('man') +
> vector('woman') is close to
> > vector('queen')
> >
> > Thanks,
> >
> > Koji
> >
> >
> > (2014/11/20 20:01), Paul Libbrecht wrote:
> >> Hello Koji,
> >>
> >> how would you compare that to SemanticVectors?
> >>
> >> paul
> >>
> >> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> >>
> >>> Hello,
> >>>
> >>> It's my pleasure to share that I have an interesting tool "word2vec
> for Lucene"
> >>> available at https://github.com/kojisekig/word2vec-lucene .
> >>>
> >>> As you can imagine, you can use "word2vec for Lucene" to extract word
> vectors from Lucene index.
> >>>
> >>> Thank you,
> >>>
> >>> Koji
> >>> --
> >>>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> >
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>

Re: [ANN] word2vec for Lucene

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Thanks Glen for the URL. I'd like to check it when I am available.

Thanks Paul for giving me the difference between them. I like your description!

Koji

(2014/11/21 2:18), Paul Libbrecht wrote:
> As far as I could tell, word2vec seems more mathematical, which is rather nice.
> At least I see more transparent math in the web-page.
> Maybe this helps a bit?
> 
> SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque.
> Maybe it's more a question of presentation.
> 
> Paul
> 
> 
> On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> 
>> Hi Paul,
>>
>> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
>> But word vectors that are produced by word2vec have interesting properties.
>>
>> Here is the description of the original word2vec web site:
>>
>> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
>> Interesting properties of the word vectors
>> It was recently shown that the word vectors capture many linguistic regularities, for example vector
>> operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very
>> close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to
>> vector('queen')
>>
>> Thanks,
>>
>> Koji
>>
>>
>> (2014/11/20 20:01), Paul Libbrecht wrote:
>>> Hello Koji,
>>>
>>> how would you compare that to SemanticVectors?
>>>
>>> paul
>>>
>>> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>>>
>>>> Hello,
>>>>
>>>> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
>>>> available at https://github.com/kojisekig/word2vec-lucene .
>>>>
>>>> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
>>>>
>>>> Thank you,
>>>>
>>>> Koji
>>>> -- 
>>>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> -- 
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [ANN] word2vec for Lucene

Posted by Paul Libbrecht <pa...@hoplahup.net>.
As far as I could tell, word2vec seems more mathematical, which is rather nice.
At least I see more transparent math in the web-page.
Maybe this helps a bit?

SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. 
Maybe it's more a question of presentation.

Paul


On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Hi Paul,
> 
> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
> But word vectors that are produced by word2vec have interesting properties.
> 
> Here is the description of the original word2vec web site:
> 
> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
> Interesting properties of the word vectors
> It was recently shown that the word vectors capture many linguistic regularities, for example vector
> operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very
> close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to
> vector('queen')
> 
> Thanks,
> 
> Koji
> 
> 
> (2014/11/20 20:01), Paul Libbrecht wrote:
>> Hello Koji,
>> 
>> how would you compare that to SemanticVectors?
>> 
>> paul
>> 
>> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>> 
>>> Hello,
>>> 
>>> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
>>> available at https://github.com/kojisekig/word2vec-lucene .
>>> 
>>> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
>>> 
>>> Thank you,
>>> 
>>> Koji
>>> -- 
>>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> 
> -- 
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [ANN] word2vec for Lucene

Posted by Paul Libbrecht <pa...@hoplahup.net>.
As far as I could tell, word2vec seems more mathematical, which is rather nice.
At least I see more transparent math in the web-page.
Maybe this helps a bit?

SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. 
Maybe it's more a question of presentation.

Paul


On 20 nov. 2014, at 16:24, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Hi Paul,
> 
> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
> But word vectors that are produced by word2vec have interesting properties.
> 
> Here is the description of the original word2vec web site:
> 
> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
> Interesting properties of the word vectors
> It was recently shown that the word vectors capture many linguistic regularities, for example vector
> operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very
> close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to
> vector('queen')
> 
> Thanks,
> 
> Koji
> 
> 
> (2014/11/20 20:01), Paul Libbrecht wrote:
>> Hello Koji,
>> 
>> how would you compare that to SemanticVectors?
>> 
>> paul
>> 
>> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>> 
>>> Hello,
>>> 
>>> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
>>> available at https://github.com/kojisekig/word2vec-lucene .
>>> 
>>> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
>>> 
>>> Thank you,
>>> 
>>> Koji
>>> -- 
>>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> 
> -- 
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


Re: [ANN] word2vec for Lucene

Posted by Glen Newton <gl...@gmail.com>.
Hi Koji,

Semantic vectors is here: http://code.google.com/p/semanticvectors/

It is a project that has been around for a number of years and used by many
people (including me
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
).

If you could compare and contrast word2vec with Semantic Vectors, this
would allow many of us to understand where/when we might want to use
word2vec.

Thank-you,
Glen

On Thu, Nov 20, 2014 at 10:24 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Hi Paul,
>
> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
> But word vectors that are produced by word2vec have interesting properties.
>
> Here is the description of the original word2vec web site:
>
>
> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
> Interesting properties of the word vectors
> It was recently shown that the word vectors capture many linguistic
> regularities, for example vector
> operations vector('Paris') - vector('France') + vector('Italy') results in
> a vector that is very
> close to vector('Rome'), and vector('king') - vector('man') +
> vector('woman') is close to
> vector('queen')
>
> Thanks,
>
> Koji
>
>
> (2014/11/20 20:01), Paul Libbrecht wrote:
> > Hello Koji,
> >
> > how would you compare that to SemanticVectors?
> >
> > paul
> >
> > On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> >
> >> Hello,
> >>
> >> It's my pleasure to share that I have an interesting tool "word2vec for
> Lucene"
> >> available at https://github.com/kojisekig/word2vec-lucene .
> >>
> >> As you can imagine, you can use "word2vec for Lucene" to extract word
> vectors from Lucene index.
> >>
> >> Thank you,
> >>
> >> Koji
> >> --
> >>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: [ANN] word2vec for Lucene

Posted by Glen Newton <gl...@gmail.com>.
Hi Koji,

Semantic vectors is here: http://code.google.com/p/semanticvectors/

It is a project that has been around for a number of years and used by many
people (including me
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
).

If you could compare and contrast word2vec with Semantic Vectors, this
would allow many of us to understand where/when we might want to use
word2vec.

Thank-you,
Glen

On Thu, Nov 20, 2014 at 10:24 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Hi Paul,
>
> I cannot compare it to SemanticVectors as I don't know SemanticVectors.
> But word vectors that are produced by word2vec have interesting properties.
>
> Here is the description of the original word2vec web site:
>
>
> https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
> Interesting properties of the word vectors
> It was recently shown that the word vectors capture many linguistic
> regularities, for example vector
> operations vector('Paris') - vector('France') + vector('Italy') results in
> a vector that is very
> close to vector('Rome'), and vector('king') - vector('man') +
> vector('woman') is close to
> vector('queen')
>
> Thanks,
>
> Koji
>
>
> (2014/11/20 20:01), Paul Libbrecht wrote:
> > Hello Koji,
> >
> > how would you compare that to SemanticVectors?
> >
> > paul
> >
> > On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> >
> >> Hello,
> >>
> >> It's my pleasure to share that I have an interesting tool "word2vec for
> Lucene"
> >> available at https://github.com/kojisekig/word2vec-lucene .
> >>
> >> As you can imagine, you can use "word2vec for Lucene" to extract word
> vectors from Lucene index.
> >>
> >> Thank you,
> >>
> >> Koji
> >> --
> >>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: [ANN] word2vec for Lucene

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi Paul,

I cannot compare it to SemanticVectors as I don't know SemanticVectors.
But word vectors that are produced by word2vec have interesting properties.

Here is the description of the original word2vec web site:

https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
Interesting properties of the word vectors
It was recently shown that the word vectors capture many linguistic regularities, for example vector
operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very
close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to
vector('queen')

Thanks,

Koji


(2014/11/20 20:01), Paul Libbrecht wrote:
> Hello Koji,
> 
> how would you compare that to SemanticVectors?
> 
> paul
> 
> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> 
>> Hello,
>>
>> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
>> available at https://github.com/kojisekig/word2vec-lucene .
>>
>> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
>>
>> Thank you,
>>
>> Koji
>> -- 
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [ANN] word2vec for Lucene

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi Paul,

I cannot compare it to SemanticVectors as I don't know SemanticVectors.
But word vectors that are produced by word2vec have interesting properties.

Here is the description of the original word2vec web site:

https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors
Interesting properties of the word vectors
It was recently shown that the word vectors capture many linguistic regularities, for example vector
operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very
close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to
vector('queen')

Thanks,

Koji


(2014/11/20 20:01), Paul Libbrecht wrote:
> Hello Koji,
> 
> how would you compare that to SemanticVectors?
> 
> paul
> 
> On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> 
>> Hello,
>>
>> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
>> available at https://github.com/kojisekig/word2vec-lucene .
>>
>> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
>>
>> Thank you,
>>
>> Koji
>> -- 
>> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

Re: [ANN] word2vec for Lucene

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Hello Koji,

how would you compare that to SemanticVectors?

paul

On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Hello,
> 
> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
> available at https://github.com/kojisekig/word2vec-lucene .
> 
> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
> 
> Thank you,
> 
> Koji
> -- 
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html


Re: [ANN] word2vec for Lucene

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Hello Koji,

how would you compare that to SemanticVectors?

paul

On 20 nov. 2014, at 10:10, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Hello,
> 
> It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
> available at https://github.com/kojisekig/word2vec-lucene .
> 
> As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index.
> 
> Thank you,
> 
> Koji
> -- 
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org