You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Xi Shen <da...@gmail.com> on 2015/03/09 07:39:53 UTC

How to use the TF-IDF model?

Hi,

I read this page,
http://spark.apache.org/docs/1.2.0/mllib-feature-extraction.html. But I am
wondering, how to use this TF-IDF RDD? What is this TF-IDF vector looks
like?

Can someone provide me some guide?


Thanks,


[image: --]
Xi Shen
[image: http://]about.me/davidshen
<http://about.me/davidshen?promo=email_sig>
  <http://about.me/davidshen>

Re: How to use the TF-IDF model?

Posted by Jeffrey Jedele <je...@gmail.com>.

Hi,
well, it really depends on what you want to do ;)

TF-IDF is a measure that originates in the information retrieval context
and that can be used to judge the relevancy of a document in context of a
given search term.

It's also often used for text-related machine learning tasks. E.g. have a
look at topic extraction using non-negative matrix factorization.

Regards,
Jeff

2015-03-09 7:39 GMT+01:00 Xi Shen <da...@gmail.com>:

> Hi,
>
> I read this page,
> http://spark.apache.org/docs/1.2.0/mllib-feature-extraction.html. But I
> am wondering, how to use this TF-IDF RDD? What is this TF-IDF vector looks
> like?
>
> Can someone provide me some guide?
>
>
> Thanks,
>
>
> [image: --]
> Xi Shen
> [image: http://]about.me/davidshen
> <http://about.me/davidshen?promo=email_sig>
>   <http://about.me/davidshen>
>