You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ganesh <ma...@ganeshkrishnan.com> on 2016/11/07 13:25:37 UTC
VectorUDT and ml.Vector
I am trying to run a SVD on a dataframe and I have used ml TF-IDF which
has created a dataframe.
Now for Singular Value Decomposition I am trying to use RowMatrix which
takes in RDD with mllib.Vector so I have to convert this Dataframe with
what I assumed was ml.Vector
However the conversion
/val convertedTermDocMatrix =
MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")/
fails with
java.lang.IllegalArgumentException: requirement failed: Column features
must be new Matrix type to be converted to old type but got
org.apache.spark.ml.linalg.VectorUDT
So the question is: How do I perform SVD on a DataFrame? I assume all
the functionalities of mllib has not be ported to ml.
I tried to convert my entire project to use RDD but computeSVD on
RowMatrix is throwing up out of Memory errors and anyway I would like to
stick with DataFrame.
Our text corpus is around 55 Gb of text data.
Ganesh
Re: VectorUDT and ml.Vector
Posted by Yanbo Liang <yb...@gmail.com>.
The reason behind this error can be inferred from the error log:
*MLUtils.convertMatrixColumnsFromML *was used to convert ml.linalg.Matrix
to mllib.linalg.Matrix,
but it looks like the column type is ml.linalg.Vector in your case.
Could you check the type of column "features" in your dataframe (Vector or
Matrix)? I think it's ml.linalg.Vector, so your should use
*MLUtils.convertVectorColumnsFromML.*
Thanks
Yanbo
On Mon, Nov 7, 2016 at 5:25 AM, Ganesh <ma...@ganeshkrishnan.com> wrote:
> I am trying to run a SVD on a dataframe and I have used ml TF-IDF which
> has created a dataframe.
> Now for Singular Value Decomposition I am trying to use RowMatrix which
> takes in RDD with mllib.Vector so I have to convert this Dataframe with
> what I assumed was ml.Vector
>
> However the conversion
>
> *val convertedTermDocMatrix =
> MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")*
>
> fails with
>
> java.lang.IllegalArgumentException: requirement failed: Column features
> must be new Matrix type to be converted to old type but got
> org.apache.spark.ml.linalg.VectorUDT
>
>
> So the question is: How do I perform SVD on a DataFrame? I assume all the
> functionalities of mllib has not be ported to ml.
>
>
> I tried to convert my entire project to use RDD but computeSVD on
> RowMatrix is throwing up out of Memory errors and anyway I would like to
> stick with DataFrame.
>
> Our text corpus is around 55 Gb of text data.
>
>
>
> Ganesh
>