You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Ganesh <ma...@ganeshkrishnan.com> on 2016/11/07 13:25:37 UTC

VectorUDT and ml.Vector

I am trying to run a SVD on a dataframe and I have used ml TF-IDF which 
has created a dataframe.
Now for Singular Value Decomposition I am trying to use RowMatrix which 
takes in RDD with mllib.Vector so I have to convert this Dataframe with 
what I assumed was ml.Vector

However the conversion

/val convertedTermDocMatrix = 
MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")/

fails with

java.lang.IllegalArgumentException: requirement failed: Column features 
must be new Matrix type to be converted to old type but got 
org.apache.spark.ml.linalg.VectorUDT


So the question is: How do I perform SVD on a DataFrame? I assume all 
the functionalities of mllib has not be ported to ml.


I tried to convert my entire project to use RDD but computeSVD on 
RowMatrix is throwing up out of Memory errors and anyway I would like to 
stick with DataFrame.

Our text corpus is around 55 Gb of text data.



Ganesh

Re: VectorUDT and ml.Vector

Posted by Yanbo Liang <yb...@gmail.com>.

The reason behind this error can be inferred from the error log:
*MLUtils.convertMatrixColumnsFromML *was used to convert ml.linalg.Matrix
to mllib.linalg.Matrix,
but it looks like the column type is ml.linalg.Vector in your case.
Could you check the type of column "features" in your dataframe (Vector or
Matrix)? I think it's ml.linalg.Vector, so your should use
*MLUtils.convertVectorColumnsFromML.*

Thanks
Yanbo

On Mon, Nov 7, 2016 at 5:25 AM, Ganesh <ma...@ganeshkrishnan.com> wrote:

> I am trying to run a SVD on a dataframe and I have used ml TF-IDF which
> has created a dataframe.
> Now for Singular Value Decomposition I am trying to use RowMatrix which
> takes in RDD with mllib.Vector so I have to convert this Dataframe with
> what I assumed was ml.Vector
>
> However the conversion
>
> *val convertedTermDocMatrix =
> MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")*
>
> fails with
>
> java.lang.IllegalArgumentException: requirement failed: Column features
> must be new Matrix type to be converted to old type but got
> org.apache.spark.ml.linalg.VectorUDT
>
>
> So the question is: How do I perform SVD on a DataFrame? I assume all the
> functionalities of mllib has not be ported to ml.
>
>
> I tried to convert my entire project to use RDD but computeSVD on
> RowMatrix is throwing up out of Memory errors and anyway I would like to
> stick with DataFrame.
>
> Our text corpus is around 55 Gb of text data.
>
>
>
> Ganesh
>