You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Octavian Geagla <og...@gmail.com> on 2015/01/24 17:26:25 UTC

Any interest in 'weighting' VectorTransformer which does component-wise scaling?

Hello,

I found it useful to implement the  Hadamard Product
<https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29http://>   as
a VectorTransformer.  It can be applied to scale (by a constant) a certain
dimension (column) of the data set.  

Since I've already implemented it and am using it, I thought I'd see if
there's interest in this feature going in as Experimental.  I'm not sold on
the name 'Weighter', either.

Here's the current branch with the work (docs, impl, tests).
<https://github.com/ogeagla/spark/compare/spark-mllib-weighting>  

The implementation was heavily inspired by those of StandardScalerModel and
Normalizer.

Thanks
Octavian



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?

Posted by Octavian Geagla <og...@gmail.com>.
I've added support for sparse vectors and created HadamardTF for the
pipeline, please take a look  on my branch
<https://github.com/ogeagla/spark/compare/spark-mllib-weighting>  .

Thanks!



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265p10378.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?

Posted by Octavian Geagla <og...@gmail.com>.
Thanks for the responses.  How would something like HadamardProduct or
similar be in order to keep it explicit?  Would still be a VectorTransformer
so the name and trait would hopefully lead to a somewhat self-documenting
class.  

Xiangrui, do you mean Hadamard product or transform?  My initial proposal
was only a vector-vector product, but I can extend this to matrices. The
transform would require a bit more work, which I'm willing to do, but I'm
not sure where FFT comes in, can you elaborate?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265p10355.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?

Posted by "Evan R. Sparks" <ev...@gmail.com>.
Hmm... Scaler and Scalar are very close together both in terms of
pronunciation and spelling - and I wouldn't want to create confusion
between the two. Further - this operation (elementwise multiplication by a
static vector) is general enough that maybe it should have a more general
name?

On Tue, Jan 27, 2015 at 7:54 AM, Xiangrui Meng <me...@gmail.com> wrote:

> I would call it Scaler. You might want to add it to the spark.ml pipieline
> api. Please check the spark.ml.HashingTF implementation. Note that this
> should handle sparse vectors efficiently.
>
> Hadamard and FFTs are quite useful. If you are intetested, make sure that
> we call an FFT libary that is license-compatible with Apache.
>
> -Xiangrui
> On Jan 24, 2015 8:27 AM, "Octavian Geagla" <og...@gmail.com> wrote:
>
> > Hello,
> >
> > I found it useful to implement the  Hadamard Product
> > <https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29http://>
> >  as
> > a VectorTransformer.  It can be applied to scale (by a constant) a
> certain
> > dimension (column) of the data set.
> >
> > Since I've already implemented it and am using it, I thought I'd see if
> > there's interest in this feature going in as Experimental.  I'm not sold
> on
> > the name 'Weighter', either.
> >
> > Here's the current branch with the work (docs, impl, tests).
> > <https://github.com/ogeagla/spark/compare/spark-mllib-weighting>
> >
> > The implementation was heavily inspired by those of StandardScalerModel
> and
> > Normalizer.
> >
> > Thanks
> > Octavian
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265.html
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
>

Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?

Posted by Xiangrui Meng <me...@gmail.com>.
I would call it Scaler. You might want to add it to the spark.ml pipieline
api. Please check the spark.ml.HashingTF implementation. Note that this
should handle sparse vectors efficiently.

Hadamard and FFTs are quite useful. If you are intetested, make sure that
we call an FFT libary that is license-compatible with Apache.

-Xiangrui
On Jan 24, 2015 8:27 AM, "Octavian Geagla" <og...@gmail.com> wrote:

> Hello,
>
> I found it useful to implement the  Hadamard Product
> <https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29http://>
>  as
> a VectorTransformer.  It can be applied to scale (by a constant) a certain
> dimension (column) of the data set.
>
> Since I've already implemented it and am using it, I thought I'd see if
> there's interest in this feature going in as Experimental.  I'm not sold on
> the name 'Weighter', either.
>
> Here's the current branch with the work (docs, impl, tests).
> <https://github.com/ogeagla/spark/compare/spark-mllib-weighting>
>
> The implementation was heavily inspired by those of StandardScalerModel and
> Normalizer.
>
> Thanks
> Octavian
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Any-interest-in-weighting-VectorTransformer-which-does-component-wise-scaling-tp10265.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>