You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by janardhan shetty <ja...@gmail.com> on 2016/07/27 17:31:51 UTC

Writing custom Transformers and Estimators like Tokenizer in spark ML

1.  Any links or blogs to develop *custom* transformers ? ex: Tokenizer

2. Any links or blogs to develop *custom* estimators ? ex: any ml algorithm

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

Posted by Steve Rowe <sa...@gmail.com>.

UnaryTransformer’s scaladoc says "Abstract class for transformers that take one input column, apply transformation, and output the result as a new column.”

If you want to allow specification of more than one input column, or if your output column already exists, or you want multiple output columns, then you can’t use UnaryTransformer.  

If all of the above conditions are met, though, UnaryTransformer will simplify your subclass.

BTW the scaladocs for StringType say "The data type representing `String` values. Please use the singleton [[DataTypes.StringType]].” <- do that instead of calling StringType’s ctor.

--
Steve
www.lucidworks.com

> On Aug 1, 2016, at 2:30 PM, janardhan shetty <ja...@gmail.com> wrote:
> 
> What is the difference between UnaryTransformer and Transformer classes. In which scenarios should we use  one or the other ?
> 
> On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty <ja...@gmail.com> wrote:
> Developing in scala but any help with difference between UnaryTransformer (Is this experimental still ?)and Transformer class is appreciated.
> 
> Right now encountering  error for the code which extends UnaryTransformer
> override protected def outputDataType: DataType = new StringType
> 
> Error:(26, 53) constructor StringType in class StringType cannot be accessed in class Capitalizer
>   override protected def outputDataType: DataType = new StringType
>                                                     ^
> 
> 
> On Thu, Jul 28, 2016 at 8:20 PM, Phuong LE-HONG <ph...@gmail.com> wrote:
> Hi,
> 
> I've developed a simple ML estimator (in Java) that implements
> conditional Markov model for sequence labelling in Vitk toolkit. You
> can check it out here:
> 
> https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
> 
> Phuong Le-Hong
> 
> On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
> <ja...@gmail.com> wrote:
> > Thanks Steve.
> >
> > Any pointers to custom estimators development as well ?
> >
> > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
> >>
> >> You can see the source for my transformer configurable bridge to Lucene
> >> analysis components here, in my company Lucidworks’ spark-solr project:
> >> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala>.
> >>
> >> Here’s a blog I wrote about using this transformer, as well as
> >> non-ML-context use in Spark of the underlying analysis component, here:
> >> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com>
> >> > wrote:
> >> >
> >> > 1.  Any links or blogs to develop custom transformers ? ex: Tokenizer
> >> >
> >> > 2. Any links or blogs to develop custom estimators ? ex: any ml
> >> > algorithm
> >>
> >
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

Posted by janardhan shetty <ja...@gmail.com>.

What is the difference between UnaryTransformer and Transformer classes. In
which scenarios should we use  one or the other ?

On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty <ja...@gmail.com>
wrote:

> Developing in scala but any help with difference between UnaryTransformer
> (Is this experimental still ?)and Transformer class is appreciated.
>
> Right now encountering  error for the code which extends UnaryTransformer
>
> override protected def outputDataType: DataType = new StringType
>
> Error:(26, 53) constructor StringType in class StringType cannot be accessed in class Capitalizer
>   override protected def outputDataType: DataType = new StringType
>                                                     ^
>
>
>
> On Thu, Jul 28, 2016 at 8:20 PM, Phuong LE-HONG <ph...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've developed a simple ML estimator (in Java) that implements
>> conditional Markov model for sequence labelling in Vitk toolkit. You
>> can check it out here:
>>
>>
>> https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
>>
>> Phuong Le-Hong
>>
>> On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
>> <ja...@gmail.com> wrote:
>> > Thanks Steve.
>> >
>> > Any pointers to custom estimators development as well ?
>> >
>> > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
>> >>
>> >> You can see the source for my transformer configurable bridge to Lucene
>> >> analysis components here, in my company Lucidworks’ spark-solr project:
>> >> <
>> https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala
>> >.
>> >>
>> >> Here’s a blog I wrote about using this transformer, as well as
>> >> non-ML-context use in Spark of the underlying analysis component, here:
>> >> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/
>> >.
>> >>
>> >> --
>> >> Steve
>> >> www.lucidworks.com
>> >>
>> >> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <
>> janardhanp22@gmail.com>
>> >> > wrote:
>> >> >
>> >> > 1.  Any links or blogs to develop custom transformers ? ex: Tokenizer
>> >> >
>> >> > 2. Any links or blogs to develop custom estimators ? ex: any ml
>> >> > algorithm
>> >>
>> >
>>
>
>

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

Posted by janardhan shetty <ja...@gmail.com>.

Developing in scala but any help with difference between UnaryTransformer
(Is this experimental still ?)and Transformer class is appreciated.

Right now encountering  error for the code which extends UnaryTransformer

override protected def outputDataType: DataType = new StringType

Error:(26, 53) constructor StringType in class StringType cannot be
accessed in class Capitalizer
  override protected def outputDataType: DataType = new StringType
                                                    ^



On Thu, Jul 28, 2016 at 8:20 PM, Phuong LE-HONG <ph...@gmail.com> wrote:

> Hi,
>
> I've developed a simple ML estimator (in Java) that implements
> conditional Markov model for sequence labelling in Vitk toolkit. You
> can check it out here:
>
>
> https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
>
> Phuong Le-Hong
>
> On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
> <ja...@gmail.com> wrote:
> > Thanks Steve.
> >
> > Any pointers to custom estimators development as well ?
> >
> > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
> >>
> >> You can see the source for my transformer configurable bridge to Lucene
> >> analysis components here, in my company Lucidworks’ spark-solr project:
> >> <
> https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala
> >.
> >>
> >> Here’s a blog I wrote about using this transformer, as well as
> >> non-ML-context use in Spark of the underlying analysis component, here:
> >> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/
> >.
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <janardhanp22@gmail.com
> >
> >> > wrote:
> >> >
> >> > 1.  Any links or blogs to develop custom transformers ? ex: Tokenizer
> >> >
> >> > 2. Any links or blogs to develop custom estimators ? ex: any ml
> >> > algorithm
> >>
> >
>

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

Posted by Phuong LE-HONG <ph...@gmail.com>.

Hi,

I've developed a simple ML estimator (in Java) that implements
conditional Markov model for sequence labelling in Vitk toolkit. You
can check it out here:

https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java

Phuong Le-Hong

On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
<ja...@gmail.com> wrote:
> Thanks Steve.
>
> Any pointers to custom estimators development as well ?
>
> On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
>>
>> You can see the source for my transformer configurable bridge to Lucene
>> analysis components here, in my company Lucidworks’ spark-solr project:
>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala>.
>>
>> Here’s a blog I wrote about using this transformer, as well as
>> non-ML-context use in Spark of the underlying analysis component, here:
>> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com>
>> > wrote:
>> >
>> > 1.  Any links or blogs to develop custom transformers ? ex: Tokenizer
>> >
>> > 2. Any links or blogs to develop custom estimators ? ex: any ml
>> > algorithm
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

Posted by janardhan shetty <ja...@gmail.com>.

Thanks Steve.

Any pointers to custom estimators development as well ?

On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:

> You can see the source for my transformer configurable bridge to Lucene
> analysis components here, in my company Lucidworks’ spark-solr project: <
> https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala
> >.
>
> Here’s a blog I wrote about using this transformer, as well as
> non-ML-context use in Spark of the underlying analysis component, here: <
> https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
>
> --
> Steve
> www.lucidworks.com
>
> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com>
> wrote:
> >
> > 1.  Any links or blogs to develop custom transformers ? ex: Tokenizer
> >
> > 2. Any links or blogs to develop custom estimators ? ex: any ml algorithm
>
>

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

Posted by Steve Rowe <sa...@gmail.com>.

You can see the source for my transformer configurable bridge to Lucene analysis components here, in my company Lucidworks’ spark-solr project: <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala>.

Here’s a blog I wrote about using this transformer, as well as non-ML-context use in Spark of the underlying analysis component, here: <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.

--
Steve
www.lucidworks.com

> On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com> wrote:
> 
> 1.  Any links or blogs to develop custom transformers ? ex: Tokenizer
> 
> 2. Any links or blogs to develop custom estimators ? ex: any ml algorithm

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org