You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by janardhan shetty <ja...@gmail.com> on 2016/07/27 17:31:51 UTC
Writing custom Transformers and Estimators like Tokenizer in spark ML
1. Any links or blogs to develop *custom* transformers ? ex: Tokenizer
2. Any links or blogs to develop *custom* estimators ? ex: any ml algorithm
Re: Writing custom Transformers and Estimators like Tokenizer in spark ML
Posted by Steve Rowe <sa...@gmail.com>.
UnaryTransformer’s scaladoc says "Abstract class for transformers that take one input column, apply transformation, and output the result as a new column.”
If you want to allow specification of more than one input column, or if your output column already exists, or you want multiple output columns, then you can’t use UnaryTransformer.
If all of the above conditions are met, though, UnaryTransformer will simplify your subclass.
BTW the scaladocs for StringType say "The data type representing `String` values. Please use the singleton [[DataTypes.StringType]].” <- do that instead of calling StringType’s ctor.
--
Steve
www.lucidworks.com
> On Aug 1, 2016, at 2:30 PM, janardhan shetty <ja...@gmail.com> wrote:
>
> What is the difference between UnaryTransformer and Transformer classes. In which scenarios should we use one or the other ?
>
> On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty <ja...@gmail.com> wrote:
> Developing in scala but any help with difference between UnaryTransformer (Is this experimental still ?)and Transformer class is appreciated.
>
> Right now encountering error for the code which extends UnaryTransformer
> override protected def outputDataType: DataType = new StringType
>
> Error:(26, 53) constructor StringType in class StringType cannot be accessed in class Capitalizer
> override protected def outputDataType: DataType = new StringType
> ^
>
>
> On Thu, Jul 28, 2016 at 8:20 PM, Phuong LE-HONG <ph...@gmail.com> wrote:
> Hi,
>
> I've developed a simple ML estimator (in Java) that implements
> conditional Markov model for sequence labelling in Vitk toolkit. You
> can check it out here:
>
> https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
>
> Phuong Le-Hong
>
> On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
> <ja...@gmail.com> wrote:
> > Thanks Steve.
> >
> > Any pointers to custom estimators development as well ?
> >
> > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
> >>
> >> You can see the source for my transformer configurable bridge to Lucene
> >> analysis components here, in my company Lucidworks’ spark-solr project:
> >> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala>.
> >>
> >> Here’s a blog I wrote about using this transformer, as well as
> >> non-ML-context use in Spark of the underlying analysis component, here:
> >> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com>
> >> > wrote:
> >> >
> >> > 1. Any links or blogs to develop custom transformers ? ex: Tokenizer
> >> >
> >> > 2. Any links or blogs to develop custom estimators ? ex: any ml
> >> > algorithm
> >>
> >
>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Writing custom Transformers and Estimators like Tokenizer in
spark ML
Posted by janardhan shetty <ja...@gmail.com>.
What is the difference between UnaryTransformer and Transformer classes. In
which scenarios should we use one or the other ?
On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty <ja...@gmail.com>
wrote:
> Developing in scala but any help with difference between UnaryTransformer
> (Is this experimental still ?)and Transformer class is appreciated.
>
> Right now encountering error for the code which extends UnaryTransformer
>
> override protected def outputDataType: DataType = new StringType
>
> Error:(26, 53) constructor StringType in class StringType cannot be accessed in class Capitalizer
> override protected def outputDataType: DataType = new StringType
> ^
>
>
>
> On Thu, Jul 28, 2016 at 8:20 PM, Phuong LE-HONG <ph...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've developed a simple ML estimator (in Java) that implements
>> conditional Markov model for sequence labelling in Vitk toolkit. You
>> can check it out here:
>>
>>
>> https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
>>
>> Phuong Le-Hong
>>
>> On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
>> <ja...@gmail.com> wrote:
>> > Thanks Steve.
>> >
>> > Any pointers to custom estimators development as well ?
>> >
>> > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
>> >>
>> >> You can see the source for my transformer configurable bridge to Lucene
>> >> analysis components here, in my company Lucidworks’ spark-solr project:
>> >> <
>> https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala
>> >.
>> >>
>> >> Here’s a blog I wrote about using this transformer, as well as
>> >> non-ML-context use in Spark of the underlying analysis component, here:
>> >> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/
>> >.
>> >>
>> >> --
>> >> Steve
>> >> www.lucidworks.com
>> >>
>> >> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <
>> janardhanp22@gmail.com>
>> >> > wrote:
>> >> >
>> >> > 1. Any links or blogs to develop custom transformers ? ex: Tokenizer
>> >> >
>> >> > 2. Any links or blogs to develop custom estimators ? ex: any ml
>> >> > algorithm
>> >>
>> >
>>
>
>
Re: Writing custom Transformers and Estimators like Tokenizer in
spark ML
Posted by janardhan shetty <ja...@gmail.com>.
Developing in scala but any help with difference between UnaryTransformer
(Is this experimental still ?)and Transformer class is appreciated.
Right now encountering error for the code which extends UnaryTransformer
override protected def outputDataType: DataType = new StringType
Error:(26, 53) constructor StringType in class StringType cannot be
accessed in class Capitalizer
override protected def outputDataType: DataType = new StringType
^
On Thu, Jul 28, 2016 at 8:20 PM, Phuong LE-HONG <ph...@gmail.com> wrote:
> Hi,
>
> I've developed a simple ML estimator (in Java) that implements
> conditional Markov model for sequence labelling in Vitk toolkit. You
> can check it out here:
>
>
> https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
>
> Phuong Le-Hong
>
> On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
> <ja...@gmail.com> wrote:
> > Thanks Steve.
> >
> > Any pointers to custom estimators development as well ?
> >
> > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
> >>
> >> You can see the source for my transformer configurable bridge to Lucene
> >> analysis components here, in my company Lucidworks’ spark-solr project:
> >> <
> https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala
> >.
> >>
> >> Here’s a blog I wrote about using this transformer, as well as
> >> non-ML-context use in Spark of the underlying analysis component, here:
> >> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/
> >.
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <janardhanp22@gmail.com
> >
> >> > wrote:
> >> >
> >> > 1. Any links or blogs to develop custom transformers ? ex: Tokenizer
> >> >
> >> > 2. Any links or blogs to develop custom estimators ? ex: any ml
> >> > algorithm
> >>
> >
>
Re: Writing custom Transformers and Estimators like Tokenizer in
spark ML
Posted by Phuong LE-HONG <ph...@gmail.com>.
Hi,
I've developed a simple ML estimator (in Java) that implements
conditional Markov model for sequence labelling in Vitk toolkit. You
can check it out here:
https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
Phuong Le-Hong
On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
<ja...@gmail.com> wrote:
> Thanks Steve.
>
> Any pointers to custom estimators development as well ?
>
> On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
>>
>> You can see the source for my transformer configurable bridge to Lucene
>> analysis components here, in my company Lucidworks’ spark-solr project:
>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala>.
>>
>> Here’s a blog I wrote about using this transformer, as well as
>> non-ML-context use in Spark of the underlying analysis component, here:
>> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com>
>> > wrote:
>> >
>> > 1. Any links or blogs to develop custom transformers ? ex: Tokenizer
>> >
>> > 2. Any links or blogs to develop custom estimators ? ex: any ml
>> > algorithm
>>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Writing custom Transformers and Estimators like Tokenizer in
spark ML
Posted by janardhan shetty <ja...@gmail.com>.
Thanks Steve.
Any pointers to custom estimators development as well ?
On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sa...@gmail.com> wrote:
> You can see the source for my transformer configurable bridge to Lucene
> analysis components here, in my company Lucidworks’ spark-solr project: <
> https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala
> >.
>
> Here’s a blog I wrote about using this transformer, as well as
> non-ML-context use in Spark of the underlying analysis component, here: <
> https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
>
> --
> Steve
> www.lucidworks.com
>
> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com>
> wrote:
> >
> > 1. Any links or blogs to develop custom transformers ? ex: Tokenizer
> >
> > 2. Any links or blogs to develop custom estimators ? ex: any ml algorithm
>
>
Re: Writing custom Transformers and Estimators like Tokenizer in spark ML
Posted by Steve Rowe <sa...@gmail.com>.
You can see the source for my transformer configurable bridge to Lucene analysis components here, in my company Lucidworks’ spark-solr project: <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala>.
Here’s a blog I wrote about using this transformer, as well as non-ML-context use in Spark of the underlying analysis component, here: <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
--
Steve
www.lucidworks.com
> On Jul 27, 2016, at 1:31 PM, janardhan shetty <ja...@gmail.com> wrote:
>
> 1. Any links or blogs to develop custom transformers ? ex: Tokenizer
>
> 2. Any links or blogs to develop custom estimators ? ex: any ml algorithm
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org