You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Chiwan Park <ch...@apache.org> on 2015/06/28 13:49:30 UTC

[flink-ml] How to use ParameterMap in predict method?

Hi, I’m implementing k-nearest-neighbors classification based flink-ml structure.

In recent commit (7a7a2940 [1]), the pipeline is restructured by dividing predict operation
into case of a single element and case of data set. In case of data set, parameter map is
given as a method parameter but in case of a single element there is no method to access
parameter map.

But in k-nearest-neighbors classification, we need to know k in predict method to select top
k values.

How can I solve this problem?

Regards,
Chiwan Park

[1] https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f


Re: [flink-ml] How to use ParameterMap in predict method?

Posted by Chiwan Park <ch...@apache.org>.
Thanks Till :)

I reimplemented my implementation using PredictDataSetOperation.

Regards,
Chiwan Park


> On Jun 29, 2015, at 7:41 PM, Till Rohrmann <ti...@gmail.com> wrote:
> 
> Hi Chiwan,
> 
> at the moment the single element PredictOperation only supports
> non-distributed models. This means that it expects the model to be a single
> element DataSet which can be broadcasted to the predict mappers.
> 
> If you need more flexibility, you can either extend the PredictOperation
> interface or you simply use the PredictDataSetOperation, where you have
> full control over what data flow you execute.
> 
> Cheers,
> Till
> ​
> 
> On Mon, Jun 29, 2015 at 12:16 PM, Chiwan Park <ch...@apache.org> wrote:
> 
>> Thank you Till.
>> 
>> I have another question. Can I use a DataSet object as Model? In KNN, we
>> need
>> to DataSet given in fit operation.
>> 
>> But when I defined Model generic parameter to DataSet in PredictOperation,
>> the getModel method’s return type is DataSet[DataSet]. I’m confused with
>> this
>> situation.
>> 
>> If any advice about this to me, I will really appreciate.
>> 
>> 
>> Regards,
>> Chiwan Park
>> 
>>> On Jun 29, 2015, at 4:43 PM, Till Rohrmann <tr...@apache.org> wrote:
>>> 
>>> Hi Chiwan,
>>> 
>>> when you use the single element predict operation, you always have to
>>> implement the `getModel` method. There you have access to the resulting
>>> parameters and even to the instance to which the `PredictOperation`
>>> belongs. Within in this `getModel` method you can initialize all the
>>> information you need for the `predict` operation.
>>> 
>>> You can take a look at the `StandardScalerTransformOperation` [1] where
>> the
>>> mean and the std are set in the `getModel` method.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> [1]
>>> 
>> https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197
>>> 
>>> On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <ch...@apache.org>
>> wrote:
>>> 
>>>> Hi, I’m implementing k-nearest-neighbors classification based flink-ml
>>>> structure.
>>>> 
>>>> In recent commit (7a7a2940 [1]), the pipeline is restructured by
>> dividing
>>>> predict operation
>>>> into case of a single element and case of data set. In case of data set,
>>>> parameter map is
>>>> given as a method parameter but in case of a single element there is no
>>>> method to access
>>>> parameter map.
>>>> 
>>>> But in k-nearest-neighbors classification, we need to know k in predict
>>>> method to select top
>>>> k values.
>>>> 
>>>> How can I solve this problem?
>>>> 
>>>> Regards,
>>>> Chiwan Park
>>>> 
>>>> [1]
>>>> 
>> https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f
>>>> 
>>>> 
>> 
>> 
>> 
>> 
>> 






Re: [flink-ml] How to use ParameterMap in predict method?

Posted by Till Rohrmann <ti...@gmail.com>.
Hi Chiwan,

at the moment the single element PredictOperation only supports
non-distributed models. This means that it expects the model to be a single
element DataSet which can be broadcasted to the predict mappers.

If you need more flexibility, you can either extend the PredictOperation
interface or you simply use the PredictDataSetOperation, where you have
full control over what data flow you execute.

Cheers,
Till
​

On Mon, Jun 29, 2015 at 12:16 PM, Chiwan Park <ch...@apache.org> wrote:

> Thank you Till.
>
> I have another question. Can I use a DataSet object as Model? In KNN, we
> need
> to DataSet given in fit operation.
>
> But when I defined Model generic parameter to DataSet in PredictOperation,
> the getModel method’s return type is DataSet[DataSet]. I’m confused with
> this
> situation.
>
> If any advice about this to me, I will really appreciate.
>
>
> Regards,
> Chiwan Park
>
> > On Jun 29, 2015, at 4:43 PM, Till Rohrmann <tr...@apache.org> wrote:
> >
> > Hi Chiwan,
> >
> > when you use the single element predict operation, you always have to
> > implement the `getModel` method. There you have access to the resulting
> > parameters and even to the instance to which the `PredictOperation`
> > belongs. Within in this `getModel` method you can initialize all the
> > information you need for the `predict` operation.
> >
> > You can take a look at the `StandardScalerTransformOperation` [1] where
> the
> > mean and the std are set in the `getModel` method.
> >
> > Cheers,
> > Till
> >
> > [1]
> >
> https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197
> >
> > On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <ch...@apache.org>
> wrote:
> >
> >> Hi, I’m implementing k-nearest-neighbors classification based flink-ml
> >> structure.
> >>
> >> In recent commit (7a7a2940 [1]), the pipeline is restructured by
> dividing
> >> predict operation
> >> into case of a single element and case of data set. In case of data set,
> >> parameter map is
> >> given as a method parameter but in case of a single element there is no
> >> method to access
> >> parameter map.
> >>
> >> But in k-nearest-neighbors classification, we need to know k in predict
> >> method to select top
> >> k values.
> >>
> >> How can I solve this problem?
> >>
> >> Regards,
> >> Chiwan Park
> >>
> >> [1]
> >>
> https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f
> >>
> >>
>
>
>
>
>

Re: [flink-ml] How to use ParameterMap in predict method?

Posted by Chiwan Park <ch...@apache.org>.
Thank you Till.

I have another question. Can I use a DataSet object as Model? In KNN, we need
to DataSet given in fit operation.

But when I defined Model generic parameter to DataSet in PredictOperation,
the getModel method’s return type is DataSet[DataSet]. I’m confused with this
situation.

If any advice about this to me, I will really appreciate.


Regards,
Chiwan Park

> On Jun 29, 2015, at 4:43 PM, Till Rohrmann <tr...@apache.org> wrote:
> 
> Hi Chiwan,
> 
> when you use the single element predict operation, you always have to
> implement the `getModel` method. There you have access to the resulting
> parameters and even to the instance to which the `PredictOperation`
> belongs. Within in this `getModel` method you can initialize all the
> information you need for the `predict` operation.
> 
> You can take a look at the `StandardScalerTransformOperation` [1] where the
> mean and the std are set in the `getModel` method.
> 
> Cheers,
> Till
> 
> [1]
> https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197
> 
> On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <ch...@apache.org> wrote:
> 
>> Hi, I’m implementing k-nearest-neighbors classification based flink-ml
>> structure.
>> 
>> In recent commit (7a7a2940 [1]), the pipeline is restructured by dividing
>> predict operation
>> into case of a single element and case of data set. In case of data set,
>> parameter map is
>> given as a method parameter but in case of a single element there is no
>> method to access
>> parameter map.
>> 
>> But in k-nearest-neighbors classification, we need to know k in predict
>> method to select top
>> k values.
>> 
>> How can I solve this problem?
>> 
>> Regards,
>> Chiwan Park
>> 
>> [1]
>> https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f
>> 
>> 





Re: [flink-ml] How to use ParameterMap in predict method?

Posted by Till Rohrmann <tr...@apache.org>.
Hi Chiwan,

when you use the single element predict operation, you always have to
implement the `getModel` method. There you have access to the resulting
parameters and even to the instance to which the `PredictOperation`
belongs. Within in this `getModel` method you can initialize all the
information you need for the `predict` operation.

You can take a look at the `StandardScalerTransformOperation` [1] where the
mean and the std are set in the `getModel` method.

Cheers,
Till

[1]
https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197

On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <ch...@apache.org> wrote:

> Hi, I’m implementing k-nearest-neighbors classification based flink-ml
> structure.
>
> In recent commit (7a7a2940 [1]), the pipeline is restructured by dividing
> predict operation
> into case of a single element and case of data set. In case of data set,
> parameter map is
> given as a method parameter but in case of a single element there is no
> method to access
> parameter map.
>
> But in k-nearest-neighbors classification, we need to know k in predict
> method to select top
> k values.
>
> How can I solve this problem?
>
> Regards,
> Chiwan Park
>
> [1]
> https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f
>
>