You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by fs...@posteo.de on 2016/11/21 23:33:26 UTC

Parfor semantics

While debugging some ParFor code it became clear that the parameters for 
parfor can be easily overwritten by the optimizer.
One example is when I write:

```
parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
     // some code here
}
```

Depending on the data size and cluster resources, the optimizer 
(OptimizerRuleBased.java, line 844) will recognize that the work can be 
done locally and overwrite it to local execution. This might be valid 
and definitely works (in my case) but kind of contradicts what I want 
SystemML to do.
I wonder if we should disable this optimization in case a concrete 
execution mode is given and go with the mode that is provided.

Felix


Re: Parfor semantics

Posted by Matthias Boehm <mb...@googlemail.com>.
well, it has been used for similar use cases. It works well if the 
dataset fits into memory of each worker. For very large datasets, the 
distributed right indexing is an issue, as it prevents us from running 
parfor itself as distributed operation. However, this can be addressed 
via block-partitioning, but so far we only support row/column partitioning.

Regards,
Matthias

On 11/23/2016 2:54 AM, dusenberrymw@gmail.com wrote:
> Also for some context, we're aiming to use this for remote hyperparameter tuning over a large dataset.  Specifically, each remote process would train a separate model over the full dataset using a mini-batch SGD approach.  Has the `parfor` construct been used for this purpose before?
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
>> On Nov 22, 2016, at 2:01 PM, Matthias Boehm <mb...@googlemail.com> wrote:
>>
>> that's a good catch - thanks Felix. It would be great if you could modify rewriteSetExecutionStategy and rewriteSetFusedDataPartitioningExecution in OptimizerConstrained to handle the respective Spark execution types. Thanks.
>>
>> Regards,
>> Matthias
>>
>>> On 11/22/2016 7:54 PM, fschueler@posteo.de wrote:
>>> The constrained optimizer doesn't seem to know about a REMOTE_SPARK
>>> execution mode and either sets CP or REMOTE_MR. I can open a jira for
>>> that and provide a fix.
>>>
>>> Felix
>>>
>>> Am 22.11.2016 02:07 schrieb Matthias Boehm:
>>>> yes, this came up several times - initially we only supported opt=NONE
>>>> where users had to specify all other parameters. Meanwhile, there is a
>>>> so-called "constrained optimizer" that does the same as the rule-based
>>>> optimizer but respects any given parameters. Please try something like
>>>> this:
>>>>
>>>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
>>>>     // some code here
>>>> }
>>>>
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>>> On 11/22/2016 12:33 AM, fschueler@posteo.de wrote:
>>>>> While debugging some ParFor code it became clear that the parameters for
>>>>> parfor can be easily overwritten by the optimizer.
>>>>> One example is when I write:
>>>>>
>>>>> ```
>>>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
>>>>>    // some code here
>>>>> }
>>>>> ```
>>>>>
>>>>> Depending on the data size and cluster resources, the optimizer
>>>>> (OptimizerRuleBased.java, line 844) will recognize that the work can be
>>>>> done locally and overwrite it to local execution. This might be valid
>>>>> and definitely works (in my case) but kind of contradicts what I want
>>>>> SystemML to do.
>>>>> I wonder if we should disable this optimization in case a concrete
>>>>> execution mode is given and go with the mode that is provided.
>>>>>
>>>>> Felix
>>>>>
>>>>>
>>>
>>>
>

Re: Parfor semantics

Posted by Felix Schüler <fs...@posteo.de>.
I found some more issues related to parfor and opened a couple of jiras. Someone can assign them to me, I will work on it in!

Felix

On 22.11.2016 17:54, dusenberrymw@gmail.com wrote:
> Also for some context, we're aiming to use this for remote hyperparameter tuning over a large dataset.  Specifically, each remote process would train a separate model over the full dataset using a mini-batch SGD approach.  Has the `parfor` construct been used for this purpose before?
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Nov 22, 2016, at 2:01 PM, Matthias Boehm <mb...@googlemail.com> wrote:
> >
> > that's a good catch - thanks Felix. It would be great if you could modify rewriteSetExecutionStategy and rewriteSetFusedDataPartitioningExecution in OptimizerConstrained to handle the respective Spark execution types. Thanks.
> >
> > Regards,
> > Matthias
> >
> >> On 11/22/2016 7:54 PM, fschueler@posteo.de wrote:
> >> The constrained optimizer doesn't seem to know about a REMOTE_SPARK
> >> execution mode and either sets CP or REMOTE_MR. I can open a jira for
> >> that and provide a fix.
> >>
> >> Felix
> >>
> >> Am 22.11.2016 02:07 schrieb Matthias Boehm:
> >>> yes, this came up several times - initially we only supported opt=NONE
> >>> where users had to specify all other parameters. Meanwhile, there is a
> >>> so-called "constrained optimizer" that does the same as the rule-based
> >>> optimizer but respects any given parameters. Please try something like
> >>> this:
> >>>
> >>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
> >>>     // some code here
> >>> }
> >>>
> >>>
> >>> Regards,
> >>> Matthias
> >>>
> >>>> On 11/22/2016 12:33 AM, fschueler@posteo.de wrote:
> >>>> While debugging some ParFor code it became clear that the parameters for
> >>>> parfor can be easily overwritten by the optimizer.
> >>>> One example is when I write:
> >>>>
> >>>> ```
> >>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
> >>>>    // some code here
> >>>> }
> >>>> ```
> >>>>
> >>>> Depending on the data size and cluster resources, the optimizer
> >>>> (OptimizerRuleBased.java, line 844) will recognize that the work can be
> >>>> done locally and overwrite it to local execution. This might be valid
> >>>> and definitely works (in my case) but kind of contradicts what I want
> >>>> SystemML to do.
> >>>> I wonder if we should disable this optimization in case a concrete
> >>>> execution mode is given and go with the mode that is provided.
> >>>>
> >>>> Felix
> >>>>
> >>>>
> >>
> >>
>


Re: Parfor semantics

Posted by du...@gmail.com.
Also for some context, we're aiming to use this for remote hyperparameter tuning over a large dataset.  Specifically, each remote process would train a separate model over the full dataset using a mini-batch SGD approach.  Has the `parfor` construct been used for this purpose before?

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Nov 22, 2016, at 2:01 PM, Matthias Boehm <mb...@googlemail.com> wrote:
> 
> that's a good catch - thanks Felix. It would be great if you could modify rewriteSetExecutionStategy and rewriteSetFusedDataPartitioningExecution in OptimizerConstrained to handle the respective Spark execution types. Thanks.
> 
> Regards,
> Matthias
> 
>> On 11/22/2016 7:54 PM, fschueler@posteo.de wrote:
>> The constrained optimizer doesn't seem to know about a REMOTE_SPARK
>> execution mode and either sets CP or REMOTE_MR. I can open a jira for
>> that and provide a fix.
>> 
>> Felix
>> 
>> Am 22.11.2016 02:07 schrieb Matthias Boehm:
>>> yes, this came up several times - initially we only supported opt=NONE
>>> where users had to specify all other parameters. Meanwhile, there is a
>>> so-called "constrained optimizer" that does the same as the rule-based
>>> optimizer but respects any given parameters. Please try something like
>>> this:
>>> 
>>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
>>>     // some code here
>>> }
>>> 
>>> 
>>> Regards,
>>> Matthias
>>> 
>>>> On 11/22/2016 12:33 AM, fschueler@posteo.de wrote:
>>>> While debugging some ParFor code it became clear that the parameters for
>>>> parfor can be easily overwritten by the optimizer.
>>>> One example is when I write:
>>>> 
>>>> ```
>>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
>>>>    // some code here
>>>> }
>>>> ```
>>>> 
>>>> Depending on the data size and cluster resources, the optimizer
>>>> (OptimizerRuleBased.java, line 844) will recognize that the work can be
>>>> done locally and overwrite it to local execution. This might be valid
>>>> and definitely works (in my case) but kind of contradicts what I want
>>>> SystemML to do.
>>>> I wonder if we should disable this optimization in case a concrete
>>>> execution mode is given and go with the mode that is provided.
>>>> 
>>>> Felix
>>>> 
>>>> 
>> 
>> 

Re: Parfor semantics

Posted by Matthias Boehm <mb...@googlemail.com>.
that's a good catch - thanks Felix. It would be great if you could 
modify rewriteSetExecutionStategy and 
rewriteSetFusedDataPartitioningExecution in OptimizerConstrained to 
handle the respective Spark execution types. Thanks.

Regards,
Matthias

On 11/22/2016 7:54 PM, fschueler@posteo.de wrote:
> The constrained optimizer doesn't seem to know about a REMOTE_SPARK
> execution mode and either sets CP or REMOTE_MR. I can open a jira for
> that and provide a fix.
>
> Felix
>
> Am 22.11.2016 02:07 schrieb Matthias Boehm:
>> yes, this came up several times - initially we only supported opt=NONE
>> where users had to specify all other parameters. Meanwhile, there is a
>> so-called "constrained optimizer" that does the same as the rule-based
>> optimizer but respects any given parameters. Please try something like
>> this:
>>
>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
>>      // some code here
>> }
>>
>>
>> Regards,
>> Matthias
>>
>> On 11/22/2016 12:33 AM, fschueler@posteo.de wrote:
>>> While debugging some ParFor code it became clear that the parameters for
>>> parfor can be easily overwritten by the optimizer.
>>> One example is when I write:
>>>
>>> ```
>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
>>>     // some code here
>>> }
>>> ```
>>>
>>> Depending on the data size and cluster resources, the optimizer
>>> (OptimizerRuleBased.java, line 844) will recognize that the work can be
>>> done locally and overwrite it to local execution. This might be valid
>>> and definitely works (in my case) but kind of contradicts what I want
>>> SystemML to do.
>>> I wonder if we should disable this optimization in case a concrete
>>> execution mode is given and go with the mode that is provided.
>>>
>>> Felix
>>>
>>>
>
>

Re: Parfor semantics

Posted by fs...@posteo.de.
The constrained optimizer doesn't seem to know about a REMOTE_SPARK 
execution mode and either sets CP or REMOTE_MR. I can open a jira for 
that and provide a fix.

Felix

Am 22.11.2016 02:07 schrieb Matthias Boehm:
> yes, this came up several times - initially we only supported opt=NONE
> where users had to specify all other parameters. Meanwhile, there is a
> so-called "constrained optimizer" that does the same as the rule-based
> optimizer but respects any given parameters. Please try something like
> this:
> 
> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
>      // some code here
> }
> 
> 
> Regards,
> Matthias
> 
> On 11/22/2016 12:33 AM, fschueler@posteo.de wrote:
>> While debugging some ParFor code it became clear that the parameters 
>> for
>> parfor can be easily overwritten by the optimizer.
>> One example is when I write:
>> 
>> ```
>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
>>     // some code here
>> }
>> ```
>> 
>> Depending on the data size and cluster resources, the optimizer
>> (OptimizerRuleBased.java, line 844) will recognize that the work can 
>> be
>> done locally and overwrite it to local execution. This might be valid
>> and definitely works (in my case) but kind of contradicts what I want
>> SystemML to do.
>> I wonder if we should disable this optimization in case a concrete
>> execution mode is given and go with the mode that is provided.
>> 
>> Felix
>> 
>> 


Re: Parfor semantics

Posted by Matthias Boehm <mb...@googlemail.com>.
yes, this came up several times - initially we only supported opt=NONE 
where users had to specify all other parameters. Meanwhile, there is a 
so-called "constrained optimizer" that does the same as the rule-based 
optimizer but respects any given parameters. Please try something like this:

parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
      // some code here
}


Regards,
Matthias

On 11/22/2016 12:33 AM, fschueler@posteo.de wrote:
> While debugging some ParFor code it became clear that the parameters for
> parfor can be easily overwritten by the optimizer.
> One example is when I write:
>
> ```
> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
>     // some code here
> }
> ```
>
> Depending on the data size and cluster resources, the optimizer
> (OptimizerRuleBased.java, line 844) will recognize that the work can be
> done locally and overwrite it to local execution. This might be valid
> and definitely works (in my case) but kind of contradicts what I want
> SystemML to do.
> I wonder if we should disable this optimization in case a concrete
> execution mode is given and go with the mode that is provided.
>
> Felix
>
>