You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marcela Charfuelan <ch...@tu-berlin.de> on 2016/02/26 16:52:58 UTC

Iterations problem in command line

Hello,

I implemented an algorithm that includes iterations (EM algorithm) and I 
am getting different results when running in eclipse (Luna Release 
(4.4.0)) and when running in the command line using Flink run; the 
program does not crash is just that after the first iteration the 
results are different (wrong in the command line).

The solution I am getting in eclipse, for each iteration, is the same 
that I would get if running the algorithm in octave for example, so I am 
sure the solution is correct.

I have tried using java plus the command-line arguments of eclipse on 
the command line and that also works ok (local in ubuntu).

Has anybody experienced something similar? Any idea why this could 
happen? how can I fix this?

Regards,
Marcela.

Re: Iterations problem in command line

Posted by Marcela Charfuelan <ch...@tu-berlin.de>.
Hi,
the iteration looks like:
DataSet<GMM> gmms = getInitialGMMDataSet(env);
		
IterativeDataSet<GMM> loop = gmms.iterate(50);

DataSet<GMM> newGMMs = features.map(new 
Estep_ExpectationMaximisation()).withBroadcastSet(loop, "gmms")
		       .reduceGroup(new 
Mstep_ExpectationMaximisation()).withBroadcastSet(loop, "gmms");
	
DataSet<GMM> finalGMMs = loop.closeWith(newGMMs)

in every iteration the gmms parameters should be updated, I have noticed 
that the first iteration is ok, but afterwards start to get wrong... at 
least in the command line (for example gmms.coeff for the three gmms 
here should not sum up more that 1)

I have put the code here in case it helps:
https://github.com/marcelach1/EmExercise

Regards,
Marcela.


On 01.03.2016 10:58, Fabian Hueske wrote:
> Yes, env.setParallelism(1) fixes the parallelism of all operators to 1
> (unless an operator overrides this setting).
> Can you identify at which position in the data flow the results start to
> diverge?
>
> Best, Fabian
>
> 2016-02-29 17:57 GMT+01:00 Marcela Charfuelan
> <charfuelanoliva@tu-berlin.de <ma...@tu-berlin.de>>:
>
>     Thanks Fabian,
>     I am using in both default options, since I am not testing in a
>     cluster yet, just local in ubuntu, I am not specifying any parallelism.
>     just to test I set in the program env.setParallelism(1) and running
>     with -p 1 (which I guess I would not need) but I am still getting
>     the same issue.
>
>     Regards,
>     MArcela.
>
>
>     On 29.02.2016 16:44, Fabian Hueske wrote:
>
>         Hi Marcela,
>
>         do you run the algorithm in both setups with the same parallelism?
>
>         Best, Fabian
>
>         2016-02-26 16:52 GMT+01:00 Marcela Charfuelan
>         <charfuelanoliva@tu-berlin.de
>         <ma...@tu-berlin.de>
>         <mailto:charfuelanoliva@tu-berlin.de
>         <ma...@tu-berlin.de>>>:
>
>              Hello,
>
>              I implemented an algorithm that includes iterations (EM
>         algorithm)
>              and I am getting different results when running in eclipse
>         (Luna
>              Release (4.4.0)) and when running in the command line using
>         Flink
>              run; the program does not crash is just that after the first
>              iteration the results are different (wrong in the command
>         line).
>
>              The solution I am getting in eclipse, for each iteration,
>         is the
>              same that I would get if running the algorithm in octave for
>              example, so I am sure the solution is correct.
>
>              I have tried using java plus the command-line arguments of
>         eclipse
>              on the command line and that also works ok (local in ubuntu).
>
>              Has anybody experienced something similar? Any idea why
>         this could
>              happen? how can I fix this?
>
>              Regards,
>              Marcela.
>
>
>
>


-- 
Dr. Marcela Charfuelan, Senior Researcher
TU Berlin, School of Electrical Engineering and Computer Sciences
Database Systems and Information Management (DIMA)
EN7, Einsteinufer 17, D-10587 Berlin
Room: EN 725  Phone: +49 30-314-23556
URL: http://www.user.tu-berlin.de/charfuelan

Re: Iterations problem in command line

Posted by Fabian Hueske <fh...@gmail.com>.
Yes, env.setParallelism(1) fixes the parallelism of all operators to 1
(unless an operator overrides this setting).
Can you identify at which position in the data flow the results start to
diverge?

Best, Fabian

2016-02-29 17:57 GMT+01:00 Marcela Charfuelan <ch...@tu-berlin.de>
:

> Thanks Fabian,
> I am using in both default options, since I am not testing in a cluster
> yet, just local in ubuntu, I am not specifying any parallelism.
> just to test I set in the program env.setParallelism(1) and running with
> -p 1 (which I guess I would not need) but I am still getting the same issue.
>
> Regards,
> MArcela.
>
>
> On 29.02.2016 16:44, Fabian Hueske wrote:
>
>> Hi Marcela,
>>
>> do you run the algorithm in both setups with the same parallelism?
>>
>> Best, Fabian
>>
>> 2016-02-26 16:52 GMT+01:00 Marcela Charfuelan
>> <charfuelanoliva@tu-berlin.de <ma...@tu-berlin.de>>:
>>
>>     Hello,
>>
>>     I implemented an algorithm that includes iterations (EM algorithm)
>>     and I am getting different results when running in eclipse (Luna
>>     Release (4.4.0)) and when running in the command line using Flink
>>     run; the program does not crash is just that after the first
>>     iteration the results are different (wrong in the command line).
>>
>>     The solution I am getting in eclipse, for each iteration, is the
>>     same that I would get if running the algorithm in octave for
>>     example, so I am sure the solution is correct.
>>
>>     I have tried using java plus the command-line arguments of eclipse
>>     on the command line and that also works ok (local in ubuntu).
>>
>>     Has anybody experienced something similar? Any idea why this could
>>     happen? how can I fix this?
>>
>>     Regards,
>>     Marcela.
>>
>>
>>
>

Re: Iterations problem in command line

Posted by Marcela Charfuelan <ch...@tu-berlin.de>.
Thanks Fabian,
I am using in both default options, since I am not testing in a cluster 
yet, just local in ubuntu, I am not specifying any parallelism.
just to test I set in the program env.setParallelism(1) and running with 
-p 1 (which I guess I would not need) but I am still getting the same issue.

Regards,
MArcela.


On 29.02.2016 16:44, Fabian Hueske wrote:
> Hi Marcela,
>
> do you run the algorithm in both setups with the same parallelism?
>
> Best, Fabian
>
> 2016-02-26 16:52 GMT+01:00 Marcela Charfuelan
> <charfuelanoliva@tu-berlin.de <ma...@tu-berlin.de>>:
>
>     Hello,
>
>     I implemented an algorithm that includes iterations (EM algorithm)
>     and I am getting different results when running in eclipse (Luna
>     Release (4.4.0)) and when running in the command line using Flink
>     run; the program does not crash is just that after the first
>     iteration the results are different (wrong in the command line).
>
>     The solution I am getting in eclipse, for each iteration, is the
>     same that I would get if running the algorithm in octave for
>     example, so I am sure the solution is correct.
>
>     I have tried using java plus the command-line arguments of eclipse
>     on the command line and that also works ok (local in ubuntu).
>
>     Has anybody experienced something similar? Any idea why this could
>     happen? how can I fix this?
>
>     Regards,
>     Marcela.
>
>


Re: Iterations problem in command line

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Marcela,

do you run the algorithm in both setups with the same parallelism?

Best, Fabian

2016-02-26 16:52 GMT+01:00 Marcela Charfuelan <ch...@tu-berlin.de>
:

> Hello,
>
> I implemented an algorithm that includes iterations (EM algorithm) and I
> am getting different results when running in eclipse (Luna Release (4.4.0))
> and when running in the command line using Flink run; the program does not
> crash is just that after the first iteration the results are different
> (wrong in the command line).
>
> The solution I am getting in eclipse, for each iteration, is the same that
> I would get if running the algorithm in octave for example, so I am sure
> the solution is correct.
>
> I have tried using java plus the command-line arguments of eclipse on the
> command line and that also works ok (local in ubuntu).
>
> Has anybody experienced something similar? Any idea why this could happen?
> how can I fix this?
>
> Regards,
> Marcela.
>