You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Lydia Ickler <ic...@googlemail.com> on 2016/04/04 09:52:19 UTC

varying results: local VS cluster

Hi all,

I have an issue regarding execution on 1 machine VS 5 machines.
If I execute the following code the results are not the same though I would expect them to be since the input file is the same.
Do you have any suggestions?

Thanks in advance!
Lydia
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.getConfig().setGlobalJobParameters(parameters);

//read input file
DataSet<Tuple3<Integer, Integer, Double>> matrixA = readMatrix(env, parameters.get("input"));
//Approximate EigenVector by PowerIteration
//get initial vector - which equals matrixA * [1, ... , 1]
DataSet<Tuple3<Integer, Integer, Double>> initial0 = (matrixA.groupBy(0)).sum(2);
DataSet<Tuple3<Integer, Integer, Double>> maximum = initial0.maxBy(2);
//normalize by maximum value
DataSet<Tuple3<Integer, Integer, Double>> initial= (initial0.cross(maximum)).map(new normalizeByMax());

//BulkIteration to find dominant eigenvector
IterativeDataSet<Tuple3<Integer, Integer, Double>> iteration = initial.iterate(1);

DataSet<Tuple3<Integer, Integer, Double>> intermediate = ((((((matrixA.join(iteration).where(1).equalTo(0))
        .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2)).groupBy(0)).sum(2)).
        cross(((((((((matrixA.join(iteration).where(1).equalTo(0))
                .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2))).groupBy(0)).sum(2)).sum(2)))
        .map(new normalizeByMax());

DataSet<Tuple3<Integer, Integer, Double>> diffs = (iteration.join(intermediate).where(0).equalTo(0)).with(new deltaFilter());
DataSet<Tuple3<Integer, Integer, Double>> eigenVector  = iteration.closeWith(intermediate,diffs);

eigenVector.writeAsCsv(parameters.get("output"));
env.execute("Power Iteration");


Re: varying results: local VS cluster

Posted by Stephan Ewen <se...@apache.org>.
Just to make sure: Most numeric programs produce varying results across
different execution. If the algorithm is correct, they should converge
towards the same solution, but it is very common that the exact solution
differs.




On Mon, Apr 11, 2016 at 10:16 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> could you please provide a minimal example input and maybe also the output
> for parallelism=5 and parallelism=1 so that we can check.
>
> --
> aljoscha
>
> On Mon, 4 Apr 2016 at 09:52 Lydia Ickler <ic...@googlemail.com> wrote:
>
>> Hi all,
>>
>> I have an issue regarding execution on 1 machine VS 5 machines.
>> If I execute the following code the results are not the same though I
>> would expect them to be since the input file is the same.
>> Do you have any suggestions?
>>
>> Thanks in advance!
>> Lydia
>>
>> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
>> env.getConfig().setGlobalJobParameters(parameters);
>>
>> //read input file
>> DataSet<Tuple3<Integer, Integer, Double>> matrixA = readMatrix(env, parameters.get("input"));
>> //Approximate EigenVector by PowerIteration
>> //get initial vector - which equals matrixA * [1, ... , 1]
>> DataSet<Tuple3<Integer, Integer, Double>> initial0 = (matrixA.groupBy(0)).sum(2);
>> DataSet<Tuple3<Integer, Integer, Double>> maximum = initial0.maxBy(2);
>> //normalize by maximum value
>> DataSet<Tuple3<Integer, Integer, Double>> initial= (initial0.cross(maximum)).map(new normalizeByMax());
>>
>> //BulkIteration to find dominant eigenvector
>> IterativeDataSet<Tuple3<Integer, Integer, Double>> iteration = initial.iterate(1);
>>
>> DataSet<Tuple3<Integer, Integer, Double>> intermediate = ((((((matrixA.join(iteration).where(1).equalTo(0))
>>         .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2)).groupBy(0)).sum(2)).
>>         cross(((((((((matrixA.join(iteration).where(1).equalTo(0))
>>                 .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2))).groupBy(0)).sum(2)).sum(2)))
>>         .map(new normalizeByMax());
>>
>> DataSet<Tuple3<Integer, Integer, Double>> diffs = (iteration.join(intermediate).where(0).equalTo(0)).with(new deltaFilter());
>> DataSet<Tuple3<Integer, Integer, Double>> eigenVector  = iteration.closeWith(intermediate,diffs);
>>
>> eigenVector.writeAsCsv(parameters.get("output"));
>> env.execute("Power Iteration");
>>
>>
>>

Re: varying results: local VS cluster

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
could you please provide a minimal example input and maybe also the output
for parallelism=5 and parallelism=1 so that we can check.

--
aljoscha

On Mon, 4 Apr 2016 at 09:52 Lydia Ickler <ic...@googlemail.com> wrote:

> Hi all,
>
> I have an issue regarding execution on 1 machine VS 5 machines.
> If I execute the following code the results are not the same though I
> would expect them to be since the input file is the same.
> Do you have any suggestions?
>
> Thanks in advance!
> Lydia
>
> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().setGlobalJobParameters(parameters);
>
> //read input file
> DataSet<Tuple3<Integer, Integer, Double>> matrixA = readMatrix(env, parameters.get("input"));
> //Approximate EigenVector by PowerIteration
> //get initial vector - which equals matrixA * [1, ... , 1]
> DataSet<Tuple3<Integer, Integer, Double>> initial0 = (matrixA.groupBy(0)).sum(2);
> DataSet<Tuple3<Integer, Integer, Double>> maximum = initial0.maxBy(2);
> //normalize by maximum value
> DataSet<Tuple3<Integer, Integer, Double>> initial= (initial0.cross(maximum)).map(new normalizeByMax());
>
> //BulkIteration to find dominant eigenvector
> IterativeDataSet<Tuple3<Integer, Integer, Double>> iteration = initial.iterate(1);
>
> DataSet<Tuple3<Integer, Integer, Double>> intermediate = ((((((matrixA.join(iteration).where(1).equalTo(0))
>         .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2)).groupBy(0)).sum(2)).
>         cross(((((((((matrixA.join(iteration).where(1).equalTo(0))
>                 .map(new ProjectJoinResultMapper())).groupBy(0, 1)).sum(2))).groupBy(0)).sum(2)).sum(2)))
>         .map(new normalizeByMax());
>
> DataSet<Tuple3<Integer, Integer, Double>> diffs = (iteration.join(intermediate).where(0).equalTo(0)).with(new deltaFilter());
> DataSet<Tuple3<Integer, Integer, Double>> eigenVector  = iteration.closeWith(intermediate,diffs);
>
> eigenVector.writeAsCsv(parameters.get("output"));
> env.execute("Power Iteration");
>
>
>