You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Maximilian Alber <al...@gmail.com> on 2014/12/11 12:15:07 UTC

Understanding the behavior

Hi Flinksters,

after mapping a data set, the only value seems to disappear. I cannot
explain this behavior. Maybe someone can help me?

In this code I have 4 versions, the first two do basically nothing, but
ensure us that there is actually a value inside the dataset.
Version 2 maps the vector to a new vector. But the result set is empty.
Version 3 the same.

What I would like to achieve is version 3 aka change the id value of the
vector. But somehow the vector disappears and the result is always an empty
set.

val startWidth =
env.fromCollection[Vector](Seq(Vector.ones(config.dimensions) *
config.startWidth)) map {x => new Vector(0, x.values)}
val startUpdate =
env.fromCollection[Vector](Seq(Vector.ones(config.dimensions) * 0.01F)) map
{x => new Vector(1, x.values)}
val startLastGradient =
env.fromCollection[Vector](Seq(Vector.zeros(config.dimensions))) map {x =>
new Vector(2, x.values)}

var stepSet = startWidth union startUpdate union startLastGradient
stepSet = stepSet.iterate(1){
    stepSet =>
    // version 1
    val width = stepSet filter {_.id == 0};// works
    // version 2
    val width = stepSet filter {_.id == 0} map {x => x};// works
    // version 3
    val width = stepSet filter {_.id == 0} map {x => new Vector(-1,
x.values)};// does not work
    // version 4
    val width = stepSet filter {_.id == 0} map {x: Vector => new Vector(23,
Array(1.0F, 2.0F))};// does not work
  width
}


I append you jar, source code and input files.
The program writes into the the out_file the width dataset.
You can change the code "versions" at line 353 cont.

May call the program with (you need to update jar, in_file, random_file,
set out_file as you want):
flink run the_jar_file '-c', 'bumpboost.Job', 'in_file=/tmp/tmpdW3O98',
'out_file=/tmp/tmp2RISRF', 'random_file=/tmp/tmpEN9XU7', 'dimensions=1',
'N=100', 'iterations=30', 'multi_bump_boost=1',
'gradient_descent_iterations=30', 'cache=False', 'start_width=1.0',
'min_width=-4', 'max_width=6', 'min_width_update=1e-08',
'max_width_update=10'

Thank you!
Cheers,
Max

Re: Understanding the behavior

Posted by Maximilian Alber <al...@gmail.com>.
Hi!

Damn, I thought I set the iteration count at 1, but I did it in the wrong
place. My bad.
Thanks!!

Cheers,
max

On Thu, Dec 11, 2014 at 12:41 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> the reasons is that you filter out the items. In your first iteration,
> a new element is created that has a -1 and a 23 as the first field,
> respectively for version 3 and version 4. In the second iteration, you
> filter out all elements that do not have a "0" as the first field.
> Thus you arrive at an empty set.
>
> Cheers,
> Aljoscha
>
> On Thu, Dec 11, 2014 at 12:15 PM, Maximilian Alber
> <al...@gmail.com> wrote:
> > Hi Flinksters,
> >
> > after mapping a data set, the only value seems to disappear. I cannot
> > explain this behavior. Maybe someone can help me?
> >
> > In this code I have 4 versions, the first two do basically nothing, but
> > ensure us that there is actually a value inside the dataset.
> > Version 2 maps the vector to a new vector. But the result set is empty.
> > Version 3 the same.
> >
> > What I would like to achieve is version 3 aka change the id value of the
> > vector. But somehow the vector disappears and the result is always an
> empty
> > set.
> >
> > val startWidth =
> > env.fromCollection[Vector](Seq(Vector.ones(config.dimensions) *
> > config.startWidth)) map {x => new Vector(0, x.values)}
> > val startUpdate =
> > env.fromCollection[Vector](Seq(Vector.ones(config.dimensions) * 0.01F))
> map
> > {x => new Vector(1, x.values)}
> > val startLastGradient =
> > env.fromCollection[Vector](Seq(Vector.zeros(config.dimensions))) map {x
> =>
> > new Vector(2, x.values)}
> >
> > var stepSet = startWidth union startUpdate union startLastGradient
> > stepSet = stepSet.iterate(1){
> >     stepSet =>
> >     // version 1
> >     val width = stepSet filter {_.id == 0};// works
> >     // version 2
> >     val width = stepSet filter {_.id == 0} map {x => x};// works
> >     // version 3
> >     val width = stepSet filter {_.id == 0} map {x => new Vector(-1,
> > x.values)};// does not work
> >     // version 4
> >     val width = stepSet filter {_.id == 0} map {x: Vector => new
> Vector(23,
> > Array(1.0F, 2.0F))};// does not work
> >   width
> > }
> >
> >
> > I append you jar, source code and input files.
> > The program writes into the the out_file the width dataset.
> > You can change the code "versions" at line 353 cont.
> >
> > May call the program with (you need to update jar, in_file, random_file,
> set
> > out_file as you want):
> > flink run the_jar_file '-c', 'bumpboost.Job', 'in_file=/tmp/tmpdW3O98',
> > 'out_file=/tmp/tmp2RISRF', 'random_file=/tmp/tmpEN9XU7', 'dimensions=1',
> > 'N=100', 'iterations=30', 'multi_bump_boost=1',
> > 'gradient_descent_iterations=30', 'cache=False', 'start_width=1.0',
> > 'min_width=-4', 'max_width=6', 'min_width_update=1e-08',
> > 'max_width_update=10'
> >
> > Thank you!
> > Cheers,
> > Max
>

Re: Understanding the behavior

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
the reasons is that you filter out the items. In your first iteration,
a new element is created that has a -1 and a 23 as the first field,
respectively for version 3 and version 4. In the second iteration, you
filter out all elements that do not have a "0" as the first field.
Thus you arrive at an empty set.

Cheers,
Aljoscha

On Thu, Dec 11, 2014 at 12:15 PM, Maximilian Alber
<al...@gmail.com> wrote:
> Hi Flinksters,
>
> after mapping a data set, the only value seems to disappear. I cannot
> explain this behavior. Maybe someone can help me?
>
> In this code I have 4 versions, the first two do basically nothing, but
> ensure us that there is actually a value inside the dataset.
> Version 2 maps the vector to a new vector. But the result set is empty.
> Version 3 the same.
>
> What I would like to achieve is version 3 aka change the id value of the
> vector. But somehow the vector disappears and the result is always an empty
> set.
>
> val startWidth =
> env.fromCollection[Vector](Seq(Vector.ones(config.dimensions) *
> config.startWidth)) map {x => new Vector(0, x.values)}
> val startUpdate =
> env.fromCollection[Vector](Seq(Vector.ones(config.dimensions) * 0.01F)) map
> {x => new Vector(1, x.values)}
> val startLastGradient =
> env.fromCollection[Vector](Seq(Vector.zeros(config.dimensions))) map {x =>
> new Vector(2, x.values)}
>
> var stepSet = startWidth union startUpdate union startLastGradient
> stepSet = stepSet.iterate(1){
>     stepSet =>
>     // version 1
>     val width = stepSet filter {_.id == 0};// works
>     // version 2
>     val width = stepSet filter {_.id == 0} map {x => x};// works
>     // version 3
>     val width = stepSet filter {_.id == 0} map {x => new Vector(-1,
> x.values)};// does not work
>     // version 4
>     val width = stepSet filter {_.id == 0} map {x: Vector => new Vector(23,
> Array(1.0F, 2.0F))};// does not work
>   width
> }
>
>
> I append you jar, source code and input files.
> The program writes into the the out_file the width dataset.
> You can change the code "versions" at line 353 cont.
>
> May call the program with (you need to update jar, in_file, random_file, set
> out_file as you want):
> flink run the_jar_file '-c', 'bumpboost.Job', 'in_file=/tmp/tmpdW3O98',
> 'out_file=/tmp/tmp2RISRF', 'random_file=/tmp/tmpEN9XU7', 'dimensions=1',
> 'N=100', 'iterations=30', 'multi_bump_boost=1',
> 'gradient_descent_iterations=30', 'cache=False', 'start_width=1.0',
> 'min_width=-4', 'max_width=6', 'min_width_update=1e-08',
> 'max_width_update=10'
>
> Thank you!
> Cheers,
> Max