You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Vasiliki Kalavri <va...@gmail.com> on 2015/07/07 15:20:21 UTC

[Gelly] Help with GSA compiler tests

Hello to my squirrels,

I've started looking into FLINK-1943
<https://issues.apache.org/jira/browse/FLINK-1943> and I need some help to
understand what to test and how to do it properly.

In the corresponding Spargel compiler test, the following functionality is
checked:

1. sink: the ship strategy is FORWARD and the parallelism is correct
2. iteration: degree of parallelism
3. solution set join: parallelism and input1 ship strategy is PARTITION_HASH
4. workset join: parallelism, input1 (edges) ship strategy is
PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD
5. check that the initial partitioning is pushed out of the loop
6. check that the initial workset sort is outside the loop

I have been able to verify 1-4 of the above for the GSA iteration plan, but
I'm not sure how to check (5) and (6) or whether they are expected to hold
in the GSA case.

In [1] you can see what the GSA iteration operators looks like and in [2]
you can see what the visualizer tools generates the GSA connected
components.

Any pointers would be greatly appreciated!

Cheers,
Vasia.

[1]:
https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing
[2]: http://imgur.com/GQZ48ZI

Re: [Gelly] Help with GSA compiler tests

Posted by Stephan Ewen <se...@apache.org>.
Lady Kalamari,

The plan looks good.

To test whether the data is partitioned there: If you have the optimizer
plan, make sure the global properties have a partitioning property of
"PATITIONED_HASH".

Thanks,
Stephan


On Wed, Jul 15, 2015 at 2:07 PM, Vasiliki Kalavri <vasilikikalavri@gmail.com
> wrote:

> Hi,
>
> thank you Stephan!
>
> Here's the missing part of the plan: http://i.imgur.com/N861tg1.png
> There is one hash partition / sort. Is this what you're talking about?
>
> Regarding your second point, how can I test if the data is known to be
> partitioned at the end?
>
>
> -Vasia.
>
> On 15 July 2015 at 13:13, Stephan Ewen <se...@apache.org> wrote:
>
> > Hey Vasia!
> >
> > Sorry for the late response... Thanks for pinging again!
> >
> > The optimizer is acting a little funky here - seems an artifact of the
> > "properties" optimization.
> >
> >   -> The initial join needs to be partitioned and sorted. Can you check
> > whether one partitioning and sorting happens before the iteration? That
> > part is cut off in the screenshot sou sent. It must be either on the
> input
> > of the iteration, of the output.
> >
> >   -> The iteration needs to make sure it leaves the data partitioned and
> > sorted. There is a "re-sorting" operator at the end ("Rebuild Workset
> > Properties"), but it does not partition. The test should make sure the
> data
> > is known to be partitioned at the very end of the iteration (after the
> > "Rebuild Workset Properties" operator). This is probably true, if the
> join
> > has some forward field annotation.
> >
> > We can have a quick skype chat later, if you have more questions...
> >
> > Greetings,
> > Stephan
> >
> >
> >
> > On Wed, Jul 15, 2015 at 12:08 PM, Vasiliki Kalavri <
> > vasilikikalavri@gmail.com> wrote:
> >
> > > Hey,
> > >
> > > any input on this? or a hint? or where to look to figure this out by
> > > myself?
> > >
> > > Thanks!
> > > -Vasia.
> > >
> > > On 7 July 2015 at 15:20, Vasiliki Kalavri <va...@gmail.com>
> > > wrote:
> > >
> > > > Hello to my squirrels,
> > > >
> > > > I've started looking into FLINK-1943
> > > > <https://issues.apache.org/jira/browse/FLINK-1943> and I need some
> > help
> > > > to understand what to test and how to do it properly.
> > > >
> > > > In the corresponding Spargel compiler test, the following
> functionality
> > > is
> > > > checked:
> > > >
> > > > 1. sink: the ship strategy is FORWARD and the parallelism is correct
> > > > 2. iteration: degree of parallelism
> > > > 3. solution set join: parallelism and input1 ship strategy is
> > > > PARTITION_HASH
> > > > 4. workset join: parallelism, input1 (edges) ship strategy is
> > > > PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD
> > > > 5. check that the initial partitioning is pushed out of the loop
> > > > 6. check that the initial workset sort is outside the loop
> > > >
> > > > I have been able to verify 1-4 of the above for the GSA iteration
> plan,
> > > > but I'm not sure how to check (5) and (6) or whether they are
> expected
> > to
> > > > hold in the GSA case.
> > > >
> > > > In [1] you can see what the GSA iteration operators looks like and in
> > [2]
> > > > you can see what the visualizer tools generates the GSA connected
> > > > components.
> > > >
> > > > Any pointers would be greatly appreciated!
> > > >
> > > > Cheers,
> > > > Vasia.
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing
> > > > [2]: http://imgur.com/GQZ48ZI
> > > >
> > >
> >
>

Re: [Gelly] Help with GSA compiler tests

Posted by Vasiliki Kalavri <va...@gmail.com>.
Hi,

thank you Stephan!

Here's the missing part of the plan: http://i.imgur.com/N861tg1.png
There is one hash partition / sort. Is this what you're talking about?

Regarding your second point, how can I test if the data is known to be
partitioned at the end?


-Vasia.

On 15 July 2015 at 13:13, Stephan Ewen <se...@apache.org> wrote:

> Hey Vasia!
>
> Sorry for the late response... Thanks for pinging again!
>
> The optimizer is acting a little funky here - seems an artifact of the
> "properties" optimization.
>
>   -> The initial join needs to be partitioned and sorted. Can you check
> whether one partitioning and sorting happens before the iteration? That
> part is cut off in the screenshot sou sent. It must be either on the input
> of the iteration, of the output.
>
>   -> The iteration needs to make sure it leaves the data partitioned and
> sorted. There is a "re-sorting" operator at the end ("Rebuild Workset
> Properties"), but it does not partition. The test should make sure the data
> is known to be partitioned at the very end of the iteration (after the
> "Rebuild Workset Properties" operator). This is probably true, if the join
> has some forward field annotation.
>
> We can have a quick skype chat later, if you have more questions...
>
> Greetings,
> Stephan
>
>
>
> On Wed, Jul 15, 2015 at 12:08 PM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com> wrote:
>
> > Hey,
> >
> > any input on this? or a hint? or where to look to figure this out by
> > myself?
> >
> > Thanks!
> > -Vasia.
> >
> > On 7 July 2015 at 15:20, Vasiliki Kalavri <va...@gmail.com>
> > wrote:
> >
> > > Hello to my squirrels,
> > >
> > > I've started looking into FLINK-1943
> > > <https://issues.apache.org/jira/browse/FLINK-1943> and I need some
> help
> > > to understand what to test and how to do it properly.
> > >
> > > In the corresponding Spargel compiler test, the following functionality
> > is
> > > checked:
> > >
> > > 1. sink: the ship strategy is FORWARD and the parallelism is correct
> > > 2. iteration: degree of parallelism
> > > 3. solution set join: parallelism and input1 ship strategy is
> > > PARTITION_HASH
> > > 4. workset join: parallelism, input1 (edges) ship strategy is
> > > PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD
> > > 5. check that the initial partitioning is pushed out of the loop
> > > 6. check that the initial workset sort is outside the loop
> > >
> > > I have been able to verify 1-4 of the above for the GSA iteration plan,
> > > but I'm not sure how to check (5) and (6) or whether they are expected
> to
> > > hold in the GSA case.
> > >
> > > In [1] you can see what the GSA iteration operators looks like and in
> [2]
> > > you can see what the visualizer tools generates the GSA connected
> > > components.
> > >
> > > Any pointers would be greatly appreciated!
> > >
> > > Cheers,
> > > Vasia.
> > >
> > > [1]:
> > >
> >
> https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing
> > > [2]: http://imgur.com/GQZ48ZI
> > >
> >
>

Re: [Gelly] Help with GSA compiler tests

Posted by Stephan Ewen <se...@apache.org>.
Hey Vasia!

Sorry for the late response... Thanks for pinging again!

The optimizer is acting a little funky here - seems an artifact of the
"properties" optimization.

  -> The initial join needs to be partitioned and sorted. Can you check
whether one partitioning and sorting happens before the iteration? That
part is cut off in the screenshot sou sent. It must be either on the input
of the iteration, of the output.

  -> The iteration needs to make sure it leaves the data partitioned and
sorted. There is a "re-sorting" operator at the end ("Rebuild Workset
Properties"), but it does not partition. The test should make sure the data
is known to be partitioned at the very end of the iteration (after the
"Rebuild Workset Properties" operator). This is probably true, if the join
has some forward field annotation.

We can have a quick skype chat later, if you have more questions...

Greetings,
Stephan



On Wed, Jul 15, 2015 at 12:08 PM, Vasiliki Kalavri <
vasilikikalavri@gmail.com> wrote:

> Hey,
>
> any input on this? or a hint? or where to look to figure this out by
> myself?
>
> Thanks!
> -Vasia.
>
> On 7 July 2015 at 15:20, Vasiliki Kalavri <va...@gmail.com>
> wrote:
>
> > Hello to my squirrels,
> >
> > I've started looking into FLINK-1943
> > <https://issues.apache.org/jira/browse/FLINK-1943> and I need some help
> > to understand what to test and how to do it properly.
> >
> > In the corresponding Spargel compiler test, the following functionality
> is
> > checked:
> >
> > 1. sink: the ship strategy is FORWARD and the parallelism is correct
> > 2. iteration: degree of parallelism
> > 3. solution set join: parallelism and input1 ship strategy is
> > PARTITION_HASH
> > 4. workset join: parallelism, input1 (edges) ship strategy is
> > PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD
> > 5. check that the initial partitioning is pushed out of the loop
> > 6. check that the initial workset sort is outside the loop
> >
> > I have been able to verify 1-4 of the above for the GSA iteration plan,
> > but I'm not sure how to check (5) and (6) or whether they are expected to
> > hold in the GSA case.
> >
> > In [1] you can see what the GSA iteration operators looks like and in [2]
> > you can see what the visualizer tools generates the GSA connected
> > components.
> >
> > Any pointers would be greatly appreciated!
> >
> > Cheers,
> > Vasia.
> >
> > [1]:
> >
> https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing
> > [2]: http://imgur.com/GQZ48ZI
> >
>

Re: [Gelly] Help with GSA compiler tests

Posted by Vasiliki Kalavri <va...@gmail.com>.
Hey,

any input on this? or a hint? or where to look to figure this out by myself?

Thanks!
-Vasia.

On 7 July 2015 at 15:20, Vasiliki Kalavri <va...@gmail.com> wrote:

> Hello to my squirrels,
>
> I've started looking into FLINK-1943
> <https://issues.apache.org/jira/browse/FLINK-1943> and I need some help
> to understand what to test and how to do it properly.
>
> In the corresponding Spargel compiler test, the following functionality is
> checked:
>
> 1. sink: the ship strategy is FORWARD and the parallelism is correct
> 2. iteration: degree of parallelism
> 3. solution set join: parallelism and input1 ship strategy is
> PARTITION_HASH
> 4. workset join: parallelism, input1 (edges) ship strategy is
> PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD
> 5. check that the initial partitioning is pushed out of the loop
> 6. check that the initial workset sort is outside the loop
>
> I have been able to verify 1-4 of the above for the GSA iteration plan,
> but I'm not sure how to check (5) and (6) or whether they are expected to
> hold in the GSA case.
>
> In [1] you can see what the GSA iteration operators looks like and in [2]
> you can see what the visualizer tools generates the GSA connected
> components.
>
> Any pointers would be greatly appreciated!
>
> Cheers,
> Vasia.
>
> [1]:
> https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing
> [2]: http://imgur.com/GQZ48ZI
>