You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Greg Hogan <co...@greghogan.com> on 2016/03/04 23:37:07 UTC

Tuple performance and the curious JIT compiler

I am noticing what looks like the same drop-off in performance when
introducing TupleN subclasses as expressed in "Understanding the JIT and
tuning the implementation" [1].

I start my single-node cluster, run an algorithm which relies purely on
Tuples, and measure the runtime. I execute a separate jar which executes
essentially the same algorithm but using Gelly's Edge (which subclasses
Tuple3 but does not add any extra fields) and now both the Tuple and Edge
algorithms take twice as long.

Has this been previously discussed? If not I can work up a demonstration.

[1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html

Greg

Re: Tuple performance and the curious JIT compiler

Posted by Márton Balassi <ba...@gmail.com>.

If the community can agree that the proposal that Gábor Horváth has
suggested is a nice approach and can accept that the results will be coming
around mid summer, then I would strongly suggest "reserving" him this topic.

His previous experience makes him a strong candidate for the task. To add
to that Gábor Gévay and myself have known him for a couple of years now, he
is a hard-working, honest person. We beleive that he will be a great
addition to the community. That is why we have been aiding his first steps
with Flink.

I would be happy to further provide this support for Gábor Horváth as his
GSoc mentor, while Gábor Gévay volunteered for unofficially co-mentor him.
But as it is true for all potential GSoC students they will be regular
Flink contributors and thus will interact with ("will be mentored by") the
whole community via our standard channels.

Best,

Marton

On Thu, Mar 10, 2016 at 10:41 AM, Ufuk Celebi <uc...@apache.org> wrote:

> Very nice proposal!
>
> On Wed, Mar 9, 2016 at 6:35 PM, Stephan Ewen <se...@apache.org> wrote:
> > Thanks for posting this.
> >
> > I think it is not super urgent (in the sense of weeks or few months), so
> > results around mid summer is probably good.
> > The background in LLVM is a very good base for this!
> >
> > On Wed, Mar 9, 2016 at 3:56 PM, Gábor Horváth <xa...@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> In the meantime I sent out the current version of the proposal draft
> [1].
> >> Hopefully it will help you triage this task and contribute to the
> >> discussion of the problem.
> >> How urgent is this issue? In what time frame should there be results?
> >>
> >> Best Regards,
> >> Gábor
> >>
> >> [1]
> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/GSoC-Project-Proposal-Draft-Code-Generation-in-Serializers-td10702.html
> >>
> >> On 9 March 2016 at 14:49, Stephan Ewen <se...@apache.org> wrote:
> >>
> >> > Do we have consensus that we want to "reserve" this topic for a GSoC
> >> > student?
> >> >
> >> > It is becoming a feature that gains more importance. To see we can
> "hold
> >> > off" on working on that, would be good to know a bit more, like
> >> >   - when is it decided whether this project takes place?
> >> >   - when would results be there?
> >> >   - can we expect the results to be usable, i.e., how good is the
> >> student?
> >> > (no offence, but so far the results in GSoC were everywhere between
> very
> >> > good and super bad)
> >> >
> >> > Greetings,
> >> > Stephan
> >> >
> >> >
> >> > On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <
> balassi.marton@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > @Fabian: That is my bad, but I think we should be still on time.
> Pinged
> >> > Uli
> >> > > just to make sure. Proposal from Gabor and Jira from me are coming
> >> soon.
> >> > >
> >> > > On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fh...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi Gabor,
> >> > > >
> >> > > > I did not find any Flink proposals for this year's GSoC in JIRA
> >> (should
> >> > > be
> >> > > > labeled with gsoc2016).
> >> > > > I am also not sure if any of the Flink committers signed up as a
> GSoC
> >> > > > mentor.
> >> > > > Maybe it is still time to do that but as it looks right now there
> are
> >> > no
> >> > > > GSoC projects offered by Flink.
> >> > > >
> >> > > > Best, Fabian
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xa...@gmail.com>:
> >> > > >
> >> > > > > Hi!
> >> > > > >
> >> > > > > I am planning to do GSoC and I would like to work on the
> >> serializers.
> >> > > > More
> >> > > > > specifically I would like to implement code generation. I am
> >> planning
> >> > > to
> >> > > > > send the first draft of the proposal to the mailing list early
> next
> >> > > week.
> >> > > > > If everything is going well, that will include some preliminary
> >> > > > benchmarks
> >> > > > > how much performance gain can be expected from hand written
> >> > > serializers.
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Gábor
> >> > > > >
> >> > > > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org>
> wrote:
> >> > > > >
> >> > > > > > Ah, very good, that makes sense!
> >> > > > > >
> >> > > > > > I would guess that this performance difference could probably
> be
> >> > seen
> >> > > > at
> >> > > > > > various points where generic serializers and comparators are
> used
> >> > > (also
> >> > > > > for
> >> > > > > > Comparable, Writable) or
> >> > > > > > where the TupleSerializer delegates to a sequence of other
> >> > > > > TypeSerializers.
> >> > > > > >
> >> > > > > > I guess creating more specialized serializers would solve
> some of
> >> > > these
> >> > > > > > problems, like in your IntValue vs LongValue case.
> >> > > > > >
> >> > > > > > The best way to solve that would probably be through code
> >> > generation
> >> > > in
> >> > > > > the
> >> > > > > > serializers. That has actually been my wish for quite a while.
> >> > > > > > If you are also into these kinds of low-level performance
> topics,
> >> > we
> >> > > > > could
> >> > > > > > start a discussion on that.
> >> > > > > >
> >> > > > > > Greetings,
> >> > > > > > Stephan
> >> > > > > >
> >> > > > > >
> >> > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <
> code@greghogan.com>
> >> > > > wrote:
> >> > > > > >
> >> > > > > > > The issue is not with the Tuple hierarchy (running Gelly
> >> examples
> >> > > had
> >> > > > > no
> >> > > > > > > effect on runtime, and as you note there aren't any subclass
> >> > > > overrides)
> >> > > > > > but
> >> > > > > > > with CopyableValue. I had been using IntValue exclusively
> but
> >> had
> >> > > > > > switched
> >> > > > > > > to using LongValue for graph generation.
> >> CopyableValueComparator
> >> > > and
> >> > > > > > > CopyableValueSerializer are now working with multiple types.
> >> > > > > > >
> >> > > > > > > If I create IntValue- and LongValue-specific versions of
> >> > > > > > > CopyableValueComparator and CopyableValueSerializer and
> modify
> >> > > > > > > ValueTypeInfo to return these then I see the expected
> >> > performance.
> >> > > > > > >
> >> > > > > > > Greg
> >> > > > > > >
> >> > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <
> sewen@apache.org
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi Greg!
> >> > > > > > > >
> >> > > > > > > > Sounds very interesting.
> >> > > > > > > >
> >> > > > > > > > Do you have a hunch what "virtual" Tuple methods are being
> >> used
> >> > > > that
> >> > > > > > > become
> >> > > > > > > > less jit-able? In many cases, tuples use only field
> accesses
> >> > > (like
> >> > > > > > > > "vakle.f1") in the user functions.
> >> > > > > > > >
> >> > > > > > > > I have to dig into the serializers, to see if they could
> >> suffer
> >> > > > from
> >> > > > > > > that.
> >> > > > > > > > The "getField(pos)" method for example should always have
> >> many
> >> > > > > > overrides
> >> > > > > > > > (though few would be loaded at any time, because one
> usually
> >> > does
> >> > > > not
> >> > > > > > use
> >> > > > > > > > all Tuple classes at the same time).
> >> > > > > > > >
> >> > > > > > > > Greetings,
> >> > > > > > > > Stephan
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <
> >> > code@greghogan.com>
> >> > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > I am noticing what looks like the same drop-off in
> >> > performance
> >> > > > when
> >> > > > > > > > > introducing TupleN subclasses as expressed in
> >> "Understanding
> >> > > the
> >> > > > > JIT
> >> > > > > > > and
> >> > > > > > > > > tuning the implementation" [1].
> >> > > > > > > > >
> >> > > > > > > > > I start my single-node cluster, run an algorithm which
> >> relies
> >> > > > > purely
> >> > > > > > on
> >> > > > > > > > > Tuples, and measure the runtime. I execute a separate
> jar
> >> > which
> >> > > > > > > executes
> >> > > > > > > > > essentially the same algorithm but using Gelly's Edge
> >> (which
> >> > > > > > subclasses
> >> > > > > > > > > Tuple3 but does not add any extra fields) and now both
> the
> >> > > Tuple
> >> > > > > and
> >> > > > > > > Edge
> >> > > > > > > > > algorithms take twice as long.
> >> > > > > > > > >
> >> > > > > > > > > Has this been previously discussed? If not I can work
> up a
> >> > > > > > > demonstration.
> >> > > > > > > > >
> >> > > > > > > > > [1]
> >> > > > https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> >> > > > > > > > >
> >> > > > > > > > > Greg
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Tuple performance and the curious JIT compiler

Posted by Ufuk Celebi <uc...@apache.org>.

Very nice proposal!

On Wed, Mar 9, 2016 at 6:35 PM, Stephan Ewen <se...@apache.org> wrote:
> Thanks for posting this.
>
> I think it is not super urgent (in the sense of weeks or few months), so
> results around mid summer is probably good.
> The background in LLVM is a very good base for this!
>
> On Wed, Mar 9, 2016 at 3:56 PM, Gábor Horváth <xa...@gmail.com> wrote:
>
>> Hi,
>>
>> In the meantime I sent out the current version of the proposal draft [1].
>> Hopefully it will help you triage this task and contribute to the
>> discussion of the problem.
>> How urgent is this issue? In what time frame should there be results?
>>
>> Best Regards,
>> Gábor
>>
>> [1]
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/GSoC-Project-Proposal-Draft-Code-Generation-in-Serializers-td10702.html
>>
>> On 9 March 2016 at 14:49, Stephan Ewen <se...@apache.org> wrote:
>>
>> > Do we have consensus that we want to "reserve" this topic for a GSoC
>> > student?
>> >
>> > It is becoming a feature that gains more importance. To see we can "hold
>> > off" on working on that, would be good to know a bit more, like
>> >   - when is it decided whether this project takes place?
>> >   - when would results be there?
>> >   - can we expect the results to be usable, i.e., how good is the
>> student?
>> > (no offence, but so far the results in GSoC were everywhere between very
>> > good and super bad)
>> >
>> > Greetings,
>> > Stephan
>> >
>> >
>> > On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <balassi.marton@gmail.com
>> >
>> > wrote:
>> >
>> > > @Fabian: That is my bad, but I think we should be still on time. Pinged
>> > Uli
>> > > just to make sure. Proposal from Gabor and Jira from me are coming
>> soon.
>> > >
>> > > On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fh...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Gabor,
>> > > >
>> > > > I did not find any Flink proposals for this year's GSoC in JIRA
>> (should
>> > > be
>> > > > labeled with gsoc2016).
>> > > > I am also not sure if any of the Flink committers signed up as a GSoC
>> > > > mentor.
>> > > > Maybe it is still time to do that but as it looks right now there are
>> > no
>> > > > GSoC projects offered by Flink.
>> > > >
>> > > > Best, Fabian
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xa...@gmail.com>:
>> > > >
>> > > > > Hi!
>> > > > >
>> > > > > I am planning to do GSoC and I would like to work on the
>> serializers.
>> > > > More
>> > > > > specifically I would like to implement code generation. I am
>> planning
>> > > to
>> > > > > send the first draft of the proposal to the mailing list early next
>> > > week.
>> > > > > If everything is going well, that will include some preliminary
>> > > > benchmarks
>> > > > > how much performance gain can be expected from hand written
>> > > serializers.
>> > > > >
>> > > > > Best regards,
>> > > > > Gábor
>> > > > >
>> > > > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
>> > > > >
>> > > > > > Ah, very good, that makes sense!
>> > > > > >
>> > > > > > I would guess that this performance difference could probably be
>> > seen
>> > > > at
>> > > > > > various points where generic serializers and comparators are used
>> > > (also
>> > > > > for
>> > > > > > Comparable, Writable) or
>> > > > > > where the TupleSerializer delegates to a sequence of other
>> > > > > TypeSerializers.
>> > > > > >
>> > > > > > I guess creating more specialized serializers would solve some of
>> > > these
>> > > > > > problems, like in your IntValue vs LongValue case.
>> > > > > >
>> > > > > > The best way to solve that would probably be through code
>> > generation
>> > > in
>> > > > > the
>> > > > > > serializers. That has actually been my wish for quite a while.
>> > > > > > If you are also into these kinds of low-level performance topics,
>> > we
>> > > > > could
>> > > > > > start a discussion on that.
>> > > > > >
>> > > > > > Greetings,
>> > > > > > Stephan
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com>
>> > > > wrote:
>> > > > > >
>> > > > > > > The issue is not with the Tuple hierarchy (running Gelly
>> examples
>> > > had
>> > > > > no
>> > > > > > > effect on runtime, and as you note there aren't any subclass
>> > > > overrides)
>> > > > > > but
>> > > > > > > with CopyableValue. I had been using IntValue exclusively but
>> had
>> > > > > > switched
>> > > > > > > to using LongValue for graph generation.
>> CopyableValueComparator
>> > > and
>> > > > > > > CopyableValueSerializer are now working with multiple types.
>> > > > > > >
>> > > > > > > If I create IntValue- and LongValue-specific versions of
>> > > > > > > CopyableValueComparator and CopyableValueSerializer and modify
>> > > > > > > ValueTypeInfo to return these then I see the expected
>> > performance.
>> > > > > > >
>> > > > > > > Greg
>> > > > > > >
>> > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <sewen@apache.org
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Greg!
>> > > > > > > >
>> > > > > > > > Sounds very interesting.
>> > > > > > > >
>> > > > > > > > Do you have a hunch what "virtual" Tuple methods are being
>> used
>> > > > that
>> > > > > > > become
>> > > > > > > > less jit-able? In many cases, tuples use only field accesses
>> > > (like
>> > > > > > > > "vakle.f1") in the user functions.
>> > > > > > > >
>> > > > > > > > I have to dig into the serializers, to see if they could
>> suffer
>> > > > from
>> > > > > > > that.
>> > > > > > > > The "getField(pos)" method for example should always have
>> many
>> > > > > > overrides
>> > > > > > > > (though few would be loaded at any time, because one usually
>> > does
>> > > > not
>> > > > > > use
>> > > > > > > > all Tuple classes at the same time).
>> > > > > > > >
>> > > > > > > > Greetings,
>> > > > > > > > Stephan
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <
>> > code@greghogan.com>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > I am noticing what looks like the same drop-off in
>> > performance
>> > > > when
>> > > > > > > > > introducing TupleN subclasses as expressed in
>> "Understanding
>> > > the
>> > > > > JIT
>> > > > > > > and
>> > > > > > > > > tuning the implementation" [1].
>> > > > > > > > >
>> > > > > > > > > I start my single-node cluster, run an algorithm which
>> relies
>> > > > > purely
>> > > > > > on
>> > > > > > > > > Tuples, and measure the runtime. I execute a separate jar
>> > which
>> > > > > > > executes
>> > > > > > > > > essentially the same algorithm but using Gelly's Edge
>> (which
>> > > > > > subclasses
>> > > > > > > > > Tuple3 but does not add any extra fields) and now both the
>> > > Tuple
>> > > > > and
>> > > > > > > Edge
>> > > > > > > > > algorithms take twice as long.
>> > > > > > > > >
>> > > > > > > > > Has this been previously discussed? If not I can work up a
>> > > > > > > demonstration.
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > https://flink.apache.org/news/2015/09/16/off-heap-memory.html
>> > > > > > > > >
>> > > > > > > > > Greg
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Re: Tuple performance and the curious JIT compiler

Posted by Stephan Ewen <se...@apache.org>.

Thanks for posting this.

I think it is not super urgent (in the sense of weeks or few months), so
results around mid summer is probably good.
The background in LLVM is a very good base for this!

On Wed, Mar 9, 2016 at 3:56 PM, Gábor Horváth <xa...@gmail.com> wrote:

> Hi,
>
> In the meantime I sent out the current version of the proposal draft [1].
> Hopefully it will help you triage this task and contribute to the
> discussion of the problem.
> How urgent is this issue? In what time frame should there be results?
>
> Best Regards,
> Gábor
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/GSoC-Project-Proposal-Draft-Code-Generation-in-Serializers-td10702.html
>
> On 9 March 2016 at 14:49, Stephan Ewen <se...@apache.org> wrote:
>
> > Do we have consensus that we want to "reserve" this topic for a GSoC
> > student?
> >
> > It is becoming a feature that gains more importance. To see we can "hold
> > off" on working on that, would be good to know a bit more, like
> >   - when is it decided whether this project takes place?
> >   - when would results be there?
> >   - can we expect the results to be usable, i.e., how good is the
> student?
> > (no offence, but so far the results in GSoC were everywhere between very
> > good and super bad)
> >
> > Greetings,
> > Stephan
> >
> >
> > On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <balassi.marton@gmail.com
> >
> > wrote:
> >
> > > @Fabian: That is my bad, but I think we should be still on time. Pinged
> > Uli
> > > just to make sure. Proposal from Gabor and Jira from me are coming
> soon.
> > >
> > > On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fh...@gmail.com>
> > wrote:
> > >
> > > > Hi Gabor,
> > > >
> > > > I did not find any Flink proposals for this year's GSoC in JIRA
> (should
> > > be
> > > > labeled with gsoc2016).
> > > > I am also not sure if any of the Flink committers signed up as a GSoC
> > > > mentor.
> > > > Maybe it is still time to do that but as it looks right now there are
> > no
> > > > GSoC projects offered by Flink.
> > > >
> > > > Best, Fabian
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xa...@gmail.com>:
> > > >
> > > > > Hi!
> > > > >
> > > > > I am planning to do GSoC and I would like to work on the
> serializers.
> > > > More
> > > > > specifically I would like to implement code generation. I am
> planning
> > > to
> > > > > send the first draft of the proposal to the mailing list early next
> > > week.
> > > > > If everything is going well, that will include some preliminary
> > > > benchmarks
> > > > > how much performance gain can be expected from hand written
> > > serializers.
> > > > >
> > > > > Best regards,
> > > > > Gábor
> > > > >
> > > > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
> > > > >
> > > > > > Ah, very good, that makes sense!
> > > > > >
> > > > > > I would guess that this performance difference could probably be
> > seen
> > > > at
> > > > > > various points where generic serializers and comparators are used
> > > (also
> > > > > for
> > > > > > Comparable, Writable) or
> > > > > > where the TupleSerializer delegates to a sequence of other
> > > > > TypeSerializers.
> > > > > >
> > > > > > I guess creating more specialized serializers would solve some of
> > > these
> > > > > > problems, like in your IntValue vs LongValue case.
> > > > > >
> > > > > > The best way to solve that would probably be through code
> > generation
> > > in
> > > > > the
> > > > > > serializers. That has actually been my wish for quite a while.
> > > > > > If you are also into these kinds of low-level performance topics,
> > we
> > > > > could
> > > > > > start a discussion on that.
> > > > > >
> > > > > > Greetings,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com>
> > > > wrote:
> > > > > >
> > > > > > > The issue is not with the Tuple hierarchy (running Gelly
> examples
> > > had
> > > > > no
> > > > > > > effect on runtime, and as you note there aren't any subclass
> > > > overrides)
> > > > > > but
> > > > > > > with CopyableValue. I had been using IntValue exclusively but
> had
> > > > > > switched
> > > > > > > to using LongValue for graph generation.
> CopyableValueComparator
> > > and
> > > > > > > CopyableValueSerializer are now working with multiple types.
> > > > > > >
> > > > > > > If I create IntValue- and LongValue-specific versions of
> > > > > > > CopyableValueComparator and CopyableValueSerializer and modify
> > > > > > > ValueTypeInfo to return these then I see the expected
> > performance.
> > > > > > >
> > > > > > > Greg
> > > > > > >
> > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <sewen@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Greg!
> > > > > > > >
> > > > > > > > Sounds very interesting.
> > > > > > > >
> > > > > > > > Do you have a hunch what "virtual" Tuple methods are being
> used
> > > > that
> > > > > > > become
> > > > > > > > less jit-able? In many cases, tuples use only field accesses
> > > (like
> > > > > > > > "vakle.f1") in the user functions.
> > > > > > > >
> > > > > > > > I have to dig into the serializers, to see if they could
> suffer
> > > > from
> > > > > > > that.
> > > > > > > > The "getField(pos)" method for example should always have
> many
> > > > > > overrides
> > > > > > > > (though few would be loaded at any time, because one usually
> > does
> > > > not
> > > > > > use
> > > > > > > > all Tuple classes at the same time).
> > > > > > > >
> > > > > > > > Greetings,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <
> > code@greghogan.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I am noticing what looks like the same drop-off in
> > performance
> > > > when
> > > > > > > > > introducing TupleN subclasses as expressed in
> "Understanding
> > > the
> > > > > JIT
> > > > > > > and
> > > > > > > > > tuning the implementation" [1].
> > > > > > > > >
> > > > > > > > > I start my single-node cluster, run an algorithm which
> relies
> > > > > purely
> > > > > > on
> > > > > > > > > Tuples, and measure the runtime. I execute a separate jar
> > which
> > > > > > > executes
> > > > > > > > > essentially the same algorithm but using Gelly's Edge
> (which
> > > > > > subclasses
> > > > > > > > > Tuple3 but does not add any extra fields) and now both the
> > > Tuple
> > > > > and
> > > > > > > Edge
> > > > > > > > > algorithms take twice as long.
> > > > > > > > >
> > > > > > > > > Has this been previously discussed? If not I can work up a
> > > > > > > demonstration.
> > > > > > > > >
> > > > > > > > > [1]
> > > > https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > > > > > > >
> > > > > > > > > Greg
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Gábor Horváth <xa...@gmail.com>.

Hi,

In the meantime I sent out the current version of the proposal draft [1].
Hopefully it will help you triage this task and contribute to the
discussion of the problem.
How urgent is this issue? In what time frame should there be results?

Best Regards,
Gábor

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/GSoC-Project-Proposal-Draft-Code-Generation-in-Serializers-td10702.html

On 9 March 2016 at 14:49, Stephan Ewen <se...@apache.org> wrote:

> Do we have consensus that we want to "reserve" this topic for a GSoC
> student?
>
> It is becoming a feature that gains more importance. To see we can "hold
> off" on working on that, would be good to know a bit more, like
>   - when is it decided whether this project takes place?
>   - when would results be there?
>   - can we expect the results to be usable, i.e., how good is the student?
> (no offence, but so far the results in GSoC were everywhere between very
> good and super bad)
>
> Greetings,
> Stephan
>
>
> On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <ba...@gmail.com>
> wrote:
>
> > @Fabian: That is my bad, but I think we should be still on time. Pinged
> Uli
> > just to make sure. Proposal from Gabor and Jira from me are coming soon.
> >
> > On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > Hi Gabor,
> > >
> > > I did not find any Flink proposals for this year's GSoC in JIRA (should
> > be
> > > labeled with gsoc2016).
> > > I am also not sure if any of the Flink committers signed up as a GSoC
> > > mentor.
> > > Maybe it is still time to do that but as it looks right now there are
> no
> > > GSoC projects offered by Flink.
> > >
> > > Best, Fabian
> > >
> > >
> > >
> > >
> > >
> > >
> > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xa...@gmail.com>:
> > >
> > > > Hi!
> > > >
> > > > I am planning to do GSoC and I would like to work on the serializers.
> > > More
> > > > specifically I would like to implement code generation. I am planning
> > to
> > > > send the first draft of the proposal to the mailing list early next
> > week.
> > > > If everything is going well, that will include some preliminary
> > > benchmarks
> > > > how much performance gain can be expected from hand written
> > serializers.
> > > >
> > > > Best regards,
> > > > Gábor
> > > >
> > > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
> > > >
> > > > > Ah, very good, that makes sense!
> > > > >
> > > > > I would guess that this performance difference could probably be
> seen
> > > at
> > > > > various points where generic serializers and comparators are used
> > (also
> > > > for
> > > > > Comparable, Writable) or
> > > > > where the TupleSerializer delegates to a sequence of other
> > > > TypeSerializers.
> > > > >
> > > > > I guess creating more specialized serializers would solve some of
> > these
> > > > > problems, like in your IntValue vs LongValue case.
> > > > >
> > > > > The best way to solve that would probably be through code
> generation
> > in
> > > > the
> > > > > serializers. That has actually been my wish for quite a while.
> > > > > If you are also into these kinds of low-level performance topics,
> we
> > > > could
> > > > > start a discussion on that.
> > > > >
> > > > > Greetings,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com>
> > > wrote:
> > > > >
> > > > > > The issue is not with the Tuple hierarchy (running Gelly examples
> > had
> > > > no
> > > > > > effect on runtime, and as you note there aren't any subclass
> > > overrides)
> > > > > but
> > > > > > with CopyableValue. I had been using IntValue exclusively but had
> > > > > switched
> > > > > > to using LongValue for graph generation. CopyableValueComparator
> > and
> > > > > > CopyableValueSerializer are now working with multiple types.
> > > > > >
> > > > > > If I create IntValue- and LongValue-specific versions of
> > > > > > CopyableValueComparator and CopyableValueSerializer and modify
> > > > > > ValueTypeInfo to return these then I see the expected
> performance.
> > > > > >
> > > > > > Greg
> > > > > >
> > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Hi Greg!
> > > > > > >
> > > > > > > Sounds very interesting.
> > > > > > >
> > > > > > > Do you have a hunch what "virtual" Tuple methods are being used
> > > that
> > > > > > become
> > > > > > > less jit-able? In many cases, tuples use only field accesses
> > (like
> > > > > > > "vakle.f1") in the user functions.
> > > > > > >
> > > > > > > I have to dig into the serializers, to see if they could suffer
> > > from
> > > > > > that.
> > > > > > > The "getField(pos)" method for example should always have many
> > > > > overrides
> > > > > > > (though few would be loaded at any time, because one usually
> does
> > > not
> > > > > use
> > > > > > > all Tuple classes at the same time).
> > > > > > >
> > > > > > > Greetings,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <
> code@greghogan.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > I am noticing what looks like the same drop-off in
> performance
> > > when
> > > > > > > > introducing TupleN subclasses as expressed in "Understanding
> > the
> > > > JIT
> > > > > > and
> > > > > > > > tuning the implementation" [1].
> > > > > > > >
> > > > > > > > I start my single-node cluster, run an algorithm which relies
> > > > purely
> > > > > on
> > > > > > > > Tuples, and measure the runtime. I execute a separate jar
> which
> > > > > > executes
> > > > > > > > essentially the same algorithm but using Gelly's Edge (which
> > > > > subclasses
> > > > > > > > Tuple3 but does not add any extra fields) and now both the
> > Tuple
> > > > and
> > > > > > Edge
> > > > > > > > algorithms take twice as long.
> > > > > > > >
> > > > > > > > Has this been previously discussed? If not I can work up a
> > > > > > demonstration.
> > > > > > > >
> > > > > > > > [1]
> > > https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > > > > > >
> > > > > > > > Greg
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Stephan Ewen <se...@apache.org>.

Do we have consensus that we want to "reserve" this topic for a GSoC
student?

It is becoming a feature that gains more importance. To see we can "hold
off" on working on that, would be good to know a bit more, like
  - when is it decided whether this project takes place?
  - when would results be there?
  - can we expect the results to be usable, i.e., how good is the student?
(no offence, but so far the results in GSoC were everywhere between very
good and super bad)

Greetings,
Stephan


On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <ba...@gmail.com>
wrote:

> @Fabian: That is my bad, but I think we should be still on time. Pinged Uli
> just to make sure. Proposal from Gabor and Jira from me are coming soon.
>
> On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > Hi Gabor,
> >
> > I did not find any Flink proposals for this year's GSoC in JIRA (should
> be
> > labeled with gsoc2016).
> > I am also not sure if any of the Flink committers signed up as a GSoC
> > mentor.
> > Maybe it is still time to do that but as it looks right now there are no
> > GSoC projects offered by Flink.
> >
> > Best, Fabian
> >
> >
> >
> >
> >
> >
> > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xa...@gmail.com>:
> >
> > > Hi!
> > >
> > > I am planning to do GSoC and I would like to work on the serializers.
> > More
> > > specifically I would like to implement code generation. I am planning
> to
> > > send the first draft of the proposal to the mailing list early next
> week.
> > > If everything is going well, that will include some preliminary
> > benchmarks
> > > how much performance gain can be expected from hand written
> serializers.
> > >
> > > Best regards,
> > > Gábor
> > >
> > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Ah, very good, that makes sense!
> > > >
> > > > I would guess that this performance difference could probably be seen
> > at
> > > > various points where generic serializers and comparators are used
> (also
> > > for
> > > > Comparable, Writable) or
> > > > where the TupleSerializer delegates to a sequence of other
> > > TypeSerializers.
> > > >
> > > > I guess creating more specialized serializers would solve some of
> these
> > > > problems, like in your IntValue vs LongValue case.
> > > >
> > > > The best way to solve that would probably be through code generation
> in
> > > the
> > > > serializers. That has actually been my wish for quite a while.
> > > > If you are also into these kinds of low-level performance topics, we
> > > could
> > > > start a discussion on that.
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com>
> > wrote:
> > > >
> > > > > The issue is not with the Tuple hierarchy (running Gelly examples
> had
> > > no
> > > > > effect on runtime, and as you note there aren't any subclass
> > overrides)
> > > > but
> > > > > with CopyableValue. I had been using IntValue exclusively but had
> > > > switched
> > > > > to using LongValue for graph generation. CopyableValueComparator
> and
> > > > > CopyableValueSerializer are now working with multiple types.
> > > > >
> > > > > If I create IntValue- and LongValue-specific versions of
> > > > > CopyableValueComparator and CopyableValueSerializer and modify
> > > > > ValueTypeInfo to return these then I see the expected performance.
> > > > >
> > > > > Greg
> > > > >
> > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi Greg!
> > > > > >
> > > > > > Sounds very interesting.
> > > > > >
> > > > > > Do you have a hunch what "virtual" Tuple methods are being used
> > that
> > > > > become
> > > > > > less jit-able? In many cases, tuples use only field accesses
> (like
> > > > > > "vakle.f1") in the user functions.
> > > > > >
> > > > > > I have to dig into the serializers, to see if they could suffer
> > from
> > > > > that.
> > > > > > The "getField(pos)" method for example should always have many
> > > > overrides
> > > > > > (though few would be loaded at any time, because one usually does
> > not
> > > > use
> > > > > > all Tuple classes at the same time).
> > > > > >
> > > > > > Greetings,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <co...@greghogan.com>
> > > > wrote:
> > > > > >
> > > > > > > I am noticing what looks like the same drop-off in performance
> > when
> > > > > > > introducing TupleN subclasses as expressed in "Understanding
> the
> > > JIT
> > > > > and
> > > > > > > tuning the implementation" [1].
> > > > > > >
> > > > > > > I start my single-node cluster, run an algorithm which relies
> > > purely
> > > > on
> > > > > > > Tuples, and measure the runtime. I execute a separate jar which
> > > > > executes
> > > > > > > essentially the same algorithm but using Gelly's Edge (which
> > > > subclasses
> > > > > > > Tuple3 but does not add any extra fields) and now both the
> Tuple
> > > and
> > > > > Edge
> > > > > > > algorithms take twice as long.
> > > > > > >
> > > > > > > Has this been previously discussed? If not I can work up a
> > > > > demonstration.
> > > > > > >
> > > > > > > [1]
> > https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > > > > >
> > > > > > > Greg
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Márton Balassi <ba...@gmail.com>.

@Fabian: That is my bad, but I think we should be still on time. Pinged Uli
just to make sure. Proposal from Gabor and Jira from me are coming soon.

On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Gabor,
>
> I did not find any Flink proposals for this year's GSoC in JIRA (should be
> labeled with gsoc2016).
> I am also not sure if any of the Flink committers signed up as a GSoC
> mentor.
> Maybe it is still time to do that but as it looks right now there are no
> GSoC projects offered by Flink.
>
> Best, Fabian
>
>
>
>
>
>
> 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xa...@gmail.com>:
>
> > Hi!
> >
> > I am planning to do GSoC and I would like to work on the serializers.
> More
> > specifically I would like to implement code generation. I am planning to
> > send the first draft of the proposal to the mailing list early next week.
> > If everything is going well, that will include some preliminary
> benchmarks
> > how much performance gain can be expected from hand written serializers.
> >
> > Best regards,
> > Gábor
> >
> > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
> >
> > > Ah, very good, that makes sense!
> > >
> > > I would guess that this performance difference could probably be seen
> at
> > > various points where generic serializers and comparators are used (also
> > for
> > > Comparable, Writable) or
> > > where the TupleSerializer delegates to a sequence of other
> > TypeSerializers.
> > >
> > > I guess creating more specialized serializers would solve some of these
> > > problems, like in your IntValue vs LongValue case.
> > >
> > > The best way to solve that would probably be through code generation in
> > the
> > > serializers. That has actually been my wish for quite a while.
> > > If you are also into these kinds of low-level performance topics, we
> > could
> > > start a discussion on that.
> > >
> > > Greetings,
> > > Stephan
> > >
> > >
> > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com>
> wrote:
> > >
> > > > The issue is not with the Tuple hierarchy (running Gelly examples had
> > no
> > > > effect on runtime, and as you note there aren't any subclass
> overrides)
> > > but
> > > > with CopyableValue. I had been using IntValue exclusively but had
> > > switched
> > > > to using LongValue for graph generation. CopyableValueComparator and
> > > > CopyableValueSerializer are now working with multiple types.
> > > >
> > > > If I create IntValue- and LongValue-specific versions of
> > > > CopyableValueComparator and CopyableValueSerializer and modify
> > > > ValueTypeInfo to return these then I see the expected performance.
> > > >
> > > > Greg
> > > >
> > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org>
> wrote:
> > > >
> > > > > Hi Greg!
> > > > >
> > > > > Sounds very interesting.
> > > > >
> > > > > Do you have a hunch what "virtual" Tuple methods are being used
> that
> > > > become
> > > > > less jit-able? In many cases, tuples use only field accesses (like
> > > > > "vakle.f1") in the user functions.
> > > > >
> > > > > I have to dig into the serializers, to see if they could suffer
> from
> > > > that.
> > > > > The "getField(pos)" method for example should always have many
> > > overrides
> > > > > (though few would be loaded at any time, because one usually does
> not
> > > use
> > > > > all Tuple classes at the same time).
> > > > >
> > > > > Greetings,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <co...@greghogan.com>
> > > wrote:
> > > > >
> > > > > > I am noticing what looks like the same drop-off in performance
> when
> > > > > > introducing TupleN subclasses as expressed in "Understanding the
> > JIT
> > > > and
> > > > > > tuning the implementation" [1].
> > > > > >
> > > > > > I start my single-node cluster, run an algorithm which relies
> > purely
> > > on
> > > > > > Tuples, and measure the runtime. I execute a separate jar which
> > > > executes
> > > > > > essentially the same algorithm but using Gelly's Edge (which
> > > subclasses
> > > > > > Tuple3 but does not add any extra fields) and now both the Tuple
> > and
> > > > Edge
> > > > > > algorithms take twice as long.
> > > > > >
> > > > > > Has this been previously discussed? If not I can work up a
> > > > demonstration.
> > > > > >
> > > > > > [1]
> https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > > > >
> > > > > > Greg
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Gabor,

I did not find any Flink proposals for this year's GSoC in JIRA (should be
labeled with gsoc2016).
I am also not sure if any of the Flink committers signed up as a GSoC
mentor.
Maybe it is still time to do that but as it looks right now there are no
GSoC projects offered by Flink.

Best, Fabian






2016-03-08 11:22 GMT+01:00 Gábor Horváth <xa...@gmail.com>:

> Hi!
>
> I am planning to do GSoC and I would like to work on the serializers. More
> specifically I would like to implement code generation. I am planning to
> send the first draft of the proposal to the mailing list early next week.
> If everything is going well, that will include some preliminary benchmarks
> how much performance gain can be expected from hand written serializers.
>
> Best regards,
> Gábor
>
> On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
>
> > Ah, very good, that makes sense!
> >
> > I would guess that this performance difference could probably be seen at
> > various points where generic serializers and comparators are used (also
> for
> > Comparable, Writable) or
> > where the TupleSerializer delegates to a sequence of other
> TypeSerializers.
> >
> > I guess creating more specialized serializers would solve some of these
> > problems, like in your IntValue vs LongValue case.
> >
> > The best way to solve that would probably be through code generation in
> the
> > serializers. That has actually been my wish for quite a while.
> > If you are also into these kinds of low-level performance topics, we
> could
> > start a discussion on that.
> >
> > Greetings,
> > Stephan
> >
> >
> > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com> wrote:
> >
> > > The issue is not with the Tuple hierarchy (running Gelly examples had
> no
> > > effect on runtime, and as you note there aren't any subclass overrides)
> > but
> > > with CopyableValue. I had been using IntValue exclusively but had
> > switched
> > > to using LongValue for graph generation. CopyableValueComparator and
> > > CopyableValueSerializer are now working with multiple types.
> > >
> > > If I create IntValue- and LongValue-specific versions of
> > > CopyableValueComparator and CopyableValueSerializer and modify
> > > ValueTypeInfo to return these then I see the expected performance.
> > >
> > > Greg
> > >
> > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Hi Greg!
> > > >
> > > > Sounds very interesting.
> > > >
> > > > Do you have a hunch what "virtual" Tuple methods are being used that
> > > become
> > > > less jit-able? In many cases, tuples use only field accesses (like
> > > > "vakle.f1") in the user functions.
> > > >
> > > > I have to dig into the serializers, to see if they could suffer from
> > > that.
> > > > The "getField(pos)" method for example should always have many
> > overrides
> > > > (though few would be loaded at any time, because one usually does not
> > use
> > > > all Tuple classes at the same time).
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <co...@greghogan.com>
> > wrote:
> > > >
> > > > > I am noticing what looks like the same drop-off in performance when
> > > > > introducing TupleN subclasses as expressed in "Understanding the
> JIT
> > > and
> > > > > tuning the implementation" [1].
> > > > >
> > > > > I start my single-node cluster, run an algorithm which relies
> purely
> > on
> > > > > Tuples, and measure the runtime. I execute a separate jar which
> > > executes
> > > > > essentially the same algorithm but using Gelly's Edge (which
> > subclasses
> > > > > Tuple3 but does not add any extra fields) and now both the Tuple
> and
> > > Edge
> > > > > algorithms take twice as long.
> > > > >
> > > > > Has this been previously discussed? If not I can work up a
> > > demonstration.
> > > > >
> > > > > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > > >
> > > > > Greg
> > > > >
> > > >
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Gábor Horváth <xa...@gmail.com>.

Hi!

I am planning to do GSoC and I would like to work on the serializers. More
specifically I would like to implement code generation. I am planning to
send the first draft of the proposal to the mailing list early next week.
If everything is going well, that will include some preliminary benchmarks
how much performance gain can be expected from hand written serializers.

Best regards,
Gábor

On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:

> Ah, very good, that makes sense!
>
> I would guess that this performance difference could probably be seen at
> various points where generic serializers and comparators are used (also for
> Comparable, Writable) or
> where the TupleSerializer delegates to a sequence of other TypeSerializers.
>
> I guess creating more specialized serializers would solve some of these
> problems, like in your IntValue vs LongValue case.
>
> The best way to solve that would probably be through code generation in the
> serializers. That has actually been my wish for quite a while.
> If you are also into these kinds of low-level performance topics, we could
> start a discussion on that.
>
> Greetings,
> Stephan
>
>
> On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com> wrote:
>
> > The issue is not with the Tuple hierarchy (running Gelly examples had no
> > effect on runtime, and as you note there aren't any subclass overrides)
> but
> > with CopyableValue. I had been using IntValue exclusively but had
> switched
> > to using LongValue for graph generation. CopyableValueComparator and
> > CopyableValueSerializer are now working with multiple types.
> >
> > If I create IntValue- and LongValue-specific versions of
> > CopyableValueComparator and CopyableValueSerializer and modify
> > ValueTypeInfo to return these then I see the expected performance.
> >
> > Greg
> >
> > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > Hi Greg!
> > >
> > > Sounds very interesting.
> > >
> > > Do you have a hunch what "virtual" Tuple methods are being used that
> > become
> > > less jit-able? In many cases, tuples use only field accesses (like
> > > "vakle.f1") in the user functions.
> > >
> > > I have to dig into the serializers, to see if they could suffer from
> > that.
> > > The "getField(pos)" method for example should always have many
> overrides
> > > (though few would be loaded at any time, because one usually does not
> use
> > > all Tuple classes at the same time).
> > >
> > > Greetings,
> > > Stephan
> > >
> > >
> > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <co...@greghogan.com>
> wrote:
> > >
> > > > I am noticing what looks like the same drop-off in performance when
> > > > introducing TupleN subclasses as expressed in "Understanding the JIT
> > and
> > > > tuning the implementation" [1].
> > > >
> > > > I start my single-node cluster, run an algorithm which relies purely
> on
> > > > Tuples, and measure the runtime. I execute a separate jar which
> > executes
> > > > essentially the same algorithm but using Gelly's Edge (which
> subclasses
> > > > Tuple3 but does not add any extra fields) and now both the Tuple and
> > Edge
> > > > algorithms take twice as long.
> > > >
> > > > Has this been previously discussed? If not I can work up a
> > demonstration.
> > > >
> > > > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > >
> > > > Greg
> > > >
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Stephan Ewen <se...@apache.org>.

Ah, very good, that makes sense!

I would guess that this performance difference could probably be seen at
various points where generic serializers and comparators are used (also for
Comparable, Writable) or
where the TupleSerializer delegates to a sequence of other TypeSerializers.

I guess creating more specialized serializers would solve some of these
problems, like in your IntValue vs LongValue case.

The best way to solve that would probably be through code generation in the
serializers. That has actually been my wish for quite a while.
If you are also into these kinds of low-level performance topics, we could
start a discussion on that.

Greetings,
Stephan


On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <co...@greghogan.com> wrote:

> The issue is not with the Tuple hierarchy (running Gelly examples had no
> effect on runtime, and as you note there aren't any subclass overrides) but
> with CopyableValue. I had been using IntValue exclusively but had switched
> to using LongValue for graph generation. CopyableValueComparator and
> CopyableValueSerializer are now working with multiple types.
>
> If I create IntValue- and LongValue-specific versions of
> CopyableValueComparator and CopyableValueSerializer and modify
> ValueTypeInfo to return these then I see the expected performance.
>
> Greg
>
> On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hi Greg!
> >
> > Sounds very interesting.
> >
> > Do you have a hunch what "virtual" Tuple methods are being used that
> become
> > less jit-able? In many cases, tuples use only field accesses (like
> > "vakle.f1") in the user functions.
> >
> > I have to dig into the serializers, to see if they could suffer from
> that.
> > The "getField(pos)" method for example should always have many overrides
> > (though few would be loaded at any time, because one usually does not use
> > all Tuple classes at the same time).
> >
> > Greetings,
> > Stephan
> >
> >
> > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <co...@greghogan.com> wrote:
> >
> > > I am noticing what looks like the same drop-off in performance when
> > > introducing TupleN subclasses as expressed in "Understanding the JIT
> and
> > > tuning the implementation" [1].
> > >
> > > I start my single-node cluster, run an algorithm which relies purely on
> > > Tuples, and measure the runtime. I execute a separate jar which
> executes
> > > essentially the same algorithm but using Gelly's Edge (which subclasses
> > > Tuple3 but does not add any extra fields) and now both the Tuple and
> Edge
> > > algorithms take twice as long.
> > >
> > > Has this been previously discussed? If not I can work up a
> demonstration.
> > >
> > > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > >
> > > Greg
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Greg Hogan <co...@greghogan.com>.

The issue is not with the Tuple hierarchy (running Gelly examples had no
effect on runtime, and as you note there aren't any subclass overrides) but
with CopyableValue. I had been using IntValue exclusively but had switched
to using LongValue for graph generation. CopyableValueComparator and
CopyableValueSerializer are now working with multiple types.

If I create IntValue- and LongValue-specific versions of
CopyableValueComparator and CopyableValueSerializer and modify
ValueTypeInfo to return these then I see the expected performance.

Greg

On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi Greg!
>
> Sounds very interesting.
>
> Do you have a hunch what "virtual" Tuple methods are being used that become
> less jit-able? In many cases, tuples use only field accesses (like
> "vakle.f1") in the user functions.
>
> I have to dig into the serializers, to see if they could suffer from that.
> The "getField(pos)" method for example should always have many overrides
> (though few would be loaded at any time, because one usually does not use
> all Tuple classes at the same time).
>
> Greetings,
> Stephan
>
>
> On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <co...@greghogan.com> wrote:
>
> > I am noticing what looks like the same drop-off in performance when
> > introducing TupleN subclasses as expressed in "Understanding the JIT and
> > tuning the implementation" [1].
> >
> > I start my single-node cluster, run an algorithm which relies purely on
> > Tuples, and measure the runtime. I execute a separate jar which executes
> > essentially the same algorithm but using Gelly's Edge (which subclasses
> > Tuple3 but does not add any extra fields) and now both the Tuple and Edge
> > algorithms take twice as long.
> >
> > Has this been previously discussed? If not I can work up a demonstration.
> >
> > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> >
> > Greg
> >
>

Re: Tuple performance and the curious JIT compiler

Posted by Stephan Ewen <se...@apache.org>.

Hi Greg!

Sounds very interesting.

Do you have a hunch what "virtual" Tuple methods are being used that become
less jit-able? In many cases, tuples use only field accesses (like
"vakle.f1") in the user functions.

I have to dig into the serializers, to see if they could suffer from that.
The "getField(pos)" method for example should always have many overrides
(though few would be loaded at any time, because one usually does not use
all Tuple classes at the same time).

Greetings,
Stephan

On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <co...@greghogan.com> wrote:

> I am noticing what looks like the same drop-off in performance when
> introducing TupleN subclasses as expressed in "Understanding the JIT and
> tuning the implementation" [1].
>
> I start my single-node cluster, run an algorithm which relies purely on
> Tuples, and measure the runtime. I execute a separate jar which executes
> essentially the same algorithm but using Gelly's Edge (which subclasses
> Tuple3 but does not add any extra fields) and now both the Tuple and Edge
> algorithms take twice as long.
>
> Has this been previously discussed? If not I can work up a demonstration.
>
> [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
>
> Greg
>