You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by John Sirois <js...@apache.org> on 2016/01/27 04:47:11 UTC

[PROPOSAL] Change java thrift code gen

Context: Aurora uses the official Apache Thrift compiler today plus a
home-grown python code generator [1] for immutable "entity" (I*) wrappers.

The proposal is to switch from using the Apache Thrift code generator to a
home grown generator.  The proposal comes with a concrete example in the
form of the actual RBs to effect this change:
1. A custom java thrift code generator: https://reviews.apache.org/r/42748/
2. A custom MyBatis binding code generator powered by 1 above:
https://reviews.apache.org/r/42749/
3. Integration of 1 & 2 above into the Aurora codebase:
https://reviews.apache.org/r/42756/

Since the RBs are large, I wanted to provide some extra context on the idea
at a higher level.  I provide rationale, pros and cons below for those
interested in the idea but wary of diving in on code review until the idea
itself passes a sniff test.
Thanks in advance for your feedback - and if we get there - for your review
effort.
==

In the course of an initial run at creating a first-class REST-like
scheduler interface [2] I came to the conclusion generating the json API
from the thrift one might be a good path.  That idea has been scrapped with
community feedback, but an initial experiment in custom thrift code-gen for
java that accompanied that idea seemed worth pursuing for its own
independent benefits, chief among these being 1st class immutable thrift
structs and the ability to leverage thrift annotations.

Immutability:
The benefits of having an immutable by default data model are the standard
ones; namely, its trivial to reason about safety of concurrent operations
on the data model, stability of collections containing data model entities
and it opens up straight-forward optimizations that are easy to reason
about.
An example optimization is caching hashCodes for the immutable thrift
structs.  This was done after comparing jmh benchmarks run against master
and then again against the proposal branch.  Perf was comparable - within
10% plus and minus depending on the benchmark, but with the optimization
added many benchmarks showed pronounced improvement in the proposal branch
[3].  The optimization is clearly safe and was quick and easy to
implement.  Further optimizations can be experimented with in a
straightforward way.

Thrift Annotations:
The thrift IDL grammar has supported these for quite some time, but they
are not plumbed to the generated java code.  Uses are many and varied.  I
initially had my eye on annotation of thrift services with REST verbs,
routes, etc - but immediately we can leverage these annotations to kill
AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
needs to be maintained.

There are a few downsides to switching to our own java thrift code gen:
1. We own more code to maintain:  Even though we have the custom python
"immutable" wrapper generator [1] today, this new generator - even with the
python generator removed - represents a 5-6x increase in line count of
custom code (~4.1k lines of code and tests in the new custom gen, ~700
lines in the existing python custom gen)
2. We conceptually fork from a sibling Apache project.

The fork could be mitigated by turning our real experience iterating the
custom code generator into a well-founded patch back into the Apache Thrift
project, but saying we'll do that is easier than following through and
actually doing it.

==
Review guide / details:

The technology stack:
The thrift IDL parsing and thrift wire parsing are both handled by the
Facebook swift project [4].  We only implement the middle bit that
generates java code stubs.  This gives higher confidence that the tricky
bits out at either edge are done right.
The thrift struct code generation is done using Square's javapoet [5] in
favor of templating for the purpose of easier to read generator code.  This
characterization is debatable though and template are certainly more
flexible the minute you need to gen a second language (say we like this and
want to do javascript codegen this way too for example).
The MyBatis codegen is forced by the thrift codegen for technical reasons.
In short, there is no simple way to teach MyBatis to read and write
immutable objects with builders.  So the MyBatis code is generated via an
annotation processor that runs after thrift code gen, but reading thrift
annotations that survive that codegen process.
The codegen unit testing is done with the help of Google's compile-tester
[6].  NB that this has an expected output comparison that checks the
generated AST and not the text, so its fairly lenient.  Whitepsace and
comments certainly don't matter.

Review strategy:
The code generator RBs (#1 & #2 in the 3 part series) are probably easier
to review looking at samples of the generated code.  Both the thrift
codegen and MyBatis codegen samples are conveniently contained in the
MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The unit
test uses resource files that contain both the thrift codegen inputs the
annotation processor runs over and the annotation processor expected
outputs  - the MyBatis peer classes.  So have a look there if you need
something concrete and don't want to patch the RBs in and actually run the
codegen (`./gradlew api:compileJava`).
The conversion RB (#3) is large but the changes are mainly mechanical
conversions from the current mutable thrift + I* wrappers to pure immutable
thrift mutated via `.toBuilder` and `.with`'er methods.  The main changes
of note are to the portions of the codebase tightly tied to thrift as a
technology:
+ Gson/thrift converters
+ Shiro annotated auth param interception
+ Thrift/Servlet binding

[1]
https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
[2] https://issues.apache.org/jira/browse/AURORA-987
[3]
https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
[4] https://github.com/facebook/swift
[5] https://github.com/square/javapoet
[6] https://github.com/google/compile-testing

Re: [PROPOSAL] Change java thrift code gen

Posted by John Sirois <jo...@conductant.com>.
--
John Sirois
303-512-3301
On Jan 27, 2016 6:08 PM, "Maxim Khutornenko" <ma...@apache.org> wrote:
>
> I am +1 to making immutable thrift objects solely based on perf numbers.
>
> My biggest concern though is maintenance of a pretty intricate codebase,
> especially when it comes to upgrading any of the frameworks involved.
> Bill's suggestion to explore paths to make this a part of Apache Thrift
> sounds great to me.
>
> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wf...@apache.org> wrote:
>
> > Tony - this would not be a technical fork so much as a spiritual fork.
> > While we would have our own bugs to work out, the only upstream exposure
> > would be IDL or wire format changes.
> >
> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <td...@twitter.com.invalid>
> > wrote:
> >
> > > Awesome performance numbers! I don't particularly know the logistics
of
> > > upstreaming a change like this, but optimistically I would suggest
> > > upstreaming it to Apache Thrift if possible.
> > >
> > > As someone that maintains a fork of a thrift compiler(fork of
scrooge), I
> > > have to say that it's not that fun. There's a lot of custom code that
> > needs
> > > to be maintained and a bunch of work to rebase the code periodically.
> > >
> > >
> > >
> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org>
wrote:
> > >
> > > > Firstly - thanks for the clean organization and delineation of
steps in
> > > > this change.  Top notch work!
> > > >
> > > > Some of the performance improvements are very nice; and in a
> > particularly
> > > > hot code path.  I will wager a guess that the majority of the
savings
> > is
> > > in
> > > > avoiding what amounts to copy constructors between mutable and
> > immutable
> > > > types.  I further wager there are alternative approaches we could
weigh
> > > to
> > > > achieve those performance improvements.  As an example - you note
above
> > > > that we could provide a patch to Apache Thrift.  Depending how much
> > > > performance inspires our decision here, it will be prudent to
evaluate
> > > > alternatives.
> > > >
> > > > I think there are (at least) two major issues worth discussing -
code
> > > > volume (which you note) and an increase in logical complexity.  This
> > will
> > > > leave us with a bifurcation in code generation tooling (custom+swift
> > for
> > > > Java, Apache Thrift for python and js).  It's difficult to quantify
the
> > > > downside of that, but it seems like an unfortunate state with
potential

One note on the bifurcation is that it would be temporary since a 1st class
JSON api (https://issues.apache.org/jira/browse/AURORA-987), means only the
scheduler will use thrift internally - as its data model source of truth
and ser/deser for the distributed log.

> > > for
> > > > compatibility risks.
> > > >
> > > >
> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org>
> > wrote:
> > > >
> > > > > Some high level comments without looking at the code.
> > > > >
> > > > > I'm in favor from abandoning the thrift generated java code in
favor
> > of
> > > > > immutable objects. I think it is easier to reason about and will
> > ensure
> > > > we
> > > > > have less errors in our code. If I understand correctly, the
ProtoBuf
> > > > > format does this by default, so there some precedent for this
style
> > of
> > > > code
> > > > > generation already.
> > > > >
> > > > > I think using Facebook's swift is the best approach here. I would
be
> > > > > hesitant to accept any custom code generation that involved us
> > parsing
> > > > > thrift IDL files or thrift formats over the wire because I poses a
> > very
> > > > > high maintenance burden.
> > > > >
> > > > > I also think generating the MyBatis mutable classes is superior to
> > our
> > > > > current strategy of manually creating them.
> > > > >
> > > > > Finally, the performance improvements look fantastic. As an
operator
> > > of a
> > > > > large cluster I am excited to see wholesale performance
improvements
> > > as I
> > > > > am always concerned that my cluster is approaching the limits of
what
> > > > > Aurora can handle safely.
> > > > >
> > > > > Overall, I think this change merits a serious discussion from all
> > > > > contributors.
> > > > >
> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org>
> > > wrote:
> > > > >
> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsirois@apache.org
>
> > > > wrote:
> > > > > >
> > > > > > > Context: Aurora uses the official Apache Thrift compiler today
> > > plus a
> > > > > > > home-grown python code generator [1] for immutable "entity"
(I*)
> > > > > > wrappers.
> > > > > > >
> > > > > > > The proposal is to switch from using the Apache Thrift code
> > > generator
> > > > > to
> > > > > > a
> > > > > > > home grown generator.  The proposal comes with a concrete
example
> > > in
> > > > > the
> > > > > > > form of the actual RBs to effect this change:
> > > > > > > 1. A custom java thrift code generator:
> > > > > > > https://reviews.apache.org/r/42748/
> > > > > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > > > > https://reviews.apache.org/r/42749/
> > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > > > > https://reviews.apache.org/r/42756/
> > > > > > >
> > > > > > > Since the RBs are large, I wanted to provide some extra
context
> > on
> > > > the
> > > > > > > idea at a higher level.  I provide rationale, pros and cons
below
> > > for
> > > > > > those
> > > > > > > interested in the idea but wary of diving in on code review
until
> > > the
> > > > > > idea
> > > > > > > itself passes a sniff test.
> > > > > > > Thanks in advance for your feedback - and if we get there -
for
> > > your
> > > > > > > review effort.
> > > > > > >
> > > > > >
> > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
> > > since
> > > > > > they've expressed direct interest.  Happy to add others, just
speak
> > > up
> > > > or
> > > > > > else just comment on the reviews as you see fit.
> > > > > > I'll formally only submit if 1st this email thread reaches
> > consensus,
> > > > and
> > > > > > second, reviews are approved.
> > > > > >
> > > > > > ==
> > > > > > >
> > > > > > > In the course of an initial run at creating a first-class
> > REST-like
> > > > > > > scheduler interface [2] I came to the conclusion generating
the
> > > json
> > > > > API
> > > > > > > from the thrift one might be a good path.  That idea has been
> > > > scrapped
> > > > > > with
> > > > > > > community feedback, but an initial experiment in custom thrift
> > > > code-gen
> > > > > > for
> > > > > > > java that accompanied that idea seemed worth pursuing for its
own
> > > > > > > independent benefits, chief among these being 1st class
immutable
> > > > > thrift
> > > > > > > structs and the ability to leverage thrift annotations.
> > > > > > >
> > > > > > > Immutability:
> > > > > > > The benefits of having an immutable by default data model are
the
> > > > > > standard
> > > > > > > ones; namely, its trivial to reason about safety of concurrent
> > > > > operations
> > > > > > > on the data model, stability of collections containing data
model
> > > > > > entities
> > > > > > > and it opens up straight-forward optimizations that are easy
to
> > > > reason
> > > > > > > about.
> > > > > > > An example optimization is caching hashCodes for the immutable
> > > thrift
> > > > > > > structs.  This was done after comparing jmh benchmarks run
> > against
> > > > > master
> > > > > > > and then again against the proposal branch.  Perf was
comparable
> > -
> > > > > within
> > > > > > > 10% plus and minus depending on the benchmark, but with the
> > > > > optimization
> > > > > > > added many benchmarks showed pronounced improvement in the
> > proposal
> > > > > > branch
> > > > > > > [3].  The optimization is clearly safe and was quick and easy
to
> > > > > > > implement.  Further optimizations can be experimented with in
a
> > > > > > > straightforward way.
> > > > > > >
> > > > > > > Thrift Annotations:
> > > > > > > The thrift IDL grammar has supported these for quite some
time,
> > but
> > > > > they
> > > > > > > are not plumbed to the generated java code.  Uses are many and
> > > > > varied.  I
> > > > > > > initially had my eye on annotation of thrift services with
REST
> > > > verbs,
> > > > > > > routes, etc - but immediately we can leverage these
annotations
> > to
> > > > kill
> > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding
> > code
> > > > that
> > > > > > > needs to be maintained.
> > > > > > >
> > > > > > > There are a few downsides to switching to our own java thrift
> > code
> > > > gen:
> > > > > > > 1. We own more code to maintain:  Even though we have the
custom
> > > > python
> > > > > > > "immutable" wrapper generator [1] today, this new generator -
> > even
> > > > with
> > > > > > the
> > > > > > > python generator removed - represents a 5-6x increase in line
> > count
> > > > of
> > > > > > > custom code (~4.1k lines of code and tests in the new custom
gen,
> > > > ~700
> > > > > > > lines in the existing python custom gen)
> > > > > > > 2. We conceptually fork from a sibling Apache project.
> > > > > > >
> > > > > > > The fork could be mitigated by turning our real experience
> > > iterating
> > > > > the
> > > > > > > custom code generator into a well-founded patch back into the
> > > Apache
> > > > > > Thrift
> > > > > > > project, but saying we'll do that is easier than following
> > through
> > > > and
> > > > > > > actually doing it.
> > > > > > >
> > > > > > > ==
> > > > > > > Review guide / details:
> > > > > > >
> > > > > > > The technology stack:
> > > > > > > The thrift IDL parsing and thrift wire parsing are both
handled
> > by
> > > > the
> > > > > > > Facebook swift project [4].  We only implement the middle bit
> > that
> > > > > > > generates java code stubs.  This gives higher confidence that
the
> > > > > tricky
> > > > > > > bits out at either edge are done right.
> > > > > > > The thrift struct code generation is done using Square's
javapoet
> > > [5]
> > > > > in
> > > > > > > favor of templating for the purpose of easier to read
generator
> > > code.
> > > > > > This
> > > > > > > characterization is debatable though and template are
certainly
> > > more
> > > > > > > flexible the minute you need to gen a second language (say we
> > like
> > > > this
> > > > > > and
> > > > > > > want to do javascript codegen this way too for example).
> > > > > > > The MyBatis codegen is forced by the thrift codegen for
technical
> > > > > > > reasons.  In short, there is no simple way to teach MyBatis to
> > read
> > > > and
> > > > > > > write immutable objects with builders.  So the MyBatis code is
> > > > > generated
> > > > > > > via an annotation processor that runs after thrift code gen,
but
> > > > > reading
> > > > > > > thrift annotations that survive that codegen process.
> > > > > > > The codegen unit testing is done with the help of Google's
> > > > > compile-tester
> > > > > > > [6].  NB that this has an expected output comparison that
checks
> > > the
> > > > > > > generated AST and not the text, so its fairly lenient.
> > Whitepsace
> > > > and
> > > > > > > comments certainly don't matter.
> > > > > > >
> > > > > > > Review strategy:
> > > > > > > The code generator RBs (#1 & #2 in the 3 part series) are
> > probably
> > > > > easier
> > > > > > > to review looking at samples of the generated code.  Both the
> > > thrift
> > > > > > > codegen and MyBatis codegen samples are conveniently
contained in
> > > the
> > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).
> > The
> > > > > unit
> > > > > > > test uses resource files that contain both the thrift codegen
> > > inputs
> > > > > the
> > > > > > > annotation processor runs over and the annotation processor
> > > expected
> > > > > > > outputs  - the MyBatis peer classes.  So have a look there if
you
> > > > need
> > > > > > > something concrete and don't want to patch the RBs in and
> > actually
> > > > run
> > > > > > the
> > > > > > > codegen (`./gradlew api:compileJava`).
> > > > > > > The conversion RB (#3) is large but the changes are mainly
> > > mechanical
> > > > > > > conversions from the current mutable thrift + I* wrappers to
pure
> > > > > > immutable
> > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The
main
> > > > > changes
> > > > > > > of note are to the portions of the codebase tightly tied to
> > thrift
> > > > as a
> > > > > > > technology:
> > > > > > > + Gson/thrift converters
> > > > > > > + Shiro annotated auth param interception
> > > > > > > + Thrift/Servlet binding
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > > > > [3]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > > > > [4] https://github.com/facebook/swift
> > > > > > > [5] https://github.com/square/javapoet
> > > > > > > [6] https://github.com/google/compile-testing
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Zameer Manji
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [PROPOSAL] Change java thrift code gen

Posted by John Sirois <js...@apache.org>.
To wrap things up - I'm considering this proposal as having failed; so I've
marked the 3 associated RBs as discarded.

I'll be working on https://issues.apache.org/jira/browse/THRIFT-3583 to
introduce an immutable builder-style of java codegen that carries over
thrift annotations to Apache Thrift.  Once that is complete and released,
I'll circle back and re-do an RB like 3/3 (
https://reviews.apache.org/r/42756/) that flips over the codebase to the
new apache thrift gen.

On Thu, Jan 28, 2016 at 10:08 AM, John Sirois <js...@apache.org> wrote:

>
>
> On Thu, Jan 28, 2016 at 8:26 AM, John Sirois <jo...@conductant.com> wrote:
>
>> On Thu, Jan 28, 2016 at 6:45 AM, Jake Farrell <jf...@apache.org>
>> wrote:
>>
>> > +1 to making this apart of Thrift, i'm happy to help shepard this on the
>> > Thrift side and get it in as soon as its ready
>> >
>>
>> I've filed https://issues.apache.org/jira/browse/THRIFT-3583 to use as
>> the
>> basis for discussion of this feature over in the Apache Thrift project.
>> I looked at the problem a bit and noted some challenges.
>>
>
> And started a dev@thrift.apache.org thread here if anyone wants to follow
> along or participate: http://markmail.org/message/mlxyyauyvlvuxjsf
>
>
>>
>>
>> >
>> > -Jake
>> >
>> > On Wed, Jan 27, 2016 at 8:08 PM, Maxim Khutornenko <ma...@apache.org>
>> > wrote:
>> >
>> >> I am +1 to making immutable thrift objects solely based on perf
>> numbers.
>> >>
>> >> My biggest concern though is maintenance of a pretty intricate
>> codebase,
>> >> especially when it comes to upgrading any of the frameworks involved.
>> >> Bill's suggestion to explore paths to make this a part of Apache Thrift
>> >> sounds great to me.
>> >>
>> >> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> >>
>> >> > Tony - this would not be a technical fork so much as a spiritual
>> fork.
>> >> > While we would have our own bugs to work out, the only upstream
>> exposure
>> >> > would be IDL or wire format changes.
>> >> >
>> >> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid
>> >
>> >> > wrote:
>> >> >
>> >> > > Awesome performance numbers! I don't particularly know the
>> logistics
>> >> of
>> >> > > upstreaming a change like this, but optimistically I would suggest
>> >> > > upstreaming it to Apache Thrift if possible.
>> >> > >
>> >> > > As someone that maintains a fork of a thrift compiler(fork of
>> >> scrooge), I
>> >> > > have to say that it's not that fun. There's a lot of custom code
>> that
>> >> > needs
>> >> > > to be maintained and a bunch of work to rebase the code
>> periodically.
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org>
>> >> wrote:
>> >> > >
>> >> > > > Firstly - thanks for the clean organization and delineation of
>> >> steps in
>> >> > > > this change.  Top notch work!
>> >> > > >
>> >> > > > Some of the performance improvements are very nice; and in a
>> >> > particularly
>> >> > > > hot code path.  I will wager a guess that the majority of the
>> >> savings
>> >> > is
>> >> > > in
>> >> > > > avoiding what amounts to copy constructors between mutable and
>> >> > immutable
>> >> > > > types.  I further wager there are alternative approaches we could
>> >> weigh
>> >> > > to
>> >> > > > achieve those performance improvements.  As an example - you note
>> >> above
>> >> > > > that we could provide a patch to Apache Thrift.  Depending how
>> much
>> >> > > > performance inspires our decision here, it will be prudent to
>> >> evaluate
>> >> > > > alternatives.
>> >> > > >
>> >> > > > I think there are (at least) two major issues worth discussing -
>> >> code
>> >> > > > volume (which you note) and an increase in logical complexity.
>> This
>> >> > will
>> >> > > > leave us with a bifurcation in code generation tooling
>> (custom+swift
>> >> > for
>> >> > > > Java, Apache Thrift for python and js).  It's difficult to
>> quantify
>> >> the
>> >> > > > downside of that, but it seems like an unfortunate state with
>> >> potential
>> >> > > for
>> >> > > > compatibility risks.
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org
>> >
>> >> > wrote:
>> >> > > >
>> >> > > > > Some high level comments without looking at the code.
>> >> > > > >
>> >> > > > > I'm in favor from abandoning the thrift generated java code in
>> >> favor
>> >> > of
>> >> > > > > immutable objects. I think it is easier to reason about and
>> will
>> >> > ensure
>> >> > > > we
>> >> > > > > have less errors in our code. If I understand correctly, the
>> >> ProtoBuf
>> >> > > > > format does this by default, so there some precedent for this
>> >> style
>> >> > of
>> >> > > > code
>> >> > > > > generation already.
>> >> > > > >
>> >> > > > > I think using Facebook's swift is the best approach here. I
>> would
>> >> be
>> >> > > > > hesitant to accept any custom code generation that involved us
>> >> > parsing
>> >> > > > > thrift IDL files or thrift formats over the wire because I
>> poses a
>> >> > very
>> >> > > > > high maintenance burden.
>> >> > > > >
>> >> > > > > I also think generating the MyBatis mutable classes is
>> superior to
>> >> > our
>> >> > > > > current strategy of manually creating them.
>> >> > > > >
>> >> > > > > Finally, the performance improvements look fantastic. As an
>> >> operator
>> >> > > of a
>> >> > > > > large cluster I am excited to see wholesale performance
>> >> improvements
>> >> > > as I
>> >> > > > > am always concerned that my cluster is approaching the limits
>> of
>> >> what
>> >> > > > > Aurora can handle safely.
>> >> > > > >
>> >> > > > > Overall, I think this change merits a serious discussion from
>> all
>> >> > > > > contributors.
>> >> > > > >
>> >> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <
>> jsirois@apache.org>
>> >> > > wrote:
>> >> > > > >
>> >> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <
>> >> jsirois@apache.org>
>> >> > > > wrote:
>> >> > > > > >
>> >> > > > > > > Context: Aurora uses the official Apache Thrift compiler
>> today
>> >> > > plus a
>> >> > > > > > > home-grown python code generator [1] for immutable "entity"
>> >> (I*)
>> >> > > > > > wrappers.
>> >> > > > > > >
>> >> > > > > > > The proposal is to switch from using the Apache Thrift code
>> >> > > generator
>> >> > > > > to
>> >> > > > > > a
>> >> > > > > > > home grown generator.  The proposal comes with a concrete
>> >> example
>> >> > > in
>> >> > > > > the
>> >> > > > > > > form of the actual RBs to effect this change:
>> >> > > > > > > 1. A custom java thrift code generator:
>> >> > > > > > > https://reviews.apache.org/r/42748/
>> >> > > > > > > 2. A custom MyBatis binding code generator powered by 1
>> above:
>> >> > > > > > > https://reviews.apache.org/r/42749/
>> >> > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
>> >> > > > > > > https://reviews.apache.org/r/42756/
>> >> > > > > > >
>> >> > > > > > > Since the RBs are large, I wanted to provide some extra
>> >> context
>> >> > on
>> >> > > > the
>> >> > > > > > > idea at a higher level.  I provide rationale, pros and cons
>> >> below
>> >> > > for
>> >> > > > > > those
>> >> > > > > > > interested in the idea but wary of diving in on code review
>> >> until
>> >> > > the
>> >> > > > > > idea
>> >> > > > > > > itself passes a sniff test.
>> >> > > > > > > Thanks in advance for your feedback - and if we get there -
>> >> for
>> >> > > your
>> >> > > > > > > review effort.
>> >> > > > > > >
>> >> > > > > >
>> >> > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs
>> above
>> >> > > since
>> >> > > > > > they've expressed direct interest.  Happy to add others, just
>> >> speak
>> >> > > up
>> >> > > > or
>> >> > > > > > else just comment on the reviews as you see fit.
>> >> > > > > > I'll formally only submit if 1st this email thread reaches
>> >> > consensus,
>> >> > > > and
>> >> > > > > > second, reviews are approved.
>> >> > > > > >
>> >> > > > > > ==
>> >> > > > > > >
>> >> > > > > > > In the course of an initial run at creating a first-class
>> >> > REST-like
>> >> > > > > > > scheduler interface [2] I came to the conclusion generating
>> >> the
>> >> > > json
>> >> > > > > API
>> >> > > > > > > from the thrift one might be a good path.  That idea has
>> been
>> >> > > > scrapped
>> >> > > > > > with
>> >> > > > > > > community feedback, but an initial experiment in custom
>> thrift
>> >> > > > code-gen
>> >> > > > > > for
>> >> > > > > > > java that accompanied that idea seemed worth pursuing for
>> its
>> >> own
>> >> > > > > > > independent benefits, chief among these being 1st class
>> >> immutable
>> >> > > > > thrift
>> >> > > > > > > structs and the ability to leverage thrift annotations.
>> >> > > > > > >
>> >> > > > > > > Immutability:
>> >> > > > > > > The benefits of having an immutable by default data model
>> are
>> >> the
>> >> > > > > > standard
>> >> > > > > > > ones; namely, its trivial to reason about safety of
>> concurrent
>> >> > > > > operations
>> >> > > > > > > on the data model, stability of collections containing data
>> >> model
>> >> > > > > > entities
>> >> > > > > > > and it opens up straight-forward optimizations that are
>> easy
>> >> to
>> >> > > > reason
>> >> > > > > > > about.
>> >> > > > > > > An example optimization is caching hashCodes for the
>> immutable
>> >> > > thrift
>> >> > > > > > > structs.  This was done after comparing jmh benchmarks run
>> >> > against
>> >> > > > > master
>> >> > > > > > > and then again against the proposal branch.  Perf was
>> >> comparable
>> >> > -
>> >> > > > > within
>> >> > > > > > > 10% plus and minus depending on the benchmark, but with the
>> >> > > > > optimization
>> >> > > > > > > added many benchmarks showed pronounced improvement in the
>> >> > proposal
>> >> > > > > > branch
>> >> > > > > > > [3].  The optimization is clearly safe and was quick and
>> easy
>> >> to
>> >> > > > > > > implement.  Further optimizations can be experimented with
>> in
>> >> a
>> >> > > > > > > straightforward way.
>> >> > > > > > >
>> >> > > > > > > Thrift Annotations:
>> >> > > > > > > The thrift IDL grammar has supported these for quite some
>> >> time,
>> >> > but
>> >> > > > > they
>> >> > > > > > > are not plumbed to the generated java code.  Uses are many
>> and
>> >> > > > > varied.  I
>> >> > > > > > > initially had my eye on annotation of thrift services with
>> >> REST
>> >> > > > verbs,
>> >> > > > > > > routes, etc - but immediately we can leverage these
>> >> annotations
>> >> > to
>> >> > > > kill
>> >> > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis
>> binding
>> >> > code
>> >> > > > that
>> >> > > > > > > needs to be maintained.
>> >> > > > > > >
>> >> > > > > > > There are a few downsides to switching to our own java
>> thrift
>> >> > code
>> >> > > > gen:
>> >> > > > > > > 1. We own more code to maintain:  Even though we have the
>> >> custom
>> >> > > > python
>> >> > > > > > > "immutable" wrapper generator [1] today, this new
>> generator -
>> >> > even
>> >> > > > with
>> >> > > > > > the
>> >> > > > > > > python generator removed - represents a 5-6x increase in
>> line
>> >> > count
>> >> > > > of
>> >> > > > > > > custom code (~4.1k lines of code and tests in the new
>> custom
>> >> gen,
>> >> > > > ~700
>> >> > > > > > > lines in the existing python custom gen)
>> >> > > > > > > 2. We conceptually fork from a sibling Apache project.
>> >> > > > > > >
>> >> > > > > > > The fork could be mitigated by turning our real experience
>> >> > > iterating
>> >> > > > > the
>> >> > > > > > > custom code generator into a well-founded patch back into
>> the
>> >> > > Apache
>> >> > > > > > Thrift
>> >> > > > > > > project, but saying we'll do that is easier than following
>> >> > through
>> >> > > > and
>> >> > > > > > > actually doing it.
>> >> > > > > > >
>> >> > > > > > > ==
>> >> > > > > > > Review guide / details:
>> >> > > > > > >
>> >> > > > > > > The technology stack:
>> >> > > > > > > The thrift IDL parsing and thrift wire parsing are both
>> >> handled
>> >> > by
>> >> > > > the
>> >> > > > > > > Facebook swift project [4].  We only implement the middle
>> bit
>> >> > that
>> >> > > > > > > generates java code stubs.  This gives higher confidence
>> that
>> >> the
>> >> > > > > tricky
>> >> > > > > > > bits out at either edge are done right.
>> >> > > > > > > The thrift struct code generation is done using Square's
>> >> javapoet
>> >> > > [5]
>> >> > > > > in
>> >> > > > > > > favor of templating for the purpose of easier to read
>> >> generator
>> >> > > code.
>> >> > > > > > This
>> >> > > > > > > characterization is debatable though and template are
>> >> certainly
>> >> > > more
>> >> > > > > > > flexible the minute you need to gen a second language (say
>> we
>> >> > like
>> >> > > > this
>> >> > > > > > and
>> >> > > > > > > want to do javascript codegen this way too for example).
>> >> > > > > > > The MyBatis codegen is forced by the thrift codegen for
>> >> technical
>> >> > > > > > > reasons.  In short, there is no simple way to teach
>> MyBatis to
>> >> > read
>> >> > > > and
>> >> > > > > > > write immutable objects with builders.  So the MyBatis
>> code is
>> >> > > > > generated
>> >> > > > > > > via an annotation processor that runs after thrift code
>> gen,
>> >> but
>> >> > > > > reading
>> >> > > > > > > thrift annotations that survive that codegen process.
>> >> > > > > > > The codegen unit testing is done with the help of Google's
>> >> > > > > compile-tester
>> >> > > > > > > [6].  NB that this has an expected output comparison that
>> >> checks
>> >> > > the
>> >> > > > > > > generated AST and not the text, so its fairly lenient.
>> >> > Whitepsace
>> >> > > > and
>> >> > > > > > > comments certainly don't matter.
>> >> > > > > > >
>> >> > > > > > > Review strategy:
>> >> > > > > > > The code generator RBs (#1 & #2 in the 3 part series) are
>> >> > probably
>> >> > > > > easier
>> >> > > > > > > to review looking at samples of the generated code.  Both
>> the
>> >> > > thrift
>> >> > > > > > > codegen and MyBatis codegen samples are conveniently
>> >> contained in
>> >> > > the
>> >> > > > > > > MyBatis codegen RB (#2:
>> https://reviews.apache.org/r/42749/).
>> >> > The
>> >> > > > > unit
>> >> > > > > > > test uses resource files that contain both the thrift
>> codegen
>> >> > > inputs
>> >> > > > > the
>> >> > > > > > > annotation processor runs over and the annotation processor
>> >> > > expected
>> >> > > > > > > outputs  - the MyBatis peer classes.  So have a look there
>> if
>> >> you
>> >> > > > need
>> >> > > > > > > something concrete and don't want to patch the RBs in and
>> >> > actually
>> >> > > > run
>> >> > > > > > the
>> >> > > > > > > codegen (`./gradlew api:compileJava`).
>> >> > > > > > > The conversion RB (#3) is large but the changes are mainly
>> >> > > mechanical
>> >> > > > > > > conversions from the current mutable thrift + I* wrappers
>> to
>> >> pure
>> >> > > > > > immutable
>> >> > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.
>> The
>> >> main
>> >> > > > > changes
>> >> > > > > > > of note are to the portions of the codebase tightly tied to
>> >> > thrift
>> >> > > > as a
>> >> > > > > > > technology:
>> >> > > > > > > + Gson/thrift converters
>> >> > > > > > > + Shiro annotated auth param interception
>> >> > > > > > > + Thrift/Servlet binding
>> >> > > > > > >
>> >> > > > > > > [1]
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
>> >> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
>> >> > > > > > > [3]
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
>> >> > > > > > > [4] https://github.com/facebook/swift
>> >> > > > > > > [5] https://github.com/square/javapoet
>> >> > > > > > > [6] https://github.com/google/compile-testing
>> >> > > > > > >
>> >> > > > > >
>> >> > > > > > --
>> >> > > > > > Zameer Manji
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>>
>> --
>> John Sirois
>> 303-512-3301
>>
>
>

Re: [PROPOSAL] Change java thrift code gen

Posted by John Sirois <js...@apache.org>.
On Thu, Jan 28, 2016 at 8:26 AM, John Sirois <jo...@conductant.com> wrote:

> On Thu, Jan 28, 2016 at 6:45 AM, Jake Farrell <jf...@apache.org> wrote:
>
> > +1 to making this apart of Thrift, i'm happy to help shepard this on the
> > Thrift side and get it in as soon as its ready
> >
>
> I've filed https://issues.apache.org/jira/browse/THRIFT-3583 to use as the
> basis for discussion of this feature over in the Apache Thrift project.
> I looked at the problem a bit and noted some challenges.
>

And started a dev@thrift.apache.org thread here if anyone wants to follow
along or participate: http://markmail.org/message/mlxyyauyvlvuxjsf


>
>
> >
> > -Jake
> >
> > On Wed, Jan 27, 2016 at 8:08 PM, Maxim Khutornenko <ma...@apache.org>
> > wrote:
> >
> >> I am +1 to making immutable thrift objects solely based on perf numbers.
> >>
> >> My biggest concern though is maintenance of a pretty intricate codebase,
> >> especially when it comes to upgrading any of the frameworks involved.
> >> Bill's suggestion to explore paths to make this a part of Apache Thrift
> >> sounds great to me.
> >>
> >> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wf...@apache.org>
> wrote:
> >>
> >> > Tony - this would not be a technical fork so much as a spiritual fork.
> >> > While we would have our own bugs to work out, the only upstream
> exposure
> >> > would be IDL or wire format changes.
> >> >
> >> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid
> >
> >> > wrote:
> >> >
> >> > > Awesome performance numbers! I don't particularly know the logistics
> >> of
> >> > > upstreaming a change like this, but optimistically I would suggest
> >> > > upstreaming it to Apache Thrift if possible.
> >> > >
> >> > > As someone that maintains a fork of a thrift compiler(fork of
> >> scrooge), I
> >> > > have to say that it's not that fun. There's a lot of custom code
> that
> >> > needs
> >> > > to be maintained and a bunch of work to rebase the code
> periodically.
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org>
> >> wrote:
> >> > >
> >> > > > Firstly - thanks for the clean organization and delineation of
> >> steps in
> >> > > > this change.  Top notch work!
> >> > > >
> >> > > > Some of the performance improvements are very nice; and in a
> >> > particularly
> >> > > > hot code path.  I will wager a guess that the majority of the
> >> savings
> >> > is
> >> > > in
> >> > > > avoiding what amounts to copy constructors between mutable and
> >> > immutable
> >> > > > types.  I further wager there are alternative approaches we could
> >> weigh
> >> > > to
> >> > > > achieve those performance improvements.  As an example - you note
> >> above
> >> > > > that we could provide a patch to Apache Thrift.  Depending how
> much
> >> > > > performance inspires our decision here, it will be prudent to
> >> evaluate
> >> > > > alternatives.
> >> > > >
> >> > > > I think there are (at least) two major issues worth discussing -
> >> code
> >> > > > volume (which you note) and an increase in logical complexity.
> This
> >> > will
> >> > > > leave us with a bifurcation in code generation tooling
> (custom+swift
> >> > for
> >> > > > Java, Apache Thrift for python and js).  It's difficult to
> quantify
> >> the
> >> > > > downside of that, but it seems like an unfortunate state with
> >> potential
> >> > > for
> >> > > > compatibility risks.
> >> > > >
> >> > > >
> >> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org>
> >> > wrote:
> >> > > >
> >> > > > > Some high level comments without looking at the code.
> >> > > > >
> >> > > > > I'm in favor from abandoning the thrift generated java code in
> >> favor
> >> > of
> >> > > > > immutable objects. I think it is easier to reason about and will
> >> > ensure
> >> > > > we
> >> > > > > have less errors in our code. If I understand correctly, the
> >> ProtoBuf
> >> > > > > format does this by default, so there some precedent for this
> >> style
> >> > of
> >> > > > code
> >> > > > > generation already.
> >> > > > >
> >> > > > > I think using Facebook's swift is the best approach here. I
> would
> >> be
> >> > > > > hesitant to accept any custom code generation that involved us
> >> > parsing
> >> > > > > thrift IDL files or thrift formats over the wire because I
> poses a
> >> > very
> >> > > > > high maintenance burden.
> >> > > > >
> >> > > > > I also think generating the MyBatis mutable classes is superior
> to
> >> > our
> >> > > > > current strategy of manually creating them.
> >> > > > >
> >> > > > > Finally, the performance improvements look fantastic. As an
> >> operator
> >> > > of a
> >> > > > > large cluster I am excited to see wholesale performance
> >> improvements
> >> > > as I
> >> > > > > am always concerned that my cluster is approaching the limits of
> >> what
> >> > > > > Aurora can handle safely.
> >> > > > >
> >> > > > > Overall, I think this change merits a serious discussion from
> all
> >> > > > > contributors.
> >> > > > >
> >> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <
> jsirois@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <
> >> jsirois@apache.org>
> >> > > > wrote:
> >> > > > > >
> >> > > > > > > Context: Aurora uses the official Apache Thrift compiler
> today
> >> > > plus a
> >> > > > > > > home-grown python code generator [1] for immutable "entity"
> >> (I*)
> >> > > > > > wrappers.
> >> > > > > > >
> >> > > > > > > The proposal is to switch from using the Apache Thrift code
> >> > > generator
> >> > > > > to
> >> > > > > > a
> >> > > > > > > home grown generator.  The proposal comes with a concrete
> >> example
> >> > > in
> >> > > > > the
> >> > > > > > > form of the actual RBs to effect this change:
> >> > > > > > > 1. A custom java thrift code generator:
> >> > > > > > > https://reviews.apache.org/r/42748/
> >> > > > > > > 2. A custom MyBatis binding code generator powered by 1
> above:
> >> > > > > > > https://reviews.apache.org/r/42749/
> >> > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> >> > > > > > > https://reviews.apache.org/r/42756/
> >> > > > > > >
> >> > > > > > > Since the RBs are large, I wanted to provide some extra
> >> context
> >> > on
> >> > > > the
> >> > > > > > > idea at a higher level.  I provide rationale, pros and cons
> >> below
> >> > > for
> >> > > > > > those
> >> > > > > > > interested in the idea but wary of diving in on code review
> >> until
> >> > > the
> >> > > > > > idea
> >> > > > > > > itself passes a sniff test.
> >> > > > > > > Thanks in advance for your feedback - and if we get there -
> >> for
> >> > > your
> >> > > > > > > review effort.
> >> > > > > > >
> >> > > > > >
> >> > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs
> above
> >> > > since
> >> > > > > > they've expressed direct interest.  Happy to add others, just
> >> speak
> >> > > up
> >> > > > or
> >> > > > > > else just comment on the reviews as you see fit.
> >> > > > > > I'll formally only submit if 1st this email thread reaches
> >> > consensus,
> >> > > > and
> >> > > > > > second, reviews are approved.
> >> > > > > >
> >> > > > > > ==
> >> > > > > > >
> >> > > > > > > In the course of an initial run at creating a first-class
> >> > REST-like
> >> > > > > > > scheduler interface [2] I came to the conclusion generating
> >> the
> >> > > json
> >> > > > > API
> >> > > > > > > from the thrift one might be a good path.  That idea has
> been
> >> > > > scrapped
> >> > > > > > with
> >> > > > > > > community feedback, but an initial experiment in custom
> thrift
> >> > > > code-gen
> >> > > > > > for
> >> > > > > > > java that accompanied that idea seemed worth pursuing for
> its
> >> own
> >> > > > > > > independent benefits, chief among these being 1st class
> >> immutable
> >> > > > > thrift
> >> > > > > > > structs and the ability to leverage thrift annotations.
> >> > > > > > >
> >> > > > > > > Immutability:
> >> > > > > > > The benefits of having an immutable by default data model
> are
> >> the
> >> > > > > > standard
> >> > > > > > > ones; namely, its trivial to reason about safety of
> concurrent
> >> > > > > operations
> >> > > > > > > on the data model, stability of collections containing data
> >> model
> >> > > > > > entities
> >> > > > > > > and it opens up straight-forward optimizations that are easy
> >> to
> >> > > > reason
> >> > > > > > > about.
> >> > > > > > > An example optimization is caching hashCodes for the
> immutable
> >> > > thrift
> >> > > > > > > structs.  This was done after comparing jmh benchmarks run
> >> > against
> >> > > > > master
> >> > > > > > > and then again against the proposal branch.  Perf was
> >> comparable
> >> > -
> >> > > > > within
> >> > > > > > > 10% plus and minus depending on the benchmark, but with the
> >> > > > > optimization
> >> > > > > > > added many benchmarks showed pronounced improvement in the
> >> > proposal
> >> > > > > > branch
> >> > > > > > > [3].  The optimization is clearly safe and was quick and
> easy
> >> to
> >> > > > > > > implement.  Further optimizations can be experimented with
> in
> >> a
> >> > > > > > > straightforward way.
> >> > > > > > >
> >> > > > > > > Thrift Annotations:
> >> > > > > > > The thrift IDL grammar has supported these for quite some
> >> time,
> >> > but
> >> > > > > they
> >> > > > > > > are not plumbed to the generated java code.  Uses are many
> and
> >> > > > > varied.  I
> >> > > > > > > initially had my eye on annotation of thrift services with
> >> REST
> >> > > > verbs,
> >> > > > > > > routes, etc - but immediately we can leverage these
> >> annotations
> >> > to
> >> > > > kill
> >> > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis
> binding
> >> > code
> >> > > > that
> >> > > > > > > needs to be maintained.
> >> > > > > > >
> >> > > > > > > There are a few downsides to switching to our own java
> thrift
> >> > code
> >> > > > gen:
> >> > > > > > > 1. We own more code to maintain:  Even though we have the
> >> custom
> >> > > > python
> >> > > > > > > "immutable" wrapper generator [1] today, this new generator
> -
> >> > even
> >> > > > with
> >> > > > > > the
> >> > > > > > > python generator removed - represents a 5-6x increase in
> line
> >> > count
> >> > > > of
> >> > > > > > > custom code (~4.1k lines of code and tests in the new custom
> >> gen,
> >> > > > ~700
> >> > > > > > > lines in the existing python custom gen)
> >> > > > > > > 2. We conceptually fork from a sibling Apache project.
> >> > > > > > >
> >> > > > > > > The fork could be mitigated by turning our real experience
> >> > > iterating
> >> > > > > the
> >> > > > > > > custom code generator into a well-founded patch back into
> the
> >> > > Apache
> >> > > > > > Thrift
> >> > > > > > > project, but saying we'll do that is easier than following
> >> > through
> >> > > > and
> >> > > > > > > actually doing it.
> >> > > > > > >
> >> > > > > > > ==
> >> > > > > > > Review guide / details:
> >> > > > > > >
> >> > > > > > > The technology stack:
> >> > > > > > > The thrift IDL parsing and thrift wire parsing are both
> >> handled
> >> > by
> >> > > > the
> >> > > > > > > Facebook swift project [4].  We only implement the middle
> bit
> >> > that
> >> > > > > > > generates java code stubs.  This gives higher confidence
> that
> >> the
> >> > > > > tricky
> >> > > > > > > bits out at either edge are done right.
> >> > > > > > > The thrift struct code generation is done using Square's
> >> javapoet
> >> > > [5]
> >> > > > > in
> >> > > > > > > favor of templating for the purpose of easier to read
> >> generator
> >> > > code.
> >> > > > > > This
> >> > > > > > > characterization is debatable though and template are
> >> certainly
> >> > > more
> >> > > > > > > flexible the minute you need to gen a second language (say
> we
> >> > like
> >> > > > this
> >> > > > > > and
> >> > > > > > > want to do javascript codegen this way too for example).
> >> > > > > > > The MyBatis codegen is forced by the thrift codegen for
> >> technical
> >> > > > > > > reasons.  In short, there is no simple way to teach MyBatis
> to
> >> > read
> >> > > > and
> >> > > > > > > write immutable objects with builders.  So the MyBatis code
> is
> >> > > > > generated
> >> > > > > > > via an annotation processor that runs after thrift code gen,
> >> but
> >> > > > > reading
> >> > > > > > > thrift annotations that survive that codegen process.
> >> > > > > > > The codegen unit testing is done with the help of Google's
> >> > > > > compile-tester
> >> > > > > > > [6].  NB that this has an expected output comparison that
> >> checks
> >> > > the
> >> > > > > > > generated AST and not the text, so its fairly lenient.
> >> > Whitepsace
> >> > > > and
> >> > > > > > > comments certainly don't matter.
> >> > > > > > >
> >> > > > > > > Review strategy:
> >> > > > > > > The code generator RBs (#1 & #2 in the 3 part series) are
> >> > probably
> >> > > > > easier
> >> > > > > > > to review looking at samples of the generated code.  Both
> the
> >> > > thrift
> >> > > > > > > codegen and MyBatis codegen samples are conveniently
> >> contained in
> >> > > the
> >> > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/
> ).
> >> > The
> >> > > > > unit
> >> > > > > > > test uses resource files that contain both the thrift
> codegen
> >> > > inputs
> >> > > > > the
> >> > > > > > > annotation processor runs over and the annotation processor
> >> > > expected
> >> > > > > > > outputs  - the MyBatis peer classes.  So have a look there
> if
> >> you
> >> > > > need
> >> > > > > > > something concrete and don't want to patch the RBs in and
> >> > actually
> >> > > > run
> >> > > > > > the
> >> > > > > > > codegen (`./gradlew api:compileJava`).
> >> > > > > > > The conversion RB (#3) is large but the changes are mainly
> >> > > mechanical
> >> > > > > > > conversions from the current mutable thrift + I* wrappers to
> >> pure
> >> > > > > > immutable
> >> > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The
> >> main
> >> > > > > changes
> >> > > > > > > of note are to the portions of the codebase tightly tied to
> >> > thrift
> >> > > > as a
> >> > > > > > > technology:
> >> > > > > > > + Gson/thrift converters
> >> > > > > > > + Shiro annotated auth param interception
> >> > > > > > > + Thrift/Servlet binding
> >> > > > > > >
> >> > > > > > > [1]
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> >> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> >> > > > > > > [3]
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> >> > > > > > > [4] https://github.com/facebook/swift
> >> > > > > > > [5] https://github.com/square/javapoet
> >> > > > > > > [6] https://github.com/google/compile-testing
> >> > > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Zameer Manji
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
>
> --
> John Sirois
> 303-512-3301
>

Re: [PROPOSAL] Change java thrift code gen

Posted by John Sirois <jo...@conductant.com>.
On Thu, Jan 28, 2016 at 6:45 AM, Jake Farrell <jf...@apache.org> wrote:

> +1 to making this apart of Thrift, i'm happy to help shepard this on the
> Thrift side and get it in as soon as its ready
>

I've filed https://issues.apache.org/jira/browse/THRIFT-3583 to use as the
basis for discussion of this feature over in the Apache Thrift project.
I looked at the problem a bit and noted some challenges.


>
> -Jake
>
> On Wed, Jan 27, 2016 at 8:08 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
>> I am +1 to making immutable thrift objects solely based on perf numbers.
>>
>> My biggest concern though is maintenance of a pretty intricate codebase,
>> especially when it comes to upgrading any of the frameworks involved.
>> Bill's suggestion to explore paths to make this a part of Apache Thrift
>> sounds great to me.
>>
>> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wf...@apache.org> wrote:
>>
>> > Tony - this would not be a technical fork so much as a spiritual fork.
>> > While we would have our own bugs to work out, the only upstream exposure
>> > would be IDL or wire format changes.
>> >
>> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <td...@twitter.com.invalid>
>> > wrote:
>> >
>> > > Awesome performance numbers! I don't particularly know the logistics
>> of
>> > > upstreaming a change like this, but optimistically I would suggest
>> > > upstreaming it to Apache Thrift if possible.
>> > >
>> > > As someone that maintains a fork of a thrift compiler(fork of
>> scrooge), I
>> > > have to say that it's not that fun. There's a lot of custom code that
>> > needs
>> > > to be maintained and a bunch of work to rebase the code periodically.
>> > >
>> > >
>> > >
>> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> > >
>> > > > Firstly - thanks for the clean organization and delineation of
>> steps in
>> > > > this change.  Top notch work!
>> > > >
>> > > > Some of the performance improvements are very nice; and in a
>> > particularly
>> > > > hot code path.  I will wager a guess that the majority of the
>> savings
>> > is
>> > > in
>> > > > avoiding what amounts to copy constructors between mutable and
>> > immutable
>> > > > types.  I further wager there are alternative approaches we could
>> weigh
>> > > to
>> > > > achieve those performance improvements.  As an example - you note
>> above
>> > > > that we could provide a patch to Apache Thrift.  Depending how much
>> > > > performance inspires our decision here, it will be prudent to
>> evaluate
>> > > > alternatives.
>> > > >
>> > > > I think there are (at least) two major issues worth discussing -
>> code
>> > > > volume (which you note) and an increase in logical complexity.  This
>> > will
>> > > > leave us with a bifurcation in code generation tooling (custom+swift
>> > for
>> > > > Java, Apache Thrift for python and js).  It's difficult to quantify
>> the
>> > > > downside of that, but it seems like an unfortunate state with
>> potential
>> > > for
>> > > > compatibility risks.
>> > > >
>> > > >
>> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org>
>> > wrote:
>> > > >
>> > > > > Some high level comments without looking at the code.
>> > > > >
>> > > > > I'm in favor from abandoning the thrift generated java code in
>> favor
>> > of
>> > > > > immutable objects. I think it is easier to reason about and will
>> > ensure
>> > > > we
>> > > > > have less errors in our code. If I understand correctly, the
>> ProtoBuf
>> > > > > format does this by default, so there some precedent for this
>> style
>> > of
>> > > > code
>> > > > > generation already.
>> > > > >
>> > > > > I think using Facebook's swift is the best approach here. I would
>> be
>> > > > > hesitant to accept any custom code generation that involved us
>> > parsing
>> > > > > thrift IDL files or thrift formats over the wire because I poses a
>> > very
>> > > > > high maintenance burden.
>> > > > >
>> > > > > I also think generating the MyBatis mutable classes is superior to
>> > our
>> > > > > current strategy of manually creating them.
>> > > > >
>> > > > > Finally, the performance improvements look fantastic. As an
>> operator
>> > > of a
>> > > > > large cluster I am excited to see wholesale performance
>> improvements
>> > > as I
>> > > > > am always concerned that my cluster is approaching the limits of
>> what
>> > > > > Aurora can handle safely.
>> > > > >
>> > > > > Overall, I think this change merits a serious discussion from all
>> > > > > contributors.
>> > > > >
>> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <
>> jsirois@apache.org>
>> > > > wrote:
>> > > > > >
>> > > > > > > Context: Aurora uses the official Apache Thrift compiler today
>> > > plus a
>> > > > > > > home-grown python code generator [1] for immutable "entity"
>> (I*)
>> > > > > > wrappers.
>> > > > > > >
>> > > > > > > The proposal is to switch from using the Apache Thrift code
>> > > generator
>> > > > > to
>> > > > > > a
>> > > > > > > home grown generator.  The proposal comes with a concrete
>> example
>> > > in
>> > > > > the
>> > > > > > > form of the actual RBs to effect this change:
>> > > > > > > 1. A custom java thrift code generator:
>> > > > > > > https://reviews.apache.org/r/42748/
>> > > > > > > 2. A custom MyBatis binding code generator powered by 1 above:
>> > > > > > > https://reviews.apache.org/r/42749/
>> > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
>> > > > > > > https://reviews.apache.org/r/42756/
>> > > > > > >
>> > > > > > > Since the RBs are large, I wanted to provide some extra
>> context
>> > on
>> > > > the
>> > > > > > > idea at a higher level.  I provide rationale, pros and cons
>> below
>> > > for
>> > > > > > those
>> > > > > > > interested in the idea but wary of diving in on code review
>> until
>> > > the
>> > > > > > idea
>> > > > > > > itself passes a sniff test.
>> > > > > > > Thanks in advance for your feedback - and if we get there -
>> for
>> > > your
>> > > > > > > review effort.
>> > > > > > >
>> > > > > >
>> > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
>> > > since
>> > > > > > they've expressed direct interest.  Happy to add others, just
>> speak
>> > > up
>> > > > or
>> > > > > > else just comment on the reviews as you see fit.
>> > > > > > I'll formally only submit if 1st this email thread reaches
>> > consensus,
>> > > > and
>> > > > > > second, reviews are approved.
>> > > > > >
>> > > > > > ==
>> > > > > > >
>> > > > > > > In the course of an initial run at creating a first-class
>> > REST-like
>> > > > > > > scheduler interface [2] I came to the conclusion generating
>> the
>> > > json
>> > > > > API
>> > > > > > > from the thrift one might be a good path.  That idea has been
>> > > > scrapped
>> > > > > > with
>> > > > > > > community feedback, but an initial experiment in custom thrift
>> > > > code-gen
>> > > > > > for
>> > > > > > > java that accompanied that idea seemed worth pursuing for its
>> own
>> > > > > > > independent benefits, chief among these being 1st class
>> immutable
>> > > > > thrift
>> > > > > > > structs and the ability to leverage thrift annotations.
>> > > > > > >
>> > > > > > > Immutability:
>> > > > > > > The benefits of having an immutable by default data model are
>> the
>> > > > > > standard
>> > > > > > > ones; namely, its trivial to reason about safety of concurrent
>> > > > > operations
>> > > > > > > on the data model, stability of collections containing data
>> model
>> > > > > > entities
>> > > > > > > and it opens up straight-forward optimizations that are easy
>> to
>> > > > reason
>> > > > > > > about.
>> > > > > > > An example optimization is caching hashCodes for the immutable
>> > > thrift
>> > > > > > > structs.  This was done after comparing jmh benchmarks run
>> > against
>> > > > > master
>> > > > > > > and then again against the proposal branch.  Perf was
>> comparable
>> > -
>> > > > > within
>> > > > > > > 10% plus and minus depending on the benchmark, but with the
>> > > > > optimization
>> > > > > > > added many benchmarks showed pronounced improvement in the
>> > proposal
>> > > > > > branch
>> > > > > > > [3].  The optimization is clearly safe and was quick and easy
>> to
>> > > > > > > implement.  Further optimizations can be experimented with in
>> a
>> > > > > > > straightforward way.
>> > > > > > >
>> > > > > > > Thrift Annotations:
>> > > > > > > The thrift IDL grammar has supported these for quite some
>> time,
>> > but
>> > > > > they
>> > > > > > > are not plumbed to the generated java code.  Uses are many and
>> > > > > varied.  I
>> > > > > > > initially had my eye on annotation of thrift services with
>> REST
>> > > > verbs,
>> > > > > > > routes, etc - but immediately we can leverage these
>> annotations
>> > to
>> > > > kill
>> > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding
>> > code
>> > > > that
>> > > > > > > needs to be maintained.
>> > > > > > >
>> > > > > > > There are a few downsides to switching to our own java thrift
>> > code
>> > > > gen:
>> > > > > > > 1. We own more code to maintain:  Even though we have the
>> custom
>> > > > python
>> > > > > > > "immutable" wrapper generator [1] today, this new generator -
>> > even
>> > > > with
>> > > > > > the
>> > > > > > > python generator removed - represents a 5-6x increase in line
>> > count
>> > > > of
>> > > > > > > custom code (~4.1k lines of code and tests in the new custom
>> gen,
>> > > > ~700
>> > > > > > > lines in the existing python custom gen)
>> > > > > > > 2. We conceptually fork from a sibling Apache project.
>> > > > > > >
>> > > > > > > The fork could be mitigated by turning our real experience
>> > > iterating
>> > > > > the
>> > > > > > > custom code generator into a well-founded patch back into the
>> > > Apache
>> > > > > > Thrift
>> > > > > > > project, but saying we'll do that is easier than following
>> > through
>> > > > and
>> > > > > > > actually doing it.
>> > > > > > >
>> > > > > > > ==
>> > > > > > > Review guide / details:
>> > > > > > >
>> > > > > > > The technology stack:
>> > > > > > > The thrift IDL parsing and thrift wire parsing are both
>> handled
>> > by
>> > > > the
>> > > > > > > Facebook swift project [4].  We only implement the middle bit
>> > that
>> > > > > > > generates java code stubs.  This gives higher confidence that
>> the
>> > > > > tricky
>> > > > > > > bits out at either edge are done right.
>> > > > > > > The thrift struct code generation is done using Square's
>> javapoet
>> > > [5]
>> > > > > in
>> > > > > > > favor of templating for the purpose of easier to read
>> generator
>> > > code.
>> > > > > > This
>> > > > > > > characterization is debatable though and template are
>> certainly
>> > > more
>> > > > > > > flexible the minute you need to gen a second language (say we
>> > like
>> > > > this
>> > > > > > and
>> > > > > > > want to do javascript codegen this way too for example).
>> > > > > > > The MyBatis codegen is forced by the thrift codegen for
>> technical
>> > > > > > > reasons.  In short, there is no simple way to teach MyBatis to
>> > read
>> > > > and
>> > > > > > > write immutable objects with builders.  So the MyBatis code is
>> > > > > generated
>> > > > > > > via an annotation processor that runs after thrift code gen,
>> but
>> > > > > reading
>> > > > > > > thrift annotations that survive that codegen process.
>> > > > > > > The codegen unit testing is done with the help of Google's
>> > > > > compile-tester
>> > > > > > > [6].  NB that this has an expected output comparison that
>> checks
>> > > the
>> > > > > > > generated AST and not the text, so its fairly lenient.
>> > Whitepsace
>> > > > and
>> > > > > > > comments certainly don't matter.
>> > > > > > >
>> > > > > > > Review strategy:
>> > > > > > > The code generator RBs (#1 & #2 in the 3 part series) are
>> > probably
>> > > > > easier
>> > > > > > > to review looking at samples of the generated code.  Both the
>> > > thrift
>> > > > > > > codegen and MyBatis codegen samples are conveniently
>> contained in
>> > > the
>> > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).
>> > The
>> > > > > unit
>> > > > > > > test uses resource files that contain both the thrift codegen
>> > > inputs
>> > > > > the
>> > > > > > > annotation processor runs over and the annotation processor
>> > > expected
>> > > > > > > outputs  - the MyBatis peer classes.  So have a look there if
>> you
>> > > > need
>> > > > > > > something concrete and don't want to patch the RBs in and
>> > actually
>> > > > run
>> > > > > > the
>> > > > > > > codegen (`./gradlew api:compileJava`).
>> > > > > > > The conversion RB (#3) is large but the changes are mainly
>> > > mechanical
>> > > > > > > conversions from the current mutable thrift + I* wrappers to
>> pure
>> > > > > > immutable
>> > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The
>> main
>> > > > > changes
>> > > > > > > of note are to the portions of the codebase tightly tied to
>> > thrift
>> > > > as a
>> > > > > > > technology:
>> > > > > > > + Gson/thrift converters
>> > > > > > > + Shiro annotated auth param interception
>> > > > > > > + Thrift/Servlet binding
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
>> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
>> > > > > > > [3]
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
>> > > > > > > [4] https://github.com/facebook/swift
>> > > > > > > [5] https://github.com/square/javapoet
>> > > > > > > [6] https://github.com/google/compile-testing
>> > > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Zameer Manji
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


-- 
John Sirois
303-512-3301

Re: [PROPOSAL] Change java thrift code gen

Posted by Jake Farrell <jf...@apache.org>.
+1 to making this apart of Thrift, i'm happy to help shepard this on the
Thrift side and get it in as soon as its ready

-Jake

On Wed, Jan 27, 2016 at 8:08 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> I am +1 to making immutable thrift objects solely based on perf numbers.
>
> My biggest concern though is maintenance of a pretty intricate codebase,
> especially when it comes to upgrading any of the frameworks involved.
> Bill's suggestion to explore paths to make this a part of Apache Thrift
> sounds great to me.
>
> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wf...@apache.org> wrote:
>
> > Tony - this would not be a technical fork so much as a spiritual fork.
> > While we would have our own bugs to work out, the only upstream exposure
> > would be IDL or wire format changes.
> >
> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <td...@twitter.com.invalid>
> > wrote:
> >
> > > Awesome performance numbers! I don't particularly know the logistics of
> > > upstreaming a change like this, but optimistically I would suggest
> > > upstreaming it to Apache Thrift if possible.
> > >
> > > As someone that maintains a fork of a thrift compiler(fork of
> scrooge), I
> > > have to say that it's not that fun. There's a lot of custom code that
> > needs
> > > to be maintained and a bunch of work to rebase the code periodically.
> > >
> > >
> > >
> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org>
> wrote:
> > >
> > > > Firstly - thanks for the clean organization and delineation of steps
> in
> > > > this change.  Top notch work!
> > > >
> > > > Some of the performance improvements are very nice; and in a
> > particularly
> > > > hot code path.  I will wager a guess that the majority of the savings
> > is
> > > in
> > > > avoiding what amounts to copy constructors between mutable and
> > immutable
> > > > types.  I further wager there are alternative approaches we could
> weigh
> > > to
> > > > achieve those performance improvements.  As an example - you note
> above
> > > > that we could provide a patch to Apache Thrift.  Depending how much
> > > > performance inspires our decision here, it will be prudent to
> evaluate
> > > > alternatives.
> > > >
> > > > I think there are (at least) two major issues worth discussing - code
> > > > volume (which you note) and an increase in logical complexity.  This
> > will
> > > > leave us with a bifurcation in code generation tooling (custom+swift
> > for
> > > > Java, Apache Thrift for python and js).  It's difficult to quantify
> the
> > > > downside of that, but it seems like an unfortunate state with
> potential
> > > for
> > > > compatibility risks.
> > > >
> > > >
> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org>
> > wrote:
> > > >
> > > > > Some high level comments without looking at the code.
> > > > >
> > > > > I'm in favor from abandoning the thrift generated java code in
> favor
> > of
> > > > > immutable objects. I think it is easier to reason about and will
> > ensure
> > > > we
> > > > > have less errors in our code. If I understand correctly, the
> ProtoBuf
> > > > > format does this by default, so there some precedent for this style
> > of
> > > > code
> > > > > generation already.
> > > > >
> > > > > I think using Facebook's swift is the best approach here. I would
> be
> > > > > hesitant to accept any custom code generation that involved us
> > parsing
> > > > > thrift IDL files or thrift formats over the wire because I poses a
> > very
> > > > > high maintenance burden.
> > > > >
> > > > > I also think generating the MyBatis mutable classes is superior to
> > our
> > > > > current strategy of manually creating them.
> > > > >
> > > > > Finally, the performance improvements look fantastic. As an
> operator
> > > of a
> > > > > large cluster I am excited to see wholesale performance
> improvements
> > > as I
> > > > > am always concerned that my cluster is approaching the limits of
> what
> > > > > Aurora can handle safely.
> > > > >
> > > > > Overall, I think this change merits a serious discussion from all
> > > > > contributors.
> > > > >
> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org>
> > > wrote:
> > > > >
> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsirois@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Context: Aurora uses the official Apache Thrift compiler today
> > > plus a
> > > > > > > home-grown python code generator [1] for immutable "entity"
> (I*)
> > > > > > wrappers.
> > > > > > >
> > > > > > > The proposal is to switch from using the Apache Thrift code
> > > generator
> > > > > to
> > > > > > a
> > > > > > > home grown generator.  The proposal comes with a concrete
> example
> > > in
> > > > > the
> > > > > > > form of the actual RBs to effect this change:
> > > > > > > 1. A custom java thrift code generator:
> > > > > > > https://reviews.apache.org/r/42748/
> > > > > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > > > > https://reviews.apache.org/r/42749/
> > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > > > > https://reviews.apache.org/r/42756/
> > > > > > >
> > > > > > > Since the RBs are large, I wanted to provide some extra context
> > on
> > > > the
> > > > > > > idea at a higher level.  I provide rationale, pros and cons
> below
> > > for
> > > > > > those
> > > > > > > interested in the idea but wary of diving in on code review
> until
> > > the
> > > > > > idea
> > > > > > > itself passes a sniff test.
> > > > > > > Thanks in advance for your feedback - and if we get there - for
> > > your
> > > > > > > review effort.
> > > > > > >
> > > > > >
> > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
> > > since
> > > > > > they've expressed direct interest.  Happy to add others, just
> speak
> > > up
> > > > or
> > > > > > else just comment on the reviews as you see fit.
> > > > > > I'll formally only submit if 1st this email thread reaches
> > consensus,
> > > > and
> > > > > > second, reviews are approved.
> > > > > >
> > > > > > ==
> > > > > > >
> > > > > > > In the course of an initial run at creating a first-class
> > REST-like
> > > > > > > scheduler interface [2] I came to the conclusion generating the
> > > json
> > > > > API
> > > > > > > from the thrift one might be a good path.  That idea has been
> > > > scrapped
> > > > > > with
> > > > > > > community feedback, but an initial experiment in custom thrift
> > > > code-gen
> > > > > > for
> > > > > > > java that accompanied that idea seemed worth pursuing for its
> own
> > > > > > > independent benefits, chief among these being 1st class
> immutable
> > > > > thrift
> > > > > > > structs and the ability to leverage thrift annotations.
> > > > > > >
> > > > > > > Immutability:
> > > > > > > The benefits of having an immutable by default data model are
> the
> > > > > > standard
> > > > > > > ones; namely, its trivial to reason about safety of concurrent
> > > > > operations
> > > > > > > on the data model, stability of collections containing data
> model
> > > > > > entities
> > > > > > > and it opens up straight-forward optimizations that are easy to
> > > > reason
> > > > > > > about.
> > > > > > > An example optimization is caching hashCodes for the immutable
> > > thrift
> > > > > > > structs.  This was done after comparing jmh benchmarks run
> > against
> > > > > master
> > > > > > > and then again against the proposal branch.  Perf was
> comparable
> > -
> > > > > within
> > > > > > > 10% plus and minus depending on the benchmark, but with the
> > > > > optimization
> > > > > > > added many benchmarks showed pronounced improvement in the
> > proposal
> > > > > > branch
> > > > > > > [3].  The optimization is clearly safe and was quick and easy
> to
> > > > > > > implement.  Further optimizations can be experimented with in a
> > > > > > > straightforward way.
> > > > > > >
> > > > > > > Thrift Annotations:
> > > > > > > The thrift IDL grammar has supported these for quite some time,
> > but
> > > > > they
> > > > > > > are not plumbed to the generated java code.  Uses are many and
> > > > > varied.  I
> > > > > > > initially had my eye on annotation of thrift services with REST
> > > > verbs,
> > > > > > > routes, etc - but immediately we can leverage these annotations
> > to
> > > > kill
> > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding
> > code
> > > > that
> > > > > > > needs to be maintained.
> > > > > > >
> > > > > > > There are a few downsides to switching to our own java thrift
> > code
> > > > gen:
> > > > > > > 1. We own more code to maintain:  Even though we have the
> custom
> > > > python
> > > > > > > "immutable" wrapper generator [1] today, this new generator -
> > even
> > > > with
> > > > > > the
> > > > > > > python generator removed - represents a 5-6x increase in line
> > count
> > > > of
> > > > > > > custom code (~4.1k lines of code and tests in the new custom
> gen,
> > > > ~700
> > > > > > > lines in the existing python custom gen)
> > > > > > > 2. We conceptually fork from a sibling Apache project.
> > > > > > >
> > > > > > > The fork could be mitigated by turning our real experience
> > > iterating
> > > > > the
> > > > > > > custom code generator into a well-founded patch back into the
> > > Apache
> > > > > > Thrift
> > > > > > > project, but saying we'll do that is easier than following
> > through
> > > > and
> > > > > > > actually doing it.
> > > > > > >
> > > > > > > ==
> > > > > > > Review guide / details:
> > > > > > >
> > > > > > > The technology stack:
> > > > > > > The thrift IDL parsing and thrift wire parsing are both handled
> > by
> > > > the
> > > > > > > Facebook swift project [4].  We only implement the middle bit
> > that
> > > > > > > generates java code stubs.  This gives higher confidence that
> the
> > > > > tricky
> > > > > > > bits out at either edge are done right.
> > > > > > > The thrift struct code generation is done using Square's
> javapoet
> > > [5]
> > > > > in
> > > > > > > favor of templating for the purpose of easier to read generator
> > > code.
> > > > > > This
> > > > > > > characterization is debatable though and template are certainly
> > > more
> > > > > > > flexible the minute you need to gen a second language (say we
> > like
> > > > this
> > > > > > and
> > > > > > > want to do javascript codegen this way too for example).
> > > > > > > The MyBatis codegen is forced by the thrift codegen for
> technical
> > > > > > > reasons.  In short, there is no simple way to teach MyBatis to
> > read
> > > > and
> > > > > > > write immutable objects with builders.  So the MyBatis code is
> > > > > generated
> > > > > > > via an annotation processor that runs after thrift code gen,
> but
> > > > > reading
> > > > > > > thrift annotations that survive that codegen process.
> > > > > > > The codegen unit testing is done with the help of Google's
> > > > > compile-tester
> > > > > > > [6].  NB that this has an expected output comparison that
> checks
> > > the
> > > > > > > generated AST and not the text, so its fairly lenient.
> > Whitepsace
> > > > and
> > > > > > > comments certainly don't matter.
> > > > > > >
> > > > > > > Review strategy:
> > > > > > > The code generator RBs (#1 & #2 in the 3 part series) are
> > probably
> > > > > easier
> > > > > > > to review looking at samples of the generated code.  Both the
> > > thrift
> > > > > > > codegen and MyBatis codegen samples are conveniently contained
> in
> > > the
> > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).
> > The
> > > > > unit
> > > > > > > test uses resource files that contain both the thrift codegen
> > > inputs
> > > > > the
> > > > > > > annotation processor runs over and the annotation processor
> > > expected
> > > > > > > outputs  - the MyBatis peer classes.  So have a look there if
> you
> > > > need
> > > > > > > something concrete and don't want to patch the RBs in and
> > actually
> > > > run
> > > > > > the
> > > > > > > codegen (`./gradlew api:compileJava`).
> > > > > > > The conversion RB (#3) is large but the changes are mainly
> > > mechanical
> > > > > > > conversions from the current mutable thrift + I* wrappers to
> pure
> > > > > > immutable
> > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The
> main
> > > > > changes
> > > > > > > of note are to the portions of the codebase tightly tied to
> > thrift
> > > > as a
> > > > > > > technology:
> > > > > > > + Gson/thrift converters
> > > > > > > + Shiro annotated auth param interception
> > > > > > > + Thrift/Servlet binding
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > > > > [3]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > > > > [4] https://github.com/facebook/swift
> > > > > > > [5] https://github.com/square/javapoet
> > > > > > > [6] https://github.com/google/compile-testing
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Zameer Manji
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] Change java thrift code gen

Posted by Maxim Khutornenko <ma...@apache.org>.
I am +1 to making immutable thrift objects solely based on perf numbers.

My biggest concern though is maintenance of a pretty intricate codebase,
especially when it comes to upgrading any of the frameworks involved.
Bill's suggestion to explore paths to make this a part of Apache Thrift
sounds great to me.

On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wf...@apache.org> wrote:

> Tony - this would not be a technical fork so much as a spiritual fork.
> While we would have our own bugs to work out, the only upstream exposure
> would be IDL or wire format changes.
>
> On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <td...@twitter.com.invalid>
> wrote:
>
> > Awesome performance numbers! I don't particularly know the logistics of
> > upstreaming a change like this, but optimistically I would suggest
> > upstreaming it to Apache Thrift if possible.
> >
> > As someone that maintains a fork of a thrift compiler(fork of scrooge), I
> > have to say that it's not that fun. There's a lot of custom code that
> needs
> > to be maintained and a bunch of work to rebase the code periodically.
> >
> >
> >
> > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org> wrote:
> >
> > > Firstly - thanks for the clean organization and delineation of steps in
> > > this change.  Top notch work!
> > >
> > > Some of the performance improvements are very nice; and in a
> particularly
> > > hot code path.  I will wager a guess that the majority of the savings
> is
> > in
> > > avoiding what amounts to copy constructors between mutable and
> immutable
> > > types.  I further wager there are alternative approaches we could weigh
> > to
> > > achieve those performance improvements.  As an example - you note above
> > > that we could provide a patch to Apache Thrift.  Depending how much
> > > performance inspires our decision here, it will be prudent to evaluate
> > > alternatives.
> > >
> > > I think there are (at least) two major issues worth discussing - code
> > > volume (which you note) and an increase in logical complexity.  This
> will
> > > leave us with a bifurcation in code generation tooling (custom+swift
> for
> > > Java, Apache Thrift for python and js).  It's difficult to quantify the
> > > downside of that, but it seems like an unfortunate state with potential
> > for
> > > compatibility risks.
> > >
> > >
> > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org>
> wrote:
> > >
> > > > Some high level comments without looking at the code.
> > > >
> > > > I'm in favor from abandoning the thrift generated java code in favor
> of
> > > > immutable objects. I think it is easier to reason about and will
> ensure
> > > we
> > > > have less errors in our code. If I understand correctly, the ProtoBuf
> > > > format does this by default, so there some precedent for this style
> of
> > > code
> > > > generation already.
> > > >
> > > > I think using Facebook's swift is the best approach here. I would be
> > > > hesitant to accept any custom code generation that involved us
> parsing
> > > > thrift IDL files or thrift formats over the wire because I poses a
> very
> > > > high maintenance burden.
> > > >
> > > > I also think generating the MyBatis mutable classes is superior to
> our
> > > > current strategy of manually creating them.
> > > >
> > > > Finally, the performance improvements look fantastic. As an operator
> > of a
> > > > large cluster I am excited to see wholesale performance improvements
> > as I
> > > > am always concerned that my cluster is approaching the limits of what
> > > > Aurora can handle safely.
> > > >
> > > > Overall, I think this change merits a serious discussion from all
> > > > contributors.
> > > >
> > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org>
> > wrote:
> > > >
> > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <js...@apache.org>
> > > wrote:
> > > > >
> > > > > > Context: Aurora uses the official Apache Thrift compiler today
> > plus a
> > > > > > home-grown python code generator [1] for immutable "entity" (I*)
> > > > > wrappers.
> > > > > >
> > > > > > The proposal is to switch from using the Apache Thrift code
> > generator
> > > > to
> > > > > a
> > > > > > home grown generator.  The proposal comes with a concrete example
> > in
> > > > the
> > > > > > form of the actual RBs to effect this change:
> > > > > > 1. A custom java thrift code generator:
> > > > > > https://reviews.apache.org/r/42748/
> > > > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > > > https://reviews.apache.org/r/42749/
> > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > > > https://reviews.apache.org/r/42756/
> > > > > >
> > > > > > Since the RBs are large, I wanted to provide some extra context
> on
> > > the
> > > > > > idea at a higher level.  I provide rationale, pros and cons below
> > for
> > > > > those
> > > > > > interested in the idea but wary of diving in on code review until
> > the
> > > > > idea
> > > > > > itself passes a sniff test.
> > > > > > Thanks in advance for your feedback - and if we get there - for
> > your
> > > > > > review effort.
> > > > > >
> > > > >
> > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
> > since
> > > > > they've expressed direct interest.  Happy to add others, just speak
> > up
> > > or
> > > > > else just comment on the reviews as you see fit.
> > > > > I'll formally only submit if 1st this email thread reaches
> consensus,
> > > and
> > > > > second, reviews are approved.
> > > > >
> > > > > ==
> > > > > >
> > > > > > In the course of an initial run at creating a first-class
> REST-like
> > > > > > scheduler interface [2] I came to the conclusion generating the
> > json
> > > > API
> > > > > > from the thrift one might be a good path.  That idea has been
> > > scrapped
> > > > > with
> > > > > > community feedback, but an initial experiment in custom thrift
> > > code-gen
> > > > > for
> > > > > > java that accompanied that idea seemed worth pursuing for its own
> > > > > > independent benefits, chief among these being 1st class immutable
> > > > thrift
> > > > > > structs and the ability to leverage thrift annotations.
> > > > > >
> > > > > > Immutability:
> > > > > > The benefits of having an immutable by default data model are the
> > > > > standard
> > > > > > ones; namely, its trivial to reason about safety of concurrent
> > > > operations
> > > > > > on the data model, stability of collections containing data model
> > > > > entities
> > > > > > and it opens up straight-forward optimizations that are easy to
> > > reason
> > > > > > about.
> > > > > > An example optimization is caching hashCodes for the immutable
> > thrift
> > > > > > structs.  This was done after comparing jmh benchmarks run
> against
> > > > master
> > > > > > and then again against the proposal branch.  Perf was comparable
> -
> > > > within
> > > > > > 10% plus and minus depending on the benchmark, but with the
> > > > optimization
> > > > > > added many benchmarks showed pronounced improvement in the
> proposal
> > > > > branch
> > > > > > [3].  The optimization is clearly safe and was quick and easy to
> > > > > > implement.  Further optimizations can be experimented with in a
> > > > > > straightforward way.
> > > > > >
> > > > > > Thrift Annotations:
> > > > > > The thrift IDL grammar has supported these for quite some time,
> but
> > > > they
> > > > > > are not plumbed to the generated java code.  Uses are many and
> > > > varied.  I
> > > > > > initially had my eye on annotation of thrift services with REST
> > > verbs,
> > > > > > routes, etc - but immediately we can leverage these annotations
> to
> > > kill
> > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding
> code
> > > that
> > > > > > needs to be maintained.
> > > > > >
> > > > > > There are a few downsides to switching to our own java thrift
> code
> > > gen:
> > > > > > 1. We own more code to maintain:  Even though we have the custom
> > > python
> > > > > > "immutable" wrapper generator [1] today, this new generator -
> even
> > > with
> > > > > the
> > > > > > python generator removed - represents a 5-6x increase in line
> count
> > > of
> > > > > > custom code (~4.1k lines of code and tests in the new custom gen,
> > > ~700
> > > > > > lines in the existing python custom gen)
> > > > > > 2. We conceptually fork from a sibling Apache project.
> > > > > >
> > > > > > The fork could be mitigated by turning our real experience
> > iterating
> > > > the
> > > > > > custom code generator into a well-founded patch back into the
> > Apache
> > > > > Thrift
> > > > > > project, but saying we'll do that is easier than following
> through
> > > and
> > > > > > actually doing it.
> > > > > >
> > > > > > ==
> > > > > > Review guide / details:
> > > > > >
> > > > > > The technology stack:
> > > > > > The thrift IDL parsing and thrift wire parsing are both handled
> by
> > > the
> > > > > > Facebook swift project [4].  We only implement the middle bit
> that
> > > > > > generates java code stubs.  This gives higher confidence that the
> > > > tricky
> > > > > > bits out at either edge are done right.
> > > > > > The thrift struct code generation is done using Square's javapoet
> > [5]
> > > > in
> > > > > > favor of templating for the purpose of easier to read generator
> > code.
> > > > > This
> > > > > > characterization is debatable though and template are certainly
> > more
> > > > > > flexible the minute you need to gen a second language (say we
> like
> > > this
> > > > > and
> > > > > > want to do javascript codegen this way too for example).
> > > > > > The MyBatis codegen is forced by the thrift codegen for technical
> > > > > > reasons.  In short, there is no simple way to teach MyBatis to
> read
> > > and
> > > > > > write immutable objects with builders.  So the MyBatis code is
> > > > generated
> > > > > > via an annotation processor that runs after thrift code gen, but
> > > > reading
> > > > > > thrift annotations that survive that codegen process.
> > > > > > The codegen unit testing is done with the help of Google's
> > > > compile-tester
> > > > > > [6].  NB that this has an expected output comparison that checks
> > the
> > > > > > generated AST and not the text, so its fairly lenient.
> Whitepsace
> > > and
> > > > > > comments certainly don't matter.
> > > > > >
> > > > > > Review strategy:
> > > > > > The code generator RBs (#1 & #2 in the 3 part series) are
> probably
> > > > easier
> > > > > > to review looking at samples of the generated code.  Both the
> > thrift
> > > > > > codegen and MyBatis codegen samples are conveniently contained in
> > the
> > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).
> The
> > > > unit
> > > > > > test uses resource files that contain both the thrift codegen
> > inputs
> > > > the
> > > > > > annotation processor runs over and the annotation processor
> > expected
> > > > > > outputs  - the MyBatis peer classes.  So have a look there if you
> > > need
> > > > > > something concrete and don't want to patch the RBs in and
> actually
> > > run
> > > > > the
> > > > > > codegen (`./gradlew api:compileJava`).
> > > > > > The conversion RB (#3) is large but the changes are mainly
> > mechanical
> > > > > > conversions from the current mutable thrift + I* wrappers to pure
> > > > > immutable
> > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The main
> > > > changes
> > > > > > of note are to the portions of the codebase tightly tied to
> thrift
> > > as a
> > > > > > technology:
> > > > > > + Gson/thrift converters
> > > > > > + Shiro annotated auth param interception
> > > > > > + Thrift/Servlet binding
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > > > [3]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > > > [4] https://github.com/facebook/swift
> > > > > > [5] https://github.com/square/javapoet
> > > > > > [6] https://github.com/google/compile-testing
> > > > > >
> > > > >
> > > > > --
> > > > > Zameer Manji
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] Change java thrift code gen

Posted by Bill Farner <wf...@apache.org>.
Tony - this would not be a technical fork so much as a spiritual fork.
While we would have our own bugs to work out, the only upstream exposure
would be IDL or wire format changes.

On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <td...@twitter.com.invalid>
wrote:

> Awesome performance numbers! I don't particularly know the logistics of
> upstreaming a change like this, but optimistically I would suggest
> upstreaming it to Apache Thrift if possible.
>
> As someone that maintains a fork of a thrift compiler(fork of scrooge), I
> have to say that it's not that fun. There's a lot of custom code that needs
> to be maintained and a bunch of work to rebase the code periodically.
>
>
>
> On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org> wrote:
>
> > Firstly - thanks for the clean organization and delineation of steps in
> > this change.  Top notch work!
> >
> > Some of the performance improvements are very nice; and in a particularly
> > hot code path.  I will wager a guess that the majority of the savings is
> in
> > avoiding what amounts to copy constructors between mutable and immutable
> > types.  I further wager there are alternative approaches we could weigh
> to
> > achieve those performance improvements.  As an example - you note above
> > that we could provide a patch to Apache Thrift.  Depending how much
> > performance inspires our decision here, it will be prudent to evaluate
> > alternatives.
> >
> > I think there are (at least) two major issues worth discussing - code
> > volume (which you note) and an increase in logical complexity.  This will
> > leave us with a bifurcation in code generation tooling (custom+swift for
> > Java, Apache Thrift for python and js).  It's difficult to quantify the
> > downside of that, but it seems like an unfortunate state with potential
> for
> > compatibility risks.
> >
> >
> > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org> wrote:
> >
> > > Some high level comments without looking at the code.
> > >
> > > I'm in favor from abandoning the thrift generated java code in favor of
> > > immutable objects. I think it is easier to reason about and will ensure
> > we
> > > have less errors in our code. If I understand correctly, the ProtoBuf
> > > format does this by default, so there some precedent for this style of
> > code
> > > generation already.
> > >
> > > I think using Facebook's swift is the best approach here. I would be
> > > hesitant to accept any custom code generation that involved us parsing
> > > thrift IDL files or thrift formats over the wire because I poses a very
> > > high maintenance burden.
> > >
> > > I also think generating the MyBatis mutable classes is superior to our
> > > current strategy of manually creating them.
> > >
> > > Finally, the performance improvements look fantastic. As an operator
> of a
> > > large cluster I am excited to see wholesale performance improvements
> as I
> > > am always concerned that my cluster is approaching the limits of what
> > > Aurora can handle safely.
> > >
> > > Overall, I think this change merits a serious discussion from all
> > > contributors.
> > >
> > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org>
> wrote:
> > >
> > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <js...@apache.org>
> > wrote:
> > > >
> > > > > Context: Aurora uses the official Apache Thrift compiler today
> plus a
> > > > > home-grown python code generator [1] for immutable "entity" (I*)
> > > > wrappers.
> > > > >
> > > > > The proposal is to switch from using the Apache Thrift code
> generator
> > > to
> > > > a
> > > > > home grown generator.  The proposal comes with a concrete example
> in
> > > the
> > > > > form of the actual RBs to effect this change:
> > > > > 1. A custom java thrift code generator:
> > > > > https://reviews.apache.org/r/42748/
> > > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > > https://reviews.apache.org/r/42749/
> > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > > https://reviews.apache.org/r/42756/
> > > > >
> > > > > Since the RBs are large, I wanted to provide some extra context on
> > the
> > > > > idea at a higher level.  I provide rationale, pros and cons below
> for
> > > > those
> > > > > interested in the idea but wary of diving in on code review until
> the
> > > > idea
> > > > > itself passes a sniff test.
> > > > > Thanks in advance for your feedback - and if we get there - for
> your
> > > > > review effort.
> > > > >
> > > >
> > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
> since
> > > > they've expressed direct interest.  Happy to add others, just speak
> up
> > or
> > > > else just comment on the reviews as you see fit.
> > > > I'll formally only submit if 1st this email thread reaches consensus,
> > and
> > > > second, reviews are approved.
> > > >
> > > > ==
> > > > >
> > > > > In the course of an initial run at creating a first-class REST-like
> > > > > scheduler interface [2] I came to the conclusion generating the
> json
> > > API
> > > > > from the thrift one might be a good path.  That idea has been
> > scrapped
> > > > with
> > > > > community feedback, but an initial experiment in custom thrift
> > code-gen
> > > > for
> > > > > java that accompanied that idea seemed worth pursuing for its own
> > > > > independent benefits, chief among these being 1st class immutable
> > > thrift
> > > > > structs and the ability to leverage thrift annotations.
> > > > >
> > > > > Immutability:
> > > > > The benefits of having an immutable by default data model are the
> > > > standard
> > > > > ones; namely, its trivial to reason about safety of concurrent
> > > operations
> > > > > on the data model, stability of collections containing data model
> > > > entities
> > > > > and it opens up straight-forward optimizations that are easy to
> > reason
> > > > > about.
> > > > > An example optimization is caching hashCodes for the immutable
> thrift
> > > > > structs.  This was done after comparing jmh benchmarks run against
> > > master
> > > > > and then again against the proposal branch.  Perf was comparable -
> > > within
> > > > > 10% plus and minus depending on the benchmark, but with the
> > > optimization
> > > > > added many benchmarks showed pronounced improvement in the proposal
> > > > branch
> > > > > [3].  The optimization is clearly safe and was quick and easy to
> > > > > implement.  Further optimizations can be experimented with in a
> > > > > straightforward way.
> > > > >
> > > > > Thrift Annotations:
> > > > > The thrift IDL grammar has supported these for quite some time, but
> > > they
> > > > > are not plumbed to the generated java code.  Uses are many and
> > > varied.  I
> > > > > initially had my eye on annotation of thrift services with REST
> > verbs,
> > > > > routes, etc - but immediately we can leverage these annotations to
> > kill
> > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code
> > that
> > > > > needs to be maintained.
> > > > >
> > > > > There are a few downsides to switching to our own java thrift code
> > gen:
> > > > > 1. We own more code to maintain:  Even though we have the custom
> > python
> > > > > "immutable" wrapper generator [1] today, this new generator - even
> > with
> > > > the
> > > > > python generator removed - represents a 5-6x increase in line count
> > of
> > > > > custom code (~4.1k lines of code and tests in the new custom gen,
> > ~700
> > > > > lines in the existing python custom gen)
> > > > > 2. We conceptually fork from a sibling Apache project.
> > > > >
> > > > > The fork could be mitigated by turning our real experience
> iterating
> > > the
> > > > > custom code generator into a well-founded patch back into the
> Apache
> > > > Thrift
> > > > > project, but saying we'll do that is easier than following through
> > and
> > > > > actually doing it.
> > > > >
> > > > > ==
> > > > > Review guide / details:
> > > > >
> > > > > The technology stack:
> > > > > The thrift IDL parsing and thrift wire parsing are both handled by
> > the
> > > > > Facebook swift project [4].  We only implement the middle bit that
> > > > > generates java code stubs.  This gives higher confidence that the
> > > tricky
> > > > > bits out at either edge are done right.
> > > > > The thrift struct code generation is done using Square's javapoet
> [5]
> > > in
> > > > > favor of templating for the purpose of easier to read generator
> code.
> > > > This
> > > > > characterization is debatable though and template are certainly
> more
> > > > > flexible the minute you need to gen a second language (say we like
> > this
> > > > and
> > > > > want to do javascript codegen this way too for example).
> > > > > The MyBatis codegen is forced by the thrift codegen for technical
> > > > > reasons.  In short, there is no simple way to teach MyBatis to read
> > and
> > > > > write immutable objects with builders.  So the MyBatis code is
> > > generated
> > > > > via an annotation processor that runs after thrift code gen, but
> > > reading
> > > > > thrift annotations that survive that codegen process.
> > > > > The codegen unit testing is done with the help of Google's
> > > compile-tester
> > > > > [6].  NB that this has an expected output comparison that checks
> the
> > > > > generated AST and not the text, so its fairly lenient.  Whitepsace
> > and
> > > > > comments certainly don't matter.
> > > > >
> > > > > Review strategy:
> > > > > The code generator RBs (#1 & #2 in the 3 part series) are probably
> > > easier
> > > > > to review looking at samples of the generated code.  Both the
> thrift
> > > > > codegen and MyBatis codegen samples are conveniently contained in
> the
> > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The
> > > unit
> > > > > test uses resource files that contain both the thrift codegen
> inputs
> > > the
> > > > > annotation processor runs over and the annotation processor
> expected
> > > > > outputs  - the MyBatis peer classes.  So have a look there if you
> > need
> > > > > something concrete and don't want to patch the RBs in and actually
> > run
> > > > the
> > > > > codegen (`./gradlew api:compileJava`).
> > > > > The conversion RB (#3) is large but the changes are mainly
> mechanical
> > > > > conversions from the current mutable thrift + I* wrappers to pure
> > > > immutable
> > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The main
> > > changes
> > > > > of note are to the portions of the codebase tightly tied to thrift
> > as a
> > > > > technology:
> > > > > + Gson/thrift converters
> > > > > + Shiro annotated auth param interception
> > > > > + Thrift/Servlet binding
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > > [4] https://github.com/facebook/swift
> > > > > [5] https://github.com/square/javapoet
> > > > > [6] https://github.com/google/compile-testing
> > > > >
> > > >
> > > > --
> > > > Zameer Manji
> > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] Change java thrift code gen

Posted by Tony Dong <td...@twitter.com.INVALID>.
Awesome performance numbers! I don't particularly know the logistics of
upstreaming a change like this, but optimistically I would suggest
upstreaming it to Apache Thrift if possible.

As someone that maintains a fork of a thrift compiler(fork of scrooge), I
have to say that it's not that fun. There's a lot of custom code that needs
to be maintained and a bunch of work to rebase the code periodically.



On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wf...@apache.org> wrote:

> Firstly - thanks for the clean organization and delineation of steps in
> this change.  Top notch work!
>
> Some of the performance improvements are very nice; and in a particularly
> hot code path.  I will wager a guess that the majority of the savings is in
> avoiding what amounts to copy constructors between mutable and immutable
> types.  I further wager there are alternative approaches we could weigh to
> achieve those performance improvements.  As an example - you note above
> that we could provide a patch to Apache Thrift.  Depending how much
> performance inspires our decision here, it will be prudent to evaluate
> alternatives.
>
> I think there are (at least) two major issues worth discussing - code
> volume (which you note) and an increase in logical complexity.  This will
> leave us with a bifurcation in code generation tooling (custom+swift for
> Java, Apache Thrift for python and js).  It's difficult to quantify the
> downside of that, but it seems like an unfortunate state with potential for
> compatibility risks.
>
>
> On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org> wrote:
>
> > Some high level comments without looking at the code.
> >
> > I'm in favor from abandoning the thrift generated java code in favor of
> > immutable objects. I think it is easier to reason about and will ensure
> we
> > have less errors in our code. If I understand correctly, the ProtoBuf
> > format does this by default, so there some precedent for this style of
> code
> > generation already.
> >
> > I think using Facebook's swift is the best approach here. I would be
> > hesitant to accept any custom code generation that involved us parsing
> > thrift IDL files or thrift formats over the wire because I poses a very
> > high maintenance burden.
> >
> > I also think generating the MyBatis mutable classes is superior to our
> > current strategy of manually creating them.
> >
> > Finally, the performance improvements look fantastic. As an operator of a
> > large cluster I am excited to see wholesale performance improvements as I
> > am always concerned that my cluster is approaching the limits of what
> > Aurora can handle safely.
> >
> > Overall, I think this change merits a serious discussion from all
> > contributors.
> >
> > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org> wrote:
> >
> > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <js...@apache.org>
> wrote:
> > >
> > > > Context: Aurora uses the official Apache Thrift compiler today plus a
> > > > home-grown python code generator [1] for immutable "entity" (I*)
> > > wrappers.
> > > >
> > > > The proposal is to switch from using the Apache Thrift code generator
> > to
> > > a
> > > > home grown generator.  The proposal comes with a concrete example in
> > the
> > > > form of the actual RBs to effect this change:
> > > > 1. A custom java thrift code generator:
> > > > https://reviews.apache.org/r/42748/
> > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > https://reviews.apache.org/r/42749/
> > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > https://reviews.apache.org/r/42756/
> > > >
> > > > Since the RBs are large, I wanted to provide some extra context on
> the
> > > > idea at a higher level.  I provide rationale, pros and cons below for
> > > those
> > > > interested in the idea but wary of diving in on code review until the
> > > idea
> > > > itself passes a sniff test.
> > > > Thanks in advance for your feedback - and if we get there - for your
> > > > review effort.
> > > >
> > >
> > > I just added wfarner and zmanji as reviewers for the 3 RBs above since
> > > they've expressed direct interest.  Happy to add others, just speak up
> or
> > > else just comment on the reviews as you see fit.
> > > I'll formally only submit if 1st this email thread reaches consensus,
> and
> > > second, reviews are approved.
> > >
> > > ==
> > > >
> > > > In the course of an initial run at creating a first-class REST-like
> > > > scheduler interface [2] I came to the conclusion generating the json
> > API
> > > > from the thrift one might be a good path.  That idea has been
> scrapped
> > > with
> > > > community feedback, but an initial experiment in custom thrift
> code-gen
> > > for
> > > > java that accompanied that idea seemed worth pursuing for its own
> > > > independent benefits, chief among these being 1st class immutable
> > thrift
> > > > structs and the ability to leverage thrift annotations.
> > > >
> > > > Immutability:
> > > > The benefits of having an immutable by default data model are the
> > > standard
> > > > ones; namely, its trivial to reason about safety of concurrent
> > operations
> > > > on the data model, stability of collections containing data model
> > > entities
> > > > and it opens up straight-forward optimizations that are easy to
> reason
> > > > about.
> > > > An example optimization is caching hashCodes for the immutable thrift
> > > > structs.  This was done after comparing jmh benchmarks run against
> > master
> > > > and then again against the proposal branch.  Perf was comparable -
> > within
> > > > 10% plus and minus depending on the benchmark, but with the
> > optimization
> > > > added many benchmarks showed pronounced improvement in the proposal
> > > branch
> > > > [3].  The optimization is clearly safe and was quick and easy to
> > > > implement.  Further optimizations can be experimented with in a
> > > > straightforward way.
> > > >
> > > > Thrift Annotations:
> > > > The thrift IDL grammar has supported these for quite some time, but
> > they
> > > > are not plumbed to the generated java code.  Uses are many and
> > varied.  I
> > > > initially had my eye on annotation of thrift services with REST
> verbs,
> > > > routes, etc - but immediately we can leverage these annotations to
> kill
> > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code
> that
> > > > needs to be maintained.
> > > >
> > > > There are a few downsides to switching to our own java thrift code
> gen:
> > > > 1. We own more code to maintain:  Even though we have the custom
> python
> > > > "immutable" wrapper generator [1] today, this new generator - even
> with
> > > the
> > > > python generator removed - represents a 5-6x increase in line count
> of
> > > > custom code (~4.1k lines of code and tests in the new custom gen,
> ~700
> > > > lines in the existing python custom gen)
> > > > 2. We conceptually fork from a sibling Apache project.
> > > >
> > > > The fork could be mitigated by turning our real experience iterating
> > the
> > > > custom code generator into a well-founded patch back into the Apache
> > > Thrift
> > > > project, but saying we'll do that is easier than following through
> and
> > > > actually doing it.
> > > >
> > > > ==
> > > > Review guide / details:
> > > >
> > > > The technology stack:
> > > > The thrift IDL parsing and thrift wire parsing are both handled by
> the
> > > > Facebook swift project [4].  We only implement the middle bit that
> > > > generates java code stubs.  This gives higher confidence that the
> > tricky
> > > > bits out at either edge are done right.
> > > > The thrift struct code generation is done using Square's javapoet [5]
> > in
> > > > favor of templating for the purpose of easier to read generator code.
> > > This
> > > > characterization is debatable though and template are certainly more
> > > > flexible the minute you need to gen a second language (say we like
> this
> > > and
> > > > want to do javascript codegen this way too for example).
> > > > The MyBatis codegen is forced by the thrift codegen for technical
> > > > reasons.  In short, there is no simple way to teach MyBatis to read
> and
> > > > write immutable objects with builders.  So the MyBatis code is
> > generated
> > > > via an annotation processor that runs after thrift code gen, but
> > reading
> > > > thrift annotations that survive that codegen process.
> > > > The codegen unit testing is done with the help of Google's
> > compile-tester
> > > > [6].  NB that this has an expected output comparison that checks the
> > > > generated AST and not the text, so its fairly lenient.  Whitepsace
> and
> > > > comments certainly don't matter.
> > > >
> > > > Review strategy:
> > > > The code generator RBs (#1 & #2 in the 3 part series) are probably
> > easier
> > > > to review looking at samples of the generated code.  Both the thrift
> > > > codegen and MyBatis codegen samples are conveniently contained in the
> > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The
> > unit
> > > > test uses resource files that contain both the thrift codegen inputs
> > the
> > > > annotation processor runs over and the annotation processor expected
> > > > outputs  - the MyBatis peer classes.  So have a look there if you
> need
> > > > something concrete and don't want to patch the RBs in and actually
> run
> > > the
> > > > codegen (`./gradlew api:compileJava`).
> > > > The conversion RB (#3) is large but the changes are mainly mechanical
> > > > conversions from the current mutable thrift + I* wrappers to pure
> > > immutable
> > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The main
> > changes
> > > > of note are to the portions of the codebase tightly tied to thrift
> as a
> > > > technology:
> > > > + Gson/thrift converters
> > > > + Shiro annotated auth param interception
> > > > + Thrift/Servlet binding
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > [3]
> > > >
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > [4] https://github.com/facebook/swift
> > > > [5] https://github.com/square/javapoet
> > > > [6] https://github.com/google/compile-testing
> > > >
> > >
> > > --
> > > Zameer Manji
> > >
> > >
> >
>

Re: [PROPOSAL] Change java thrift code gen

Posted by Bill Farner <wf...@apache.org>.
Firstly - thanks for the clean organization and delineation of steps in
this change.  Top notch work!

Some of the performance improvements are very nice; and in a particularly
hot code path.  I will wager a guess that the majority of the savings is in
avoiding what amounts to copy constructors between mutable and immutable
types.  I further wager there are alternative approaches we could weigh to
achieve those performance improvements.  As an example - you note above
that we could provide a patch to Apache Thrift.  Depending how much
performance inspires our decision here, it will be prudent to evaluate
alternatives.

I think there are (at least) two major issues worth discussing - code
volume (which you note) and an increase in logical complexity.  This will
leave us with a bifurcation in code generation tooling (custom+swift for
Java, Apache Thrift for python and js).  It's difficult to quantify the
downside of that, but it seems like an unfortunate state with potential for
compatibility risks.


On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zm...@apache.org> wrote:

> Some high level comments without looking at the code.
>
> I'm in favor from abandoning the thrift generated java code in favor of
> immutable objects. I think it is easier to reason about and will ensure we
> have less errors in our code. If I understand correctly, the ProtoBuf
> format does this by default, so there some precedent for this style of code
> generation already.
>
> I think using Facebook's swift is the best approach here. I would be
> hesitant to accept any custom code generation that involved us parsing
> thrift IDL files or thrift formats over the wire because I poses a very
> high maintenance burden.
>
> I also think generating the MyBatis mutable classes is superior to our
> current strategy of manually creating them.
>
> Finally, the performance improvements look fantastic. As an operator of a
> large cluster I am excited to see wholesale performance improvements as I
> am always concerned that my cluster is approaching the limits of what
> Aurora can handle safely.
>
> Overall, I think this change merits a serious discussion from all
> contributors.
>
> On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org> wrote:
>
> > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <js...@apache.org> wrote:
> >
> > > Context: Aurora uses the official Apache Thrift compiler today plus a
> > > home-grown python code generator [1] for immutable "entity" (I*)
> > wrappers.
> > >
> > > The proposal is to switch from using the Apache Thrift code generator
> to
> > a
> > > home grown generator.  The proposal comes with a concrete example in
> the
> > > form of the actual RBs to effect this change:
> > > 1. A custom java thrift code generator:
> > > https://reviews.apache.org/r/42748/
> > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > https://reviews.apache.org/r/42749/
> > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > https://reviews.apache.org/r/42756/
> > >
> > > Since the RBs are large, I wanted to provide some extra context on the
> > > idea at a higher level.  I provide rationale, pros and cons below for
> > those
> > > interested in the idea but wary of diving in on code review until the
> > idea
> > > itself passes a sniff test.
> > > Thanks in advance for your feedback - and if we get there - for your
> > > review effort.
> > >
> >
> > I just added wfarner and zmanji as reviewers for the 3 RBs above since
> > they've expressed direct interest.  Happy to add others, just speak up or
> > else just comment on the reviews as you see fit.
> > I'll formally only submit if 1st this email thread reaches consensus, and
> > second, reviews are approved.
> >
> > ==
> > >
> > > In the course of an initial run at creating a first-class REST-like
> > > scheduler interface [2] I came to the conclusion generating the json
> API
> > > from the thrift one might be a good path.  That idea has been scrapped
> > with
> > > community feedback, but an initial experiment in custom thrift code-gen
> > for
> > > java that accompanied that idea seemed worth pursuing for its own
> > > independent benefits, chief among these being 1st class immutable
> thrift
> > > structs and the ability to leverage thrift annotations.
> > >
> > > Immutability:
> > > The benefits of having an immutable by default data model are the
> > standard
> > > ones; namely, its trivial to reason about safety of concurrent
> operations
> > > on the data model, stability of collections containing data model
> > entities
> > > and it opens up straight-forward optimizations that are easy to reason
> > > about.
> > > An example optimization is caching hashCodes for the immutable thrift
> > > structs.  This was done after comparing jmh benchmarks run against
> master
> > > and then again against the proposal branch.  Perf was comparable -
> within
> > > 10% plus and minus depending on the benchmark, but with the
> optimization
> > > added many benchmarks showed pronounced improvement in the proposal
> > branch
> > > [3].  The optimization is clearly safe and was quick and easy to
> > > implement.  Further optimizations can be experimented with in a
> > > straightforward way.
> > >
> > > Thrift Annotations:
> > > The thrift IDL grammar has supported these for quite some time, but
> they
> > > are not plumbed to the generated java code.  Uses are many and
> varied.  I
> > > initially had my eye on annotation of thrift services with REST verbs,
> > > routes, etc - but immediately we can leverage these annotations to kill
> > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
> > > needs to be maintained.
> > >
> > > There are a few downsides to switching to our own java thrift code gen:
> > > 1. We own more code to maintain:  Even though we have the custom python
> > > "immutable" wrapper generator [1] today, this new generator - even with
> > the
> > > python generator removed - represents a 5-6x increase in line count of
> > > custom code (~4.1k lines of code and tests in the new custom gen, ~700
> > > lines in the existing python custom gen)
> > > 2. We conceptually fork from a sibling Apache project.
> > >
> > > The fork could be mitigated by turning our real experience iterating
> the
> > > custom code generator into a well-founded patch back into the Apache
> > Thrift
> > > project, but saying we'll do that is easier than following through and
> > > actually doing it.
> > >
> > > ==
> > > Review guide / details:
> > >
> > > The technology stack:
> > > The thrift IDL parsing and thrift wire parsing are both handled by the
> > > Facebook swift project [4].  We only implement the middle bit that
> > > generates java code stubs.  This gives higher confidence that the
> tricky
> > > bits out at either edge are done right.
> > > The thrift struct code generation is done using Square's javapoet [5]
> in
> > > favor of templating for the purpose of easier to read generator code.
> > This
> > > characterization is debatable though and template are certainly more
> > > flexible the minute you need to gen a second language (say we like this
> > and
> > > want to do javascript codegen this way too for example).
> > > The MyBatis codegen is forced by the thrift codegen for technical
> > > reasons.  In short, there is no simple way to teach MyBatis to read and
> > > write immutable objects with builders.  So the MyBatis code is
> generated
> > > via an annotation processor that runs after thrift code gen, but
> reading
> > > thrift annotations that survive that codegen process.
> > > The codegen unit testing is done with the help of Google's
> compile-tester
> > > [6].  NB that this has an expected output comparison that checks the
> > > generated AST and not the text, so its fairly lenient.  Whitepsace and
> > > comments certainly don't matter.
> > >
> > > Review strategy:
> > > The code generator RBs (#1 & #2 in the 3 part series) are probably
> easier
> > > to review looking at samples of the generated code.  Both the thrift
> > > codegen and MyBatis codegen samples are conveniently contained in the
> > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The
> unit
> > > test uses resource files that contain both the thrift codegen inputs
> the
> > > annotation processor runs over and the annotation processor expected
> > > outputs  - the MyBatis peer classes.  So have a look there if you need
> > > something concrete and don't want to patch the RBs in and actually run
> > the
> > > codegen (`./gradlew api:compileJava`).
> > > The conversion RB (#3) is large but the changes are mainly mechanical
> > > conversions from the current mutable thrift + I* wrappers to pure
> > immutable
> > > thrift mutated via `.toBuilder` and `.with`'er methods.  The main
> changes
> > > of note are to the portions of the codebase tightly tied to thrift as a
> > > technology:
> > > + Gson/thrift converters
> > > + Shiro annotated auth param interception
> > > + Thrift/Servlet binding
> > >
> > > [1]
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > [3]
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > [4] https://github.com/facebook/swift
> > > [5] https://github.com/square/javapoet
> > > [6] https://github.com/google/compile-testing
> > >
> >
> > --
> > Zameer Manji
> >
> >
>

Re: [PROPOSAL] Change java thrift code gen

Posted by Zameer Manji <zm...@apache.org>.
Some high level comments without looking at the code.

I'm in favor from abandoning the thrift generated java code in favor of
immutable objects. I think it is easier to reason about and will ensure we
have less errors in our code. If I understand correctly, the ProtoBuf
format does this by default, so there some precedent for this style of code
generation already.

I think using Facebook's swift is the best approach here. I would be
hesitant to accept any custom code generation that involved us parsing
thrift IDL files or thrift formats over the wire because I poses a very
high maintenance burden.

I also think generating the MyBatis mutable classes is superior to our
current strategy of manually creating them.

Finally, the performance improvements look fantastic. As an operator of a
large cluster I am excited to see wholesale performance improvements as I
am always concerned that my cluster is approaching the limits of what
Aurora can handle safely.

Overall, I think this change merits a serious discussion from all
contributors.

On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <js...@apache.org> wrote:

> On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <js...@apache.org> wrote:
>
> > Context: Aurora uses the official Apache Thrift compiler today plus a
> > home-grown python code generator [1] for immutable "entity" (I*)
> wrappers.
> >
> > The proposal is to switch from using the Apache Thrift code generator to
> a
> > home grown generator.  The proposal comes with a concrete example in the
> > form of the actual RBs to effect this change:
> > 1. A custom java thrift code generator:
> > https://reviews.apache.org/r/42748/
> > 2. A custom MyBatis binding code generator powered by 1 above:
> > https://reviews.apache.org/r/42749/
> > 3. Integration of 1 & 2 above into the Aurora codebase:
> > https://reviews.apache.org/r/42756/
> >
> > Since the RBs are large, I wanted to provide some extra context on the
> > idea at a higher level.  I provide rationale, pros and cons below for
> those
> > interested in the idea but wary of diving in on code review until the
> idea
> > itself passes a sniff test.
> > Thanks in advance for your feedback - and if we get there - for your
> > review effort.
> >
>
> I just added wfarner and zmanji as reviewers for the 3 RBs above since
> they've expressed direct interest.  Happy to add others, just speak up or
> else just comment on the reviews as you see fit.
> I'll formally only submit if 1st this email thread reaches consensus, and
> second, reviews are approved.
>
> ==
> >
> > In the course of an initial run at creating a first-class REST-like
> > scheduler interface [2] I came to the conclusion generating the json API
> > from the thrift one might be a good path.  That idea has been scrapped
> with
> > community feedback, but an initial experiment in custom thrift code-gen
> for
> > java that accompanied that idea seemed worth pursuing for its own
> > independent benefits, chief among these being 1st class immutable thrift
> > structs and the ability to leverage thrift annotations.
> >
> > Immutability:
> > The benefits of having an immutable by default data model are the
> standard
> > ones; namely, its trivial to reason about safety of concurrent operations
> > on the data model, stability of collections containing data model
> entities
> > and it opens up straight-forward optimizations that are easy to reason
> > about.
> > An example optimization is caching hashCodes for the immutable thrift
> > structs.  This was done after comparing jmh benchmarks run against master
> > and then again against the proposal branch.  Perf was comparable - within
> > 10% plus and minus depending on the benchmark, but with the optimization
> > added many benchmarks showed pronounced improvement in the proposal
> branch
> > [3].  The optimization is clearly safe and was quick and easy to
> > implement.  Further optimizations can be experimented with in a
> > straightforward way.
> >
> > Thrift Annotations:
> > The thrift IDL grammar has supported these for quite some time, but they
> > are not plumbed to the generated java code.  Uses are many and varied.  I
> > initially had my eye on annotation of thrift services with REST verbs,
> > routes, etc - but immediately we can leverage these annotations to kill
> > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
> > needs to be maintained.
> >
> > There are a few downsides to switching to our own java thrift code gen:
> > 1. We own more code to maintain:  Even though we have the custom python
> > "immutable" wrapper generator [1] today, this new generator - even with
> the
> > python generator removed - represents a 5-6x increase in line count of
> > custom code (~4.1k lines of code and tests in the new custom gen, ~700
> > lines in the existing python custom gen)
> > 2. We conceptually fork from a sibling Apache project.
> >
> > The fork could be mitigated by turning our real experience iterating the
> > custom code generator into a well-founded patch back into the Apache
> Thrift
> > project, but saying we'll do that is easier than following through and
> > actually doing it.
> >
> > ==
> > Review guide / details:
> >
> > The technology stack:
> > The thrift IDL parsing and thrift wire parsing are both handled by the
> > Facebook swift project [4].  We only implement the middle bit that
> > generates java code stubs.  This gives higher confidence that the tricky
> > bits out at either edge are done right.
> > The thrift struct code generation is done using Square's javapoet [5] in
> > favor of templating for the purpose of easier to read generator code.
> This
> > characterization is debatable though and template are certainly more
> > flexible the minute you need to gen a second language (say we like this
> and
> > want to do javascript codegen this way too for example).
> > The MyBatis codegen is forced by the thrift codegen for technical
> > reasons.  In short, there is no simple way to teach MyBatis to read and
> > write immutable objects with builders.  So the MyBatis code is generated
> > via an annotation processor that runs after thrift code gen, but reading
> > thrift annotations that survive that codegen process.
> > The codegen unit testing is done with the help of Google's compile-tester
> > [6].  NB that this has an expected output comparison that checks the
> > generated AST and not the text, so its fairly lenient.  Whitepsace and
> > comments certainly don't matter.
> >
> > Review strategy:
> > The code generator RBs (#1 & #2 in the 3 part series) are probably easier
> > to review looking at samples of the generated code.  Both the thrift
> > codegen and MyBatis codegen samples are conveniently contained in the
> > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The unit
> > test uses resource files that contain both the thrift codegen inputs the
> > annotation processor runs over and the annotation processor expected
> > outputs  - the MyBatis peer classes.  So have a look there if you need
> > something concrete and don't want to patch the RBs in and actually run
> the
> > codegen (`./gradlew api:compileJava`).
> > The conversion RB (#3) is large but the changes are mainly mechanical
> > conversions from the current mutable thrift + I* wrappers to pure
> immutable
> > thrift mutated via `.toBuilder` and `.with`'er methods.  The main changes
> > of note are to the portions of the codebase tightly tied to thrift as a
> > technology:
> > + Gson/thrift converters
> > + Shiro annotated auth param interception
> > + Thrift/Servlet binding
> >
> > [1]
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > [2] https://issues.apache.org/jira/browse/AURORA-987
> > [3]
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > [4] https://github.com/facebook/swift
> > [5] https://github.com/square/javapoet
> > [6] https://github.com/google/compile-testing
> >
>
> --
> Zameer Manji
>
>

Re: [PROPOSAL] Change java thrift code gen

Posted by John Sirois <js...@apache.org>.
On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <js...@apache.org> wrote:

> Context: Aurora uses the official Apache Thrift compiler today plus a
> home-grown python code generator [1] for immutable "entity" (I*) wrappers.
>
> The proposal is to switch from using the Apache Thrift code generator to a
> home grown generator.  The proposal comes with a concrete example in the
> form of the actual RBs to effect this change:
> 1. A custom java thrift code generator:
> https://reviews.apache.org/r/42748/
> 2. A custom MyBatis binding code generator powered by 1 above:
> https://reviews.apache.org/r/42749/
> 3. Integration of 1 & 2 above into the Aurora codebase:
> https://reviews.apache.org/r/42756/
>
> Since the RBs are large, I wanted to provide some extra context on the
> idea at a higher level.  I provide rationale, pros and cons below for those
> interested in the idea but wary of diving in on code review until the idea
> itself passes a sniff test.
> Thanks in advance for your feedback - and if we get there - for your
> review effort.
>

I just added wfarner and zmanji as reviewers for the 3 RBs above since
they've expressed direct interest.  Happy to add others, just speak up or
else just comment on the reviews as you see fit.
I'll formally only submit if 1st this email thread reaches consensus, and
second, reviews are approved.

==
>
> In the course of an initial run at creating a first-class REST-like
> scheduler interface [2] I came to the conclusion generating the json API
> from the thrift one might be a good path.  That idea has been scrapped with
> community feedback, but an initial experiment in custom thrift code-gen for
> java that accompanied that idea seemed worth pursuing for its own
> independent benefits, chief among these being 1st class immutable thrift
> structs and the ability to leverage thrift annotations.
>
> Immutability:
> The benefits of having an immutable by default data model are the standard
> ones; namely, its trivial to reason about safety of concurrent operations
> on the data model, stability of collections containing data model entities
> and it opens up straight-forward optimizations that are easy to reason
> about.
> An example optimization is caching hashCodes for the immutable thrift
> structs.  This was done after comparing jmh benchmarks run against master
> and then again against the proposal branch.  Perf was comparable - within
> 10% plus and minus depending on the benchmark, but with the optimization
> added many benchmarks showed pronounced improvement in the proposal branch
> [3].  The optimization is clearly safe and was quick and easy to
> implement.  Further optimizations can be experimented with in a
> straightforward way.
>
> Thrift Annotations:
> The thrift IDL grammar has supported these for quite some time, but they
> are not plumbed to the generated java code.  Uses are many and varied.  I
> initially had my eye on annotation of thrift services with REST verbs,
> routes, etc - but immediately we can leverage these annotations to kill
> AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
> needs to be maintained.
>
> There are a few downsides to switching to our own java thrift code gen:
> 1. We own more code to maintain:  Even though we have the custom python
> "immutable" wrapper generator [1] today, this new generator - even with the
> python generator removed - represents a 5-6x increase in line count of
> custom code (~4.1k lines of code and tests in the new custom gen, ~700
> lines in the existing python custom gen)
> 2. We conceptually fork from a sibling Apache project.
>
> The fork could be mitigated by turning our real experience iterating the
> custom code generator into a well-founded patch back into the Apache Thrift
> project, but saying we'll do that is easier than following through and
> actually doing it.
>
> ==
> Review guide / details:
>
> The technology stack:
> The thrift IDL parsing and thrift wire parsing are both handled by the
> Facebook swift project [4].  We only implement the middle bit that
> generates java code stubs.  This gives higher confidence that the tricky
> bits out at either edge are done right.
> The thrift struct code generation is done using Square's javapoet [5] in
> favor of templating for the purpose of easier to read generator code.  This
> characterization is debatable though and template are certainly more
> flexible the minute you need to gen a second language (say we like this and
> want to do javascript codegen this way too for example).
> The MyBatis codegen is forced by the thrift codegen for technical
> reasons.  In short, there is no simple way to teach MyBatis to read and
> write immutable objects with builders.  So the MyBatis code is generated
> via an annotation processor that runs after thrift code gen, but reading
> thrift annotations that survive that codegen process.
> The codegen unit testing is done with the help of Google's compile-tester
> [6].  NB that this has an expected output comparison that checks the
> generated AST and not the text, so its fairly lenient.  Whitepsace and
> comments certainly don't matter.
>
> Review strategy:
> The code generator RBs (#1 & #2 in the 3 part series) are probably easier
> to review looking at samples of the generated code.  Both the thrift
> codegen and MyBatis codegen samples are conveniently contained in the
> MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The unit
> test uses resource files that contain both the thrift codegen inputs the
> annotation processor runs over and the annotation processor expected
> outputs  - the MyBatis peer classes.  So have a look there if you need
> something concrete and don't want to patch the RBs in and actually run the
> codegen (`./gradlew api:compileJava`).
> The conversion RB (#3) is large but the changes are mainly mechanical
> conversions from the current mutable thrift + I* wrappers to pure immutable
> thrift mutated via `.toBuilder` and `.with`'er methods.  The main changes
> of note are to the portions of the codebase tightly tied to thrift as a
> technology:
> + Gson/thrift converters
> + Shiro annotated auth param interception
> + Thrift/Servlet binding
>
> [1]
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> [2] https://issues.apache.org/jira/browse/AURORA-987
> [3]
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> [4] https://github.com/facebook/swift
> [5] https://github.com/square/javapoet
> [6] https://github.com/google/compile-testing
>