You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Pierre Devops <pi...@gmail.com> on 2015/04/02 10:31:08 UTC

Re: [discuss] Modernization of Cassandra build system

Hi all,

Not a cassandra contributor here, but I'm working on the cassandra sources
too.

This big cassandra source root caused me trouble too, firstly it was not
easy to import in an IDE, try to import cassandra sources in netbeans, it's
a headcache.

It would be great if we had more small modules/projects in separate POM. It
will be more easier to work on small part of the project, and as a
consequences, I'm sure you will have more external contribution to this
project.

I know cassandra devs are used to ant build model, but it's like a thread I
opened about updated and more complete documentation about sstable
structures. I got answer that it was not needed to understand how to use
Cassandra, and the only way to learn about that is to rtfcode. Because
people working on cassandra already know how sstable structure are, it's
not needed to provide up to date documentation.
So it will take me a very long time to read and understand all the
serialization code in cassandra to understand the sttable structure before
I can work on the code. Up to date documentation about internals would have
gave me the knowledge I need to contribute much quicker.

Here we have the same problem, we have a complex non modular build system,
and core cassandra dev are used to it, so it's not needed to make something
more flexible, even if it could facilite external contribution.



2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
belliottsmith@datastax.com>:

> I think the problem is everyone currently contributing is comfortable with
> ant, and as much as it is imperfect, it isn't clear maven is going to be
> better. Having the requisite maven functionality linked under the hood
> doesn't seem particularly preferable to the inverse. The status quo has the
> bonus of zero upheaval for the project and its contributors, though, so it
> would have to be a very clear win to justify the change in my opinion.
>
>
> On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki <lu...@code-house.org>
> wrote:
>
> > Hey Tyler,
> > Thank you very much for coming back. I already lost faith that I will get
> > reply. :-) I am fine with code relocations. Moving constants into one
> place
> > where they cause no circular dependencies is cool, I’m all for doing such
> > thing.
> >
> > Currently Cassandra uses ant for doing some of maven functionalities
> (such
> > deploying POM.xml into repositories with dependency information), it uses
> > also maven type of artifact repositories. This can be easily flipped.
> Maven
> > can call ant tasks for these parts which can not be made with existing
> > maven plugins. Here is simplest example:
> > http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
> > http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you can see
> > ant task definition embedded in maven pom.xml.
> >
> > Most of things can be made at this moment via maven plugins:
> > apache-rat-plugin:
> > http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
> <
> > http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11>
> > maven-thrift-plugin:
> >
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> > <
> >
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> > >
> > antlr4-maven-plugin:
> > http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 <
> > http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5> or
> > antlr3-maven-plugin:
> > http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2 <
> > http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
> > maven-gpg-plugin:
> >
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> > <
> >
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> > >
> > maven-cobertura-plugin: http://mojo.codehaus.org/cobertura-maven-plugin/
> <
> > http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days jacoco
> > with java agent instrumentation perfoms better)
> > .. and so on
> >
> > I already made some evaluation of impact and it is big. Code has to be
> > separated into different source roots. It’s not easy even for keeping
> > current artifact structure: cassandra-all, cassandra-thrift and
> clientutil
> > (cause of cyclic dependencies). What I can do is prepare of these src
> roots
> > with dependencies which are declared for them and push that to my
> cassandra
> > fork so you will be able to verify that and continue with relocations if
> > you will like new build. Creating new modules (source roots) with maven
> is
> > simple so you could possibly extract more than these 3 predefined
> > artifacts/package roots.
> > Just let me know if you are interested.
> >
> > Kind regards,
> > Lukasz
> >
> >
> > > Wiadomość napisana przez Tyler Hobbs <ty...@datastax.com> w dniu 31
> mar
> > 2015, o godz. 21:57:
> > >
> > > Hi Łukasz,
> > >
> > > I'm not very familiar with the build system, but I'll try to respond.
> > >
> > > The Serializer dependencies on org.apache.cassandra.transport are
> almost
> > > certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.  These
> are
> > > constants that represent the native protocol version in use, which
> > affects
> > > how certain types are serialized.  These constants could easily be
> moved.
> > >
> > > The o.a.c.marshal dependency in MapSerializer is on AbstractType, but
> > could
> > > easily be replaced with java.util.Comparator.
> > >
> > > In any case, I'm not necessarily opposed to improving the build system
> to
> > > make these errors more apparent.  Would your proposal still allow us to
> > > build with ant (and just change the way those artifacts are built)?
> > >
> > > On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki <luke@code-house.org
> > <ma...@code-house.org>> wrote:
> > >
> > >> Dear cassandra commiters and development process followers,
> > >> I would like to bring an important topic off build process of
> > cassandra. I
> > >> am an external user from community point of view, however I been
> walking
> > >> around various  projects close to cassandra over past year or even
> more.
> > >> What is worrying me a lot is how cassandra is publishing artifacts and
> > how
> > >> many problems are reported due that.
> > >>
> > >> First of all - I want to note that I am not born enemy of Ant itself.
> I
> > >> never used it. I am also aware of problems with custom builds made
> with
> > >> Maven, however I don’t really want to discuss any particular
> > replacement,
> > >> yet I want to note that Cassandra JIRA project contains about 116
> issues
> > >> related somehow to maven (http://bit.ly/1GRoXl5 <
> http://bit.ly/1GRoXl5>
> > <http://bit.ly/1GRoXl5 <http://bit.ly/1GRoXl5>>,
> > >> project=CASSANDRA, text ~ maven). Depends on the point of view it
> might
> > be
> > >> a lot or a little. By simple statistics it is around 21 issues a year
> or
> > >> almost 2 issues a month, many of them breaking maintanance/major
> > releases
> > >> from user point of view. From other hand it’s not bad considering how
> > >> project is being built.
> > >>
> > >> Current structure has a very big disadvantage - ONE source root for
> > >> multiple artifacts published in maven repositories and copying classes
> > to
> > >> jar AFTER they are compiled. Obviously ant copy task doesn’t follow
> > import
> > >> statements and does not include dependant classes. For example just by
> > >> making test relocations and extraction of clientutil jar on master
> > branch
> > >> into separate source root I have found a bug where ListSerializer
> > depends
> > >> on org.apache.cassandra.transpor package. More over clientutil
> > >> (MapSerializer) does depends on org.apache.cassandra.db.marshal
> package
> > >> leading to the fact that it can not be used without cassandra-all
> > present
> > >> at classpath.
> > >> Luckily for cassandra CQL as a new interface reduces thrift and
> > clientutil
> > >> usage reducing amount of issues reported around these, however this
> just
> > >> hides a real problem in previous paragraph. I have found a handy tool
> > and
> > >> made a graph of circular dependencies in cassandra-all.jar. Graph of
> > >> results can found here: http://grab.by/FRnO <http://grab.by/FRnO> <
> > http://grab.by/FRnO <http://grab.by/FRnO>>. As you
> > >> can see this graph has multiple levels and solving it is not a simple
> > task.
> > >> I am afraid a current way of building and packaging cassandra can
> create
> > >> huge hiccups when it will come to code rafactorings cause entire
> > cassandra
> > >> will become a house of cards.
> > >> Restructuring project into smaller pieces is also beneficiary for
> > >> community since solving bugs in smaller units is definitelly easier.
> > >>
> > >> At the end of this mail I would like to propose moving Cassandra build
> > >> system forward, regardless of tool which will be choosen for it.
> > Personally
> > >> I can volunteer in maven related changes to extract cassandra-thrift,
> > >> cassandra-clientutil and cassandra-all to make regular maven build. It
> > >> might be seen as a switch from one big XML into couple smaller. :-)
> All
> > >> this depends on Cassandra developers decission to devide source roots
> or
> > >> not.
> > >>
> > >> Kind regards,
> > >> Łukasz Dywicki
> > >> —
> > >> luke@code-house.org
> > >> Twitter: ldywicki
> > >> Blog: http://dywicki.pl
> > >> Code-House - http://code-house.org
> > >>
> > >>
> > >
> > >
> > > --
> > > Tyler Hobbs
> > > DataStax <http://datastax.com/ <http://datastax.com/>>
> >
> >
>

Re: [discuss] Modernization of Cassandra build system

Posted by Benedict Elliott Smith <be...@datastax.com>.
>
> every second minor release was fixing maven artifacts OR every second
> release was broken due the maven artifacts


Well, it's also possible just one release had 116 build artefact problems?
Obviously that's the absurd extreme end, but the reason I was asking if you
had any idea, since you'd done the counting.

 I don’t feel myself responsible for doing any advocating for Maven itself.
> It’s up to you what you choose.


This is a community process, and I'm trying (and apparently failing) to
help you understand at least how *I* understand it to work, and the
problems I see with what you're proposing. The silence on the list suggests
there is significant inertia and no other strong advocates for this change.
This could be for myriad reasons, from people simply not caring, to
thinking there are roughly equal pros and cons, to also just hoping the
conversation will go away because they're against it. Without advocacy, the
inertia is not overcome, and since you're the only person so far to express
a desire for this change, it is unfortunately up to you to convince us. I,
and I'm sure the rest of the community, are very appreciative of the offer
of your time. We really are. Unfortunately that isn't enough to warrant
utilising it, but we *are* open to discussion and advocacy on the topic.

The crux of the problem is that Cassandra has a lot of important work being
done to it, work that I personally perceive (and suspect others do also) as
more important than the admitted inadequacy of our modularisation and,
perhaps, our build system (I plead ignorance here). This work is currently
surpassing the labour we have to address it. If this upheaval hinders that
work, that is bad, and that is what I mean when I say "warrants" - is the
upheaval small enough, or the yield really great (modularisation doesn't
always pan out, so we may not even get a good result, but still have the
significant pain)?

I don't want to give you the impression I am either a gatekeeper or
shooting down your proposal. I'm just attempting to explain my perception
of the view of the existing contributors.


On Mon, Apr 13, 2015 at 9:31 PM, Łukasz Dywicki <lu...@code-house.org> wrote:

> Hey Benedict,
> My replies in line
>
>
> >> According to some recordings from DataStax there is a plan to support in
> >> Cassandra multiple kinds of store - document, graph so it won’t get
> easier
> >> with the time but rather harder - ask yourself do you really want to
> mess
> >> all these things together?
> > Well, these certainly won't live in the same repository, so I wouldn't
> > worry about that
> That’s good. That’s very good cause it will force separation. If you will
> do that please consider using other build system to don’t repeat mistakes
> which are present now in main Cassandra build.
>
> >> As I briefly counted in my ealier mail there was 116 issues related to
> >> artifacts published by build process.
> > That does sound like a lot of bugs. How many actual maintenance releases
> > were necessary, did you happen to also count? This is something that
> could
> > be raised at the new retrospective that Ariel has begun, to see if
> there's
> > anything that can be done to reduce their incidence and risk.
> There have been 159 minor releases of cassandra (git tag —list | egrep rc
> | egrep beta | wc -l). I did not track exactly what is correnation of the
> bug ration. These 116 vs 159 are just numbers. From my understanding there
> is 116 unecessary issues which could be avoided. You can read these numbers
> in two different ways - every second minor release was fixing maven
> artifacts OR every second release was broken due the maven artifacts. Seems
> you preffer first one while users usualy observes second.
>
>
> >> however it gives real boost when it comes to community donations, tool
> >> development, or even debugging
> > You're conflating the task of upgrading the build system with
> > modularisation, which is a bad idea if you want to make progress on
> either
> > one, since they're each a different and difficult discussion, even if
> they
> > relate.
> I do that cause this is typical chicken vs egg problem. One thing can not
> be done without another it’s just question which one is fist to follow.
> Code modularization/package separation without strict bounds is hard to
> follow. However nothing prevents doing this in reverse mode - by solving
> code issues first and then introducing new build tool. It’s up to cassandra
> developers to decide.
>
> > On the topic of the build system: if you can justify why you think Maven
> > has a significant chance of reducing our bug burden here, a case can
> > perhaps be made, and I will defer to the members of this list with more
> > experience of our build system for that in depth discussion. At the
> moment,
> > it seems to be taken as a given this would occur, but I don't yet see a
> > clear reason that we should expect this to occur.
> You see - I don’t have to justify Maven. I have proposed you a help with
> it. I also gave you couple of reasons why Ant is not first sort of tools
> these days. I don’t feel myself responsible for doing any advocating for
> Maven itself. It’s up to you what you choose. The major thing, major
> problem which modern tools are doing for you is build time classpath
> management (both compile & test) and separate javac executions for both of
> these. Take what you preffer - gradle, sbt, leiningen. Anything which does
> things from previous sentence. Do your own evaluation. Take what work for
> you, not only for me.
>
> > On the topic of modularisation: Like I said previously, everyone on this
> > list is sympathetic to that goal, I think. However the practical reality
> is
> > likely to be too confounding. But that doesn't mean it is absolutely a
> > losing battle, if you can demonstrate a sufficiently painless and
> > worthwhile transition.
> I don't quite get you at this point. From one side you suppose everyone is
> for taking such step, from another one you ask for proofs. In case of code
> relocation there are always multiple ways. Cause of what you have currently
> forces solution of multiple problems. You can start on any of it (ie.
> circular dependencies I did mention in earlier conversation doesn’t require
> changing a tool). In place where you stay at this moment there will be no
> such thing as painless transition. As said ealier - it will be only harder
> over time.
> Given example from my life. We do use Cassandra. We do have plenty of mid
> level integration tests which are verifying end to end functionality.
> Starting from frontend or messaging layer up to data persistence. Now each
> of our tests even if it consist a low amount of data hits IO on multiple
> levels - starting from socket ending on disk. We do not test in such cases
> consistency levels as it’s assumed to be tested by cassandra itself - we
> are ensuring that incoming data passes storage interface and can be
> retrieved back via same interface. With what cassandra is now we can not
> make our tests running fast. People are prisoners of cassandra-unit cause
> embedding cassandra is impossible, even if it’s written using portable
> language. It has too many inner and outer dependencies. On other hand we
> have for example ActiveMQ which has lots of options. Even with all of these
> it might be embedded with no stress, making people use it for tests even if
> in production they use different messaging provider. Cause it’s dead easy.
> By taking a look on things such netty or jackson json processor which
> consisted just two or three modules in 1.x version you can find
> fasterxml-jackson 2.x continuing library evolution in much wider way. It
> does provide more customizable approach, supports pluggable data formats,
> data types and so on. Library users did suffer a bit from changes, package
> renaming and all crazy stuff which was going on, but now only legacy
> projects are dependant on old 1.x version.
> Please don’t get me wrong - I don't want to confront library with database
> - I am just showing an approach which is affecting popular software. Also
> as mentioned above - even entire systems which are older and has similar
> complexity level such Cassandra are making better these days than you. All
> because they have serval jars more. From assembly point of view, for users
> which just download ZIP and unpack it - it doesn’t change anything if you
> have cassandra-all only or devided it into 10 pieces, but from developers
> point of view it makes huge change because these people can decide what
> parts of cassandra they actually need and in which configuration.
>
> Kind regards,
> Lukasz
>
> > On Sat, Apr 11, 2015 at 11:12 AM, Łukasz Dywicki <lu...@code-house.org>
> > wrote:
> >
> >> Sorry for not coming back to topic for long time.
> >>
> >> You are right that what Cassandra project have currently - does work and
> >> keeping package scoping discipline in such big development community as
> >> Cassandra is clearly impossible without tool support (if you insist to
> keep
> >> ant please try to separate javac tasks for logical parts in current
> build
> >> to verify that). I clearly pointed out that it doesn’t work in reliable
> way
> >> causing troubles with artifacts uploaded to maven central. As I briefly
> >> counted in my ealier mail there was 116 issues related to artifacts
> >> published by build process. It is a lot and these changes requires
> another
> >> mainanance releases to fix for example one or another bytecode level
> >> dependency causing NoClassDefErrors with invalid artifacts. According to
> >> some recordings from DataStax there is a plan to support in Cassandra
> >> multiple kinds of store - document, graph so it won’t get easier with
> the
> >> time but rather harder - ask yourself do you really want to mess all
> these
> >> things together?
> >>
> >> Starting from 2.x Cassandra supports triggers but writing even a
> simplest
> >> trigger which will drop a log message or publish UDP packet requires
> entire
> >> cassandra and all it’s dependencies to be present during development.
> >> Fact that everything sits in one big ant build.xml is caused by troubles
> >> generated by ant itself to support multiple build modules, placeholders
> and
> >> so on, not because it’s handsome to do such.
> >>
> >> Modernization of build and internal dependencies is not something which
> >> brings huge benefit in first run cause now your frontend is CQL,
> however it
> >> gives real boost when it comes to community donations, tool
> development, or
> >> even debugging. Sadly keeping current Ant build is silent agreement to
> keep
> >> mess internally and rickety architecture of project. Ant was already
> legacy
> >> tool when Cassandra has been launched. The longer you will stay with it
> the
> >> more troubles you will get with it over time.
> >>
> >> Kind regards,
> >> Lukasz
> >>
> >>
> >>> Wiadomość napisana przez Robert Stupp <sn...@snazy.de> w dniu 2 kwi
> >> 2015, o godz. 14:51:
> >>>
> >>> TL;DR - Benedict is right.
> >>>
> >>> IMO Maven is a nice, straight-forward tool if you know what you’re
> doing
> >> and start on a _new_ project.
> >>> But Maven easily becomes a pita if you want to do something that’s not
> >> supported out-of-the-box.
> >>> I bet that Maven would just not work for C* source tree with all the
> >> little nice features that C*’s build.xml offers (just look at the
> scripted
> >> stuff in build.xml).
> >>>
> >>> Eventually gradle would be an option; I proposed to switch to gradle
> >> several months ago. Same story (although gradle is better than Maven ;)
> ).
> >>> But… you need to know that build.xml is not just used to build the code
> >> and artifacts. It is also used in CI, ccm, cstar-perf and a some other
> >> custom systems that exist and just work. So - if we would exchange ant
> with
> >> something else, it would force a lot of effort to change several tools
> and
> >> systems. And there must be a guarantee that everything works like it did
> >> before.
> >>>
> >>> Regarding IDEs: i’m using IDEA every day and it works like a charm with
> >> C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”.
> TBH I
> >> don’t know NetBeans.
> >>>
> >>> As Benedict pointed out, the code has improved and still improves a lot
> >> - in structure, in inline-doc, in nomenclature and whatever else. As
> soon
> >> as we can get rid of Thrift in the tree, there’s another big
> opportunity to
> >> cleanup more stuff.
> >>>
> >>> TBH I don’t think that (beside the tools) there would be a need to
> >> generate multiple artifacts for C* daemon - you can do ”separation of
> >> concerns” (via packages) even with discipline and then measure it.
> >>> IMO The only artifact worth to extract out of C* tree, and useful for a
> >> (limited) set of 3rd party code, is something like
> >> ”cassandra-jmx-interfaces.jar”
> >>>
> >>> Robert
> >>>
> >>>> Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith <
> >> belliottsmith@datastax.com>:
> >>>>
> >>>> There are three distinct problems you raise: code structure,
> >> documentation,
> >>>> and build system.
> >>>>
> >>>> The build system, as far as I can tell, is a matter of personal
> >> preference.
> >>>> I personally dislike the few interactions I've had with maven, but
> >>>> gratefully my interactions with build system innards have been fairly
> >>>> limited. I mostly just use them. Unless a concrete and significant
> >> benefit
> >>>> is delivered by maven, though, it just doesn't seem worth the upheaval
> >> to
> >>>> me. If you can make the argument that it actually improves the project
> >> in a
> >>>> way that justifies the upheaval, it will certainly be considered, but
> so
> >>>> far no justification has been made.
> >>>>
> >>>> The documentation problem is common to many projects, though: out of
> >>>> codebase documentation gets stale very rapidly. When we say to "read
> the
> >>>> code" we mean "read the code and its inline documentation" - the
> >> quality of
> >>>> this documentation has itself generally been substandard, but has been
> >>>> improving significantly over the past year or so, and we are
> >> endeavouring
> >>>> to improve with every change. In the meantime, there are videos from a
> >>>> recent bootcamp we've run for both internal and external contributors
> >>>> http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
> >>>>
> >>>> The code structure would be great to modularise, but the reality is
> >> that it
> >>>> is not currently modular. There are no good clear dividing lines for
> >> much
> >>>> of the project. The problem with refactoring the entire codebase to
> >> create
> >>>> separate projects is that it is a significant undertaking that makes
> >>>> maintenance of the project across versions significantly more costly.
> >> This
> >>>> create a net drag on all productivity in the project. Such a major
> >> change
> >>>> requires strong consensus, and strong evidence justifying it. So the
> >>>> question is: would this create more new work than it loses? The
> evidence
> >>>> isn't there that it would. It might, but I personally guess that it
> >> would
> >>>> not, judging by the results of our other attempts to drive up
> >> contributions
> >>>> to the project. Perhaps we can have a wider dialogue about the
> >> endeavour,
> >>>> though, and see if a consensus can in fact be built.
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops <pierredevops@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Not a cassandra contributor here, but I'm working on the cassandra
> >> sources
> >>>>> too.
> >>>>>
> >>>>> This big cassandra source root caused me trouble too, firstly it was
> >> not
> >>>>> easy to import in an IDE, try to import cassandra sources in
> netbeans,
> >> it's
> >>>>> a headcache.
> >>>>>
> >>>>> It would be great if we had more small modules/projects in separate
> >> POM. It
> >>>>> will be more easier to work on small part of the project, and as a
> >>>>> consequences, I'm sure you will have more external contribution to
> this
> >>>>> project.
> >>>>>
> >>>>> I know cassandra devs are used to ant build model, but it's like a
> >> thread I
> >>>>> opened about updated and more complete documentation about sstable
> >>>>> structures. I got answer that it was not needed to understand how to
> >> use
> >>>>> Cassandra, and the only way to learn about that is to rtfcode.
> Because
> >>>>> people working on cassandra already know how sstable structure are,
> >> it's
> >>>>> not needed to provide up to date documentation.
> >>>>> So it will take me a very long time to read and understand all the
> >>>>> serialization code in cassandra to understand the sttable structure
> >> before
> >>>>> I can work on the code. Up to date documentation about internals
> would
> >> have
> >>>>> gave me the knowledge I need to contribute much quicker.
> >>>>>
> >>>>> Here we have the same problem, we have a complex non modular build
> >> system,
> >>>>> and core cassandra dev are used to it, so it's not needed to make
> >> something
> >>>>> more flexible, even if it could facilite external contribution.
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
> >>>>> belliottsmith@datastax.com>:
> >>>>>
> >>>>>> I think the problem is everyone currently contributing is
> comfortable
> >>>>> with
> >>>>>> ant, and as much as it is imperfect, it isn't clear maven is going
> to
> >> be
> >>>>>> better. Having the requisite maven functionality linked under the
> hood
> >>>>>> doesn't seem particularly preferable to the inverse. The status quo
> >> has
> >>>>> the
> >>>>>> bonus of zero upheaval for the project and its contributors, though,
> >> so
> >>>>> it
> >>>>>> would have to be a very clear win to justify the change in my
> opinion.
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki <
> luke@code-house.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hey Tyler,
> >>>>>>> Thank you very much for coming back. I already lost faith that I
> will
> >>>>> get
> >>>>>>> reply. :-) I am fine with code relocations. Moving constants into
> one
> >>>>>> place
> >>>>>>> where they cause no circular dependencies is cool, I’m all for
> doing
> >>>>> such
> >>>>>>> thing.
> >>>>>>>
> >>>>>>> Currently Cassandra uses ant for doing some of maven
> functionalities
> >>>>>> (such
> >>>>>>> deploying POM.xml into repositories with dependency information),
> it
> >>>>> uses
> >>>>>>> also maven type of artifact repositories. This can be easily
> flipped.
> >>>>>> Maven
> >>>>>>> can call ant tasks for these parts which can not be made with
> >> existing
> >>>>>>> maven plugins. Here is simplest example:
> >>>>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
> >>>>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you
> can
> >>>>> see
> >>>>>>> ant task definition embedded in maven pom.xml.
> >>>>>>>
> >>>>>>> Most of things can be made at this moment via maven plugins:
> >>>>>>> apache-rat-plugin:
> >>>>>>>
> >>>>>
> >> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
> >>>>>> <
> >>>>>>>
> >>>>>
> >> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
> >
> >>>>>>> maven-thrift-plugin:
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> >>>>>>> <
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> >>>>>>>>
> >>>>>>> antlr4-maven-plugin:
> >>>>>>>
> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5
> >> <
> >>>>>>>
> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5>
> >>>>> or
> >>>>>>> antlr3-maven-plugin:
> >>>>>>>
> >> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
> >>>>> <
> >>>>>>>
> >> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
> >>>>>>> maven-gpg-plugin:
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> >>>>>>> <
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> >>>>>>>>
> >>>>>>> maven-cobertura-plugin:
> >>>>> http://mojo.codehaus.org/cobertura-maven-plugin/
> >>>>>> <
> >>>>>>> http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days
> >>>>> jacoco
> >>>>>>> with java agent instrumentation perfoms better)
> >>>>>>> .. and so on
> >>>>>>>
> >>>>>>> I already made some evaluation of impact and it is big. Code has to
> >> be
> >>>>>>> separated into different source roots. It’s not easy even for
> keeping
> >>>>>>> current artifact structure: cassandra-all, cassandra-thrift and
> >>>>>> clientutil
> >>>>>>> (cause of cyclic dependencies). What I can do is prepare of these
> src
> >>>>>> roots
> >>>>>>> with dependencies which are declared for them and push that to my
> >>>>>> cassandra
> >>>>>>> fork so you will be able to verify that and continue with
> relocations
> >>>>> if
> >>>>>>> you will like new build. Creating new modules (source roots) with
> >> maven
> >>>>>> is
> >>>>>>> simple so you could possibly extract more than these 3 predefined
> >>>>>>> artifacts/package roots.
> >>>>>>> Just let me know if you are interested.
> >>>>>>>
> >>>>>>> Kind regards,
> >>>>>>> Lukasz
> >>>>>>>
> >>>>>>>
> >>>>>>>> Wiadomość napisana przez Tyler Hobbs <ty...@datastax.com> w dniu
> 31
> >>>>>> mar
> >>>>>>> 2015, o godz. 21:57:
> >>>>>>>>
> >>>>>>>> Hi Łukasz,
> >>>>>>>>
> >>>>>>>> I'm not very familiar with the build system, but I'll try to
> >> respond.
> >>>>>>>>
> >>>>>>>> The Serializer dependencies on org.apache.cassandra.transport are
> >>>>>> almost
> >>>>>>>> certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.
> >> These
> >>>>>> are
> >>>>>>>> constants that represent the native protocol version in use, which
> >>>>>>> affects
> >>>>>>>> how certain types are serialized.  These constants could easily be
> >>>>>> moved.
> >>>>>>>>
> >>>>>>>> The o.a.c.marshal dependency in MapSerializer is on AbstractType,
> >> but
> >>>>>>> could
> >>>>>>>> easily be replaced with java.util.Comparator.
> >>>>>>>>
> >>>>>>>> In any case, I'm not necessarily opposed to improving the build
> >>>>> system
> >>>>>> to
> >>>>>>>> make these errors more apparent.  Would your proposal still allow
> us
> >>>>> to
> >>>>>>>> build with ant (and just change the way those artifacts are
> built)?
> >>>>>>>>
> >>>>>>>> On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki <
> >> luke@code-house.org
> >>>>>>> <ma...@code-house.org>> wrote:
> >>>>>>>>
> >>>>>>>>> Dear cassandra commiters and development process followers,
> >>>>>>>>> I would like to bring an important topic off build process of
> >>>>>>> cassandra. I
> >>>>>>>>> am an external user from community point of view, however I been
> >>>>>> walking
> >>>>>>>>> around various  projects close to cassandra over past year or
> even
> >>>>>> more.
> >>>>>>>>> What is worrying me a lot is how cassandra is publishing
> artifacts
> >>>>> and
> >>>>>>> how
> >>>>>>>>> many problems are reported due that.
> >>>>>>>>>
> >>>>>>>>> First of all - I want to note that I am not born enemy of Ant
> >>>>> itself.
> >>>>>> I
> >>>>>>>>> never used it. I am also aware of problems with custom builds
> made
> >>>>>> with
> >>>>>>>>> Maven, however I don’t really want to discuss any particular
> >>>>>>> replacement,
> >>>>>>>>> yet I want to note that Cassandra JIRA project contains about 116
> >>>>>> issues
> >>>>>>>>> related somehow to maven (http://bit.ly/1GRoXl5 <
> >>>>>> http://bit.ly/1GRoXl5>
> >>>>>>> <http://bit.ly/1GRoXl5 <http://bit.ly/1GRoXl5>>,
> >>>>>>>>> project=CASSANDRA, text ~ maven). Depends on the point of view it
> >>>>>> might
> >>>>>>> be
> >>>>>>>>> a lot or a little. By simple statistics it is around 21 issues a
> >>>>> year
> >>>>>> or
> >>>>>>>>> almost 2 issues a month, many of them breaking maintanance/major
> >>>>>>> releases
> >>>>>>>>> from user point of view. From other hand it’s not bad considering
> >>>>> how
> >>>>>>>>> project is being built.
> >>>>>>>>>
> >>>>>>>>> Current structure has a very big disadvantage - ONE source root
> for
> >>>>>>>>> multiple artifacts published in maven repositories and copying
> >>>>> classes
> >>>>>>> to
> >>>>>>>>> jar AFTER they are compiled. Obviously ant copy task doesn’t
> follow
> >>>>>>> import
> >>>>>>>>> statements and does not include dependant classes. For example
> just
> >>>>> by
> >>>>>>>>> making test relocations and extraction of clientutil jar on
> master
> >>>>>>> branch
> >>>>>>>>> into separate source root I have found a bug where ListSerializer
> >>>>>>> depends
> >>>>>>>>> on org.apache.cassandra.transpor package. More over clientutil
> >>>>>>>>> (MapSerializer) does depends on org.apache.cassandra.db.marshal
> >>>>>> package
> >>>>>>>>> leading to the fact that it can not be used without cassandra-all
> >>>>>>> present
> >>>>>>>>> at classpath.
> >>>>>>>>> Luckily for cassandra CQL as a new interface reduces thrift and
> >>>>>>> clientutil
> >>>>>>>>> usage reducing amount of issues reported around these, however
> this
> >>>>>> just
> >>>>>>>>> hides a real problem in previous paragraph. I have found a handy
> >>>>> tool
> >>>>>>> and
> >>>>>>>>> made a graph of circular dependencies in cassandra-all.jar. Graph
> >> of
> >>>>>>>>> results can found here: http://grab.by/FRnO <http://grab.by/FRnO
> >
> >> <
> >>>>>>> http://grab.by/FRnO <http://grab.by/FRnO>>. As you
> >>>>>>>>> can see this graph has multiple levels and solving it is not a
> >>>>> simple
> >>>>>>> task.
> >>>>>>>>> I am afraid a current way of building and packaging cassandra can
> >>>>>> create
> >>>>>>>>> huge hiccups when it will come to code rafactorings cause entire
> >>>>>>> cassandra
> >>>>>>>>> will become a house of cards.
> >>>>>>>>> Restructuring project into smaller pieces is also beneficiary for
> >>>>>>>>> community since solving bugs in smaller units is definitelly
> >> easier.
> >>>>>>>>>
> >>>>>>>>> At the end of this mail I would like to propose moving Cassandra
> >>>>> build
> >>>>>>>>> system forward, regardless of tool which will be choosen for it.
> >>>>>>> Personally
> >>>>>>>>> I can volunteer in maven related changes to extract
> >>>>> cassandra-thrift,
> >>>>>>>>> cassandra-clientutil and cassandra-all to make regular maven
> build.
> >>>>> It
> >>>>>>>>> might be seen as a switch from one big XML into couple smaller.
> :-)
> >>>>>> All
> >>>>>>>>> this depends on Cassandra developers decission to devide source
> >>>>> roots
> >>>>>> or
> >>>>>>>>> not.
> >>>>>>>>>
> >>>>>>>>> Kind regards,
> >>>>>>>>> Łukasz Dywicki
> >>>>>>>>> —
> >>>>>>>>> luke@code-house.org
> >>>>>>>>> Twitter: ldywicki
> >>>>>>>>> Blog: http://dywicki.pl
> >>>>>>>>> Code-House - http://code-house.org
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Tyler Hobbs
> >>>>>>>> DataStax <http://datastax.com/ <http://datastax.com/>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>> —
> >>> Robert Stupp
> >>> @snazy
> >>>
> >>
> >>
>
>

Re: [discuss] Modernization of Cassandra build system

Posted by Łukasz Dywicki <lu...@code-house.org>.
Hey Benedict,
My replies in line


>> According to some recordings from DataStax there is a plan to support in
>> Cassandra multiple kinds of store - document, graph so it won’t get easier
>> with the time but rather harder - ask yourself do you really want to mess
>> all these things together?
> Well, these certainly won't live in the same repository, so I wouldn't
> worry about that
That’s good. That’s very good cause it will force separation. If you will do that please consider using other build system to don’t repeat mistakes which are present now in main Cassandra build.

>> As I briefly counted in my ealier mail there was 116 issues related to
>> artifacts published by build process.
> That does sound like a lot of bugs. How many actual maintenance releases
> were necessary, did you happen to also count? This is something that could
> be raised at the new retrospective that Ariel has begun, to see if there's
> anything that can be done to reduce their incidence and risk.
There have been 159 minor releases of cassandra (git tag —list | egrep rc | egrep beta | wc -l). I did not track exactly what is correnation of the bug ration. These 116 vs 159 are just numbers. From my understanding there is 116 unecessary issues which could be avoided. You can read these numbers in two different ways - every second minor release was fixing maven artifacts OR every second release was broken due the maven artifacts. Seems you preffer first one while users usualy observes second.


>> however it gives real boost when it comes to community donations, tool
>> development, or even debugging
> You're conflating the task of upgrading the build system with
> modularisation, which is a bad idea if you want to make progress on either
> one, since they're each a different and difficult discussion, even if they
> relate.
I do that cause this is typical chicken vs egg problem. One thing can not be done without another it’s just question which one is fist to follow. Code modularization/package separation without strict bounds is hard to follow. However nothing prevents doing this in reverse mode - by solving code issues first and then introducing new build tool. It’s up to cassandra developers to decide.

> On the topic of the build system: if you can justify why you think Maven
> has a significant chance of reducing our bug burden here, a case can
> perhaps be made, and I will defer to the members of this list with more
> experience of our build system for that in depth discussion. At the moment,
> it seems to be taken as a given this would occur, but I don't yet see a
> clear reason that we should expect this to occur.
You see - I don’t have to justify Maven. I have proposed you a help with it. I also gave you couple of reasons why Ant is not first sort of tools these days. I don’t feel myself responsible for doing any advocating for Maven itself. It’s up to you what you choose. The major thing, major problem which modern tools are doing for you is build time classpath management (both compile & test) and separate javac executions for both of these. Take what you preffer - gradle, sbt, leiningen. Anything which does things from previous sentence. Do your own evaluation. Take what work for you, not only for me.

> On the topic of modularisation: Like I said previously, everyone on this
> list is sympathetic to that goal, I think. However the practical reality is
> likely to be too confounding. But that doesn't mean it is absolutely a
> losing battle, if you can demonstrate a sufficiently painless and
> worthwhile transition.
I don't quite get you at this point. From one side you suppose everyone is for taking such step, from another one you ask for proofs. In case of code relocation there are always multiple ways. Cause of what you have currently forces solution of multiple problems. You can start on any of it (ie. circular dependencies I did mention in earlier conversation doesn’t require changing a tool). In place where you stay at this moment there will be no such thing as painless transition. As said ealier - it will be only harder over time.
Given example from my life. We do use Cassandra. We do have plenty of mid level integration tests which are verifying end to end functionality. Starting from frontend or messaging layer up to data persistence. Now each of our tests even if it consist a low amount of data hits IO on multiple levels - starting from socket ending on disk. We do not test in such cases consistency levels as it’s assumed to be tested by cassandra itself - we are ensuring that incoming data passes storage interface and can be retrieved back via same interface. With what cassandra is now we can not make our tests running fast. People are prisoners of cassandra-unit cause embedding cassandra is impossible, even if it’s written using portable language. It has too many inner and outer dependencies. On other hand we have for example ActiveMQ which has lots of options. Even with all of these it might be embedded with no stress, making people use it for tests even if in production they use different messaging provider. Cause it’s dead easy.
By taking a look on things such netty or jackson json processor which consisted just two or three modules in 1.x version you can find fasterxml-jackson 2.x continuing library evolution in much wider way. It does provide more customizable approach, supports pluggable data formats, data types and so on. Library users did suffer a bit from changes, package renaming and all crazy stuff which was going on, but now only legacy projects are dependant on old 1.x version.
Please don’t get me wrong - I don't want to confront library with database - I am just showing an approach which is affecting popular software. Also as mentioned above - even entire systems which are older and has similar complexity level such Cassandra are making better these days than you. All because they have serval jars more. From assembly point of view, for users which just download ZIP and unpack it - it doesn’t change anything if you have cassandra-all only or devided it into 10 pieces, but from developers point of view it makes huge change because these people can decide what parts of cassandra they actually need and in which configuration.

Kind regards,
Lukasz

> On Sat, Apr 11, 2015 at 11:12 AM, Łukasz Dywicki <lu...@code-house.org>
> wrote:
> 
>> Sorry for not coming back to topic for long time.
>> 
>> You are right that what Cassandra project have currently - does work and
>> keeping package scoping discipline in such big development community as
>> Cassandra is clearly impossible without tool support (if you insist to keep
>> ant please try to separate javac tasks for logical parts in current build
>> to verify that). I clearly pointed out that it doesn’t work in reliable way
>> causing troubles with artifacts uploaded to maven central. As I briefly
>> counted in my ealier mail there was 116 issues related to artifacts
>> published by build process. It is a lot and these changes requires another
>> mainanance releases to fix for example one or another bytecode level
>> dependency causing NoClassDefErrors with invalid artifacts. According to
>> some recordings from DataStax there is a plan to support in Cassandra
>> multiple kinds of store - document, graph so it won’t get easier with the
>> time but rather harder - ask yourself do you really want to mess all these
>> things together?
>> 
>> Starting from 2.x Cassandra supports triggers but writing even a simplest
>> trigger which will drop a log message or publish UDP packet requires entire
>> cassandra and all it’s dependencies to be present during development.
>> Fact that everything sits in one big ant build.xml is caused by troubles
>> generated by ant itself to support multiple build modules, placeholders and
>> so on, not because it’s handsome to do such.
>> 
>> Modernization of build and internal dependencies is not something which
>> brings huge benefit in first run cause now your frontend is CQL, however it
>> gives real boost when it comes to community donations, tool development, or
>> even debugging. Sadly keeping current Ant build is silent agreement to keep
>> mess internally and rickety architecture of project. Ant was already legacy
>> tool when Cassandra has been launched. The longer you will stay with it the
>> more troubles you will get with it over time.
>> 
>> Kind regards,
>> Lukasz
>> 
>> 
>>> Wiadomość napisana przez Robert Stupp <sn...@snazy.de> w dniu 2 kwi
>> 2015, o godz. 14:51:
>>> 
>>> TL;DR - Benedict is right.
>>> 
>>> IMO Maven is a nice, straight-forward tool if you know what you’re doing
>> and start on a _new_ project.
>>> But Maven easily becomes a pita if you want to do something that’s not
>> supported out-of-the-box.
>>> I bet that Maven would just not work for C* source tree with all the
>> little nice features that C*’s build.xml offers (just look at the scripted
>> stuff in build.xml).
>>> 
>>> Eventually gradle would be an option; I proposed to switch to gradle
>> several months ago. Same story (although gradle is better than Maven ;) ).
>>> But… you need to know that build.xml is not just used to build the code
>> and artifacts. It is also used in CI, ccm, cstar-perf and a some other
>> custom systems that exist and just work. So - if we would exchange ant with
>> something else, it would force a lot of effort to change several tools and
>> systems. And there must be a guarantee that everything works like it did
>> before.
>>> 
>>> Regarding IDEs: i’m using IDEA every day and it works like a charm with
>> C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”. TBH I
>> don’t know NetBeans.
>>> 
>>> As Benedict pointed out, the code has improved and still improves a lot
>> - in structure, in inline-doc, in nomenclature and whatever else. As soon
>> as we can get rid of Thrift in the tree, there’s another big opportunity to
>> cleanup more stuff.
>>> 
>>> TBH I don’t think that (beside the tools) there would be a need to
>> generate multiple artifacts for C* daemon - you can do ”separation of
>> concerns” (via packages) even with discipline and then measure it.
>>> IMO The only artifact worth to extract out of C* tree, and useful for a
>> (limited) set of 3rd party code, is something like
>> ”cassandra-jmx-interfaces.jar”
>>> 
>>> Robert
>>> 
>>>> Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith <
>> belliottsmith@datastax.com>:
>>>> 
>>>> There are three distinct problems you raise: code structure,
>> documentation,
>>>> and build system.
>>>> 
>>>> The build system, as far as I can tell, is a matter of personal
>> preference.
>>>> I personally dislike the few interactions I've had with maven, but
>>>> gratefully my interactions with build system innards have been fairly
>>>> limited. I mostly just use them. Unless a concrete and significant
>> benefit
>>>> is delivered by maven, though, it just doesn't seem worth the upheaval
>> to
>>>> me. If you can make the argument that it actually improves the project
>> in a
>>>> way that justifies the upheaval, it will certainly be considered, but so
>>>> far no justification has been made.
>>>> 
>>>> The documentation problem is common to many projects, though: out of
>>>> codebase documentation gets stale very rapidly. When we say to "read the
>>>> code" we mean "read the code and its inline documentation" - the
>> quality of
>>>> this documentation has itself generally been substandard, but has been
>>>> improving significantly over the past year or so, and we are
>> endeavouring
>>>> to improve with every change. In the meantime, there are videos from a
>>>> recent bootcamp we've run for both internal and external contributors
>>>> http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
>>>> 
>>>> The code structure would be great to modularise, but the reality is
>> that it
>>>> is not currently modular. There are no good clear dividing lines for
>> much
>>>> of the project. The problem with refactoring the entire codebase to
>> create
>>>> separate projects is that it is a significant undertaking that makes
>>>> maintenance of the project across versions significantly more costly.
>> This
>>>> create a net drag on all productivity in the project. Such a major
>> change
>>>> requires strong consensus, and strong evidence justifying it. So the
>>>> question is: would this create more new work than it loses? The evidence
>>>> isn't there that it would. It might, but I personally guess that it
>> would
>>>> not, judging by the results of our other attempts to drive up
>> contributions
>>>> to the project. Perhaps we can have a wider dialogue about the
>> endeavour,
>>>> though, and see if a consensus can in fact be built.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops <pi...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Not a cassandra contributor here, but I'm working on the cassandra
>> sources
>>>>> too.
>>>>> 
>>>>> This big cassandra source root caused me trouble too, firstly it was
>> not
>>>>> easy to import in an IDE, try to import cassandra sources in netbeans,
>> it's
>>>>> a headcache.
>>>>> 
>>>>> It would be great if we had more small modules/projects in separate
>> POM. It
>>>>> will be more easier to work on small part of the project, and as a
>>>>> consequences, I'm sure you will have more external contribution to this
>>>>> project.
>>>>> 
>>>>> I know cassandra devs are used to ant build model, but it's like a
>> thread I
>>>>> opened about updated and more complete documentation about sstable
>>>>> structures. I got answer that it was not needed to understand how to
>> use
>>>>> Cassandra, and the only way to learn about that is to rtfcode. Because
>>>>> people working on cassandra already know how sstable structure are,
>> it's
>>>>> not needed to provide up to date documentation.
>>>>> So it will take me a very long time to read and understand all the
>>>>> serialization code in cassandra to understand the sttable structure
>> before
>>>>> I can work on the code. Up to date documentation about internals would
>> have
>>>>> gave me the knowledge I need to contribute much quicker.
>>>>> 
>>>>> Here we have the same problem, we have a complex non modular build
>> system,
>>>>> and core cassandra dev are used to it, so it's not needed to make
>> something
>>>>> more flexible, even if it could facilite external contribution.
>>>>> 
>>>>> 
>>>>> 
>>>>> 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
>>>>> belliottsmith@datastax.com>:
>>>>> 
>>>>>> I think the problem is everyone currently contributing is comfortable
>>>>> with
>>>>>> ant, and as much as it is imperfect, it isn't clear maven is going to
>> be
>>>>>> better. Having the requisite maven functionality linked under the hood
>>>>>> doesn't seem particularly preferable to the inverse. The status quo
>> has
>>>>> the
>>>>>> bonus of zero upheaval for the project and its contributors, though,
>> so
>>>>> it
>>>>>> would have to be a very clear win to justify the change in my opinion.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki <luke@code-house.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hey Tyler,
>>>>>>> Thank you very much for coming back. I already lost faith that I will
>>>>> get
>>>>>>> reply. :-) I am fine with code relocations. Moving constants into one
>>>>>> place
>>>>>>> where they cause no circular dependencies is cool, I’m all for doing
>>>>> such
>>>>>>> thing.
>>>>>>> 
>>>>>>> Currently Cassandra uses ant for doing some of maven functionalities
>>>>>> (such
>>>>>>> deploying POM.xml into repositories with dependency information), it
>>>>> uses
>>>>>>> also maven type of artifact repositories. This can be easily flipped.
>>>>>> Maven
>>>>>>> can call ant tasks for these parts which can not be made with
>> existing
>>>>>>> maven plugins. Here is simplest example:
>>>>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
>>>>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you can
>>>>> see
>>>>>>> ant task definition embedded in maven pom.xml.
>>>>>>> 
>>>>>>> Most of things can be made at this moment via maven plugins:
>>>>>>> apache-rat-plugin:
>>>>>>> 
>>>>> 
>> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
>>>>>> <
>>>>>>> 
>>>>> 
>> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11>
>>>>>>> maven-thrift-plugin:
>>>>>>> 
>>>>>> 
>>>>> 
>> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
>>>>>>> <
>>>>>>> 
>>>>>> 
>>>>> 
>> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
>>>>>>>> 
>>>>>>> antlr4-maven-plugin:
>>>>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5
>> <
>>>>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5>
>>>>> or
>>>>>>> antlr3-maven-plugin:
>>>>>>> 
>> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
>>>>> <
>>>>>>> 
>> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
>>>>>>> maven-gpg-plugin:
>>>>>>> 
>>>>>> 
>>>>> 
>> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
>>>>>>> <
>>>>>>> 
>>>>>> 
>>>>> 
>> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
>>>>>>>> 
>>>>>>> maven-cobertura-plugin:
>>>>> http://mojo.codehaus.org/cobertura-maven-plugin/
>>>>>> <
>>>>>>> http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days
>>>>> jacoco
>>>>>>> with java agent instrumentation perfoms better)
>>>>>>> .. and so on
>>>>>>> 
>>>>>>> I already made some evaluation of impact and it is big. Code has to
>> be
>>>>>>> separated into different source roots. It’s not easy even for keeping
>>>>>>> current artifact structure: cassandra-all, cassandra-thrift and
>>>>>> clientutil
>>>>>>> (cause of cyclic dependencies). What I can do is prepare of these src
>>>>>> roots
>>>>>>> with dependencies which are declared for them and push that to my
>>>>>> cassandra
>>>>>>> fork so you will be able to verify that and continue with relocations
>>>>> if
>>>>>>> you will like new build. Creating new modules (source roots) with
>> maven
>>>>>> is
>>>>>>> simple so you could possibly extract more than these 3 predefined
>>>>>>> artifacts/package roots.
>>>>>>> Just let me know if you are interested.
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> Lukasz
>>>>>>> 
>>>>>>> 
>>>>>>>> Wiadomość napisana przez Tyler Hobbs <ty...@datastax.com> w dniu 31
>>>>>> mar
>>>>>>> 2015, o godz. 21:57:
>>>>>>>> 
>>>>>>>> Hi Łukasz,
>>>>>>>> 
>>>>>>>> I'm not very familiar with the build system, but I'll try to
>> respond.
>>>>>>>> 
>>>>>>>> The Serializer dependencies on org.apache.cassandra.transport are
>>>>>> almost
>>>>>>>> certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.
>> These
>>>>>> are
>>>>>>>> constants that represent the native protocol version in use, which
>>>>>>> affects
>>>>>>>> how certain types are serialized.  These constants could easily be
>>>>>> moved.
>>>>>>>> 
>>>>>>>> The o.a.c.marshal dependency in MapSerializer is on AbstractType,
>> but
>>>>>>> could
>>>>>>>> easily be replaced with java.util.Comparator.
>>>>>>>> 
>>>>>>>> In any case, I'm not necessarily opposed to improving the build
>>>>> system
>>>>>> to
>>>>>>>> make these errors more apparent.  Would your proposal still allow us
>>>>> to
>>>>>>>> build with ant (and just change the way those artifacts are built)?
>>>>>>>> 
>>>>>>>> On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki <
>> luke@code-house.org
>>>>>>> <ma...@code-house.org>> wrote:
>>>>>>>> 
>>>>>>>>> Dear cassandra commiters and development process followers,
>>>>>>>>> I would like to bring an important topic off build process of
>>>>>>> cassandra. I
>>>>>>>>> am an external user from community point of view, however I been
>>>>>> walking
>>>>>>>>> around various  projects close to cassandra over past year or even
>>>>>> more.
>>>>>>>>> What is worrying me a lot is how cassandra is publishing artifacts
>>>>> and
>>>>>>> how
>>>>>>>>> many problems are reported due that.
>>>>>>>>> 
>>>>>>>>> First of all - I want to note that I am not born enemy of Ant
>>>>> itself.
>>>>>> I
>>>>>>>>> never used it. I am also aware of problems with custom builds made
>>>>>> with
>>>>>>>>> Maven, however I don’t really want to discuss any particular
>>>>>>> replacement,
>>>>>>>>> yet I want to note that Cassandra JIRA project contains about 116
>>>>>> issues
>>>>>>>>> related somehow to maven (http://bit.ly/1GRoXl5 <
>>>>>> http://bit.ly/1GRoXl5>
>>>>>>> <http://bit.ly/1GRoXl5 <http://bit.ly/1GRoXl5>>,
>>>>>>>>> project=CASSANDRA, text ~ maven). Depends on the point of view it
>>>>>> might
>>>>>>> be
>>>>>>>>> a lot or a little. By simple statistics it is around 21 issues a
>>>>> year
>>>>>> or
>>>>>>>>> almost 2 issues a month, many of them breaking maintanance/major
>>>>>>> releases
>>>>>>>>> from user point of view. From other hand it’s not bad considering
>>>>> how
>>>>>>>>> project is being built.
>>>>>>>>> 
>>>>>>>>> Current structure has a very big disadvantage - ONE source root for
>>>>>>>>> multiple artifacts published in maven repositories and copying
>>>>> classes
>>>>>>> to
>>>>>>>>> jar AFTER they are compiled. Obviously ant copy task doesn’t follow
>>>>>>> import
>>>>>>>>> statements and does not include dependant classes. For example just
>>>>> by
>>>>>>>>> making test relocations and extraction of clientutil jar on master
>>>>>>> branch
>>>>>>>>> into separate source root I have found a bug where ListSerializer
>>>>>>> depends
>>>>>>>>> on org.apache.cassandra.transpor package. More over clientutil
>>>>>>>>> (MapSerializer) does depends on org.apache.cassandra.db.marshal
>>>>>> package
>>>>>>>>> leading to the fact that it can not be used without cassandra-all
>>>>>>> present
>>>>>>>>> at classpath.
>>>>>>>>> Luckily for cassandra CQL as a new interface reduces thrift and
>>>>>>> clientutil
>>>>>>>>> usage reducing amount of issues reported around these, however this
>>>>>> just
>>>>>>>>> hides a real problem in previous paragraph. I have found a handy
>>>>> tool
>>>>>>> and
>>>>>>>>> made a graph of circular dependencies in cassandra-all.jar. Graph
>> of
>>>>>>>>> results can found here: http://grab.by/FRnO <http://grab.by/FRnO>
>> <
>>>>>>> http://grab.by/FRnO <http://grab.by/FRnO>>. As you
>>>>>>>>> can see this graph has multiple levels and solving it is not a
>>>>> simple
>>>>>>> task.
>>>>>>>>> I am afraid a current way of building and packaging cassandra can
>>>>>> create
>>>>>>>>> huge hiccups when it will come to code rafactorings cause entire
>>>>>>> cassandra
>>>>>>>>> will become a house of cards.
>>>>>>>>> Restructuring project into smaller pieces is also beneficiary for
>>>>>>>>> community since solving bugs in smaller units is definitelly
>> easier.
>>>>>>>>> 
>>>>>>>>> At the end of this mail I would like to propose moving Cassandra
>>>>> build
>>>>>>>>> system forward, regardless of tool which will be choosen for it.
>>>>>>> Personally
>>>>>>>>> I can volunteer in maven related changes to extract
>>>>> cassandra-thrift,
>>>>>>>>> cassandra-clientutil and cassandra-all to make regular maven build.
>>>>> It
>>>>>>>>> might be seen as a switch from one big XML into couple smaller. :-)
>>>>>> All
>>>>>>>>> this depends on Cassandra developers decission to devide source
>>>>> roots
>>>>>> or
>>>>>>>>> not.
>>>>>>>>> 
>>>>>>>>> Kind regards,
>>>>>>>>> Łukasz Dywicki
>>>>>>>>> —
>>>>>>>>> luke@code-house.org
>>>>>>>>> Twitter: ldywicki
>>>>>>>>> Blog: http://dywicki.pl
>>>>>>>>> Code-House - http://code-house.org
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Tyler Hobbs
>>>>>>>> DataStax <http://datastax.com/ <http://datastax.com/>>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> —
>>> Robert Stupp
>>> @snazy
>>> 
>> 
>> 


Re: [discuss] Modernization of Cassandra build system

Posted by Benedict Elliott Smith <be...@datastax.com>.
>
> According to some recordings from DataStax there is a plan to support in
> Cassandra multiple kinds of store - document, graph so it won’t get easier
> with the time but rather harder - ask yourself do you really want to mess
> all these things together?


Well, these certainly won't live in the same repository, so I wouldn't
worry about that

 As I briefly counted in my ealier mail there was 116 issues related to
> artifacts published by build process.


That does sound like a lot of bugs. How many actual maintenance releases
were necessary, did you happen to also count? This is something that could
be raised at the new retrospective that Ariel has begun, to see if there's
anything that can be done to reduce their incidence and risk.

however it gives real boost when it comes to community donations, tool
> development, or even debugging


You're conflating the task of upgrading the build system with
modularisation, which is a bad idea if you want to make progress on either
one, since they're each a different and difficult discussion, even if they
relate.

On the topic of the build system: if you can justify why you think Maven
has a significant chance of reducing our bug burden here, a case can
perhaps be made, and I will defer to the members of this list with more
experience of our build system for that in depth discussion. At the moment,
it seems to be taken as a given this would occur, but I don't yet see a
clear reason that we should expect this to occur.

On the topic of modularisation: Like I said previously, everyone on this
list is sympathetic to that goal, I think. However the practical reality is
likely to be too confounding. But that doesn't mean it is absolutely a
losing battle, if you can demonstrate a sufficiently painless and
worthwhile transition.


On Sat, Apr 11, 2015 at 11:12 AM, Łukasz Dywicki <lu...@code-house.org>
wrote:

> Sorry for not coming back to topic for long time.
>
> You are right that what Cassandra project have currently - does work and
> keeping package scoping discipline in such big development community as
> Cassandra is clearly impossible without tool support (if you insist to keep
> ant please try to separate javac tasks for logical parts in current build
> to verify that). I clearly pointed out that it doesn’t work in reliable way
> causing troubles with artifacts uploaded to maven central. As I briefly
> counted in my ealier mail there was 116 issues related to artifacts
> published by build process. It is a lot and these changes requires another
> mainanance releases to fix for example one or another bytecode level
> dependency causing NoClassDefErrors with invalid artifacts. According to
> some recordings from DataStax there is a plan to support in Cassandra
> multiple kinds of store - document, graph so it won’t get easier with the
> time but rather harder - ask yourself do you really want to mess all these
> things together?
>
> Starting from 2.x Cassandra supports triggers but writing even a simplest
> trigger which will drop a log message or publish UDP packet requires entire
> cassandra and all it’s dependencies to be present during development.
> Fact that everything sits in one big ant build.xml is caused by troubles
> generated by ant itself to support multiple build modules, placeholders and
> so on, not because it’s handsome to do such.
>
> Modernization of build and internal dependencies is not something which
> brings huge benefit in first run cause now your frontend is CQL, however it
> gives real boost when it comes to community donations, tool development, or
> even debugging. Sadly keeping current Ant build is silent agreement to keep
> mess internally and rickety architecture of project. Ant was already legacy
> tool when Cassandra has been launched. The longer you will stay with it the
> more troubles you will get with it over time.
>
> Kind regards,
> Lukasz
>
>
> > Wiadomość napisana przez Robert Stupp <sn...@snazy.de> w dniu 2 kwi
> 2015, o godz. 14:51:
> >
> > TL;DR - Benedict is right.
> >
> > IMO Maven is a nice, straight-forward tool if you know what you’re doing
> and start on a _new_ project.
> > But Maven easily becomes a pita if you want to do something that’s not
> supported out-of-the-box.
> > I bet that Maven would just not work for C* source tree with all the
> little nice features that C*’s build.xml offers (just look at the scripted
> stuff in build.xml).
> >
> > Eventually gradle would be an option; I proposed to switch to gradle
> several months ago. Same story (although gradle is better than Maven ;) ).
> > But… you need to know that build.xml is not just used to build the code
> and artifacts. It is also used in CI, ccm, cstar-perf and a some other
> custom systems that exist and just work. So - if we would exchange ant with
> something else, it would force a lot of effort to change several tools and
> systems. And there must be a guarantee that everything works like it did
> before.
> >
> > Regarding IDEs: i’m using IDEA every day and it works like a charm with
> C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”. TBH I
> don’t know NetBeans.
> >
> > As Benedict pointed out, the code has improved and still improves a lot
> - in structure, in inline-doc, in nomenclature and whatever else. As soon
> as we can get rid of Thrift in the tree, there’s another big opportunity to
> cleanup more stuff.
> >
> > TBH I don’t think that (beside the tools) there would be a need to
> generate multiple artifacts for C* daemon - you can do ”separation of
> concerns” (via packages) even with discipline and then measure it.
> > IMO The only artifact worth to extract out of C* tree, and useful for a
> (limited) set of 3rd party code, is something like
> ”cassandra-jmx-interfaces.jar”
> >
> > Robert
> >
> >> Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith <
> belliottsmith@datastax.com>:
> >>
> >> There are three distinct problems you raise: code structure,
> documentation,
> >> and build system.
> >>
> >> The build system, as far as I can tell, is a matter of personal
> preference.
> >> I personally dislike the few interactions I've had with maven, but
> >> gratefully my interactions with build system innards have been fairly
> >> limited. I mostly just use them. Unless a concrete and significant
> benefit
> >> is delivered by maven, though, it just doesn't seem worth the upheaval
> to
> >> me. If you can make the argument that it actually improves the project
> in a
> >> way that justifies the upheaval, it will certainly be considered, but so
> >> far no justification has been made.
> >>
> >> The documentation problem is common to many projects, though: out of
> >> codebase documentation gets stale very rapidly. When we say to "read the
> >> code" we mean "read the code and its inline documentation" - the
> quality of
> >> this documentation has itself generally been substandard, but has been
> >> improving significantly over the past year or so, and we are
> endeavouring
> >> to improve with every change. In the meantime, there are videos from a
> >> recent bootcamp we've run for both internal and external contributors
> >> http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
> >>
> >> The code structure would be great to modularise, but the reality is
> that it
> >> is not currently modular. There are no good clear dividing lines for
> much
> >> of the project. The problem with refactoring the entire codebase to
> create
> >> separate projects is that it is a significant undertaking that makes
> >> maintenance of the project across versions significantly more costly.
> This
> >> create a net drag on all productivity in the project. Such a major
> change
> >> requires strong consensus, and strong evidence justifying it. So the
> >> question is: would this create more new work than it loses? The evidence
> >> isn't there that it would. It might, but I personally guess that it
> would
> >> not, judging by the results of our other attempts to drive up
> contributions
> >> to the project. Perhaps we can have a wider dialogue about the
> endeavour,
> >> though, and see if a consensus can in fact be built.
> >>
> >>
> >>
> >> On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops <pi...@gmail.com>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Not a cassandra contributor here, but I'm working on the cassandra
> sources
> >>> too.
> >>>
> >>> This big cassandra source root caused me trouble too, firstly it was
> not
> >>> easy to import in an IDE, try to import cassandra sources in netbeans,
> it's
> >>> a headcache.
> >>>
> >>> It would be great if we had more small modules/projects in separate
> POM. It
> >>> will be more easier to work on small part of the project, and as a
> >>> consequences, I'm sure you will have more external contribution to this
> >>> project.
> >>>
> >>> I know cassandra devs are used to ant build model, but it's like a
> thread I
> >>> opened about updated and more complete documentation about sstable
> >>> structures. I got answer that it was not needed to understand how to
> use
> >>> Cassandra, and the only way to learn about that is to rtfcode. Because
> >>> people working on cassandra already know how sstable structure are,
> it's
> >>> not needed to provide up to date documentation.
> >>> So it will take me a very long time to read and understand all the
> >>> serialization code in cassandra to understand the sttable structure
> before
> >>> I can work on the code. Up to date documentation about internals would
> have
> >>> gave me the knowledge I need to contribute much quicker.
> >>>
> >>> Here we have the same problem, we have a complex non modular build
> system,
> >>> and core cassandra dev are used to it, so it's not needed to make
> something
> >>> more flexible, even if it could facilite external contribution.
> >>>
> >>>
> >>>
> >>> 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
> >>> belliottsmith@datastax.com>:
> >>>
> >>>> I think the problem is everyone currently contributing is comfortable
> >>> with
> >>>> ant, and as much as it is imperfect, it isn't clear maven is going to
> be
> >>>> better. Having the requisite maven functionality linked under the hood
> >>>> doesn't seem particularly preferable to the inverse. The status quo
> has
> >>> the
> >>>> bonus of zero upheaval for the project and its contributors, though,
> so
> >>> it
> >>>> would have to be a very clear win to justify the change in my opinion.
> >>>>
> >>>>
> >>>> On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki <luke@code-house.org
> >
> >>>> wrote:
> >>>>
> >>>>> Hey Tyler,
> >>>>> Thank you very much for coming back. I already lost faith that I will
> >>> get
> >>>>> reply. :-) I am fine with code relocations. Moving constants into one
> >>>> place
> >>>>> where they cause no circular dependencies is cool, I’m all for doing
> >>> such
> >>>>> thing.
> >>>>>
> >>>>> Currently Cassandra uses ant for doing some of maven functionalities
> >>>> (such
> >>>>> deploying POM.xml into repositories with dependency information), it
> >>> uses
> >>>>> also maven type of artifact repositories. This can be easily flipped.
> >>>> Maven
> >>>>> can call ant tasks for these parts which can not be made with
> existing
> >>>>> maven plugins. Here is simplest example:
> >>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
> >>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you can
> >>> see
> >>>>> ant task definition embedded in maven pom.xml.
> >>>>>
> >>>>> Most of things can be made at this moment via maven plugins:
> >>>>> apache-rat-plugin:
> >>>>>
> >>>
> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
> >>>> <
> >>>>>
> >>>
> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11>
> >>>>> maven-thrift-plugin:
> >>>>>
> >>>>
> >>>
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> >>>>> <
> >>>>>
> >>>>
> >>>
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> >>>>>>
> >>>>> antlr4-maven-plugin:
> >>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5
> <
> >>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5>
> >>> or
> >>>>> antlr3-maven-plugin:
> >>>>>
> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
> >>> <
> >>>>>
> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
> >>>>> maven-gpg-plugin:
> >>>>>
> >>>>
> >>>
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> >>>>> <
> >>>>>
> >>>>
> >>>
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> >>>>>>
> >>>>> maven-cobertura-plugin:
> >>> http://mojo.codehaus.org/cobertura-maven-plugin/
> >>>> <
> >>>>> http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days
> >>> jacoco
> >>>>> with java agent instrumentation perfoms better)
> >>>>> .. and so on
> >>>>>
> >>>>> I already made some evaluation of impact and it is big. Code has to
> be
> >>>>> separated into different source roots. It’s not easy even for keeping
> >>>>> current artifact structure: cassandra-all, cassandra-thrift and
> >>>> clientutil
> >>>>> (cause of cyclic dependencies). What I can do is prepare of these src
> >>>> roots
> >>>>> with dependencies which are declared for them and push that to my
> >>>> cassandra
> >>>>> fork so you will be able to verify that and continue with relocations
> >>> if
> >>>>> you will like new build. Creating new modules (source roots) with
> maven
> >>>> is
> >>>>> simple so you could possibly extract more than these 3 predefined
> >>>>> artifacts/package roots.
> >>>>> Just let me know if you are interested.
> >>>>>
> >>>>> Kind regards,
> >>>>> Lukasz
> >>>>>
> >>>>>
> >>>>>> Wiadomość napisana przez Tyler Hobbs <ty...@datastax.com> w dniu 31
> >>>> mar
> >>>>> 2015, o godz. 21:57:
> >>>>>>
> >>>>>> Hi Łukasz,
> >>>>>>
> >>>>>> I'm not very familiar with the build system, but I'll try to
> respond.
> >>>>>>
> >>>>>> The Serializer dependencies on org.apache.cassandra.transport are
> >>>> almost
> >>>>>> certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.
> These
> >>>> are
> >>>>>> constants that represent the native protocol version in use, which
> >>>>> affects
> >>>>>> how certain types are serialized.  These constants could easily be
> >>>> moved.
> >>>>>>
> >>>>>> The o.a.c.marshal dependency in MapSerializer is on AbstractType,
> but
> >>>>> could
> >>>>>> easily be replaced with java.util.Comparator.
> >>>>>>
> >>>>>> In any case, I'm not necessarily opposed to improving the build
> >>> system
> >>>> to
> >>>>>> make these errors more apparent.  Would your proposal still allow us
> >>> to
> >>>>>> build with ant (and just change the way those artifacts are built)?
> >>>>>>
> >>>>>> On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki <
> luke@code-house.org
> >>>>> <ma...@code-house.org>> wrote:
> >>>>>>
> >>>>>>> Dear cassandra commiters and development process followers,
> >>>>>>> I would like to bring an important topic off build process of
> >>>>> cassandra. I
> >>>>>>> am an external user from community point of view, however I been
> >>>> walking
> >>>>>>> around various  projects close to cassandra over past year or even
> >>>> more.
> >>>>>>> What is worrying me a lot is how cassandra is publishing artifacts
> >>> and
> >>>>> how
> >>>>>>> many problems are reported due that.
> >>>>>>>
> >>>>>>> First of all - I want to note that I am not born enemy of Ant
> >>> itself.
> >>>> I
> >>>>>>> never used it. I am also aware of problems with custom builds made
> >>>> with
> >>>>>>> Maven, however I don’t really want to discuss any particular
> >>>>> replacement,
> >>>>>>> yet I want to note that Cassandra JIRA project contains about 116
> >>>> issues
> >>>>>>> related somehow to maven (http://bit.ly/1GRoXl5 <
> >>>> http://bit.ly/1GRoXl5>
> >>>>> <http://bit.ly/1GRoXl5 <http://bit.ly/1GRoXl5>>,
> >>>>>>> project=CASSANDRA, text ~ maven). Depends on the point of view it
> >>>> might
> >>>>> be
> >>>>>>> a lot or a little. By simple statistics it is around 21 issues a
> >>> year
> >>>> or
> >>>>>>> almost 2 issues a month, many of them breaking maintanance/major
> >>>>> releases
> >>>>>>> from user point of view. From other hand it’s not bad considering
> >>> how
> >>>>>>> project is being built.
> >>>>>>>
> >>>>>>> Current structure has a very big disadvantage - ONE source root for
> >>>>>>> multiple artifacts published in maven repositories and copying
> >>> classes
> >>>>> to
> >>>>>>> jar AFTER they are compiled. Obviously ant copy task doesn’t follow
> >>>>> import
> >>>>>>> statements and does not include dependant classes. For example just
> >>> by
> >>>>>>> making test relocations and extraction of clientutil jar on master
> >>>>> branch
> >>>>>>> into separate source root I have found a bug where ListSerializer
> >>>>> depends
> >>>>>>> on org.apache.cassandra.transpor package. More over clientutil
> >>>>>>> (MapSerializer) does depends on org.apache.cassandra.db.marshal
> >>>> package
> >>>>>>> leading to the fact that it can not be used without cassandra-all
> >>>>> present
> >>>>>>> at classpath.
> >>>>>>> Luckily for cassandra CQL as a new interface reduces thrift and
> >>>>> clientutil
> >>>>>>> usage reducing amount of issues reported around these, however this
> >>>> just
> >>>>>>> hides a real problem in previous paragraph. I have found a handy
> >>> tool
> >>>>> and
> >>>>>>> made a graph of circular dependencies in cassandra-all.jar. Graph
> of
> >>>>>>> results can found here: http://grab.by/FRnO <http://grab.by/FRnO>
> <
> >>>>> http://grab.by/FRnO <http://grab.by/FRnO>>. As you
> >>>>>>> can see this graph has multiple levels and solving it is not a
> >>> simple
> >>>>> task.
> >>>>>>> I am afraid a current way of building and packaging cassandra can
> >>>> create
> >>>>>>> huge hiccups when it will come to code rafactorings cause entire
> >>>>> cassandra
> >>>>>>> will become a house of cards.
> >>>>>>> Restructuring project into smaller pieces is also beneficiary for
> >>>>>>> community since solving bugs in smaller units is definitelly
> easier.
> >>>>>>>
> >>>>>>> At the end of this mail I would like to propose moving Cassandra
> >>> build
> >>>>>>> system forward, regardless of tool which will be choosen for it.
> >>>>> Personally
> >>>>>>> I can volunteer in maven related changes to extract
> >>> cassandra-thrift,
> >>>>>>> cassandra-clientutil and cassandra-all to make regular maven build.
> >>> It
> >>>>>>> might be seen as a switch from one big XML into couple smaller. :-)
> >>>> All
> >>>>>>> this depends on Cassandra developers decission to devide source
> >>> roots
> >>>> or
> >>>>>>> not.
> >>>>>>>
> >>>>>>> Kind regards,
> >>>>>>> Łukasz Dywicki
> >>>>>>> —
> >>>>>>> luke@code-house.org
> >>>>>>> Twitter: ldywicki
> >>>>>>> Blog: http://dywicki.pl
> >>>>>>> Code-House - http://code-house.org
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Tyler Hobbs
> >>>>>> DataStax <http://datastax.com/ <http://datastax.com/>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >
> > —
> > Robert Stupp
> > @snazy
> >
>
>

Re: [discuss] Modernization of Cassandra build system

Posted by Łukasz Dywicki <lu...@code-house.org>.
Sorry for not coming back to topic for long time.

You are right that what Cassandra project have currently - does work and keeping package scoping discipline in such big development community as Cassandra is clearly impossible without tool support (if you insist to keep ant please try to separate javac tasks for logical parts in current build to verify that). I clearly pointed out that it doesn’t work in reliable way causing troubles with artifacts uploaded to maven central. As I briefly counted in my ealier mail there was 116 issues related to artifacts published by build process. It is a lot and these changes requires another mainanance releases to fix for example one or another bytecode level dependency causing NoClassDefErrors with invalid artifacts. According to some recordings from DataStax there is a plan to support in Cassandra multiple kinds of store - document, graph so it won’t get easier with the time but rather harder - ask yourself do you really want to mess all these things together?

Starting from 2.x Cassandra supports triggers but writing even a simplest trigger which will drop a log message or publish UDP packet requires entire cassandra and all it’s dependencies to be present during development.
Fact that everything sits in one big ant build.xml is caused by troubles generated by ant itself to support multiple build modules, placeholders and so on, not because it’s handsome to do such. 

Modernization of build and internal dependencies is not something which brings huge benefit in first run cause now your frontend is CQL, however it gives real boost when it comes to community donations, tool development, or even debugging. Sadly keeping current Ant build is silent agreement to keep mess internally and rickety architecture of project. Ant was already legacy tool when Cassandra has been launched. The longer you will stay with it the more troubles you will get with it over time.

Kind regards,
Lukasz


> Wiadomość napisana przez Robert Stupp <sn...@snazy.de> w dniu 2 kwi 2015, o godz. 14:51:
> 
> TL;DR - Benedict is right.
> 
> IMO Maven is a nice, straight-forward tool if you know what you’re doing and start on a _new_ project.
> But Maven easily becomes a pita if you want to do something that’s not supported out-of-the-box.
> I bet that Maven would just not work for C* source tree with all the little nice features that C*’s build.xml offers (just look at the scripted stuff in build.xml).
> 
> Eventually gradle would be an option; I proposed to switch to gradle several months ago. Same story (although gradle is better than Maven ;) ).
> But… you need to know that build.xml is not just used to build the code and artifacts. It is also used in CI, ccm, cstar-perf and a some other custom systems that exist and just work. So - if we would exchange ant with something else, it would force a lot of effort to change several tools and systems. And there must be a guarantee that everything works like it did before.
> 
> Regarding IDEs: i’m using IDEA every day and it works like a charm with C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”. TBH I don’t know NetBeans.
> 
> As Benedict pointed out, the code has improved and still improves a lot - in structure, in inline-doc, in nomenclature and whatever else. As soon as we can get rid of Thrift in the tree, there’s another big opportunity to cleanup more stuff.
> 
> TBH I don’t think that (beside the tools) there would be a need to generate multiple artifacts for C* daemon - you can do ”separation of concerns” (via packages) even with discipline and then measure it.
> IMO The only artifact worth to extract out of C* tree, and useful for a (limited) set of 3rd party code, is something like ”cassandra-jmx-interfaces.jar”
> 
> Robert
> 
>> Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith <be...@datastax.com>:
>> 
>> There are three distinct problems you raise: code structure, documentation,
>> and build system.
>> 
>> The build system, as far as I can tell, is a matter of personal preference.
>> I personally dislike the few interactions I've had with maven, but
>> gratefully my interactions with build system innards have been fairly
>> limited. I mostly just use them. Unless a concrete and significant benefit
>> is delivered by maven, though, it just doesn't seem worth the upheaval to
>> me. If you can make the argument that it actually improves the project in a
>> way that justifies the upheaval, it will certainly be considered, but so
>> far no justification has been made.
>> 
>> The documentation problem is common to many projects, though: out of
>> codebase documentation gets stale very rapidly. When we say to "read the
>> code" we mean "read the code and its inline documentation" - the quality of
>> this documentation has itself generally been substandard, but has been
>> improving significantly over the past year or so, and we are endeavouring
>> to improve with every change. In the meantime, there are videos from a
>> recent bootcamp we've run for both internal and external contributors
>> http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
>> 
>> The code structure would be great to modularise, but the reality is that it
>> is not currently modular. There are no good clear dividing lines for much
>> of the project. The problem with refactoring the entire codebase to create
>> separate projects is that it is a significant undertaking that makes
>> maintenance of the project across versions significantly more costly. This
>> create a net drag on all productivity in the project. Such a major change
>> requires strong consensus, and strong evidence justifying it. So the
>> question is: would this create more new work than it loses? The evidence
>> isn't there that it would. It might, but I personally guess that it would
>> not, judging by the results of our other attempts to drive up contributions
>> to the project. Perhaps we can have a wider dialogue about the endeavour,
>> though, and see if a consensus can in fact be built.
>> 
>> 
>> 
>> On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops <pi...@gmail.com>
>> wrote:
>> 
>>> Hi all,
>>> 
>>> Not a cassandra contributor here, but I'm working on the cassandra sources
>>> too.
>>> 
>>> This big cassandra source root caused me trouble too, firstly it was not
>>> easy to import in an IDE, try to import cassandra sources in netbeans, it's
>>> a headcache.
>>> 
>>> It would be great if we had more small modules/projects in separate POM. It
>>> will be more easier to work on small part of the project, and as a
>>> consequences, I'm sure you will have more external contribution to this
>>> project.
>>> 
>>> I know cassandra devs are used to ant build model, but it's like a thread I
>>> opened about updated and more complete documentation about sstable
>>> structures. I got answer that it was not needed to understand how to use
>>> Cassandra, and the only way to learn about that is to rtfcode. Because
>>> people working on cassandra already know how sstable structure are, it's
>>> not needed to provide up to date documentation.
>>> So it will take me a very long time to read and understand all the
>>> serialization code in cassandra to understand the sttable structure before
>>> I can work on the code. Up to date documentation about internals would have
>>> gave me the knowledge I need to contribute much quicker.
>>> 
>>> Here we have the same problem, we have a complex non modular build system,
>>> and core cassandra dev are used to it, so it's not needed to make something
>>> more flexible, even if it could facilite external contribution.
>>> 
>>> 
>>> 
>>> 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
>>> belliottsmith@datastax.com>:
>>> 
>>>> I think the problem is everyone currently contributing is comfortable
>>> with
>>>> ant, and as much as it is imperfect, it isn't clear maven is going to be
>>>> better. Having the requisite maven functionality linked under the hood
>>>> doesn't seem particularly preferable to the inverse. The status quo has
>>> the
>>>> bonus of zero upheaval for the project and its contributors, though, so
>>> it
>>>> would have to be a very clear win to justify the change in my opinion.
>>>> 
>>>> 
>>>> On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki <lu...@code-house.org>
>>>> wrote:
>>>> 
>>>>> Hey Tyler,
>>>>> Thank you very much for coming back. I already lost faith that I will
>>> get
>>>>> reply. :-) I am fine with code relocations. Moving constants into one
>>>> place
>>>>> where they cause no circular dependencies is cool, I’m all for doing
>>> such
>>>>> thing.
>>>>> 
>>>>> Currently Cassandra uses ant for doing some of maven functionalities
>>>> (such
>>>>> deploying POM.xml into repositories with dependency information), it
>>> uses
>>>>> also maven type of artifact repositories. This can be easily flipped.
>>>> Maven
>>>>> can call ant tasks for these parts which can not be made with existing
>>>>> maven plugins. Here is simplest example:
>>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
>>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you can
>>> see
>>>>> ant task definition embedded in maven pom.xml.
>>>>> 
>>>>> Most of things can be made at this moment via maven plugins:
>>>>> apache-rat-plugin:
>>>>> 
>>> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
>>>> <
>>>>> 
>>> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11>
>>>>> maven-thrift-plugin:
>>>>> 
>>>> 
>>> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
>>>>> <
>>>>> 
>>>> 
>>> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
>>>>>> 
>>>>> antlr4-maven-plugin:
>>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 <
>>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5>
>>> or
>>>>> antlr3-maven-plugin:
>>>>> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
>>> <
>>>>> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
>>>>> maven-gpg-plugin:
>>>>> 
>>>> 
>>> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
>>>>> <
>>>>> 
>>>> 
>>> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
>>>>>> 
>>>>> maven-cobertura-plugin:
>>> http://mojo.codehaus.org/cobertura-maven-plugin/
>>>> <
>>>>> http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days
>>> jacoco
>>>>> with java agent instrumentation perfoms better)
>>>>> .. and so on
>>>>> 
>>>>> I already made some evaluation of impact and it is big. Code has to be
>>>>> separated into different source roots. It’s not easy even for keeping
>>>>> current artifact structure: cassandra-all, cassandra-thrift and
>>>> clientutil
>>>>> (cause of cyclic dependencies). What I can do is prepare of these src
>>>> roots
>>>>> with dependencies which are declared for them and push that to my
>>>> cassandra
>>>>> fork so you will be able to verify that and continue with relocations
>>> if
>>>>> you will like new build. Creating new modules (source roots) with maven
>>>> is
>>>>> simple so you could possibly extract more than these 3 predefined
>>>>> artifacts/package roots.
>>>>> Just let me know if you are interested.
>>>>> 
>>>>> Kind regards,
>>>>> Lukasz
>>>>> 
>>>>> 
>>>>>> Wiadomość napisana przez Tyler Hobbs <ty...@datastax.com> w dniu 31
>>>> mar
>>>>> 2015, o godz. 21:57:
>>>>>> 
>>>>>> Hi Łukasz,
>>>>>> 
>>>>>> I'm not very familiar with the build system, but I'll try to respond.
>>>>>> 
>>>>>> The Serializer dependencies on org.apache.cassandra.transport are
>>>> almost
>>>>>> certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.  These
>>>> are
>>>>>> constants that represent the native protocol version in use, which
>>>>> affects
>>>>>> how certain types are serialized.  These constants could easily be
>>>> moved.
>>>>>> 
>>>>>> The o.a.c.marshal dependency in MapSerializer is on AbstractType, but
>>>>> could
>>>>>> easily be replaced with java.util.Comparator.
>>>>>> 
>>>>>> In any case, I'm not necessarily opposed to improving the build
>>> system
>>>> to
>>>>>> make these errors more apparent.  Would your proposal still allow us
>>> to
>>>>>> build with ant (and just change the way those artifacts are built)?
>>>>>> 
>>>>>> On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki <luke@code-house.org
>>>>> <ma...@code-house.org>> wrote:
>>>>>> 
>>>>>>> Dear cassandra commiters and development process followers,
>>>>>>> I would like to bring an important topic off build process of
>>>>> cassandra. I
>>>>>>> am an external user from community point of view, however I been
>>>> walking
>>>>>>> around various  projects close to cassandra over past year or even
>>>> more.
>>>>>>> What is worrying me a lot is how cassandra is publishing artifacts
>>> and
>>>>> how
>>>>>>> many problems are reported due that.
>>>>>>> 
>>>>>>> First of all - I want to note that I am not born enemy of Ant
>>> itself.
>>>> I
>>>>>>> never used it. I am also aware of problems with custom builds made
>>>> with
>>>>>>> Maven, however I don’t really want to discuss any particular
>>>>> replacement,
>>>>>>> yet I want to note that Cassandra JIRA project contains about 116
>>>> issues
>>>>>>> related somehow to maven (http://bit.ly/1GRoXl5 <
>>>> http://bit.ly/1GRoXl5>
>>>>> <http://bit.ly/1GRoXl5 <http://bit.ly/1GRoXl5>>,
>>>>>>> project=CASSANDRA, text ~ maven). Depends on the point of view it
>>>> might
>>>>> be
>>>>>>> a lot or a little. By simple statistics it is around 21 issues a
>>> year
>>>> or
>>>>>>> almost 2 issues a month, many of them breaking maintanance/major
>>>>> releases
>>>>>>> from user point of view. From other hand it’s not bad considering
>>> how
>>>>>>> project is being built.
>>>>>>> 
>>>>>>> Current structure has a very big disadvantage - ONE source root for
>>>>>>> multiple artifacts published in maven repositories and copying
>>> classes
>>>>> to
>>>>>>> jar AFTER they are compiled. Obviously ant copy task doesn’t follow
>>>>> import
>>>>>>> statements and does not include dependant classes. For example just
>>> by
>>>>>>> making test relocations and extraction of clientutil jar on master
>>>>> branch
>>>>>>> into separate source root I have found a bug where ListSerializer
>>>>> depends
>>>>>>> on org.apache.cassandra.transpor package. More over clientutil
>>>>>>> (MapSerializer) does depends on org.apache.cassandra.db.marshal
>>>> package
>>>>>>> leading to the fact that it can not be used without cassandra-all
>>>>> present
>>>>>>> at classpath.
>>>>>>> Luckily for cassandra CQL as a new interface reduces thrift and
>>>>> clientutil
>>>>>>> usage reducing amount of issues reported around these, however this
>>>> just
>>>>>>> hides a real problem in previous paragraph. I have found a handy
>>> tool
>>>>> and
>>>>>>> made a graph of circular dependencies in cassandra-all.jar. Graph of
>>>>>>> results can found here: http://grab.by/FRnO <http://grab.by/FRnO> <
>>>>> http://grab.by/FRnO <http://grab.by/FRnO>>. As you
>>>>>>> can see this graph has multiple levels and solving it is not a
>>> simple
>>>>> task.
>>>>>>> I am afraid a current way of building and packaging cassandra can
>>>> create
>>>>>>> huge hiccups when it will come to code rafactorings cause entire
>>>>> cassandra
>>>>>>> will become a house of cards.
>>>>>>> Restructuring project into smaller pieces is also beneficiary for
>>>>>>> community since solving bugs in smaller units is definitelly easier.
>>>>>>> 
>>>>>>> At the end of this mail I would like to propose moving Cassandra
>>> build
>>>>>>> system forward, regardless of tool which will be choosen for it.
>>>>> Personally
>>>>>>> I can volunteer in maven related changes to extract
>>> cassandra-thrift,
>>>>>>> cassandra-clientutil and cassandra-all to make regular maven build.
>>> It
>>>>>>> might be seen as a switch from one big XML into couple smaller. :-)
>>>> All
>>>>>>> this depends on Cassandra developers decission to devide source
>>> roots
>>>> or
>>>>>>> not.
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> Łukasz Dywicki
>>>>>>> —
>>>>>>> luke@code-house.org
>>>>>>> Twitter: ldywicki
>>>>>>> Blog: http://dywicki.pl
>>>>>>> Code-House - http://code-house.org
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Tyler Hobbs
>>>>>> DataStax <http://datastax.com/ <http://datastax.com/>>
>>>>> 
>>>>> 
>>>> 
>>> 
> 
> —
> Robert Stupp
> @snazy
> 


Re: [discuss] Modernization of Cassandra build system

Posted by Robert Stupp <sn...@snazy.de>.
TL;DR - Benedict is right.

IMO Maven is a nice, straight-forward tool if you know what you’re doing and start on a _new_ project.
But Maven easily becomes a pita if you want to do something that’s not supported out-of-the-box.
I bet that Maven would just not work for C* source tree with all the little nice features that C*’s build.xml offers (just look at the scripted stuff in build.xml).

Eventually gradle would be an option; I proposed to switch to gradle several months ago. Same story (although gradle is better than Maven ;) ).
But… you need to know that build.xml is not just used to build the code and artifacts. It is also used in CI, ccm, cstar-perf and a some other custom systems that exist and just work. So - if we would exchange ant with something else, it would force a lot of effort to change several tools and systems. And there must be a guarantee that everything works like it did before.

Regarding IDEs: i’m using IDEA every day and it works like a charm with C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”. TBH I don’t know NetBeans.

As Benedict pointed out, the code has improved and still improves a lot - in structure, in inline-doc, in nomenclature and whatever else. As soon as we can get rid of Thrift in the tree, there’s another big opportunity to cleanup more stuff.

TBH I don’t think that (beside the tools) there would be a need to generate multiple artifacts for C* daemon - you can do ”separation of concerns” (via packages) even with discipline and then measure it.
IMO The only artifact worth to extract out of C* tree, and useful for a (limited) set of 3rd party code, is something like ”cassandra-jmx-interfaces.jar”

Robert

> Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith <be...@datastax.com>:
> 
> There are three distinct problems you raise: code structure, documentation,
> and build system.
> 
> The build system, as far as I can tell, is a matter of personal preference.
> I personally dislike the few interactions I've had with maven, but
> gratefully my interactions with build system innards have been fairly
> limited. I mostly just use them. Unless a concrete and significant benefit
> is delivered by maven, though, it just doesn't seem worth the upheaval to
> me. If you can make the argument that it actually improves the project in a
> way that justifies the upheaval, it will certainly be considered, but so
> far no justification has been made.
> 
> The documentation problem is common to many projects, though: out of
> codebase documentation gets stale very rapidly. When we say to "read the
> code" we mean "read the code and its inline documentation" - the quality of
> this documentation has itself generally been substandard, but has been
> improving significantly over the past year or so, and we are endeavouring
> to improve with every change. In the meantime, there are videos from a
> recent bootcamp we've run for both internal and external contributors
> http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
> 
> The code structure would be great to modularise, but the reality is that it
> is not currently modular. There are no good clear dividing lines for much
> of the project. The problem with refactoring the entire codebase to create
> separate projects is that it is a significant undertaking that makes
> maintenance of the project across versions significantly more costly. This
> create a net drag on all productivity in the project. Such a major change
> requires strong consensus, and strong evidence justifying it. So the
> question is: would this create more new work than it loses? The evidence
> isn't there that it would. It might, but I personally guess that it would
> not, judging by the results of our other attempts to drive up contributions
> to the project. Perhaps we can have a wider dialogue about the endeavour,
> though, and see if a consensus can in fact be built.
> 
> 
> 
> On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops <pi...@gmail.com>
> wrote:
> 
>> Hi all,
>> 
>> Not a cassandra contributor here, but I'm working on the cassandra sources
>> too.
>> 
>> This big cassandra source root caused me trouble too, firstly it was not
>> easy to import in an IDE, try to import cassandra sources in netbeans, it's
>> a headcache.
>> 
>> It would be great if we had more small modules/projects in separate POM. It
>> will be more easier to work on small part of the project, and as a
>> consequences, I'm sure you will have more external contribution to this
>> project.
>> 
>> I know cassandra devs are used to ant build model, but it's like a thread I
>> opened about updated and more complete documentation about sstable
>> structures. I got answer that it was not needed to understand how to use
>> Cassandra, and the only way to learn about that is to rtfcode. Because
>> people working on cassandra already know how sstable structure are, it's
>> not needed to provide up to date documentation.
>> So it will take me a very long time to read and understand all the
>> serialization code in cassandra to understand the sttable structure before
>> I can work on the code. Up to date documentation about internals would have
>> gave me the knowledge I need to contribute much quicker.
>> 
>> Here we have the same problem, we have a complex non modular build system,
>> and core cassandra dev are used to it, so it's not needed to make something
>> more flexible, even if it could facilite external contribution.
>> 
>> 
>> 
>> 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
>> belliottsmith@datastax.com>:
>> 
>>> I think the problem is everyone currently contributing is comfortable
>> with
>>> ant, and as much as it is imperfect, it isn't clear maven is going to be
>>> better. Having the requisite maven functionality linked under the hood
>>> doesn't seem particularly preferable to the inverse. The status quo has
>> the
>>> bonus of zero upheaval for the project and its contributors, though, so
>> it
>>> would have to be a very clear win to justify the change in my opinion.
>>> 
>>> 
>>> On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki <lu...@code-house.org>
>>> wrote:
>>> 
>>>> Hey Tyler,
>>>> Thank you very much for coming back. I already lost faith that I will
>> get
>>>> reply. :-) I am fine with code relocations. Moving constants into one
>>> place
>>>> where they cause no circular dependencies is cool, I’m all for doing
>> such
>>>> thing.
>>>> 
>>>> Currently Cassandra uses ant for doing some of maven functionalities
>>> (such
>>>> deploying POM.xml into repositories with dependency information), it
>> uses
>>>> also maven type of artifact repositories. This can be easily flipped.
>>> Maven
>>>> can call ant tasks for these parts which can not be made with existing
>>>> maven plugins. Here is simplest example:
>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
>>>> http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you can
>> see
>>>> ant task definition embedded in maven pom.xml.
>>>> 
>>>> Most of things can be made at this moment via maven plugins:
>>>> apache-rat-plugin:
>>>> 
>> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
>>> <
>>>> 
>> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11>
>>>> maven-thrift-plugin:
>>>> 
>>> 
>> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
>>>> <
>>>> 
>>> 
>> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
>>>>> 
>>>> antlr4-maven-plugin:
>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 <
>>>> http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5>
>> or
>>>> antlr3-maven-plugin:
>>>> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
>> <
>>>> http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
>>>> maven-gpg-plugin:
>>>> 
>>> 
>> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
>>>> <
>>>> 
>>> 
>> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
>>>>> 
>>>> maven-cobertura-plugin:
>> http://mojo.codehaus.org/cobertura-maven-plugin/
>>> <
>>>> http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days
>> jacoco
>>>> with java agent instrumentation perfoms better)
>>>> .. and so on
>>>> 
>>>> I already made some evaluation of impact and it is big. Code has to be
>>>> separated into different source roots. It’s not easy even for keeping
>>>> current artifact structure: cassandra-all, cassandra-thrift and
>>> clientutil
>>>> (cause of cyclic dependencies). What I can do is prepare of these src
>>> roots
>>>> with dependencies which are declared for them and push that to my
>>> cassandra
>>>> fork so you will be able to verify that and continue with relocations
>> if
>>>> you will like new build. Creating new modules (source roots) with maven
>>> is
>>>> simple so you could possibly extract more than these 3 predefined
>>>> artifacts/package roots.
>>>> Just let me know if you are interested.
>>>> 
>>>> Kind regards,
>>>> Lukasz
>>>> 
>>>> 
>>>>> Wiadomość napisana przez Tyler Hobbs <ty...@datastax.com> w dniu 31
>>> mar
>>>> 2015, o godz. 21:57:
>>>>> 
>>>>> Hi Łukasz,
>>>>> 
>>>>> I'm not very familiar with the build system, but I'll try to respond.
>>>>> 
>>>>> The Serializer dependencies on org.apache.cassandra.transport are
>>> almost
>>>>> certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.  These
>>> are
>>>>> constants that represent the native protocol version in use, which
>>>> affects
>>>>> how certain types are serialized.  These constants could easily be
>>> moved.
>>>>> 
>>>>> The o.a.c.marshal dependency in MapSerializer is on AbstractType, but
>>>> could
>>>>> easily be replaced with java.util.Comparator.
>>>>> 
>>>>> In any case, I'm not necessarily opposed to improving the build
>> system
>>> to
>>>>> make these errors more apparent.  Would your proposal still allow us
>> to
>>>>> build with ant (and just change the way those artifacts are built)?
>>>>> 
>>>>> On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki <luke@code-house.org
>>>> <ma...@code-house.org>> wrote:
>>>>> 
>>>>>> Dear cassandra commiters and development process followers,
>>>>>> I would like to bring an important topic off build process of
>>>> cassandra. I
>>>>>> am an external user from community point of view, however I been
>>> walking
>>>>>> around various  projects close to cassandra over past year or even
>>> more.
>>>>>> What is worrying me a lot is how cassandra is publishing artifacts
>> and
>>>> how
>>>>>> many problems are reported due that.
>>>>>> 
>>>>>> First of all - I want to note that I am not born enemy of Ant
>> itself.
>>> I
>>>>>> never used it. I am also aware of problems with custom builds made
>>> with
>>>>>> Maven, however I don’t really want to discuss any particular
>>>> replacement,
>>>>>> yet I want to note that Cassandra JIRA project contains about 116
>>> issues
>>>>>> related somehow to maven (http://bit.ly/1GRoXl5 <
>>> http://bit.ly/1GRoXl5>
>>>> <http://bit.ly/1GRoXl5 <http://bit.ly/1GRoXl5>>,
>>>>>> project=CASSANDRA, text ~ maven). Depends on the point of view it
>>> might
>>>> be
>>>>>> a lot or a little. By simple statistics it is around 21 issues a
>> year
>>> or
>>>>>> almost 2 issues a month, many of them breaking maintanance/major
>>>> releases
>>>>>> from user point of view. From other hand it’s not bad considering
>> how
>>>>>> project is being built.
>>>>>> 
>>>>>> Current structure has a very big disadvantage - ONE source root for
>>>>>> multiple artifacts published in maven repositories and copying
>> classes
>>>> to
>>>>>> jar AFTER they are compiled. Obviously ant copy task doesn’t follow
>>>> import
>>>>>> statements and does not include dependant classes. For example just
>> by
>>>>>> making test relocations and extraction of clientutil jar on master
>>>> branch
>>>>>> into separate source root I have found a bug where ListSerializer
>>>> depends
>>>>>> on org.apache.cassandra.transpor package. More over clientutil
>>>>>> (MapSerializer) does depends on org.apache.cassandra.db.marshal
>>> package
>>>>>> leading to the fact that it can not be used without cassandra-all
>>>> present
>>>>>> at classpath.
>>>>>> Luckily for cassandra CQL as a new interface reduces thrift and
>>>> clientutil
>>>>>> usage reducing amount of issues reported around these, however this
>>> just
>>>>>> hides a real problem in previous paragraph. I have found a handy
>> tool
>>>> and
>>>>>> made a graph of circular dependencies in cassandra-all.jar. Graph of
>>>>>> results can found here: http://grab.by/FRnO <http://grab.by/FRnO> <
>>>> http://grab.by/FRnO <http://grab.by/FRnO>>. As you
>>>>>> can see this graph has multiple levels and solving it is not a
>> simple
>>>> task.
>>>>>> I am afraid a current way of building and packaging cassandra can
>>> create
>>>>>> huge hiccups when it will come to code rafactorings cause entire
>>>> cassandra
>>>>>> will become a house of cards.
>>>>>> Restructuring project into smaller pieces is also beneficiary for
>>>>>> community since solving bugs in smaller units is definitelly easier.
>>>>>> 
>>>>>> At the end of this mail I would like to propose moving Cassandra
>> build
>>>>>> system forward, regardless of tool which will be choosen for it.
>>>> Personally
>>>>>> I can volunteer in maven related changes to extract
>> cassandra-thrift,
>>>>>> cassandra-clientutil and cassandra-all to make regular maven build.
>> It
>>>>>> might be seen as a switch from one big XML into couple smaller. :-)
>>> All
>>>>>> this depends on Cassandra developers decission to devide source
>> roots
>>> or
>>>>>> not.
>>>>>> 
>>>>>> Kind regards,
>>>>>> Łukasz Dywicki
>>>>>> —
>>>>>> luke@code-house.org
>>>>>> Twitter: ldywicki
>>>>>> Blog: http://dywicki.pl
>>>>>> Code-House - http://code-house.org
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Tyler Hobbs
>>>>> DataStax <http://datastax.com/ <http://datastax.com/>>
>>>> 
>>>> 
>>> 
>> 

—
Robert Stupp
@snazy


Re: [discuss] Modernization of Cassandra build system

Posted by Benedict Elliott Smith <be...@datastax.com>.
There are three distinct problems you raise: code structure, documentation,
and build system.

The build system, as far as I can tell, is a matter of personal preference.
I personally dislike the few interactions I've had with maven, but
gratefully my interactions with build system innards have been fairly
limited. I mostly just use them. Unless a concrete and significant benefit
is delivered by maven, though, it just doesn't seem worth the upheaval to
me. If you can make the argument that it actually improves the project in a
way that justifies the upheaval, it will certainly be considered, but so
far no justification has been made.

The documentation problem is common to many projects, though: out of
codebase documentation gets stale very rapidly. When we say to "read the
code" we mean "read the code and its inline documentation" - the quality of
this documentation has itself generally been substandard, but has been
improving significantly over the past year or so, and we are endeavouring
to improve with every change. In the meantime, there are videos from a
recent bootcamp we've run for both internal and external contributors
http://www.datastax.com/dev/blog/deep-into-cassandra-internals.

The code structure would be great to modularise, but the reality is that it
is not currently modular. There are no good clear dividing lines for much
of the project. The problem with refactoring the entire codebase to create
separate projects is that it is a significant undertaking that makes
maintenance of the project across versions significantly more costly. This
create a net drag on all productivity in the project. Such a major change
requires strong consensus, and strong evidence justifying it. So the
question is: would this create more new work than it loses? The evidence
isn't there that it would. It might, but I personally guess that it would
not, judging by the results of our other attempts to drive up contributions
to the project. Perhaps we can have a wider dialogue about the endeavour,
though, and see if a consensus can in fact be built.



On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops <pi...@gmail.com>
wrote:

> Hi all,
>
> Not a cassandra contributor here, but I'm working on the cassandra sources
> too.
>
> This big cassandra source root caused me trouble too, firstly it was not
> easy to import in an IDE, try to import cassandra sources in netbeans, it's
> a headcache.
>
> It would be great if we had more small modules/projects in separate POM. It
> will be more easier to work on small part of the project, and as a
> consequences, I'm sure you will have more external contribution to this
> project.
>
> I know cassandra devs are used to ant build model, but it's like a thread I
> opened about updated and more complete documentation about sstable
> structures. I got answer that it was not needed to understand how to use
> Cassandra, and the only way to learn about that is to rtfcode. Because
> people working on cassandra already know how sstable structure are, it's
> not needed to provide up to date documentation.
> So it will take me a very long time to read and understand all the
> serialization code in cassandra to understand the sttable structure before
> I can work on the code. Up to date documentation about internals would have
> gave me the knowledge I need to contribute much quicker.
>
> Here we have the same problem, we have a complex non modular build system,
> and core cassandra dev are used to it, so it's not needed to make something
> more flexible, even if it could facilite external contribution.
>
>
>
> 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith <
> belliottsmith@datastax.com>:
>
> > I think the problem is everyone currently contributing is comfortable
> with
> > ant, and as much as it is imperfect, it isn't clear maven is going to be
> > better. Having the requisite maven functionality linked under the hood
> > doesn't seem particularly preferable to the inverse. The status quo has
> the
> > bonus of zero upheaval for the project and its contributors, though, so
> it
> > would have to be a very clear win to justify the change in my opinion.
> >
> >
> > On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki <lu...@code-house.org>
> > wrote:
> >
> > > Hey Tyler,
> > > Thank you very much for coming back. I already lost faith that I will
> get
> > > reply. :-) I am fine with code relocations. Moving constants into one
> > place
> > > where they cause no circular dependencies is cool, I’m all for doing
> such
> > > thing.
> > >
> > > Currently Cassandra uses ant for doing some of maven functionalities
> > (such
> > > deploying POM.xml into repositories with dependency information), it
> uses
> > > also maven type of artifact repositories. This can be easily flipped.
> > Maven
> > > can call ant tasks for these parts which can not be made with existing
> > > maven plugins. Here is simplest example:
> > > http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin <
> > > http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin> - you can
> see
> > > ant task definition embedded in maven pom.xml.
> > >
> > > Most of things can be made at this moment via maven plugins:
> > > apache-rat-plugin:
> > >
> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
> > <
> > >
> http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11>
> > > maven-thrift-plugin:
> > >
> >
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> > > <
> > >
> >
> http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
> > > >
> > > antlr4-maven-plugin:
> > > http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 <
> > > http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5>
> or
> > > antlr3-maven-plugin:
> > > http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
> <
> > > http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2>
> > > maven-gpg-plugin:
> > >
> >
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> > > <
> > >
> >
> http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
> > > >
> > > maven-cobertura-plugin:
> http://mojo.codehaus.org/cobertura-maven-plugin/
> > <
> > > http://mojo.codehaus.org/cobertura-maven-plugin/> (but these days
> jacoco
> > > with java agent instrumentation perfoms better)
> > > .. and so on
> > >
> > > I already made some evaluation of impact and it is big. Code has to be
> > > separated into different source roots. It’s not easy even for keeping
> > > current artifact structure: cassandra-all, cassandra-thrift and
> > clientutil
> > > (cause of cyclic dependencies). What I can do is prepare of these src
> > roots
> > > with dependencies which are declared for them and push that to my
> > cassandra
> > > fork so you will be able to verify that and continue with relocations
> if
> > > you will like new build. Creating new modules (source roots) with maven
> > is
> > > simple so you could possibly extract more than these 3 predefined
> > > artifacts/package roots.
> > > Just let me know if you are interested.
> > >
> > > Kind regards,
> > > Lukasz
> > >
> > >
> > > > Wiadomość napisana przez Tyler Hobbs <ty...@datastax.com> w dniu 31
> > mar
> > > 2015, o godz. 21:57:
> > > >
> > > > Hi Łukasz,
> > > >
> > > > I'm not very familiar with the build system, but I'll try to respond.
> > > >
> > > > The Serializer dependencies on org.apache.cassandra.transport are
> > almost
> > > > certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.  These
> > are
> > > > constants that represent the native protocol version in use, which
> > > affects
> > > > how certain types are serialized.  These constants could easily be
> > moved.
> > > >
> > > > The o.a.c.marshal dependency in MapSerializer is on AbstractType, but
> > > could
> > > > easily be replaced with java.util.Comparator.
> > > >
> > > > In any case, I'm not necessarily opposed to improving the build
> system
> > to
> > > > make these errors more apparent.  Would your proposal still allow us
> to
> > > > build with ant (and just change the way those artifacts are built)?
> > > >
> > > > On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki <luke@code-house.org
> > > <ma...@code-house.org>> wrote:
> > > >
> > > >> Dear cassandra commiters and development process followers,
> > > >> I would like to bring an important topic off build process of
> > > cassandra. I
> > > >> am an external user from community point of view, however I been
> > walking
> > > >> around various  projects close to cassandra over past year or even
> > more.
> > > >> What is worrying me a lot is how cassandra is publishing artifacts
> and
> > > how
> > > >> many problems are reported due that.
> > > >>
> > > >> First of all - I want to note that I am not born enemy of Ant
> itself.
> > I
> > > >> never used it. I am also aware of problems with custom builds made
> > with
> > > >> Maven, however I don’t really want to discuss any particular
> > > replacement,
> > > >> yet I want to note that Cassandra JIRA project contains about 116
> > issues
> > > >> related somehow to maven (http://bit.ly/1GRoXl5 <
> > http://bit.ly/1GRoXl5>
> > > <http://bit.ly/1GRoXl5 <http://bit.ly/1GRoXl5>>,
> > > >> project=CASSANDRA, text ~ maven). Depends on the point of view it
> > might
> > > be
> > > >> a lot or a little. By simple statistics it is around 21 issues a
> year
> > or
> > > >> almost 2 issues a month, many of them breaking maintanance/major
> > > releases
> > > >> from user point of view. From other hand it’s not bad considering
> how
> > > >> project is being built.
> > > >>
> > > >> Current structure has a very big disadvantage - ONE source root for
> > > >> multiple artifacts published in maven repositories and copying
> classes
> > > to
> > > >> jar AFTER they are compiled. Obviously ant copy task doesn’t follow
> > > import
> > > >> statements and does not include dependant classes. For example just
> by
> > > >> making test relocations and extraction of clientutil jar on master
> > > branch
> > > >> into separate source root I have found a bug where ListSerializer
> > > depends
> > > >> on org.apache.cassandra.transpor package. More over clientutil
> > > >> (MapSerializer) does depends on org.apache.cassandra.db.marshal
> > package
> > > >> leading to the fact that it can not be used without cassandra-all
> > > present
> > > >> at classpath.
> > > >> Luckily for cassandra CQL as a new interface reduces thrift and
> > > clientutil
> > > >> usage reducing amount of issues reported around these, however this
> > just
> > > >> hides a real problem in previous paragraph. I have found a handy
> tool
> > > and
> > > >> made a graph of circular dependencies in cassandra-all.jar. Graph of
> > > >> results can found here: http://grab.by/FRnO <http://grab.by/FRnO> <
> > > http://grab.by/FRnO <http://grab.by/FRnO>>. As you
> > > >> can see this graph has multiple levels and solving it is not a
> simple
> > > task.
> > > >> I am afraid a current way of building and packaging cassandra can
> > create
> > > >> huge hiccups when it will come to code rafactorings cause entire
> > > cassandra
> > > >> will become a house of cards.
> > > >> Restructuring project into smaller pieces is also beneficiary for
> > > >> community since solving bugs in smaller units is definitelly easier.
> > > >>
> > > >> At the end of this mail I would like to propose moving Cassandra
> build
> > > >> system forward, regardless of tool which will be choosen for it.
> > > Personally
> > > >> I can volunteer in maven related changes to extract
> cassandra-thrift,
> > > >> cassandra-clientutil and cassandra-all to make regular maven build.
> It
> > > >> might be seen as a switch from one big XML into couple smaller. :-)
> > All
> > > >> this depends on Cassandra developers decission to devide source
> roots
> > or
> > > >> not.
> > > >>
> > > >> Kind regards,
> > > >> Łukasz Dywicki
> > > >> —
> > > >> luke@code-house.org
> > > >> Twitter: ldywicki
> > > >> Blog: http://dywicki.pl
> > > >> Code-House - http://code-house.org
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > Tyler Hobbs
> > > > DataStax <http://datastax.com/ <http://datastax.com/>>
> > >
> > >
> >
>