You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Robert Scholte <rf...@apache.org> on 2017/01/16 09:37:08 UTC

Advice + proposals regarding automodule naming

This is a message from Robert Scholte and Brian Fox. We both have been  
talking about this topic several weeks with other Maven developers and  
came to the conclusion that we should warn the jigsaw team with their  
current approach regarding auto modules. We will share our experiences,  
thoughts, conclusions and will suggest two proposals.

Traditionally, the Java ecosystem has been very mature in terms of naming  
and namespacing. The reverse fqdn introduced into the java package was a  
great choice to ensure classes don’t conflict. Popular build tools such as  
Maven and nearly all those that followed built upon that this key concept  
with the introduction of “GroupId” also using the fqdn as part of the name  
to ensure the coordinates were properly namespaced.

We’ve seen some ecosystems diverge from this leading to new challenges  
that ultimately had to be reversed. A great example can be seen in the “  
tragic mistake from npm creators ” [1] which was to launch without a  
namespace concept. Eventually, NPM started running out of useful names and  
had to backtrack to introduce “scopes” which is really just a namespace  
[2]. The real problem here is that the major change in namespace was  
backed in after several years of momentum without it. It’s taken a long  
time for tooling and best practice to catch up to scopes and in the  
interim, people have been left with a dual mode, some namespaced, some not  
namespaced situation that has created chaos. [3]

The real issue at hand here as we consider behaviors in the jigsaw  
automodule revolves around two well studied concepts.

The most important is the “Default effect” [3] which states that whatever  
the default behavior is will become the most prominent best practice. A  
default that uses a filename to generate a very short, un-namespaced  
module id effectively sets the behavior to create generic names that will  
eventually conflict...exactly what we’ve seen in npm.

Additionally, The switching costs introduced in overcoming a default  
un-namespaced module id to one with a unique namespace is also significant  
once you consider all the potential users. This is why API change is hard,  
and changing the module id after the fact from the default is effectively  
an API change.

The second principal at hand is the “Principle of least astonishment”. We  
want to find a default that doesn’t violate what most users would consider  
to be the most obvious. One could argue the current auto module algorithm  
doesn’t violate this principle, but it’s important to consider alternate  
suggestions in this light.

First, lets explore the potential downsides if the default effect takes  
hold with the currently generated auto module id. In Apache Maven, the  
artifact id is the part of the coordinate that generates the filename.  
This means that com.somecompany:artifact:version will become  
artifact-version.jar, which would result in automodule id “artifact”.  
Armed with this understanding, that does an analysis of the Maven  
ecosystem have to say about potential conflicts in the automodule id?

If we ignore the groupid and version of all the components in the Maven  
Central repository, we end up with over 13,500 (7% of the total  
group:artifact combinations) conflicts. This does not consider conflicts  
across other repositories, or within customer portfolios yet it is pretty  
telling. Conflicts will happen. In some cases, the number of conflicts on  
the same common names is well above 100. The list of conflicts as of  
October, 2016 can be seen here. [6]

At this point, hopefully we’ve made the case for at least establishing a  
default module id that
1. Uses namespaces to minimizes id conflicts when possible
2. Leverages the default effect to create a de facto best practice
3. Follows the principle of least astonishment

We have two potential proposals that solve these goals.

Proposal 1: Leverage existing coordinates when available.

Maven is inarguably the most popular build system for Java components,  
with Maven Central being the default and largest repository of Java  
components in the world. By default, every jar built by Maven  
automatically gets a simple properties file inserted into it with its  
unique coordinates. Now, not every jar in Central was built with Maven,  
however 94% of them were, as we can find the pom.properties file in  
1,806,023 of the 1,913,561 central components . Talk about the default  
effect in action!

It’s further important to recognize that given a jar with a pom.properties  
declaring coordinates, it means that the project itself has chosen those  
coordinates as their own name. In other words, this is how they refer to  
themselves, even if other consumers may not be using Maven directly.

If automodule were able to peek inside a jar and generate the default id  
using the groupid and artifactid present in the file, this would nearly  
eliminate all instances of id conflict because a significant portion of  
the Java ecosystem is in fact built with Maven. Additionally, the fact  
that 1.8 million (and counting) modules would have namespace as the  
default behavior means we’ve taken a huge step in setting the best  
practice of picking module ids with a namepace. Additionally, since the  
project itself has chosen these coordinates and uses them as their primary  
distribution mechanism, this follows the principle of least astonishment  
to consumers regardless of their chosen build system. Finally, since all  
of the above are true, it’s unlikely the project would need to migrate to  
a new module id when they adopt jigsaw natively, thus avoiding an API  
switching cost for their users.

Proposal 2: Drop automodules
Right now Jigsaw tries to calculate a module name solely based on the name  
of the jar file, which now already causes issues. Besides the fact that  
the module name is not guaranteed unique compared with its Maven  
coordinate, there are extra transformations which makes it even less  
guaranteed that it is unique; e.g. dashes are replaced by dots (which are  
both valid artifactId characters), in some cases the number and their  
following characters are stripped off. For artifacts like  
jboss-servlet-api_4.0_spec it makes sense, however we already see issues  
here where commons-lang, commons-lang2 and commons-lang3 get the same  
module name,
even though they have different artifactIds and contain different  
packages. Choosing different artifactIds and packages was a very wise  
decision because it made it possible that these jars could live next to  
each other. Removing that separation by the authors is a very unwise  
decision.

Another known example is the jsrNNN jars, which now all get jsr as the  
module name.

Is it highly unlikely there is one single rule to capture all the use  
cases and which always result in a module name we can work with.

For that reason the other proposal is to simply drop automodules. Don’t  
try to come up with a name for unnamed jars. It might look like the  
feature of automodules makes migrating easier because every dependency  
will get a name so can complete your module-info for all requirements, but  
we expect that once Jigsaw comes to speed the invalid module names are  
actually blocking further development due to name collisions or forced  
renaming by transitive modular jars.

The advantage of this proposal is that library builders are not forced to  
keep the proposed module name in order to maintain backwards compatibility  
with the default.. Instead library builders can pick a more suitable  
module name. The modular system doesn’t allow the same package to be  
exported by multiple jars (and automodules exports every package). Library  
builders can fix this is their new jars, however if end users would  
require both jars because they were specified as requirements in different  
transitive jars, you cannot compile this project. There’s just no  
dependency-excludes like Maven has, because “requires” in the module-info  
really means requires. Dropping automodules will prevent these kind of  
issues, because a package can only be exported by a named module.

Sure, this means that for end users they cannot refer to every jar in  
their module-info. But at least if they add a “requires” to their  
module-info, they can ensure that it’ll always refer to the intended  
modular jar. With build tools like Maven the chance of missing artifacts  
on the classpath has already been reduced a lot. In general builds have  
become quite stable, so we don’t expect that developers will translate all  
dependencies to the module-info file, especially if we warn them about the  
possible consequences of depending on automodules. Only referring to named  
modules and even a single “requires” is already a gain. There’s no reason  
to try to speed this up and give the developer the false impression that  
it’ll keep working when upgrading to real modular jars. Focus should be on  
the target, not on the path how to reach it.

Dropping the automodules will prevent a lot of discussions about what is  
the correct way to select a module name and will give the responsibility  
for the name back to the place where it belongs: the developer.

[1]  
http://stackoverflow.com/questions/22053381/lack-of-available-module-names-on-npm
[2]  
http://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
[3] The fact that so much of the npm ecosystem is effectively  
not-namespaced is has actually
created potential build time malware injection possibilities. If I know of  
a package in use by a
company through log analysis, bug report analysis etc, I could potentially  
go register the same
name in the default repo with a very high semver and know that it’s very  
likely this would be
picked up over the intended internally developed module because there’s no  
namespace.
[4] https://en.wikipedia.org/wiki/Default_effect_(psychology)
[5] https://en.wikipedia.org/wiki/Principle_of_least_astonishment
[6]  
https://docs.google.com/spreadsheets/d/1TVR5uTpDYw0827AlvPRu8l95zHnFPL_g61TdPtnj
Q5M/edit?usp=sharing
[7] http://openjdk.java.net/jeps/261 #Risk and assumptions
[8] https://www.mail-archive.com/jigsaw-dev@openjdk.java.net/msg06623.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Advice + proposals regarding automodule naming

Posted by Robert Scholte <rf...@apache.org>.
Hi,

a small update:
The discussion has started. For those interested, please subscribe to  
jpms-spec-observers[1] or view online the threads[2]
However, if you want to respond to these topics, you must subscribe to  
jpms-spec-comments[3] or inform me (and Brian). We will gather all  
evidence and keep our proposals up to date.

thanks,
Robert

[1] http://mail.openjdk.java.net/mailman/listinfo/jpms-spec-observers
[2] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/
[3] http://mail.openjdk.java.net/mailman/listinfo/jpms-spec-comments

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org