You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by David Leangen <ap...@leangen.net> on 2020/05/15 05:31:26 UTC

public vs. private packages

Hi,

I am a big fan and long-time user of OSGi. I got a little excited when I noticed that these projects are compiled as bundles. However, my excitement quickly waned when I noticed that there does not really seem to be any good separation between API and implementation. Is there a particular reason for this?

If not, perhaps it would be nice to separate the public packages from the private packages. This is a good practice that OSGi helps with a lot. I don’t mind doing this because the project does not seem to be too large. If the code is not too tangled, it should be easy to do. I have not yet looked at the code, but I wanted to ask opinions here first as to whether or not it is worthwhile to think about this.

Does anybody object to moving around the implementation classes a bit to achieve separation of public / private packages?


Cheers,
=David

Re: public vs. private packages

Posted by David Leangen <ap...@leangen.net>.
> I'm not against keeping them together or separate. I am lazy enough to
> not change things that work and for which there is no incentive to do
> the work.

Yeah… I kinda agree.

I’m really beginning to question myself about why I am even spending time on this, and taking up others’ valuable time as well. :-)


Thanks a lot for the comments. I think I’ll put this on ice for now and see what happens in a few weeks.


Cheers,
=David


Re: public vs. private packages

Posted by Eugen Stan <st...@gmail.com>.
La 18.05.2020 03:02, David Leangen a scris:
>> I think it is a good initiative. since mie4j is a library, and a lower
>> level one, we can announce the change intention on the mailing list.
>
> Ok, thanks.
>
> Since we’re on the topic…
>
> What is the purpose of having multiple jars? Why is the project organized the way it is?
>
> I am saying this without knowing much about Mime4j… but IMO a library jar should be really simple to integrate. It would be simpler to have one and only one release jar, which includes the API, implementation, and all the dependencies bundled together. By the way, ideally there should not be any transitive dependencies. Those are definitely the best libraries!
>
> In a nutshell, as a library we should be thinking more about the users, and less about the developers, as there are likely many more users than developers. The users just want to integrate the library with as few headaches as possible, so reducing dependencies is critical, i.e. only a single released jar with zero transitive dependencies should be the goal.

Hi,

Very good points. I will try to answer as best I can. Please use this in
the documentation we are building.

From Maven we get:

[INFO] Reactor Build Order:
[INFO]
[INFO] Apache James :: Mime4j ::
Project                                  [pom]
[INFO] Apache James :: Mime4j :: Core                                 
[bundle]
[INFO] Apache James :: Mime4j :: DOM                                  
[bundle]
[INFO] Apache James :: Mime4j :: Storage                              
[bundle]
[INFO] Apache James :: Mime4j :: Benchmarks                           
[bundle]
[INFO] Apache James :: Mime4j :: Mbox Iterator                        
[bundle]
[INFO] Apache James :: Mime4j :: Code Examples                        
[bundle]
[INFO] Apache James :: Mime4j ::
Assembly                                 [pom]
[INFO] Apache James :: Mime4j :: James utils                          
[bundle]


== Apache James :: Mime4j :: Project

Contains common configuration for the other projects. Groups the other
modules together.

== Apache James :: Mime4j :: Core

Contains the core API for working with email content and email the email
parts.

You can find code and structure to work with: headers, body, fields,
mime format, quoted printed characters and other email specific elements
(dictated by RFC's).

Contains low level utilities and code to deal with parsing the mime
format mostly.

== Apache James :: Mime4j :: DOM

Contains higher level structure. Here we deal with email address,
different body types (text, binary, single, multipart), email message.
We also have concepts of: mailbox, groups, domain lists, etc.

This converts a mime message format and builds a document model that
exposes the email elements, instead of the low level data format used to
transfer email.

== Apache James :: Mime4j :: Storage

Provides different providers to store and retrieve emails. It deals only
with one email so it's not a replacement for Apache James Mailbox API.

It deals with storing parts of the email in memory and on disk,
transparent encryption for emails during storage and other related
strategies.

== Apache James :: Mime4j :: Mbox Iterator

I wrote this module because I needed a fast way to iterate over
thousands of emails in a single mbox file. It uses memory mapped files
and it very efficient.

It's an utility when you need to import/export to mbox. Should not be
packaged with the library.


The other projects are self explanatory:

- Benchmarks: simple benchmarks for mime4j parsing speed. Should not be
bundled with the library IMO. Keep separate.

- Code examples: How to use the libraries. Examples should be added to docs.

- Assembly - builds the binary distributions for release: zip and others.


>
> I am not very familiar with Maven. Is it possible to bundle together everything as a single jar even if it is separated into different projects?
>
>   —> If yes: we could keep the same project structure, but only deploy a single jar.
>   —> If no: we could restructure the project. This would only affect developers.

Yes it is possible to bundle a single jar out of the others. I've been
off of Maven for some time myself but if I remember correctly,

Maven assembly plugin could work. 
https://maven.apache.org/plugins/maven-assembly-plugin/index.html 

Maven Shade plugin is another option (better suited I think)
https://maven.apache.org/plugins/maven-shade-plugin/

>
> The question is: do others agree about my assessment?
>
It's debatable. It's very easy for them to build the assembly if they
need to.

We could offer something like this and/or write the documentation on how
they can do it ? I'm not the best to ask what the users of mime4j need.
And if they nobody complained about it I guess it's not that big of a
deal in practice.

IMO: Provide docs, give the users options, be as lazy as possible :).

> With regards to the currently released jars and the project organization:
>
>  * core - ideally deployed as mime4j (drop the “core” as there would be only a single jar)
>  * dom - I understand that there is a cohesion thing going on as this is all related to “dom”…
>     —> But is this even useful on its own? Why is it separate from the core?
>  * utils - really? There is only a single class!!
>  * examples - why deploy this as a jar?? People who really want a jar could just build it.
>  * mbox-iterator - again, really? There are only 3 classes? Why create so much pain?
>  * storage - There are a few more classes here, but the real question is: what does this do?
>      —> Depending on the purpose, it may or may not be preferable to deploy separately
>  * benchmark - what is this for? Perhaps it could be deployed, but does anybody even use this?
>
> As part of my documentation project, if I can understand these better, I could help to create some basic documentation. Since the project is tiny enough, I could help reorganize in a push for a 1.0.0 release.
>
> Please let me know what you think.

I'm not against keeping them together or separate. I am lazy enough to
not change things that work and for which there is no incentive to do
the work.

Bundling them together means that people will get some extra classes,
even if they don't use them. The project is small so that might not be a
big issue.


If we go the change route, here are my thoughts:

Storage can be merged with dom, brings in commons-io - we might ditch or
shade commons-io to avoid transitive dependencies.

Mbox can be merged in core or storage perhaps?

Utils !??

Examples, benchmarks should stay separate.


If we decide for unification, we might end up with:

- mime4j: core + dom + storage + utils + mbox

- examples

- benchmarks

- assembly


I hope this helps.

> Cheers,
> =David
>
>


Re: public vs. private packages

Posted by David Leangen <ap...@leangen.net>.
> I think it is a good initiative. since mie4j is a library, and a lower
> level one, we can announce the change intention on the mailing list.


Ok, thanks.

Since we’re on the topic…

What is the purpose of having multiple jars? Why is the project organized the way it is?

I am saying this without knowing much about Mime4j… but IMO a library jar should be really simple to integrate. It would be simpler to have one and only one release jar, which includes the API, implementation, and all the dependencies bundled together. By the way, ideally there should not be any transitive dependencies. Those are definitely the best libraries!

In a nutshell, as a library we should be thinking more about the users, and less about the developers, as there are likely many more users than developers. The users just want to integrate the library with as few headaches as possible, so reducing dependencies is critical, i.e. only a single released jar with zero transitive dependencies should be the goal.

I am not very familiar with Maven. Is it possible to bundle together everything as a single jar even if it is separated into different projects?

  —> If yes: we could keep the same project structure, but only deploy a single jar.
  —> If no: we could restructure the project. This would only affect developers.

The question is: do others agree about my assessment?


With regards to the currently released jars and the project organization:

 * core - ideally deployed as mime4j (drop the “core” as there would be only a single jar)
 * dom - I understand that there is a cohesion thing going on as this is all related to “dom”…
    —> But is this even useful on its own? Why is it separate from the core?
 * utils - really? There is only a single class!!
 * examples - why deploy this as a jar?? People who really want a jar could just build it.
 * mbox-iterator - again, really? There are only 3 classes? Why create so much pain?
 * storage - There are a few more classes here, but the real question is: what does this do?
     —> Depending on the purpose, it may or may not be preferable to deploy separately
 * benchmark - what is this for? Perhaps it could be deployed, but does anybody even use this?

As part of my documentation project, if I can understand these better, I could help to create some basic documentation. Since the project is tiny enough, I could help reorganize in a push for a 1.0.0 release.

Please let me know what you think.


Cheers,
=David



Re: public vs. private packages

Posted by Eugen Stan <st...@gmail.com>.
Thanks,

I think it is a good initiative. since mie4j is a library, and a lower
level one, we can announce the change intention on the mailing list.

After that, when changes are ready to release, we can issue a series of
1-2 Release Candidates and give people a chance to test them out and
provide some feedback.

If no feedback comes in a defined period of time: 1-2 weeks we can make
the release.

Regards,

Eugen

La 16.05.2020 09:21, David Leangen a scris:
> Hi,
>
> I made a PR as promised:
>
>   —> https://github.com/apache/james-mime4j/pull/31
>
> As I commented in the PR:
>
>> Based on my simplistic analysis, is was not very difficult to separate what appears ought to be "public" from what ought to be "private".
>>
>> The main advantage of separating public packages (API) from private packages (implementation) is that it decreases the surface area of the API. It makes versioning much simpler, provided that the API was well designed. It also helps keep users out of trouble, as what they should be using and what they shouldn't be doing is more obvious.
>>
>> Used together with semantic versioning makes it much easier to provide releases and a nice upgrade path for users.
>>
>> I cannot comment on the design quality of the API itself because I have not yet used it. It does appear to me, however, that there are many classes that would be best not to expose to users of the library.
>>
>> This would of course be a major, breaking change. However, it is also just a refactoring (change of package and, ideally, ceasing to directly instantiate classes that reside in "private" packages), so it's not really all that bad.
>>
>> My suggestion would be to version this as 0.9.0. Once it is confirmed to be "stable", then it ought to be released as version 1.0.0, according to semantic versioning guidelines.
>>
>> If this idea is acceptable to the community, then I could investigate how to make this work both within and outside of an OSGi framework, and could also investigate the other modules.
>>
>> If this idea is not well received, I can easily drop it. I am only suggesting to try to be helpful, not because I really need this. :-)
> Cheers,
> =David
>
>

Re: public vs. private packages

Posted by David Leangen <ap...@leangen.net>.
Hi,

I made a PR as promised:

  —> https://github.com/apache/james-mime4j/pull/31

As I commented in the PR:

> Based on my simplistic analysis, is was not very difficult to separate what appears ought to be "public" from what ought to be "private".
> 
> The main advantage of separating public packages (API) from private packages (implementation) is that it decreases the surface area of the API. It makes versioning much simpler, provided that the API was well designed. It also helps keep users out of trouble, as what they should be using and what they shouldn't be doing is more obvious.
> 
> Used together with semantic versioning makes it much easier to provide releases and a nice upgrade path for users.
> 
> I cannot comment on the design quality of the API itself because I have not yet used it. It does appear to me, however, that there are many classes that would be best not to expose to users of the library.
> 
> This would of course be a major, breaking change. However, it is also just a refactoring (change of package and, ideally, ceasing to directly instantiate classes that reside in "private" packages), so it's not really all that bad.
> 
> My suggestion would be to version this as 0.9.0. Once it is confirmed to be "stable", then it ought to be released as version 1.0.0, according to semantic versioning guidelines.
> 
> If this idea is acceptable to the community, then I could investigate how to make this work both within and outside of an OSGi framework, and could also investigate the other modules.
> 
> If this idea is not well received, I can easily drop it. I am only suggesting to try to be helpful, not because I really need this. :-)

Cheers,
=David



Re: public vs. private packages

Posted by David Leangen <ap...@leangen.net>.
Hi Eugen,

> Regarding the OSGI correctness of mime4j packages: Have you found any
> issues that block you to use the library as it is?

My personal issue is that I found the library a little difficult to understand. I wanted to document it, but noticed:

 1. That there are several projects/modules (not sure what you want to call them), but I don’t understand what they are or why they are not all in the same project/module
 2. That the difference between the API and the implementation is not clear to me, so it’s not immediately clear what classes I should care about and which ones I can safely ignore

If somebody can answer those questions, that would be great.


> MesssageBuilderFactory/MessageBuilder - decouple DOM from Message:
> https://issues.apache.org/jira/browse/MIME4J-175

This is *exactly* the type of problem that virtually goes away when API is properly separated from implementation.


> If the improvements are substantial and provide benefits to users, they
> may be more inclined to accept a breaking change.

Agree. As I mentioned to Philip, I will first run a little test to see if it is easily doable or not. I probably should have done that first. I may discover that my suggestion is completely irrelevant, but that’s also why I wanted to ask here first in order to get some advice. :-)


> We can make it easier for them if we provide a migration guide as to what packages / classes
> they should rename.

Agree.


> A few years ago I was very eager to update and use newest versions. Now,
> while having to maintain software in production, I appreciate stability
> and boring libraries.

Of course. That is what semantic versioning is supposed to help with. Unfortunately this project is not using semantic versioning, apparently.


Cheers,
=David



Re: public vs. private packages

Posted by Eugen Stan <st...@gmail.com>.
Hi,


Regarding the OSGI correctness of mime4j packages: Have you found any
issues that block you to use the library as it is?

Mime4j has a small API surface. You should be ok, Worst case you could
bundle all the packages together. Not OSGi nice but it works in practice.


There are open issues with MIME4j that may require a version bump. Some
changes that might require API changes are:

MesssageBuilderFactory/MessageBuilder - decouple DOM from Message:
https://issues.apache.org/jira/browse/MIME4J-175

Extension mechanism for parsing messages:
https://issues.apache.org/jira/browse/MIME4J-117

Efficient read access to parts of parsed document:
https://issues.apache.org/jira/browse/MIME4J-114

Java 11 compilation: https://issues.apache.org/jira/browse/MIME4J-290

Java 8 library update:
https://issues.apache.org/jira/projects/MIME4J/issues/MIME4J-287


If the improvements are substantial and provide benefits to users, they
may be more inclined to accept a breaking change. We can make it easier
for them if we provide a migration guide as to what packages / classes
they should rename.

A few years ago I was very eager to update and use newest versions. Now,
while having to maintain software in production, I appreciate stability
and boring libraries.

Of course, I do upgrade periodically and I feel it most of the times.
Especially when dealing with front-end node.js libraries. 


Regards,

Eugen


La 15.05.2020 09:05, David Leangen a scris:
> Hi Philip,
>
> Thanks for the comment. You ask very pertinent questions.
>
>>> This is a good practice that OSGi helps with a lot.
>> Can you be more specific about private packages. Is this entirely an OSGI idea.
> Well, yes and no.
>
> No in the sense that it is a fairly well-known good practice (to the extent that any practice is debatable according to the circumstances). I will write more about this below.
>
> Yes in the sense that OSGi enforces the separation. Public packages are available to any other bundle that imports those packages, while private packages are “hidden” from other bundles, and are only available to the bundle in which they reside. (“Private” means “private to its own bundle.) It just helps keeping larger systems under control better.
>
> Also no in the sense that just because a module follows these practices and is “OSGi-enabled”, it does not at all force anybody to use OSGi if they don’t want to. All you would need to do is (one way or another) instantiate the implementation class directly rather than having the OSGi framework do it for you. However you are doing this now would not have to change.
>
> Note that I am making the big assumption that there is a well-known interface to implement in Mime4j, and that the implementation classes *can* be hidden. If this is a library that is actually intended to be completely exposed, then my comment about separating interface from implementation is not valid. I have only taken a cursory glance, and my initial impression is that moving out the implementation could be useful.
>
>> Can you also be specific about perceived benefits. I assume this would involve lots of bloat in terms of new wrappers and refactoring downstream projects to match the new API with no functional benefit.
> Actually, assuming that the “API” consists mostly of interfaces, there should be no change at all to those interfaces. It is only the implementation classes that would get moved to a different package. As I mention above, if the intent was to directly use the implementation classes, then this is a “library” and my proposal is not valid. However, if people are directly using classes that normally they should not be using, then by making this separation it would make it clearer what is the public API, and what is intended to be private.
>
> In that case, if people are using the public API, there ought to be no change at all. Otherwise, people actually **ought** to stop directly using implementation classes, so they **ought** to refactor their code to use the API properly.
>
> But I am making big statements here without knowing Mime4j well enough. Again, I would have to evaluate the code. You ask great questions, and I realize that maybe I should have investigated more first before shooting out my question here. :-)
>
> What I will do before writing any more is try a sample of the kind of changes I am referring to. It would be much easier to take a look at the actual changes than to have an abstract discussion without referring to code.
>
>
> Thanks for engaging. I’ll get back to you soon either with some code changes, or with the understanding that my proposal is wrong.
>
>
> Cheers,
> =David
>


Re: public vs. private packages

Posted by David Leangen <ap...@leangen.net>.
Hi Philip,

Thanks for the comment. You ask very pertinent questions.

>> This is a good practice that OSGi helps with a lot.

> Can you be more specific about private packages. Is this entirely an OSGI idea.

Well, yes and no.

No in the sense that it is a fairly well-known good practice (to the extent that any practice is debatable according to the circumstances). I will write more about this below.

Yes in the sense that OSGi enforces the separation. Public packages are available to any other bundle that imports those packages, while private packages are “hidden” from other bundles, and are only available to the bundle in which they reside. (“Private” means “private to its own bundle.) It just helps keeping larger systems under control better.

Also no in the sense that just because a module follows these practices and is “OSGi-enabled”, it does not at all force anybody to use OSGi if they don’t want to. All you would need to do is (one way or another) instantiate the implementation class directly rather than having the OSGi framework do it for you. However you are doing this now would not have to change.

Note that I am making the big assumption that there is a well-known interface to implement in Mime4j, and that the implementation classes *can* be hidden. If this is a library that is actually intended to be completely exposed, then my comment about separating interface from implementation is not valid. I have only taken a cursory glance, and my initial impression is that moving out the implementation could be useful.

> Can you also be specific about perceived benefits. I assume this would involve lots of bloat in terms of new wrappers and refactoring downstream projects to match the new API with no functional benefit.

Actually, assuming that the “API” consists mostly of interfaces, there should be no change at all to those interfaces. It is only the implementation classes that would get moved to a different package. As I mention above, if the intent was to directly use the implementation classes, then this is a “library” and my proposal is not valid. However, if people are directly using classes that normally they should not be using, then by making this separation it would make it clearer what is the public API, and what is intended to be private.

In that case, if people are using the public API, there ought to be no change at all. Otherwise, people actually **ought** to stop directly using implementation classes, so they **ought** to refactor their code to use the API properly.

But I am making big statements here without knowing Mime4j well enough. Again, I would have to evaluate the code. You ask great questions, and I realize that maybe I should have investigated more first before shooting out my question here. :-)

What I will do before writing any more is try a sample of the kind of changes I am referring to. It would be much easier to take a look at the actual changes than to have an abstract discussion without referring to code.


Thanks for engaging. I’ll get back to you soon either with some code changes, or with the understanding that my proposal is wrong.


Cheers,
=David


Re: public vs. private packages

Posted by Philip Whitehouse <ph...@whiuk.com>.
Can you be more specific about private packages. Is this entirely an OSGI idea.

Can you also be specific about perceived benefits. I assume this would involve lots of bloat in terms of new wrappers and refactoring downstream projects to match the new API with no functional benefit.

Best,

Philip Whitehouse

> On 15 May 2020, at 06:31, David Leangen <ap...@leangen.net> wrote:
> 
> This is a good practice that OSGi helps with a lot.