You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stefan Bodewig <bo...@apache.org> on 2015/01/22 17:30:40 UTC

[compress] Preparations for 1.10

Hi all

I've had good experience with creating a site build before cutting a
release candidate, so here we go:

        <http://people.apache.org/~bodewig/commons-compress/>

Please have a look and identify stuff that looks as if I'd have to
reroll a new RC should it come to a vote with the current code base.

Thanks

        Stefan

BTW I'll fix the unused import PMD reports; the "unused private method"
warnings are false positives we've seen since upgrading the PMD plugin.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Gary Gregory <ga...@gmail.com>.
Here is an easy one: "Avoid unused imports such as 'java.io.InputStream".

Gary

On Thu, Jan 22, 2015 at 11:30 AM, Stefan Bodewig <bo...@apache.org> wrote:

> Hi all
>
> I've had good experience with creating a site build before cutting a
> release candidate, so here we go:
>
>         <http://people.apache.org/~bodewig/commons-compress/>
>
> Please have a look and identify stuff that looks as if I'd have to
> reroll a new RC should it come to a vote with the current code base.
>
> Thanks
>
>         Stefan
>
> BTW I'll fix the unused import PMD reports; the "unused private method"
> warnings are false positives we've seen since upgrading the PMD plugin.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [compress] Preparations for 1.10

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-25, Emmanuel Bourg wrote:

> Le 25/01/2015 11:38, Benedikt Ritter a écrit :

>> I'm not a compress developer, but IMHO exceptions should be packaged by their API and not by their nature.

> Thank you for moving the exception Stefan, however I tend to agree with
> Benedikt on this, exceptions are better placed in the base package. So
> either at the o.a.c.c.archivers level, or even above if this could be
> used by compressors (I don't know if there are compression formats with
> password protection though).

Honestly I don't know either.  I've moved it to the base package just to
be sure.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Gary Gregory <ga...@gmail.com>.
On Sun, Jan 25, 2015 at 4:37 PM, Emmanuel Bourg <eb...@apache.org> wrote:

> Le 25/01/2015 11:38, Benedikt Ritter a écrit :
>
> > I'm not a compress developer, but IMHO exceptions should be packaged by
> their API and not by their nature.
>
> Thank you for moving the exception Stefan, however I tend to agree with
> Benedikt on this, exceptions are better placed in the base package. So
> either at the o.a.c.c.archivers level, or even above if this could be
> used by compressors (I don't know if there are compression formats with
> password protection though).
>

+1. Put an exception in the parent package that uses it.

Gary

>
> Emmanuel Bourg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [compress] Preparations for 1.10

Posted by Emmanuel Bourg <eb...@apache.org>.
Le 25/01/2015 11:38, Benedikt Ritter a écrit :

> I'm not a compress developer, but IMHO exceptions should be packaged by their API and not by their nature.

Thank you for moving the exception Stefan, however I tend to agree with
Benedikt on this, exceptions are better placed in the base package. So
either at the o.a.c.c.archivers level, or even above if this could be
used by compressors (I don't know if there are compression formats with
password protection though).

Emmanuel Bourg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Benedikt Ritter <be...@gmail.com>.
Hi,

> Am 24.01.2015 um 13:18 schrieb Stefan Bodewig <bo...@apache.org>:
> 
>> On 2015-01-23, Emmanuel Bourg wrote:
>> 
>> Le 22/01/2015 17:30, Stefan Bodewig a écrit :
> 
>>> Please have a look and identify stuff that looks as if I'd have to
>>> reroll a new RC should it come to a vote with the current code base.
> 
>> I reviewed the API changes, here are my comments:
> 
>> - PasswordRequiredException: the exception is in the sevenz package, do
>> we want to move it elsewhere so it can be used later for other formats
>> like zip (if that makes sense).
> 
> Moved to a new exceptions package.

I'm not a compress developer, but IMHO exceptions should be packaged by their API and not by their nature. Is the exception thrown by compressors only? Put it in the compressor package. Can it be thrown by compressors and archivers? Put it in the top level o.a.c.compress package. A package called "exception" is a design smell. You wouldn't create a package called "Interface" and put your interfaces there, would you?

Just my 2 cents :-)

Bene

> 
>> - BitInputStream: why not using a long cache instead of an int, like
>> BitStream before the refactoring? It might be interesting to do some
>> benchmarks and see if it make a difference.
> 
> We never needed more than 31 bits, but you are right, I'll try to
> experiment a little.
> 
> Stefan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-25, sebb wrote:

> On 25 January 2015 at 05:11, Stefan Bodewig <bo...@apache.org> wrote:
>> On 2015-01-24, Kristian Rosenvold wrote:

>>> After an intense few minutes discussing the color of the bike shed
>>> with myself (package name) I moved the zip-unspecific parallel stuff
>>> to org.apache.commons.compress.parallel in r1654572.

>> Maybe add a package-info.java?

>> I'll need to add one to the exceptions pakage as well.

> If the p-i files are only documentation (no annotations) you can
> exclude them from the compiler plugin, and thereby avoid unnecessary
> recompiles of all the code (not just p-i files).

Oh, I would have expected the compiler plugin to know how to handle them
(granted, it took us three or four iterations in Ant's <javac>). :-)

> See Commons NET pom for an example (search for package-info)
> See also https://jira.codehaus.org/browse/MCOMPILER-205.

Thanks, I'll take a look

        Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by sebb <se...@gmail.com>.
On 25 January 2015 at 05:11, Stefan Bodewig <bo...@apache.org> wrote:
> On 2015-01-24, Kristian Rosenvold wrote:
>
>> After an intense few minutes discussing the color of the bike shed
>> with myself (package name) I moved the zip-unspecific parallel stuff
>> to org.apache.commons.compress.parallel in r1654572.
>
> Maybe add a package-info.java?
>
> I'll need to add one to the exceptions pakage as well.

If the p-i files are only documentation (no annotations) you can
exclude them from the compiler plugin, and thereby avoid unnecessary
recompiles of all the code (not just p-i files).
See Commons NET pom for an example (search for package-info)
See also https://jira.codehaus.org/browse/MCOMPILER-205.


> Thanks
>
>         Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-24, Kristian Rosenvold wrote:

> After an intense few minutes discussing the color of the bike shed
> with myself (package name) I moved the zip-unspecific parallel stuff
> to org.apache.commons.compress.parallel in r1654572.

Maybe add a package-info.java?

I'll need to add one to the exceptions pakage as well.

Thanks

        Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-24, Kristian Rosenvold wrote:

> I suppose the "what's new section" on the site also needs to be
> updated to 1.10 ?

Yes, that would be good.

> Anything else I can assist with before the release ?

Thanks.

I intend to spend a bit of time with BitInputStream and cut the RC
tomorrow, at latest.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Kristian Rosenvold <kr...@apache.org>.
After an intense few minutes discussing the color of the bike shed
with myself (package name) I moved the zip-unspecific parallel stuff
to org.apache.commons.compress.parallel in r1654572. This concludes
the changes Emmanuel suggested. I have already responded to a few of
the comments that did not lead to code change; if anyone wants to
discuss these matters further now would be a good time :)

I also further refined some javadocs a bit; I'll be reading through
the full site for the new stuff once more tonight.

I suppose the "what's new section" on the site also needs to be
updated to 1.10 ?

Anything else I can assist with before the release ?

Kristian






2015-01-24 13:18 GMT+01:00 Stefan Bodewig <bo...@apache.org>:
> On 2015-01-23, Emmanuel Bourg wrote:
>
>> Le 22/01/2015 17:30, Stefan Bodewig a écrit :
>
>>> Please have a look and identify stuff that looks as if I'd have to
>>> reroll a new RC should it come to a vote with the current code base.
>
>> I reviewed the API changes, here are my comments:
>
>> - PasswordRequiredException: the exception is in the sevenz package, do
>> we want to move it elsewhere so it can be used later for other formats
>> like zip (if that makes sense).
>
> Moved to a new exceptions package.
>
>> - BitInputStream: why not using a long cache instead of an int, like
>> BitStream before the refactoring? It might be interesting to do some
>> benchmarks and see if it make a difference.
>
> We never needed more than 31 bits, but you are right, I'll try to
> experiment a little.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[compress] BitInputStream (was Preparations for 1.10)

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-24, Stefan Bodewig wrote:

> On 2015-01-23, Emmanuel Bourg wrote:

>> - BitInputStream: why not using a long cache instead of an int, like
>> BitStream before the refactoring? It might be interesting to do some
>> benchmarks and see if it make a difference.

> We never needed more than 31 bits, but you are right, I'll try to
> experiment a little.

I've changed ot to use a long internally but it won't have any effect.

The way it is implemented we never read more bytes than are needed to
fulfill the current readBits request.  This means there will be no
difference in I/O between using a long or an int as cache.  There may be
a small difference between the bit operations but my micro-benchmarks
are not conclusive.  Since CC never uses more than 31 bits, we will
never use the upper four bytes of the cache.

We may want to change the implementation to eagerly try to cache more in
a future release.

Still I switched to long now because this allowed me to change the
return type of readBits to long.  The class is new, so we can still
change the API.  If it said int now, we wouldn't be able to change it
later.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-23, Emmanuel Bourg wrote:

> Le 22/01/2015 17:30, Stefan Bodewig a écrit :

>> Please have a look and identify stuff that looks as if I'd have to
>> reroll a new RC should it come to a vote with the current code base.

> I reviewed the API changes, here are my comments:

> - PasswordRequiredException: the exception is in the sevenz package, do
> we want to move it elsewhere so it can be used later for other formats
> like zip (if that makes sense).

Moved to a new exceptions package.

> - BitInputStream: why not using a long cache instead of an int, like
> BitStream before the refactoring? It might be interesting to do some
> benchmarks and see if it make a difference.

We never needed more than 31 bits, but you are right, I'll try to
experiment a little.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Kristian Rosenvold <kr...@gmail.com>.
2015-01-23 11:52 GMT+01:00 Emmanuel Bourg <eb...@apache.org>:

> - PasswordRequiredException: the exception is in the sevenz package, do
> we want to move it elsewhere so it can be used later for other formats
> like zip (if that makes sense).

Makes sense, where should we put that ?

> - there are some missing @since tags on the new classes
> (ScatterGatherBackingStoreSupplier, ParallelScatterZipCreator,
> InputStreamSupplier, ZipArchiveEntryRequest, ZipArchiveEntryPredicate)

Fixed r1654291

> - InputStreamSupplier could have been replaced with
> Supplier<InputStream> if we were using Java 8. But with the current JDK
> I wonder if we could use Callable<InputStream> instead to spare an
> interface.

Discussed in separate mail; leaving as-is for now.


> - If we had a kind of DeferredInputStream (not yet in commons-io sadly)
> the API could be slightly simpler. ZipArchiveEntryRequest and
> InputStreamSupplier could go away. DeferredInputStream would open an
> underlying stream only when a read() method is called.

I think the overall memory model constraints in a parallel context
favour the "Supplier" approach, since you avoid the entire need for a
thread-safe DeferredInputStream.

> - ParallelScatterZipCreator has two public methods with a
> Callable<Object> in the signature, but it's not clear what this Object
> actually is.

Fixed in r1654291

>
> - Is ParallelScatterZipCreator.getStatisticsMessage() intended to remain
> in the final API or was it just there for benchmarking the
> implementation during the development? If it is meant to stay maybe a
> proper ScatterStatistics class would be cleaner.

Introduced class in r1654291
>
> - ParallelScatterZipCreator.createCallable: the message of the
> IllegalArgumentException could mention the name of the entry involved.

Also fixed.

> - The javadoc of the default ParallelScatterZipCreator constructor could
> explain how the thread pool is sized by default.

Also fixed.

> - Could ScatterGatherBackingStoreSupplier be avoided? The public
> ParallelScatterZipCreator constructor that uses it could accept a
> ScatterGatherBackingStore instead, we would just have to ensure that the
> backing store doesn't open resources until it's used.

Concurrent running creates multiple instances, so that won't work.

> - ScatterGatherBackingStore, FileBasedScatterGatherBackingStore and
> ScatterGatherBackingStoreSupplier are in the zip package but have
> nothing specific to the zip format. Should we move them elsewhere so
> they can be used later to implement parallel compression for other formats?

Good idea; something like org.apache.commons.compress.archivers.concurrent or
org.apache.commons.compress.concurrent ?

> - I think an example demonstrating the parallel zip creation would be a
> nice addition to the site.

I'll add some code later tonight.

Thanks a lot for some great comments !

Kristian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Kristian Rosenvold <kr...@apache.org>.
Thanks for lots of great comments ! Just a few comments about some of the items;

Regarding the *Supplier interfaces, the basic idea is that once
commons-compress reaches jdk 1.8 language level (somewhere in the late
2030's :-) some of such interfaces may be modified to extend
java.util.function.Supplier. And if the supplier throws an exception
it'll be well defined on the interface with clear semantics. I somehow
think these interfaces serve to define the contract a little better
than just a straight callable throwing "Exception". It seems  like a
decent way to retromount some jdk8 idioms into jdk[567] compatible
libraries. I'm not really married to this solution, but I think it's
nicer than callable.

ZipArchiveEntryRequest is created for one specific purpose:
ZipArchiveEntry is created on a different thread that has no clear
happens-before relationship to the compressor thread, which means we
cannot really access any fields in that object unless we make them all
volatile. For a long time I figured I could get away without
ZipArchiveEntryRequest but I decided that the threading model would be
too hair-splitting and way too subtle for my liking; I prefer code
that does not require black-belt knowledge in memory model semantics.

I'll look at the specific code stuff for all the parallel code in the weekend.

Kristian


2015-01-23 11:52 GMT+01:00 Emmanuel Bourg <eb...@apache.org>:
> Le 22/01/2015 17:30, Stefan Bodewig a écrit :
>
>> Please have a look and identify stuff that looks as if I'd have to
>> reroll a new RC should it come to a vote with the current code base.
>
> I reviewed the API changes, here are my comments:
>
> - PasswordRequiredException: the exception is in the sevenz package, do
> we want to move it elsewhere so it can be used later for other formats
> like zip (if that makes sense).
>
> - there are some missing @since tags on the new classes
> (ScatterGatherBackingStoreSupplier, ParallelScatterZipCreator,
> InputStreamSupplier, ZipArchiveEntryRequest, ZipArchiveEntryPredicate)
>
> - InputStreamSupplier could have been replaced with
> Supplier<InputStream> if we were using Java 8. But with the current JDK
> I wonder if we could use Callable<InputStream> instead to spare an
> interface.
>
> - If we had a kind of DeferredInputStream (not yet in commons-io sadly)
> the API could be slightly simpler. ZipArchiveEntryRequest and
> InputStreamSupplier could go away. DeferredInputStream would open an
> underlying stream only when a read() method is called.
>
> - ParallelScatterZipCreator has two public methods with a
> Callable<Object> in the signature, but it's not clear what this Object
> actually is.
>
> - Is ParallelScatterZipCreator.getStatisticsMessage() intended to remain
> in the final API or was it just there for benchmarking the
> implementation during the development? If it is meant to stay maybe a
> proper ScatterStatistics class would be cleaner.
>
> - ParallelScatterZipCreator.createCallable: the message of the
> IllegalArgumentException could mention the name of the entry involved.
>
> - The javadoc of the default ParallelScatterZipCreator constructor could
> explain how the thread pool is sized by default.
>
> - Could ScatterGatherBackingStoreSupplier be avoided? The public
> ParallelScatterZipCreator constructor that uses it could accept a
> ScatterGatherBackingStore instead, we would just have to ensure that the
> backing store doesn't open resources until it's used.
>
> - ScatterGatherBackingStore, FileBasedScatterGatherBackingStore and
> ScatterGatherBackingStoreSupplier are in the zip package but have
> nothing specific to the zip format. Should we move them elsewhere so
> they can be used later to implement parallel compression for other formats?
>
> - BitInputStream: why not using a long cache instead of an int, like
> BitStream before the refactoring? It might be interesting to do some
> benchmarks and see if it make a difference.
>
> - I think an example demonstrating the parallel zip creation would be a
> nice addition to the site.
>
> Emmanuel Bourg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Preparations for 1.10

Posted by Emmanuel Bourg <eb...@apache.org>.
Le 22/01/2015 17:30, Stefan Bodewig a écrit :

> Please have a look and identify stuff that looks as if I'd have to
> reroll a new RC should it come to a vote with the current code base.

I reviewed the API changes, here are my comments:

- PasswordRequiredException: the exception is in the sevenz package, do
we want to move it elsewhere so it can be used later for other formats
like zip (if that makes sense).

- there are some missing @since tags on the new classes
(ScatterGatherBackingStoreSupplier, ParallelScatterZipCreator,
InputStreamSupplier, ZipArchiveEntryRequest, ZipArchiveEntryPredicate)

- InputStreamSupplier could have been replaced with
Supplier<InputStream> if we were using Java 8. But with the current JDK
I wonder if we could use Callable<InputStream> instead to spare an
interface.

- If we had a kind of DeferredInputStream (not yet in commons-io sadly)
the API could be slightly simpler. ZipArchiveEntryRequest and
InputStreamSupplier could go away. DeferredInputStream would open an
underlying stream only when a read() method is called.

- ParallelScatterZipCreator has two public methods with a
Callable<Object> in the signature, but it's not clear what this Object
actually is.

- Is ParallelScatterZipCreator.getStatisticsMessage() intended to remain
in the final API or was it just there for benchmarking the
implementation during the development? If it is meant to stay maybe a
proper ScatterStatistics class would be cleaner.

- ParallelScatterZipCreator.createCallable: the message of the
IllegalArgumentException could mention the name of the entry involved.

- The javadoc of the default ParallelScatterZipCreator constructor could
explain how the thread pool is sized by default.

- Could ScatterGatherBackingStoreSupplier be avoided? The public
ParallelScatterZipCreator constructor that uses it could accept a
ScatterGatherBackingStore instead, we would just have to ensure that the
backing store doesn't open resources until it's used.

- ScatterGatherBackingStore, FileBasedScatterGatherBackingStore and
ScatterGatherBackingStoreSupplier are in the zip package but have
nothing specific to the zip format. Should we move them elsewhere so
they can be used later to implement parallel compression for other formats?

- BitInputStream: why not using a long cache instead of an int, like
BitStream before the refactoring? It might be interesting to do some
benchmarks and see if it make a difference.

- I think an example demonstrating the parallel zip creation would be a
nice addition to the site.

Emmanuel Bourg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org