You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stefan Bodewig <bo...@apache.org> on 2009/03/26 11:58:16 UTC

[compress] potential bzip2 improvement

Hi folks,

first of all I don't know enough about the bzip2 format to undestand
the existing code, nor the one I'll be pointing at further down.

The current code in compress is the one of Ant 1.7.1 and versions
prior to 1.7.0.  In Ant 1.7.0 Ant shipped with a completely rewritten
version that was a lot faster 

https://issues.apache.org/bugzilla/show_bug.cgi?id=24798

Unfortunatly that version created corrupt archives under certain
circumstances

https://issues.apache.org/bugzilla/show_bug.cgi?id=41596

and the change was reverted in Ant 1.7.1.

The Hadoop folks have been using the version of Ant 1.7.0, ran into
the same corruption but had somebody around who actually understood
the code and fixed it.  So Ant's Bugzilla now contains a patch to
BZip2OutputStream that has the potential to be a lot faster, isn't
really any less understandable than the existing code (which is
impossible anyway) and is claimed to be tested by Hadoop.

Do we want to use Hadoop's code (provided it passes the existing unit
tests) for the 1.0 release or do we want to stick to the current code
and try an upgrade after the release?

Since I understand neither code base, I don't really have any
preference.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] potential bzip2 improvement

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-03-26, Torsten Curdt <tc...@apache.org> wrote:

> If the code has been tested by the folks at hadoop I trust it very
> much. (They do extensive testing!) and would be all for replacing the
> current code base.

I just did it 8-)

svn revision 759143

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] potential bzip2 improvement

Posted by Torsten Curdt <tc...@apache.org>.
If the code has been tested by the folks at hadoop I trust it very
much. (They do extensive testing!) and would be all for replacing the
current code base. But IMO that's no blocker. In the end the algorithm
should be a black box for everyone using compress. So changing this in
a later release should be no problem.

I would strongly argue against shipping two code bases for the same
algorithm. It would cause more confusion than it would help IMO.

On Thu, Mar 26, 2009 at 12:47, sebb <se...@gmail.com> wrote:
> On 26/03/2009, Jochen Wiedmann <jo...@gmail.com> wrote:
>> On Thu, Mar 26, 2009 at 12:36 PM, sebb <se...@gmail.com> wrote:
>>
>>  > Would it be a silly idea to include both variants?
>>
>>
>> I'd clearly tend to avoid that. It would likely cause confusion.
>
> Surely that depends on how well it is documented and on the API?
>
> I've not looked into it, but perhaps the implementation can be chosen
> at construction-time via a parameter.
>
>>  On a related matter: Stefan, wouldn't it be possible for the others to
>>  start using compress?
>>
>>  Jochen
>>
>>
>>
>>  --
>>  I have always wished for my computer to be as easy to use as my
>>  telephone; my wish has come true because I can no longer figure out
>>  how to use my telephone.
>>
>>     -- (Bjarne Stroustrup,
>>  http://www.research.att.com/~bs/bs_faq.html#really-say-that
>>        My guess: Nokia E50)
>>
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>  For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] potential bzip2 improvement

Posted by sebb <se...@gmail.com>.
On 26/03/2009, Jochen Wiedmann <jo...@gmail.com> wrote:
> On Thu, Mar 26, 2009 at 12:36 PM, sebb <se...@gmail.com> wrote:
>
>  > Would it be a silly idea to include both variants?
>
>
> I'd clearly tend to avoid that. It would likely cause confusion.

Surely that depends on how well it is documented and on the API?

I've not looked into it, but perhaps the implementation can be chosen
at construction-time via a parameter.

>  On a related matter: Stefan, wouldn't it be possible for the others to
>  start using compress?
>
>  Jochen
>
>
>
>  --
>  I have always wished for my computer to be as easy to use as my
>  telephone; my wish has come true because I can no longer figure out
>  how to use my telephone.
>
>     -- (Bjarne Stroustrup,
>  http://www.research.att.com/~bs/bs_faq.html#really-say-that
>        My guess: Nokia E50)
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] potential bzip2 improvement

Posted by Torsten Curdt <tc...@apache.org>.
>> On a related matter: Stefan, wouldn't it be possible for the others to
>> start using compress?
>
> Not sure who "the others" may be.
>
> As for Ant, I don't expect it to move away from its own classes.
> Creating jars is a core requirement for a build tool for Java and the
> Ant community probably won't want to depend on an external library to
> do that (I know that I don't).

I so don't get why you guys don't look into something like
jarjar/minijar ...but that's a different story :)

cheers
--
Torsten

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] potential bzip2 improvement

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-03-26, Jochen Wiedmann <jo...@gmail.com> wrote:

> On a related matter: Stefan, wouldn't it be possible for the others to
> start using compress?

Not sure who "the others" may be.

As for Ant, I don't expect it to move away from its own classes.
Creating jars is a core requirement for a build tool for Java and the
Ant community probably won't want to depend on an external library to
do that (I know that I don't).

As for Hadoop, yes, they probably can and should - it will be better
than depending on Ant just for the compression libraries.  I intend to
add Javadocs to Ant's classes pointing to commons-compress as the
preferred alternative once we have a release over here.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] potential bzip2 improvement

Posted by Jochen Wiedmann <jo...@gmail.com>.
On Thu, Mar 26, 2009 at 12:36 PM, sebb <se...@gmail.com> wrote:

> Would it be a silly idea to include both variants?

I'd clearly tend to avoid that. It would likely cause confusion.

On a related matter: Stefan, wouldn't it be possible for the others to
start using compress?

Jochen


-- 
I have always wished for my computer to be as easy to use as my
telephone; my wish has come true because I can no longer figure out
how to use my telephone.

    -- (Bjarne Stroustrup,
http://www.research.att.com/~bs/bs_faq.html#really-say-that
       My guess: Nokia E50)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] potential bzip2 improvement

Posted by sebb <se...@gmail.com>.
On 26/03/2009, Stefan Bodewig <bo...@apache.org> wrote:
> Hi folks,
>
>  first of all I don't know enough about the bzip2 format to undestand
>  the existing code, nor the one I'll be pointing at further down.
>
>  The current code in compress is the one of Ant 1.7.1 and versions
>  prior to 1.7.0.  In Ant 1.7.0 Ant shipped with a completely rewritten
>  version that was a lot faster
>
>  https://issues.apache.org/bugzilla/show_bug.cgi?id=24798
>
>  Unfortunatly that version created corrupt archives under certain
>  circumstances
>
>  https://issues.apache.org/bugzilla/show_bug.cgi?id=41596
>
>  and the change was reverted in Ant 1.7.1.
>
>  The Hadoop folks have been using the version of Ant 1.7.0, ran into
>  the same corruption but had somebody around who actually understood
>  the code and fixed it.  So Ant's Bugzilla now contains a patch to
>  BZip2OutputStream that has the potential to be a lot faster, isn't
>  really any less understandable than the existing code (which is
>  impossible anyway) and is claimed to be tested by Hadoop.
>
>  Do we want to use Hadoop's code (provided it passes the existing unit
>  tests) for the 1.0 release or do we want to stick to the current code
>  and try an upgrade after the release?
>
>  Since I understand neither code base, I don't really have any
>  preference.

Would it be a silly idea to include both variants?

>  Stefan
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org