You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Joydeep Sen Sarma <js...@facebook.com> on 2008/02/21 02:06:19 UTC

changes to compression interfaces in 0.15?

Hi developers,

 

In migrating to 0.15 - i am noticing that the compression interfaces
have changed:

 

-          compression type for sequencefile outputs used to be set by:
SequenceFile.setCompressionType()

-          now it seems to be set using:
sequenceFileOutputFormat.setOutputCompressionType()

 

The change is for the better - but would it be possible to:

 

-          remove old/dead interfaces. That would have been a
straightforward hint for applications to look for new interfaces.
(hadoop-default.xml also still has setting for old conf variable:
io.seqfile.compression.type)

-          if possible - document changed interfaces in the release
notes (there's no way we can find this out by looking at the long list
of Jiras).

 

As u can imagine - this causes a very subtle and harmful regression in
behavior of existing apps. It does not causes failures - and in our case
- switched from BLOCK to RECORD compression - meaning - there's no
compression at all pretty much. I caught this by *pure* chance and now I
am living in absolute fear of what else lurks out there.

 

i am not sure how updated the wiki is on the compression stuff (my
responsibility to update it) - but please do consider the impact of
changing interfaces on existing applications. (maybe we should have a
JIRA tag to mark out bugs that change interfaces).

 

As always - thanks for all the fish (err .. working code),

 

Joydeep

 


Re: changes to compression interfaces in 0.15?

Posted by Ted Dunning <td...@veoh.com>.
The principles are pretty simple:

If the semantics change significantly, then the name should change.

Conversely, if the name doesn't change the semantics shouldn't change.

That is, unless the changes fix seriously broken old semantics or extend old
semantics in a way that old calls don't change.

Or unless the change is listed explicitly in the INCOMPATIBLE section in a
way that people understand (which is unlikely given how few people look at
the change notes).

ON the other side of the fence, good monitoring and exception alerting are
always good.  Output size is a key parameter to monitor and alert on.
Adaptive alerts aren't that hard to build (just alert on the ratio of today
/ yesterday or today / last week).


On 2/21/08 12:41 PM, "Arun C Murthy" <ac...@yahoo-inc.com> wrote:

> We do need to be more diligent about listing config changes in
> CHANGES.txt for starters, and that point is taken. However, we can't
> start pulling out apis without deprecating them first.


Re: define backwards compatibility

Posted by Doug Cutting <cu...@apache.org>.
Joydeep Sen Sarma wrote:
> i find the confusion over what backwards compatibility means scary - and i am really hoping that the outcome of this thread is a clear definition from the committers/hadoop-board of what to reasonably expect (or not!) going forward.

The goal is clear: code that compiles and runs warning-free in one 
release should not have to to be altered to try the next release.  It 
may generate warnings, and these should be addressed before another 
upgrade is attempted.

Sometimes it is not possible to achieve this.  In these cases 
applications should fail with a clear error message, either at 
compilation or runtime.

In both cases, incompatible changes should be well documented in the 
release notes.

This is described (in part) in http://wiki.apache.org/hadoop/Roadmap

That's the goal.  Implementing and enforcing it is another story.  For 
that we depend on developer and user vigilance.  The current issue seems 
a case of failure to implement the policy rather than a lack of policy.

Doug

define backwards compatibility (was: changes to compression interfaces in 0.15?)

Posted by Joydeep Sen Sarma <js...@facebook.com>.
Arun - if you can't pull the api - then u must redirect the api to the new call that preserves it's semantics.

in this case - had we re-implemented SequenceFile.setCompressionType in 0.15 to call SequenceFileOutputFormat.setOutputCompressionType() - then it would have been a backwards compatible change. + deprecation would have served fair warning for eventual pullout.

i find the confusion over what backwards compatibility means scary - and i am really hoping that the outcome of this thread is a clear definition from the committers/hadoop-board of what to reasonably expect (or not!) going forward.



-----Original Message-----
From: Pete Wyckoff [mailto:pwyckoff@facebook.com]
Sent: Thu 2/21/2008 12:47 PM
To: core-user@hadoop.apache.org
Subject: Re: changes to compression interfaces in 0.15?
 

If the API semantics are changing under you, you have to change your code
whether or not the API is pulled or deprecated.  Pulling it makes it more
obvious that the user has to change his/her code.

-- pete


On 2/21/08 12:41 PM, "Arun C Murthy" <ac...@yahoo-inc.com> wrote:

> 
> On Feb 21, 2008, at 12:20 PM, Joydeep Sen Sarma wrote:
> 
>>> To maintain backward compat, we cannot remove old apis - the standard
>>> procedure is to deprecate them for the next release and remove them
>>> in subsequent releases.
>> 
>> you've got to be kidding.
>> 
>> we didn't maintain backwards compatibility. my app broke. Simple
>> and straightforward. and the old interfaces are not deprecated (to
>> quote 0.15.3 on a 'deprecated' interface:
>> 
> 
> You are right, HADOOP-1851 didn't fix it right. I've filed HADOOP-2869.
> 
> We do need to be more diligent about listing config changes in
> CHANGES.txt for starters, and that point is taken. However, we can't
> start pulling out apis without deprecating them first.
> 
> Arun
> 
>> 
>>   /**
>>    * Set the compression type for sequence files.
>>    * @param job the configuration to modify
>>    * @param val the new compression type (none, block, record)
>>    */
>>   static public void setCompressionType(Configuration job,
>>                                         CompressionType val) {
>> )
>> 
>> I (and i would suspect any average user willing to recompile code)
>> would much much rather that we broke backwards compatibility
>> immediately rather than maintain carry over defunct apis that
>> insidiously break application behavior.
>> 
>> and of course - this does not address the point that the option
>> strings themselves are depcreated. (remember - people set options
>> explicitly from xml files and streaming. not everyone goes through
>> java apis)).
>> 
>> --
>> 
>> as one of my dear professors once said - put ur self in the other
>> person's shoe. consider that u were in my position and that a
>> production app suddenly went from consuming 100G to 1TB. and
>> everything slowed down drastically. and it did not give any sign
>> that anything was amiss. everything looked golden on the ourside.
>> what would be ur reaction if u find out after a week that the
>> system was full and numerous processes had to be re-run? how would
>> you have figured that was going to happen by looking at the
>> INCOMPATIBLE section (which btw - i did carefully before sending my
>> mail).
>> 
>> (fortunately i escaped the worst case - but i think this is a real
>> call to action)
>> 
>> 
>> -----Original Message-----
>> From: Arun C Murthy [mailto:acm@yahoo-inc.com]
>> Sent: Thu 2/21/2008 11:21 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: changes to compression interfaces in 0.15?
>> 
>> Joydeep,
>> 
>> On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:
>> 
>>> Hi developers,
>>> 
>>> In migrating to 0.15 - i am noticing that the compression interfaces
>>> have changed:
>>> 
>>> -          compression type for sequencefile outputs used to be set
>>> by:
>>> SequenceFile.setCompressionType()
>>> 
>>> -          now it seems to be set using:
>>> sequenceFileOutputFormat.setOutputCompressionType()
>>> 
>>> 
>> 
>> Yes, we added SequenceFileOutputFormat.setOutputCompressionType and
>> deprecated the old api. (HADOOP-1851)
>> 
>>> 
>>> The change is for the better - but would it be possible to:
>>> 
>>> -          remove old/dead interfaces. That would have been a
>>> straightforward hint for applications to look for new interfaces.
>>> (hadoop-default.xml also still has setting for old conf variable:
>>> io.seqfile.compression.type)
>>> 
>> 
>> To maintain backward compat, we cannot remove old apis - the standard
>> procedure is to deprecate them for the next release and remove them
>> in subsequent releases.
>> 
>>> -          if possible - document changed interfaces in the release
>>> notes (there's no way we can find this out by looking at the long
>>> list
>>> of Jiras).
>>> 
>> 
>> Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,
>> HADOOP-1851 is listed there. Admittedly we can do better, but that is
>> a good place to look for when upgrading to newer releases.
>>> 
>>> i am not sure how updated the wiki is on the compression stuff (my
>>> responsibility to update it) - but please do consider the impact of
>> 
>> Please use the forrest-based docs (on the hadoop website - e.g.
>> mapred_tutorial.html) rather than the wiki as the gold-standard. The
>> reason we moved away from the wiki is precisely this - harder to
>> maintain docs per release etc.
>> 
>>> changing interfaces on existing applications. (maybe we should have a
>>> JIRA tag to mark out bugs that change interfaces).
>>> 
>>> 
>> 
>> Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.
>> 
>> Arun
>> 
>>> 
>>> 
>>> As always - thanks for all the fish (err .. working code),
>>> 
>>> 
>>> 
>>> Joydeep
>>> 
>>> 
>>> 
>> 
>> 
> 



Re: changes to compression interfaces in 0.15?

Posted by Pete Wyckoff <pw...@facebook.com>.
If the API semantics are changing under you, you have to change your code
whether or not the API is pulled or deprecated.  Pulling it makes it more
obvious that the user has to change his/her code.

-- pete


On 2/21/08 12:41 PM, "Arun C Murthy" <ac...@yahoo-inc.com> wrote:

> 
> On Feb 21, 2008, at 12:20 PM, Joydeep Sen Sarma wrote:
> 
>>> To maintain backward compat, we cannot remove old apis - the standard
>>> procedure is to deprecate them for the next release and remove them
>>> in subsequent releases.
>> 
>> you've got to be kidding.
>> 
>> we didn't maintain backwards compatibility. my app broke. Simple
>> and straightforward. and the old interfaces are not deprecated (to
>> quote 0.15.3 on a 'deprecated' interface:
>> 
> 
> You are right, HADOOP-1851 didn't fix it right. I've filed HADOOP-2869.
> 
> We do need to be more diligent about listing config changes in
> CHANGES.txt for starters, and that point is taken. However, we can't
> start pulling out apis without deprecating them first.
> 
> Arun
> 
>> 
>>   /**
>>    * Set the compression type for sequence files.
>>    * @param job the configuration to modify
>>    * @param val the new compression type (none, block, record)
>>    */
>>   static public void setCompressionType(Configuration job,
>>                                         CompressionType val) {
>> )
>> 
>> I (and i would suspect any average user willing to recompile code)
>> would much much rather that we broke backwards compatibility
>> immediately rather than maintain carry over defunct apis that
>> insidiously break application behavior.
>> 
>> and of course - this does not address the point that the option
>> strings themselves are depcreated. (remember - people set options
>> explicitly from xml files and streaming. not everyone goes through
>> java apis)).
>> 
>> --
>> 
>> as one of my dear professors once said - put ur self in the other
>> person's shoe. consider that u were in my position and that a
>> production app suddenly went from consuming 100G to 1TB. and
>> everything slowed down drastically. and it did not give any sign
>> that anything was amiss. everything looked golden on the ourside.
>> what would be ur reaction if u find out after a week that the
>> system was full and numerous processes had to be re-run? how would
>> you have figured that was going to happen by looking at the
>> INCOMPATIBLE section (which btw - i did carefully before sending my
>> mail).
>> 
>> (fortunately i escaped the worst case - but i think this is a real
>> call to action)
>> 
>> 
>> -----Original Message-----
>> From: Arun C Murthy [mailto:acm@yahoo-inc.com]
>> Sent: Thu 2/21/2008 11:21 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: changes to compression interfaces in 0.15?
>> 
>> Joydeep,
>> 
>> On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:
>> 
>>> Hi developers,
>>> 
>>> In migrating to 0.15 - i am noticing that the compression interfaces
>>> have changed:
>>> 
>>> -          compression type for sequencefile outputs used to be set
>>> by:
>>> SequenceFile.setCompressionType()
>>> 
>>> -          now it seems to be set using:
>>> sequenceFileOutputFormat.setOutputCompressionType()
>>> 
>>> 
>> 
>> Yes, we added SequenceFileOutputFormat.setOutputCompressionType and
>> deprecated the old api. (HADOOP-1851)
>> 
>>> 
>>> The change is for the better - but would it be possible to:
>>> 
>>> -          remove old/dead interfaces. That would have been a
>>> straightforward hint for applications to look for new interfaces.
>>> (hadoop-default.xml also still has setting for old conf variable:
>>> io.seqfile.compression.type)
>>> 
>> 
>> To maintain backward compat, we cannot remove old apis - the standard
>> procedure is to deprecate them for the next release and remove them
>> in subsequent releases.
>> 
>>> -          if possible - document changed interfaces in the release
>>> notes (there's no way we can find this out by looking at the long
>>> list
>>> of Jiras).
>>> 
>> 
>> Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,
>> HADOOP-1851 is listed there. Admittedly we can do better, but that is
>> a good place to look for when upgrading to newer releases.
>>> 
>>> i am not sure how updated the wiki is on the compression stuff (my
>>> responsibility to update it) - but please do consider the impact of
>> 
>> Please use the forrest-based docs (on the hadoop website - e.g.
>> mapred_tutorial.html) rather than the wiki as the gold-standard. The
>> reason we moved away from the wiki is precisely this - harder to
>> maintain docs per release etc.
>> 
>>> changing interfaces on existing applications. (maybe we should have a
>>> JIRA tag to mark out bugs that change interfaces).
>>> 
>>> 
>> 
>> Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.
>> 
>> Arun
>> 
>>> 
>>> 
>>> As always - thanks for all the fish (err .. working code),
>>> 
>>> 
>>> 
>>> Joydeep
>>> 
>>> 
>>> 
>> 
>> 
> 


Re: changes to compression interfaces in 0.15?

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On Feb 21, 2008, at 12:20 PM, Joydeep Sen Sarma wrote:

>> To maintain backward compat, we cannot remove old apis - the standard
>> procedure is to deprecate them for the next release and remove them
>> in subsequent releases.
>
> you've got to be kidding.
>
> we didn't maintain backwards compatibility. my app broke. Simple  
> and straightforward. and the old interfaces are not deprecated (to  
> quote 0.15.3 on a 'deprecated' interface:
>

You are right, HADOOP-1851 didn't fix it right. I've filed HADOOP-2869.

We do need to be more diligent about listing config changes in  
CHANGES.txt for starters, and that point is taken. However, we can't  
start pulling out apis without deprecating them first.

Arun

>
>   /**
>    * Set the compression type for sequence files.
>    * @param job the configuration to modify
>    * @param val the new compression type (none, block, record)
>    */
>   static public void setCompressionType(Configuration job,
>                                         CompressionType val) {
> )
>
> I (and i would suspect any average user willing to recompile code)  
> would much much rather that we broke backwards compatibility  
> immediately rather than maintain carry over defunct apis that  
> insidiously break application behavior.
>
> and of course - this does not address the point that the option  
> strings themselves are depcreated. (remember - people set options  
> explicitly from xml files and streaming. not everyone goes through  
> java apis)).
>
> --
>
> as one of my dear professors once said - put ur self in the other  
> person's shoe. consider that u were in my position and that a  
> production app suddenly went from consuming 100G to 1TB. and  
> everything slowed down drastically. and it did not give any sign  
> that anything was amiss. everything looked golden on the ourside.  
> what would be ur reaction if u find out after a week that the  
> system was full and numerous processes had to be re-run? how would  
> you have figured that was going to happen by looking at the  
> INCOMPATIBLE section (which btw - i did carefully before sending my  
> mail).
>
> (fortunately i escaped the worst case - but i think this is a real  
> call to action)
>
>
> -----Original Message-----
> From: Arun C Murthy [mailto:acm@yahoo-inc.com]
> Sent: Thu 2/21/2008 11:21 AM
> To: core-user@hadoop.apache.org
> Subject: Re: changes to compression interfaces in 0.15?
>
> Joydeep,
>
> On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:
>
>> Hi developers,
>>
>> In migrating to 0.15 - i am noticing that the compression interfaces
>> have changed:
>>
>> -          compression type for sequencefile outputs used to be set
>> by:
>> SequenceFile.setCompressionType()
>>
>> -          now it seems to be set using:
>> sequenceFileOutputFormat.setOutputCompressionType()
>>
>>
>
> Yes, we added SequenceFileOutputFormat.setOutputCompressionType and
> deprecated the old api. (HADOOP-1851)
>
>>
>> The change is for the better - but would it be possible to:
>>
>> -          remove old/dead interfaces. That would have been a
>> straightforward hint for applications to look for new interfaces.
>> (hadoop-default.xml also still has setting for old conf variable:
>> io.seqfile.compression.type)
>>
>
> To maintain backward compat, we cannot remove old apis - the standard
> procedure is to deprecate them for the next release and remove them
> in subsequent releases.
>
>> -          if possible - document changed interfaces in the release
>> notes (there's no way we can find this out by looking at the long  
>> list
>> of Jiras).
>>
>
> Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,
> HADOOP-1851 is listed there. Admittedly we can do better, but that is
> a good place to look for when upgrading to newer releases.
>>
>> i am not sure how updated the wiki is on the compression stuff (my
>> responsibility to update it) - but please do consider the impact of
>
> Please use the forrest-based docs (on the hadoop website - e.g.
> mapred_tutorial.html) rather than the wiki as the gold-standard. The
> reason we moved away from the wiki is precisely this - harder to
> maintain docs per release etc.
>
>> changing interfaces on existing applications. (maybe we should have a
>> JIRA tag to mark out bugs that change interfaces).
>>
>>
>
> Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.
>
> Arun
>
>>
>>
>> As always - thanks for all the fish (err .. working code),
>>
>>
>>
>> Joydeep
>>
>>
>>
>
>


RE: changes to compression interfaces in 0.15?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
> To maintain backward compat, we cannot remove old apis - the standard 
> procedure is to deprecate them for the next release and remove them 
> in subsequent releases.

you've got to be kidding.

we didn't maintain backwards compatibility. my app broke. Simple and straightforward. and the old interfaces are not deprecated (to quote 0.15.3 on a 'deprecated' interface:

  /**                                                                                                                   
   * Set the compression type for sequence files.                                                                       
   * @param job the configuration to modify                                                                             
   * @param val the new compression type (none, block, record)                                                          
   */
  static public void setCompressionType(Configuration job,
                                        CompressionType val) {
)

I (and i would suspect any average user willing to recompile code) would much much rather that we broke backwards compatibility immediately rather than maintain carry over defunct apis that insidiously break application behavior.

and of course - this does not address the point that the option strings themselves are depcreated. (remember - people set options explicitly from xml files and streaming. not everyone goes through java apis)).

--

as one of my dear professors once said - put ur self in the other person's shoe. consider that u were in my position and that a production app suddenly went from consuming 100G to 1TB. and everything slowed down drastically. and it did not give any sign that anything was amiss. everything looked golden on the ourside. what would be ur reaction if u find out after a week that the system was full and numerous processes had to be re-run? how would you have figured that was going to happen by looking at the INCOMPATIBLE section (which btw - i did carefully before sending my mail).

(fortunately i escaped the worst case - but i think this is a real call to action)


-----Original Message-----
From: Arun C Murthy [mailto:acm@yahoo-inc.com]
Sent: Thu 2/21/2008 11:21 AM
To: core-user@hadoop.apache.org
Subject: Re: changes to compression interfaces in 0.15?
 
Joydeep,

On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:

> Hi developers,
>
> In migrating to 0.15 - i am noticing that the compression interfaces
> have changed:
>
> -          compression type for sequencefile outputs used to be set  
> by:
> SequenceFile.setCompressionType()
>
> -          now it seems to be set using:
> sequenceFileOutputFormat.setOutputCompressionType()
>
>

Yes, we added SequenceFileOutputFormat.setOutputCompressionType and  
deprecated the old api. (HADOOP-1851)

>
> The change is for the better - but would it be possible to:
>
> -          remove old/dead interfaces. That would have been a
> straightforward hint for applications to look for new interfaces.
> (hadoop-default.xml also still has setting for old conf variable:
> io.seqfile.compression.type)
>

To maintain backward compat, we cannot remove old apis - the standard  
procedure is to deprecate them for the next release and remove them  
in subsequent releases.

> -          if possible - document changed interfaces in the release
> notes (there's no way we can find this out by looking at the long list
> of Jiras).
>

Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,  
HADOOP-1851 is listed there. Admittedly we can do better, but that is  
a good place to look for when upgrading to newer releases.
>
> i am not sure how updated the wiki is on the compression stuff (my
> responsibility to update it) - but please do consider the impact of

Please use the forrest-based docs (on the hadoop website - e.g.  
mapred_tutorial.html) rather than the wiki as the gold-standard. The  
reason we moved away from the wiki is precisely this - harder to  
maintain docs per release etc.

> changing interfaces on existing applications. (maybe we should have a
> JIRA tag to mark out bugs that change interfaces).
>
>

Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.

Arun

>
>
> As always - thanks for all the fish (err .. working code),
>
>
>
> Joydeep
>
>
>



Re: changes to compression interfaces in 0.15?

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Joydeep,

On Feb 20, 2008, at 5:06 PM, Joydeep Sen Sarma wrote:

> Hi developers,
>
> In migrating to 0.15 - i am noticing that the compression interfaces
> have changed:
>
> -          compression type for sequencefile outputs used to be set  
> by:
> SequenceFile.setCompressionType()
>
> -          now it seems to be set using:
> sequenceFileOutputFormat.setOutputCompressionType()
>
>

Yes, we added SequenceFileOutputFormat.setOutputCompressionType and  
deprecated the old api. (HADOOP-1851)

>
> The change is for the better - but would it be possible to:
>
> -          remove old/dead interfaces. That would have been a
> straightforward hint for applications to look for new interfaces.
> (hadoop-default.xml also still has setting for old conf variable:
> io.seqfile.compression.type)
>

To maintain backward compat, we cannot remove old apis - the standard  
procedure is to deprecate them for the next release and remove them  
in subsequent releases.

> -          if possible - document changed interfaces in the release
> notes (there's no way we can find this out by looking at the long list
> of Jiras).
>

Please look at the INCOMPATIBLE CHANGES section of CHANGES.txt,  
HADOOP-1851 is listed there. Admittedly we can do better, but that is  
a good place to look for when upgrading to newer releases.
>
> i am not sure how updated the wiki is on the compression stuff (my
> responsibility to update it) - but please do consider the impact of

Please use the forrest-based docs (on the hadoop website - e.g.  
mapred_tutorial.html) rather than the wiki as the gold-standard. The  
reason we moved away from the wiki is precisely this - harder to  
maintain docs per release etc.

> changing interfaces on existing applications. (maybe we should have a
> JIRA tag to mark out bugs that change interfaces).
>
>

Again, CHANGES.txt and INCOMPATIBLE CHANGES section for now.

Arun

>
>
> As always - thanks for all the fish (err .. working code),
>
>
>
> Joydeep
>
>
>


RE: changes to compression interfaces in 0.15?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
To add to the litany of woes: LocalFileSystem.globPaths() returns file
paths with 'file://' uri prefix. (other file system list calls don't).
toString() calls to these paths now have different format from before.

Ok - maybe the right thing to do was use toUri().getPath() all along.
But still ..



-----Original Message-----
From: Jason Venner [mailto:jason@attributor.com] 
Sent: Wednesday, February 20, 2008 6:13 PM
To: core-user@hadoop.apache.org
Subject: Re: changes to compression interfaces in 0.15?

I agree. I am in the midst of combing through the config files for 16 to

see what changes i have to retrofit into our jobs.
Support in the tools to inform of the use of depreciated or outright 
removed keys would be wonderful.

Aaron Kimball wrote:
> As a general follow-up suggestion : Is there a mechanism to output a 
> warning when the user sets deprecated JobConf keys? Given that you can

> set any arbitrary key name and it will simply be ignored, this might 
> be a good idea.
>
> - Aaron
>
> Joydeep Sen Sarma wrote:
>> In addition:
>>
>> -          "mapred.output.compression.type" is now replaced with
>> "mapred.map.output.compression.type"
>>
>> -          the old implementation of the Java interface
>> setMapOutputCompressorClass() used to turn on map compression on
>> automatically as side-effect, the 0.15 one doesn't. Looks like one
has
>> to call setCompressMapOutput() separately.
>>
>>  
>>
>> Aargh.
>>
>>  
>>
>> ________________________________
>>
>> From: hive-devel-bounces@lists.facebook.com
>> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep
Sen
>> Sarma
>> Sent: Wednesday, February 20, 2008 5:06 PM
>> To: core-user@hadoop.apache.org
>> Subject: changes to compression interfaces in 0.15?
>>
>>  
>>
>> Hi developers,
>>
>>  
>>
>> In migrating to 0.15 - i am noticing that the compression interfaces
>> have changed:
>>
>>  
>>
>> -          compression type for sequencefile outputs used to be set
by:
>> SequenceFile.setCompressionType()
>>
>> -          now it seems to be set using:
>> sequenceFileOutputFormat.setOutputCompressionType()
>>
>>  
>>
>> The change is for the better - but would it be possible to:
>>
>>  
>>
>> -          remove old/dead interfaces. That would have been a
>> straightforward hint for applications to look for new interfaces.
>> (hadoop-default.xml also still has setting for old conf variable:
>> io.seqfile.compression.type)
>>
>> -          if possible - document changed interfaces in the release
>> notes (there's no way we can find this out by looking at the long
list
>> of Jiras).
>>
>>  
>>
>> As u can imagine - this causes a very subtle and harmful regression
in
>> behavior of existing apps. It does not causes failures - and in our
case
>> - switched from BLOCK to RECORD compression - meaning - there's no
>> compression at all pretty much. I caught this by *pure* chance and
now I
>> am living in absolute fear of what else lurks out there.
>>
>>  
>>
>> i am not sure how updated the wiki is on the compression stuff (my
>> responsibility to update it) - but please do consider the impact of
>> changing interfaces on existing applications. (maybe we should have a
>> JIRA tag to mark out bugs that change interfaces).
>>
>>  
>>
>> As always - thanks for all the fish (err .. working code),
>>
>>  
>>
>> Joydeep
>>
>>  
>>
>>
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

RE: changes to compression interfaces in 0.15?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
/* will leave my comments in the JIRA as well */

the mechanism proposed here - by itself - will not prevent regressions. we can create a 'registry' of legal options - but that will not prevent a software change that adds a new option, moves semantics of old option to the new option - and then merrily leaves the old option behind in the registry.

(exactly what happened here. the tree has good looking code with references to deprecated options).

we would need isolated regression tests against each registered option (tall order indeed) for this to suffice.

besides - even the Java interfaces were obsoleted or their semantics changed without removing old interfaces. and that's not a 'lack of mechanism' issue - it's a pure software engineering issue.

--

can't complain too much though - it's one or two errant changes of many good ones. thanks to the hadoop team for all the good work! hope the team can articulate and adopt a strict policy of introducing backwards incompatibility when interfaces semantics change.



-----Original Message-----
From: Aaron Kimball [mailto:ak@cs.washington.edu]
Sent: Wed 2/20/2008 8:13 PM
To: core-user@hadoop.apache.org
Subject: Re: changes to compression interfaces in 0.15?
 
I filed a JIRA for this issue:
https://issues.apache.org/jira/browse/HADOOP-2866

- Aaron

Aaron Kimball wrote:
> +1
> 
> Ted Dunning wrote:
>> Actually, it might just be good to have a warning spit out if you use ANY
>> unknown key that starts with mapred.* or any of the other hadoop-specific
>> parameters.
>>
>> That way mis-spellings would be caught as well as deprecations.
>>
>> If you want to set a value and not get a warning, just pick a different
>> prefix.
>>
>> On 2/20/08 6:13 PM, "Jason Venner" <ja...@attributor.com> wrote:
>>
>>> I agree. I am in the midst of combing through the config files for 16 to
>>> see what changes i have to retrofit into our jobs.
>>> Support in the tools to inform of the use of depreciated or outright
>>> removed keys would be wonderful.
>>>
>>> Aaron Kimball wrote:
>>>> As a general follow-up suggestion : Is there a mechanism to output a
>>>> warning when the user sets deprecated JobConf keys? Given that you can
>>>> set any arbitrary key name and it will simply be ignored, this might
>>>> be a good idea.
>>>>
>>>> - Aaron
>>>>
>>>> Joydeep Sen Sarma wrote:
>>>>> In addition:
>>>>>
>>>>> -          "mapred.output.compression.type" is now replaced with
>>>>> "mapred.map.output.compression.type"
>>>>>
>>>>> -          the old implementation of the Java interface
>>>>> setMapOutputCompressorClass() used to turn on map compression on
>>>>> automatically as side-effect, the 0.15 one doesn't. Looks like one has
>>>>> to call setCompressMapOutput() separately.
>>>>>
>>>>>  
>>>>>
>>>>> Aargh.
>>>>>
>>>>>  
>>>>>
>>>>> ________________________________
>>>>>
>>>>> From: hive-devel-bounces@lists.facebook.com
>>>>> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep 
>>>>> Sen
>>>>> Sarma
>>>>> Sent: Wednesday, February 20, 2008 5:06 PM
>>>>> To: core-user@hadoop.apache.org
>>>>> Subject: changes to compression interfaces in 0.15?
>>>>>
>>>>>  
>>>>>
>>>>> Hi developers,
>>>>>
>>>>>  
>>>>>
>>>>> In migrating to 0.15 - i am noticing that the compression interfaces
>>>>> have changed:
>>>>>
>>>>>  
>>>>>
>>>>> -          compression type for sequencefile outputs used to be set 
>>>>> by:
>>>>> SequenceFile.setCompressionType()
>>>>>
>>>>> -          now it seems to be set using:
>>>>> sequenceFileOutputFormat.setOutputCompressionType()
>>>>>
>>>>>  
>>>>>
>>>>> The change is for the better - but would it be possible to:
>>>>>
>>>>>  
>>>>>
>>>>> -          remove old/dead interfaces. That would have been a
>>>>> straightforward hint for applications to look for new interfaces.
>>>>> (hadoop-default.xml also still has setting for old conf variable:
>>>>> io.seqfile.compression.type)
>>>>>
>>>>> -          if possible - document changed interfaces in the release
>>>>> notes (there's no way we can find this out by looking at the long list
>>>>> of Jiras).
>>>>>
>>>>>  
>>>>>
>>>>> As u can imagine - this causes a very subtle and harmful regression in
>>>>> behavior of existing apps. It does not causes failures - and in our 
>>>>> case
>>>>> - switched from BLOCK to RECORD compression - meaning - there's no
>>>>> compression at all pretty much. I caught this by *pure* chance and 
>>>>> now I
>>>>> am living in absolute fear of what else lurks out there.
>>>>>
>>>>>  
>>>>>
>>>>> i am not sure how updated the wiki is on the compression stuff (my
>>>>> responsibility to update it) - but please do consider the impact of
>>>>> changing interfaces on existing applications. (maybe we should have a
>>>>> JIRA tag to mark out bugs that change interfaces).
>>>>>
>>>>>  
>>>>>
>>>>> As always - thanks for all the fish (err .. working code),
>>>>>
>>>>>  
>>>>>
>>>>> Joydeep
>>>>>
>>>>>  
>>>>>
>>>>>
>>


Re: changes to compression interfaces in 0.15?

Posted by Aaron Kimball <ak...@cs.washington.edu>.
I filed a JIRA for this issue:
https://issues.apache.org/jira/browse/HADOOP-2866

- Aaron

Aaron Kimball wrote:
> +1
> 
> Ted Dunning wrote:
>> Actually, it might just be good to have a warning spit out if you use ANY
>> unknown key that starts with mapred.* or any of the other hadoop-specific
>> parameters.
>>
>> That way mis-spellings would be caught as well as deprecations.
>>
>> If you want to set a value and not get a warning, just pick a different
>> prefix.
>>
>> On 2/20/08 6:13 PM, "Jason Venner" <ja...@attributor.com> wrote:
>>
>>> I agree. I am in the midst of combing through the config files for 16 to
>>> see what changes i have to retrofit into our jobs.
>>> Support in the tools to inform of the use of depreciated or outright
>>> removed keys would be wonderful.
>>>
>>> Aaron Kimball wrote:
>>>> As a general follow-up suggestion : Is there a mechanism to output a
>>>> warning when the user sets deprecated JobConf keys? Given that you can
>>>> set any arbitrary key name and it will simply be ignored, this might
>>>> be a good idea.
>>>>
>>>> - Aaron
>>>>
>>>> Joydeep Sen Sarma wrote:
>>>>> In addition:
>>>>>
>>>>> -          "mapred.output.compression.type" is now replaced with
>>>>> "mapred.map.output.compression.type"
>>>>>
>>>>> -          the old implementation of the Java interface
>>>>> setMapOutputCompressorClass() used to turn on map compression on
>>>>> automatically as side-effect, the 0.15 one doesn't. Looks like one has
>>>>> to call setCompressMapOutput() separately.
>>>>>
>>>>>  
>>>>>
>>>>> Aargh.
>>>>>
>>>>>  
>>>>>
>>>>> ________________________________
>>>>>
>>>>> From: hive-devel-bounces@lists.facebook.com
>>>>> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep 
>>>>> Sen
>>>>> Sarma
>>>>> Sent: Wednesday, February 20, 2008 5:06 PM
>>>>> To: core-user@hadoop.apache.org
>>>>> Subject: changes to compression interfaces in 0.15?
>>>>>
>>>>>  
>>>>>
>>>>> Hi developers,
>>>>>
>>>>>  
>>>>>
>>>>> In migrating to 0.15 - i am noticing that the compression interfaces
>>>>> have changed:
>>>>>
>>>>>  
>>>>>
>>>>> -          compression type for sequencefile outputs used to be set 
>>>>> by:
>>>>> SequenceFile.setCompressionType()
>>>>>
>>>>> -          now it seems to be set using:
>>>>> sequenceFileOutputFormat.setOutputCompressionType()
>>>>>
>>>>>  
>>>>>
>>>>> The change is for the better - but would it be possible to:
>>>>>
>>>>>  
>>>>>
>>>>> -          remove old/dead interfaces. That would have been a
>>>>> straightforward hint for applications to look for new interfaces.
>>>>> (hadoop-default.xml also still has setting for old conf variable:
>>>>> io.seqfile.compression.type)
>>>>>
>>>>> -          if possible - document changed interfaces in the release
>>>>> notes (there's no way we can find this out by looking at the long list
>>>>> of Jiras).
>>>>>
>>>>>  
>>>>>
>>>>> As u can imagine - this causes a very subtle and harmful regression in
>>>>> behavior of existing apps. It does not causes failures - and in our 
>>>>> case
>>>>> - switched from BLOCK to RECORD compression - meaning - there's no
>>>>> compression at all pretty much. I caught this by *pure* chance and 
>>>>> now I
>>>>> am living in absolute fear of what else lurks out there.
>>>>>
>>>>>  
>>>>>
>>>>> i am not sure how updated the wiki is on the compression stuff (my
>>>>> responsibility to update it) - but please do consider the impact of
>>>>> changing interfaces on existing applications. (maybe we should have a
>>>>> JIRA tag to mark out bugs that change interfaces).
>>>>>
>>>>>  
>>>>>
>>>>> As always - thanks for all the fish (err .. working code),
>>>>>
>>>>>  
>>>>>
>>>>> Joydeep
>>>>>
>>>>>  
>>>>>
>>>>>
>>

Re: changes to compression interfaces in 0.15?

Posted by Aaron Kimball <ak...@cs.washington.edu>.
+1

Ted Dunning wrote:
> Actually, it might just be good to have a warning spit out if you use ANY
> unknown key that starts with mapred.* or any of the other hadoop-specific
> parameters.
> 
> That way mis-spellings would be caught as well as deprecations.
> 
> If you want to set a value and not get a warning, just pick a different
> prefix.
> 
> On 2/20/08 6:13 PM, "Jason Venner" <ja...@attributor.com> wrote:
> 
>> I agree. I am in the midst of combing through the config files for 16 to
>> see what changes i have to retrofit into our jobs.
>> Support in the tools to inform of the use of depreciated or outright
>> removed keys would be wonderful.
>>
>> Aaron Kimball wrote:
>>> As a general follow-up suggestion : Is there a mechanism to output a
>>> warning when the user sets deprecated JobConf keys? Given that you can
>>> set any arbitrary key name and it will simply be ignored, this might
>>> be a good idea.
>>>
>>> - Aaron
>>>
>>> Joydeep Sen Sarma wrote:
>>>> In addition:
>>>>
>>>> -          "mapred.output.compression.type" is now replaced with
>>>> "mapred.map.output.compression.type"
>>>>
>>>> -          the old implementation of the Java interface
>>>> setMapOutputCompressorClass() used to turn on map compression on
>>>> automatically as side-effect, the 0.15 one doesn't. Looks like one has
>>>> to call setCompressMapOutput() separately.
>>>>
>>>>  
>>>>
>>>> Aargh.
>>>>
>>>>  
>>>>
>>>> ________________________________
>>>>
>>>> From: hive-devel-bounces@lists.facebook.com
>>>> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep Sen
>>>> Sarma
>>>> Sent: Wednesday, February 20, 2008 5:06 PM
>>>> To: core-user@hadoop.apache.org
>>>> Subject: changes to compression interfaces in 0.15?
>>>>
>>>>  
>>>>
>>>> Hi developers,
>>>>
>>>>  
>>>>
>>>> In migrating to 0.15 - i am noticing that the compression interfaces
>>>> have changed:
>>>>
>>>>  
>>>>
>>>> -          compression type for sequencefile outputs used to be set by:
>>>> SequenceFile.setCompressionType()
>>>>
>>>> -          now it seems to be set using:
>>>> sequenceFileOutputFormat.setOutputCompressionType()
>>>>
>>>>  
>>>>
>>>> The change is for the better - but would it be possible to:
>>>>
>>>>  
>>>>
>>>> -          remove old/dead interfaces. That would have been a
>>>> straightforward hint for applications to look for new interfaces.
>>>> (hadoop-default.xml also still has setting for old conf variable:
>>>> io.seqfile.compression.type)
>>>>
>>>> -          if possible - document changed interfaces in the release
>>>> notes (there's no way we can find this out by looking at the long list
>>>> of Jiras).
>>>>
>>>>  
>>>>
>>>> As u can imagine - this causes a very subtle and harmful regression in
>>>> behavior of existing apps. It does not causes failures - and in our case
>>>> - switched from BLOCK to RECORD compression - meaning - there's no
>>>> compression at all pretty much. I caught this by *pure* chance and now I
>>>> am living in absolute fear of what else lurks out there.
>>>>
>>>>  
>>>>
>>>> i am not sure how updated the wiki is on the compression stuff (my
>>>> responsibility to update it) - but please do consider the impact of
>>>> changing interfaces on existing applications. (maybe we should have a
>>>> JIRA tag to mark out bugs that change interfaces).
>>>>
>>>>  
>>>>
>>>> As always - thanks for all the fish (err .. working code),
>>>>
>>>>  
>>>>
>>>> Joydeep
>>>>
>>>>  
>>>>
>>>>
> 

Re: changes to compression interfaces in 0.15?

Posted by Ted Dunning <td...@veoh.com>.
Actually, it might just be good to have a warning spit out if you use ANY
unknown key that starts with mapred.* or any of the other hadoop-specific
parameters.

That way mis-spellings would be caught as well as deprecations.

If you want to set a value and not get a warning, just pick a different
prefix.

On 2/20/08 6:13 PM, "Jason Venner" <ja...@attributor.com> wrote:

> I agree. I am in the midst of combing through the config files for 16 to
> see what changes i have to retrofit into our jobs.
> Support in the tools to inform of the use of depreciated or outright
> removed keys would be wonderful.
> 
> Aaron Kimball wrote:
>> As a general follow-up suggestion : Is there a mechanism to output a
>> warning when the user sets deprecated JobConf keys? Given that you can
>> set any arbitrary key name and it will simply be ignored, this might
>> be a good idea.
>> 
>> - Aaron
>> 
>> Joydeep Sen Sarma wrote:
>>> In addition:
>>> 
>>> -          "mapred.output.compression.type" is now replaced with
>>> "mapred.map.output.compression.type"
>>> 
>>> -          the old implementation of the Java interface
>>> setMapOutputCompressorClass() used to turn on map compression on
>>> automatically as side-effect, the 0.15 one doesn't. Looks like one has
>>> to call setCompressMapOutput() separately.
>>> 
>>>  
>>> 
>>> Aargh.
>>> 
>>>  
>>> 
>>> ________________________________
>>> 
>>> From: hive-devel-bounces@lists.facebook.com
>>> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep Sen
>>> Sarma
>>> Sent: Wednesday, February 20, 2008 5:06 PM
>>> To: core-user@hadoop.apache.org
>>> Subject: changes to compression interfaces in 0.15?
>>> 
>>>  
>>> 
>>> Hi developers,
>>> 
>>>  
>>> 
>>> In migrating to 0.15 - i am noticing that the compression interfaces
>>> have changed:
>>> 
>>>  
>>> 
>>> -          compression type for sequencefile outputs used to be set by:
>>> SequenceFile.setCompressionType()
>>> 
>>> -          now it seems to be set using:
>>> sequenceFileOutputFormat.setOutputCompressionType()
>>> 
>>>  
>>> 
>>> The change is for the better - but would it be possible to:
>>> 
>>>  
>>> 
>>> -          remove old/dead interfaces. That would have been a
>>> straightforward hint for applications to look for new interfaces.
>>> (hadoop-default.xml also still has setting for old conf variable:
>>> io.seqfile.compression.type)
>>> 
>>> -          if possible - document changed interfaces in the release
>>> notes (there's no way we can find this out by looking at the long list
>>> of Jiras).
>>> 
>>>  
>>> 
>>> As u can imagine - this causes a very subtle and harmful regression in
>>> behavior of existing apps. It does not causes failures - and in our case
>>> - switched from BLOCK to RECORD compression - meaning - there's no
>>> compression at all pretty much. I caught this by *pure* chance and now I
>>> am living in absolute fear of what else lurks out there.
>>> 
>>>  
>>> 
>>> i am not sure how updated the wiki is on the compression stuff (my
>>> responsibility to update it) - but please do consider the impact of
>>> changing interfaces on existing applications. (maybe we should have a
>>> JIRA tag to mark out bugs that change interfaces).
>>> 
>>>  
>>> 
>>> As always - thanks for all the fish (err .. working code),
>>> 
>>>  
>>> 
>>> Joydeep
>>> 
>>>  
>>> 
>>> 


Re: changes to compression interfaces in 0.15?

Posted by Jason Venner <ja...@attributor.com>.
I agree. I am in the midst of combing through the config files for 16 to 
see what changes i have to retrofit into our jobs.
Support in the tools to inform of the use of depreciated or outright 
removed keys would be wonderful.

Aaron Kimball wrote:
> As a general follow-up suggestion : Is there a mechanism to output a 
> warning when the user sets deprecated JobConf keys? Given that you can 
> set any arbitrary key name and it will simply be ignored, this might 
> be a good idea.
>
> - Aaron
>
> Joydeep Sen Sarma wrote:
>> In addition:
>>
>> -          "mapred.output.compression.type" is now replaced with
>> "mapred.map.output.compression.type"
>>
>> -          the old implementation of the Java interface
>> setMapOutputCompressorClass() used to turn on map compression on
>> automatically as side-effect, the 0.15 one doesn't. Looks like one has
>> to call setCompressMapOutput() separately.
>>
>>  
>>
>> Aargh.
>>
>>  
>>
>> ________________________________
>>
>> From: hive-devel-bounces@lists.facebook.com
>> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep Sen
>> Sarma
>> Sent: Wednesday, February 20, 2008 5:06 PM
>> To: core-user@hadoop.apache.org
>> Subject: changes to compression interfaces in 0.15?
>>
>>  
>>
>> Hi developers,
>>
>>  
>>
>> In migrating to 0.15 - i am noticing that the compression interfaces
>> have changed:
>>
>>  
>>
>> -          compression type for sequencefile outputs used to be set by:
>> SequenceFile.setCompressionType()
>>
>> -          now it seems to be set using:
>> sequenceFileOutputFormat.setOutputCompressionType()
>>
>>  
>>
>> The change is for the better - but would it be possible to:
>>
>>  
>>
>> -          remove old/dead interfaces. That would have been a
>> straightforward hint for applications to look for new interfaces.
>> (hadoop-default.xml also still has setting for old conf variable:
>> io.seqfile.compression.type)
>>
>> -          if possible - document changed interfaces in the release
>> notes (there's no way we can find this out by looking at the long list
>> of Jiras).
>>
>>  
>>
>> As u can imagine - this causes a very subtle and harmful regression in
>> behavior of existing apps. It does not causes failures - and in our case
>> - switched from BLOCK to RECORD compression - meaning - there's no
>> compression at all pretty much. I caught this by *pure* chance and now I
>> am living in absolute fear of what else lurks out there.
>>
>>  
>>
>> i am not sure how updated the wiki is on the compression stuff (my
>> responsibility to update it) - but please do consider the impact of
>> changing interfaces on existing applications. (maybe we should have a
>> JIRA tag to mark out bugs that change interfaces).
>>
>>  
>>
>> As always - thanks for all the fish (err .. working code),
>>
>>  
>>
>> Joydeep
>>
>>  
>>
>>
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: changes to compression interfaces in 0.15?

Posted by Aaron Kimball <ak...@cs.washington.edu>.
As a general follow-up suggestion : Is there a mechanism to output a 
warning when the user sets deprecated JobConf keys? Given that you can 
set any arbitrary key name and it will simply be ignored, this might be 
a good idea.

- Aaron

Joydeep Sen Sarma wrote:
> In addition:
> 
> -          "mapred.output.compression.type" is now replaced with
> "mapred.map.output.compression.type"
> 
> -          the old implementation of the Java interface
> setMapOutputCompressorClass() used to turn on map compression on
> automatically as side-effect, the 0.15 one doesn't. Looks like one has
> to call setCompressMapOutput() separately.
> 
>  
> 
> Aargh.
> 
>  
> 
> ________________________________
> 
> From: hive-devel-bounces@lists.facebook.com
> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep Sen
> Sarma
> Sent: Wednesday, February 20, 2008 5:06 PM
> To: core-user@hadoop.apache.org
> Subject: changes to compression interfaces in 0.15?
> 
>  
> 
> Hi developers,
> 
>  
> 
> In migrating to 0.15 - i am noticing that the compression interfaces
> have changed:
> 
>  
> 
> -          compression type for sequencefile outputs used to be set by:
> SequenceFile.setCompressionType()
> 
> -          now it seems to be set using:
> sequenceFileOutputFormat.setOutputCompressionType()
> 
>  
> 
> The change is for the better - but would it be possible to:
> 
>  
> 
> -          remove old/dead interfaces. That would have been a
> straightforward hint for applications to look for new interfaces.
> (hadoop-default.xml also still has setting for old conf variable:
> io.seqfile.compression.type)
> 
> -          if possible - document changed interfaces in the release
> notes (there's no way we can find this out by looking at the long list
> of Jiras).
> 
>  
> 
> As u can imagine - this causes a very subtle and harmful regression in
> behavior of existing apps. It does not causes failures - and in our case
> - switched from BLOCK to RECORD compression - meaning - there's no
> compression at all pretty much. I caught this by *pure* chance and now I
> am living in absolute fear of what else lurks out there.
> 
>  
> 
> i am not sure how updated the wiki is on the compression stuff (my
> responsibility to update it) - but please do consider the impact of
> changing interfaces on existing applications. (maybe we should have a
> JIRA tag to mark out bugs that change interfaces).
> 
>  
> 
> As always - thanks for all the fish (err .. working code),
> 
>  
> 
> Joydeep
> 
>  
> 
> 

RE: changes to compression interfaces in 0.15?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
In addition:

-          "mapred.output.compression.type" is now replaced with
"mapred.map.output.compression.type"

-          the old implementation of the Java interface
setMapOutputCompressorClass() used to turn on map compression on
automatically as side-effect, the 0.15 one doesn't. Looks like one has
to call setCompressMapOutput() separately.

 

Aargh.

 

________________________________

From: hive-devel-bounces@lists.facebook.com
[mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep Sen
Sarma
Sent: Wednesday, February 20, 2008 5:06 PM
To: core-user@hadoop.apache.org
Subject: changes to compression interfaces in 0.15?

 

Hi developers,

 

In migrating to 0.15 - i am noticing that the compression interfaces
have changed:

 

-          compression type for sequencefile outputs used to be set by:
SequenceFile.setCompressionType()

-          now it seems to be set using:
sequenceFileOutputFormat.setOutputCompressionType()

 

The change is for the better - but would it be possible to:

 

-          remove old/dead interfaces. That would have been a
straightforward hint for applications to look for new interfaces.
(hadoop-default.xml also still has setting for old conf variable:
io.seqfile.compression.type)

-          if possible - document changed interfaces in the release
notes (there's no way we can find this out by looking at the long list
of Jiras).

 

As u can imagine - this causes a very subtle and harmful regression in
behavior of existing apps. It does not causes failures - and in our case
- switched from BLOCK to RECORD compression - meaning - there's no
compression at all pretty much. I caught this by *pure* chance and now I
am living in absolute fear of what else lurks out there.

 

i am not sure how updated the wiki is on the compression stuff (my
responsibility to update it) - but please do consider the impact of
changing interfaces on existing applications. (maybe we should have a
JIRA tag to mark out bugs that change interfaces).

 

As always - thanks for all the fish (err .. working code),

 

Joydeep