You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/02/21 23:30:49 UTC

[jira] Created: (HADOOP-51) per-file replication counts

per-file replication counts
---------------------------

         Key: HADOOP-51
         URL: http://issues.apache.org/jira/browse/HADOOP-51
     Project: Hadoop
        Type: New Feature
  Components: dfs  
    Versions: 0.1    
    Reporter: Doug Cutting
     Fix For: 0.1


It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Resolved: (HADOOP-51) per-file replication counts

Posted by "Bryan A. Pendleton" <bp...@geekdom.net>.
I agree: Good work Konstantin!

I'll file new feature requests for the sticky bits from the discussion.

On 4/10/06, Doug Cutting (JIRA) <ji...@apache.org> wrote:
>
>      [ http://issues.apache.org/jira/browse/HADOOP-51?page=all ]
>
> Doug Cutting resolved HADOOP-51:
> --------------------------------
>
>     Resolution: Fixed
>
> I just committed this.  I fixed the comment on dfs.replication.min.  I
> also added a message to CHANGES.txt.
>
> Thanks, Konstantin!
>
> Bryan: I think the issues you raise bear further discussion that I do not
> wish to stifle.  For example, we may someday want  to be able to specify
> more than 2^16 replications, and we may wish to handle replication requests
> outside of the configured limits differently.  But, for now, I think the
> patch fixes this bug and that those issues can be addressed though
> subsequent bugs as we gain experience.  So please file new bugs for any
> related issues that are important to you.
>
> > per-file replication counts
> > ---------------------------
> >
> >          Key: HADOOP-51
> >          URL: http://issues.apache.org/jira/browse/HADOOP-51
> >      Project: Hadoop
> >         Type: New Feature
>
> >   Components: dfs
> >     Versions: 0.2
> >     Reporter: Doug Cutting
> >     Assignee: Konstantin Shvachko
> >      Fix For: 0.2
> >  Attachments: Replication.patch
> >
> > It should be possible to specify different replication counts for
> different files.  Perhaps an option when creating a new file should be the
> desired replication count.  MapReduce should take advantage of this feature
> so that job.xml and job.jar files, which are frequently accessed by lots
> of machines, are more highly replicated than large data files.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>


--
Bryan A. Pendleton
Ph: (877) geek-1-bp

Re: [jira] Commented: (HADOOP-51) per-file replication counts

Posted by "Bryan A. Pendleton" <bp...@geekdom.net>.
I realize that the value will come from hadoop-default.xml when it's not
defined elsewhere. I guess I was just suggesting that it makes the code
less-clear to have an explicit default in cases where the value *would* be
provided from a -default file, or the equivalent.

Having spent enough time reading others' code over the years, figuring out
*why* a value gets set, especially with a fancy configuration environment
(yes, hadoop's configuration is definitely in the "fancy" realm) by walking
the code is a tricky operation. My comment was merely that we might want to
have a better way of being explicit when things are really "system
defaults". The idiom used makes plenty of sense for user-created or
rarely-set values which have no specific defaults. Just my $0.02, though,
perhaps no one else agrees.

On 4/9/06, Doug Cutting (JIRA) <ji...@apache.org> wrote:
>
>     [
> http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373803]
>
> Doug Cutting commented on HADOOP-51:
> ------------------------------------
>
> > the idiom of conf.getType("config.value",defaultValue) is good for
> user-defined values, but shouldn't the default be skipped for things that
> are defined in hadoop-default.xml, in general?
>
> The value from hadoop-default.xml is used in preference to the
> defaultValue paramter.  The paramter is only used as a last resort when no
> value is found in hadoop-default.xml or any other config file.
>
> > per-file replication counts
> > ---------------------------
> >
> >          Key: HADOOP-51
> >          URL: http://issues.apache.org/jira/browse/HADOOP-51
> >      Project: Hadoop
> >         Type: New Feature
>
> >   Components: dfs
> >     Versions: 0.2
> >     Reporter: Doug Cutting
> >     Assignee: Konstantin Shvachko
> >      Fix For: 0.2
> >  Attachments: Replication.patch
> >
> > It should be possible to specify different replication counts for
> different files.  Perhaps an option when creating a new file should be the
> desired replication count.  MapReduce should take advantage of this feature
> so that job.xml and job.jar files, which are frequently accessed by lots
> of machines, are more highly replicated than large data files.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>


--
Bryan A. Pendleton
Ph: (877) geek-1-bp

[jira] Commented: (HADOOP-51) per-file replication counts

Posted by "paul sutter (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373565 ] 

paul sutter commented on HADOOP-51:
-----------------------------------

+1 that

it might be easier to use on a per-directory basis, examples:

- /tmp directory, replication count 2 (or 1!), a good place for the output of intermediate reduce steps
- /cached directory, infinite replicaton count, a good place for lookup files used in mappers or reducers


> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2

>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-51) per-file replication counts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373803 ] 

Doug Cutting commented on HADOOP-51:
------------------------------------

> the idiom of conf.getType("config.value",defaultValue) is good for user-defined values, but shouldn't the default be skipped for things that are defined in hadoop-default.xml, in general?

The value from hadoop-default.xml is used in preference to the defaultValue paramter.  The paramter is only used as a last resort when no value is found in hadoop-default.xml or any other config file.

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2
>  Attachments: Replication.patch
>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: Bug fixes not promoted since Friday.

Posted by Doug Cutting <cu...@apache.org>.
Konstantin Shvachko wrote:
> Could you please take a look at three JIRA issues that sitting there for 
> quite a while now:
> 
> http://issues.apache.org/jira/browse/HADOOP-42
> http://issues.apache.org/jira/browse/HADOOP-40
> http://issues.apache.org/jira/browse/HADOOP-33

I have now committed one and added comments to the other two.

Cheers,

Doug

Bug fixes not promoted since Friday.

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Doug,

Could you please take a look at three JIRA issues that sitting there for 
quite a while now:

http://issues.apache.org/jira/browse/HADOOP-42
http://issues.apache.org/jira/browse/HADOOP-40
http://issues.apache.org/jira/browse/HADOOP-33

The first two are actually bugs that produce incorrect results as I 
write it.
The last one is an improvement, which would not heart any existing 
functionality.
It would be good to resolve them one way or another.

Thanks,

--Konst


Sameer Paranjpye (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/HADOOP-51?page=all ]
>
>Sameer Paranjpye reassigned HADOOP-51:
>--------------------------------------
>
>    Assign To: Konstantin Shvachko  (was: Sameer Paranjpye)
>
>  
>
>>per-file replication counts
>>---------------------------
>>
>>         Key: HADOOP-51
>>         URL: http://issues.apache.org/jira/browse/HADOOP-51
>>     Project: Hadoop
>>        Type: New Feature
>>  Components: dfs
>>    Versions: 0.1
>>    Reporter: Doug Cutting
>>    Assignee: Konstantin Shvachko
>>     Fix For: 0.1
>>    
>>
>
>  
>
>>It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.
>>    
>>
>
>  
>


[jira] Assigned: (HADOOP-51) per-file replication counts

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-51?page=all ]

Sameer Paranjpye reassigned HADOOP-51:
--------------------------------------

    Assign To: Konstantin Shvachko  (was: Sameer Paranjpye)

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.1

>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-51) per-file replication counts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-51?page=all ]

Doug Cutting updated HADOOP-51:
-------------------------------

    Assign To: Sameer Paranjpye

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Sameer Paranjpye
>      Fix For: 0.1

>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-51) per-file replication counts

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-51?page=all ]

Sameer Paranjpye updated HADOOP-51:
-----------------------------------

    Fix Version: 0.2
                     (was: 0.1)
        Version: 0.2
                     (was: 0.1)

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2

>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-51) per-file replication counts

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12367456 ] 

Konstantin Shvachko commented on HADOOP-51:
-------------------------------------------

I propose to add 2 functions.
1) An additional create() method that would take an extra parameter for that file replication.
This should be a new method in order to retain existing programs in working condition.
2) A new setReplication() method to be able to change replication for existing files.
It will also require an additional field in the INode class to store replication for each file
on the namenode.




> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.1

>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-51) per-file replication counts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12367560 ] 

Doug Cutting commented on HADOOP-51:
------------------------------------

This sounds like a good plan to me.  +1

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.1

>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Commented: (HADOOP-51) per-file replication counts

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
"Full" replication is a good idea, but I suggest we file it as a new  
bug/enhancement.

Actually placing a copy of a file on every node is probably rarely  
the right thing to do for "full" replication.  One copy per switch  
would be my preferred default on our clusters (gigabit switches) and  
for .JAR files squareroot(numNodes) is probably the right answer.

e14

On Apr 8, 2006, at 12:16 PM, Bryan Pendleton (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/HADOOP-51? 
> page=comments#action_12373745 ]
>
> Bryan Pendleton commented on HADOOP-51:
> ---------------------------------------
>
> Great!
>
> A few comments from reading the patch (haven't test with it yet):
> 1) The <description> for dfs.replication.min is wrong
> 2) This is a wider concern, but on coding style - the idiom of  
> conf.getType("config.value",defaultValue) is good for user-defined  
> values, but shouldn't the default be skipped for things that are  
> defined in hadoop-default.xml, in general? It takes away the value  
> of hadoop-default, and it also means changing that value might or  
> might not always have the desired system-wide results.
> 3) Wouldn't it be better to log at a severe level replications that  
> are set below minReplication, or greater than maxReplication, and  
> just set the replication to the nearest bound? Since replication is  
> set per-file by the application, but min and max are probably set  
> by the administrator of the hadoop cluster. Throwing an IOException  
> causes failure where degraded performance would be preferable.
> 4) I may be dense, but I didn't see any way to specify that  
> replication be "full", ie, a copy per datanode. I got the feeling  
> this was something that was desired of this functionality (ie, for  
> job.jar files, job configs, and lookup data used widely in a job)  
> Using a short means, if we ever scale to > 32k nodes, there'd be no  
> way to manually specify this. Just using Short.MAX_VALUE means  
> getting a lot of errors about not being able to replicate as fully  
> as desired.
>
> Otherwise, this looks like a wonderful patch!
>
>> per-file replication counts
>> ---------------------------
>>
>>          Key: HADOOP-51
>>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>>      Project: Hadoop
>>         Type: New Feature
>
>>   Components: dfs
>>     Versions: 0.2
>>     Reporter: Doug Cutting
>>     Assignee: Konstantin Shvachko
>>      Fix For: 0.2
>>  Attachments: Replication.patch
>>
>> It should be possible to specify different replication counts for  
>> different files.  Perhaps an option when creating a new file  
>> should be the desired replication count.  MapReduce should take  
>> advantage of this feature so that job.xml and job.jar files, which  
>> are frequently accessed by lots of machines, are more highly  
>> replicated than large data files.
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>


[jira] Commented: (HADOOP-51) per-file replication counts

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373745 ] 

Bryan Pendleton commented on HADOOP-51:
---------------------------------------

Great!

A few comments from reading the patch (haven't test with it yet):
1) The <description> for dfs.replication.min is wrong
2) This is a wider concern, but on coding style - the idiom of conf.getType("config.value",defaultValue) is good for user-defined values, but shouldn't the default be skipped for things that are defined in hadoop-default.xml, in general? It takes away the value of hadoop-default, and it also means changing that value might or might not always have the desired system-wide results.
3) Wouldn't it be better to log at a severe level replications that are set below minReplication, or greater than maxReplication, and just set the replication to the nearest bound? Since replication is set per-file by the application, but min and max are probably set by the administrator of the hadoop cluster. Throwing an IOException causes failure where degraded performance would be preferable.
4) I may be dense, but I didn't see any way to specify that replication be "full", ie, a copy per datanode. I got the feeling this was something that was desired of this functionality (ie, for job.jar files, job configs, and lookup data used widely in a job) Using a short means, if we ever scale to > 32k nodes, there'd be no way to manually specify this. Just using Short.MAX_VALUE means getting a lot of errors about not being able to replicate as fully as desired.

Otherwise, this looks like a wonderful patch!

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2
>  Attachments: Replication.patch
>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-51) per-file replication counts

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-51?page=all ]

Konstantin Shvachko updated HADOOP-51:
--------------------------------------

    Attachment: Replication.patch

Here is rather big patch. The changes are.

- Create methods include new parameter "short replication"
- If replication is not specified the default replication is used.
- The namenode stores and maintains replication for each file separately.
- File replication can be obtained from the namenode as a part of DFSFileInfo.
- 2 new namenode config parameters
   dfs.replication.max
   dfs.replication.min
which are checked when a new file is created.
- Namenode image and edit log file format are modified. Both contain version
number at the beginning now. The versions are negative. I started from version -1.
When the namenode starts you should expect that your current dfs image will be
loaded and converted into new format. All old files will have the same default
replication equal to the value of dfs.replication of your config.



> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2
>  Attachments: Replication.patch
>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-51) per-file replication counts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-51?page=all ]
     
Doug Cutting resolved HADOOP-51:
--------------------------------

    Resolution: Fixed

I just committed this.  I fixed the comment on dfs.replication.min.  I also added a message to CHANGES.txt.

Thanks, Konstantin!

Bryan: I think the issues you raise bear further discussion that I do not wish to stifle.  For example, we may someday want  to be able to specify more than 2^16 replications, and we may wish to handle replication requests outside of the configured limits differently.  But, for now, I think the patch fixes this bug and that those issues can be addressed though subsequent bugs as we gain experience.  So please file new bugs for any related issues that are important to you.

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2
>  Attachments: Replication.patch
>
> It should be possible to specify different replication counts for different files.  Perhaps an option when creating a new file should be the desired replication count.  MapReduce should take advantage of this feature so that job.xml and job.jar files, which are frequently accessed by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira