You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/05/12 17:46:09 UTC

[jira] Created: (HADOOP-212) allow changes to dfs block size

allow changes to dfs block size
-------------------------------

         Key: HADOOP-212
         URL: http://issues.apache.org/jira/browse/HADOOP-212
     Project: Hadoop
        Type: Improvement

  Components: dfs  
    Versions: 0.2    
    Reporter: Owen O'Malley
 Assigned to: Owen O'Malley 
    Priority: Critical
     Fix For: 0.3


Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
  1. Change the default block size to 64 * 1024 * 1024.
  2. Add the config variable dfs.block.size that sets the default block size.
  3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
  4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
  5. Add a new method to FileSytem.getBlockSize that takes a pathname.
  6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
  7. Have the InputFormatBase use the blocksize of each file to determine the split size.

Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Closed: (HADOOP-212) allow changes to dfs block size

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-212?page=all ]
     
Doug Cutting closed HADOOP-212:
-------------------------------


> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3.0
>  Attachments: TEST-org.apache.hadoop.fs.TestCopyFiles.txt, dfs-blocksize-2.patch, dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12402418 ] 

Owen O'Malley commented on HADOOP-212:
--------------------------------------

Ok, I found the problem. I'll create a new patch.

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3
>  Attachments: TEST-org.apache.hadoop.fs.TestCopyFiles.txt, dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "alan wootton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12383319 ] 

alan wootton commented on HADOOP-212:
-------------------------------------

Ok, I get it now. Even though it's currently impossible for any block, except the last block of a file, to be anything other than 32mb it looks like the system would support it.

We need to remove all references to BLOCK_SIZE. 

I see some problems. FSSataSet doesn't know which file it's working with. It always uses BLOCK_SIZE.  DFSClient.DFSOutputStream.write() has same problem. 

I'll vote yes.





> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3

>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "alan wootton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12383219 ] 

alan wootton commented on HADOOP-212:
-------------------------------------

Are there two issues here?

I can see a need to change the default block size for a DFS. In my case I'd like to write unit tests with small block sizes to check dfs code for bugs. 

I don't see the need for files to have their own sizes. Does this not introduce another 'moving part' to the dfs, and even more possibilities for bugs?  

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3

>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12402416 ] 

Doug Cutting commented on HADOOP-212:
-------------------------------------

Milind, I don't think these 'failed to create directory' messages are the problem.  That unit test succeeeds w/o this patch and fails with it.  In either case the unit test prints these messages.  I think these messages are because the directories already exist, so new attempts to create them fail, but I have not yet looked closely at that.

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3
>  Attachments: TEST-org.apache.hadoop.fs.TestCopyFiles.txt, dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-212) allow changes to dfs block size

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-212?page=all ]
     
Doug Cutting resolved HADOOP-212:
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Owen!

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3
>  Attachments: TEST-org.apache.hadoop.fs.TestCopyFiles.txt, dfs-blocksize-2.patch, dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12383297 ] 

Owen O'Malley commented on HADOOP-212:
--------------------------------------

Sure, look at DFSInputStream.blockSeekTo. You can't generate files that look like that, but the infrastructure supports them.

I'm only trying to expose setting the block size for a given file when it is being created. I don't want to expose changing it _within_ a file.

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3

>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "alan wootton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12383285 ] 

alan wootton commented on HADOOP-212:
-------------------------------------

Variable block sizes within a file? I don't see that at all. I can see some checks being made to ensure that the size being sent matches what a datanode thinks is the correct size, but don't see any way, at all, that all Block's don't have a max size of 32 mb everywhere.

Isn't adding a 'blocksize' param to the create method of DFSClient exposing variable block sizes to the client? 

Are we talking about the same thing?

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3

>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-212) allow changes to dfs block size

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-212?page=all ]

Owen O'Malley updated HADOOP-212:
---------------------------------

    Attachment: dfs-blocksize-2.patch

This adds an additional check for null on a file's block list.

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3
>  Attachments: TEST-org.apache.hadoop.fs.TestCopyFiles.txt, dfs-blocksize-2.patch, dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12402410 ] 

Milind Bhandarkar commented on HADOOP-212:
------------------------------------------

Doug, the problem is occuring much earlier before NPE:

060515 123400 DIR* FSDirectory.mkdirs: failed to create directory /srcdat

Looks like your DFS config has a root directory not writable for you. Because the test is trying to create dfs://localhost:65314/srcdat. I am using org.apache.hadoop.dfs.MiniDFSCluster for testing copying across dfs.


> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3
>  Attachments: TEST-org.apache.hadoop.fs.TestCopyFiles.txt, dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-212) allow changes to dfs block size

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-212?page=all ]

Doug Cutting updated HADOOP-212:
--------------------------------

    Attachment: TEST-org.apache.hadoop.fs.TestCopyFiles.txt

Overall, this looks great and is much needed.  Unfortunately I'm getting some null pointer exceptions running unit tests with this patch.  I've not yet tried to debug these...

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3
>  Attachments: TEST-org.apache.hadoop.fs.TestCopyFiles.txt, dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-212) allow changes to dfs block size

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-212?page=comments#action_12383223 ] 

Owen O'Malley commented on HADOOP-212:
--------------------------------------

As you point out, it is possible to just make configuration variable and use it everywhere. The problem is that you become very sensititve to differences in the configuration between nodes. It seemed less error prone to leave the client in charge of block size and consistently use their setting.

Under the hood, dfs currently supports variable block sizes within a file, but I certainly do _not_ want to expose that to the user-visible APIs.

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3

>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-212) allow changes to dfs block size

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-212?page=all ]

Owen O'Malley updated HADOOP-212:
---------------------------------

    Attachment: dfs-blocksize.patch

Ok, here is the patch.

Changes dfs block size from a compile time constant to a parameter that is set when a file is created.

1. FileSystem.getBlockSize becomes getDefaultBlockSize
2. A new method FileSystem.getBlockSize(path) finds the blocksize of a file.
3. Block size is added to FileSystem.create
4. InputFormatBase uses the block size of each file rather than the global constant.
5. I followed the convention of using DfsPath to cache meta information values associatied with the dfs file.
6. FileUnderConstruction records the block size
7. Removed check to make sure that the block size was shorter than the global value.
8. Add a new value

> allow changes to dfs block size
> -------------------------------
>
>          Key: HADOOP-212
>          URL: http://issues.apache.org/jira/browse/HADOOP-212
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Critical
>      Fix For: 0.3
>  Attachments: dfs-blocksize.patch
>
> Trying to change the DFS block size, led the realization that the 32,000,000 was hard coded into the source code. I propose:
>   1. Change the default block size to 64 * 1024 * 1024.
>   2. Add the config variable dfs.block.size that sets the default block size.
>   3. Add a parameter to the FileSystem, DFSClient, and ClientProtocol create method that let's the user control the block size.
>   4. Rename the FileSystem.getBlockSize to getDefaultBlockSize.
>   5. Add a new method to FileSytem.getBlockSize that takes a pathname.
>   6. Use long for the block size in the API, which is what was used before. However, the implementation will not work if block size is set bigger than 2**31.
>   7. Have the InputFormatBase use the blocksize of each file to determine the split size.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira