You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Chuan Liu (JIRA)" <ji...@apache.org> on 2012/07/05 20:50:34 UTC

[jira] [Created] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Chuan Liu created HADOOP-8564:
---------------------------------

             Summary: Create a Windows native InputStream class to address datanode concurrent reading and writing issue
                 Key: HADOOP-8564
                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 1-win
            Reporter: Chuan Liu
            Assignee: Chuan Liu


HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
{code} 
        if ( ! metaData.renameTo( newmeta ) ||
            ! src.renameTo( dest ) ) {
          throw new IOException( "could not move files for " + b +
                                 " from tmp to " + 
                                 dest.getAbsolutePath() );
        }
{code}

Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.

The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.


As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
# Use different mechanism in the application in dealing with files.
# Create a new implementation of InputStream abstract class using Windows native code.
# Patch JDK with a private patch that alters FileInputStream behavior.

For the third option, it cannot fix the problem for users using Oracle JDK.

We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.

For all the reasons discussed above, we will use the second approach to address the problem.

If there are better options to fix the problem, we would also like to hear about them.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Reopened] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas reopened HADOOP-8564:
-------------------------------------

    
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win-newfiles.patch, HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas reopened HADOOP-8564:
-------------------------------------


Reopened because I had reverted the patch earlier due to build issue.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas resolved HADOOP-8564.
-------------------------------------

    Resolution: Fixed

I committed the patch.

Thank you Chuan.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas resolved HADOOP-8564.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1-win
     Hadoop Flags: Reviewed

+1 for the patch. I committed it to branch-1-win. Thank you Chuan.

Todd, please do post your comments. It will be addressed by a separate jira.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas resolved HADOOP-8564.
-------------------------------------

    Resolution: Fixed

I committed additional files missed in previous commit to branch-1-win.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win-newfiles.patch, HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira