You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Chuan Liu (JIRA)" <ji...@apache.org> on 2012/07/05 20:50:34 UTC

[jira] [Created] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Chuan Liu created HADOOP-8564:
---------------------------------

Summary: Create a Windows native InputStream class to address datanode concurrent reading and writing issue
Key: HADOOP-8564
URL: https://issues.apache.org/jira/browse/HADOOP-8564
Project: Hadoop Common
Issue Type: Bug
Components: io
Affects Versions: 1-win
Reporter: Chuan Liu
Assignee: Chuan Liu

HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
{code}
if ( ! metaData.renameTo( newmeta ) ||
! src.renameTo( dest ) ) {
throw new IOException( "could not move files for " + b +
" from tmp to " +
dest.getAbsolutePath() );
}
{code}

Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.

The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows . The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.

As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
# Use different mechanism in the application in dealing with files.
# Create a new implementation of InputStream abstract class using Windows native code.
# Patch JDK with a private patch that alters FileInputStream behavior.

For the third option, it cannot fix the problem for users using Oracle JDK.

We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive. Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.

For all the reasons discussed above, we will use the second approach to address the problem.

If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas resolved HADOOP-8564.
-------------------------------------

    Resolution: Fixed

I committed the patch.

Thank you Chuan.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478658#comment-13478658 ] 

Chuan Liu commented on HADOOP-8564:
-----------------------------------

Suresh, sorry for the break.

This indeed depends on another JIRA HADOOP-8763.

FileUtil.setOwner() method was added in that JIRA.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408283#comment-13408283 ] 

Chuan Liu commented on HADOOP-8564:
-----------------------------------

{quote}
Can this be merged into the existing NativeIO JNI library? Or are the number of #ifdef WINDOWS macros required so numerous that we should just have two entirely separate libhadoops?
{quote}
NativeIO JNI library is only available on Linux while this class is only needed on Windows. I think it make sense to create a separate native lib file. We don't necessary need to name it libhadoop. For example, if the class is called 'WindowsFileInputStream', the new lib could be 'WindowsFileInputStream.dll'. Is there any concern over this? E.g. you want to reduce native library files exposed in Hadoop in general?
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409818#comment-13409818 ] 

Todd Lipcon commented on HADOOP-8564:
-------------------------------------

Yes, that makes sense. I was thinking that it makes sense to tackle #1 first -- even if it means that many of the native pieces are disabled for now on Windows. That way we only have to fix the build in one place, rather than adding a new build component and later merging the two builds.
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas reopened HADOOP-8564:
-------------------------------------

    
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win-newfiles.patch, HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated HADOOP-8564:
------------------------------

    Attachment: HADOOP-8564-branch-1-win-newfiles.patch

I forgot to include two new Windows build files. Attach a new patch of the two missing files.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win-newfiles.patch, HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478340#comment-13478340 ] 

Suresh Srinivas commented on HADOOP-8564:
-----------------------------------------

I will commit this patch, since there are other patches that are dependent on this. 

One you post your review comments, it can be addressed in another Jira. 
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>         Attachments: HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413853#comment-13413853 ] 

Chuan Liu commented on HADOOP-8564:
-----------------------------------

Hi Todd, I did some further investigation and coding. A basic porting of NativeIO is not working on Windows. However I did see some issues for a cross platform native library. For example, the open() method and fstat() method have some flags that are mostly specific to Linux or POSIX world. It may be difficult to find an exact mapping of those flags to Windows equivalent. Does it make sense to create sub classes to separate Unix/POSIX and Windows functions? E.g. we can have NativeIO.POSIX.open() and NativeIO.Windows.createFile(). However this will break existing APIs. I am not sure what is the best way to proceed.
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408360#comment-13408360 ] 

Chuan Liu commented on HADOOP-8564:
-----------------------------------

Hi Todd, thanks for the clarification. I see you point now. However I think there are three things here.

# Make existing NativeIO works on Windows.
# Create new Windows native IO functionality that solves the above issue.
# Build and organize the code/lib so that we have a central place for the native code.

For this Jira, we only intend to solve 2. I agree with you on 1. For 3, I can see both pros and cons. But once 1 is done, there should be only modest work to create a common lib for all native code. Does this make sense to you?
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472989#comment-13472989 ] 

Chuan Liu commented on HADOOP-8564:
-----------------------------------

Attaching a patch and update the JIRA title to reflect the change.

We port and extend Hadoop native libraries to Windows.

The POSIX native functions and flags are moved under the new nested class NativeIO.POSIX. The Windows functions and flags are created under NativeIO.Windows.

We spent some time on investigating how to map POSIX APIs, specially all the flags to Windows equivalent. However, this seems very difficult if even possible given all the IO options and error codes.

Instead, we created some special IO functions in NativeIO, i.e. getShareDeleteFileInputStream(), getCreateForWriteFileOutputStream() that abstract the IO usage pattern.

We changed the related data node functions to use the new native library functions to get the desired I/O streams.

Some new test cases are added to TestNativeIO. TestFileConcurrentReader is fixed to test concurrent reading and writing scenarios.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>         Attachments: HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas resolved HADOOP-8564.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1-win
     Hadoop Flags: Reviewed

+1 for the patch. I committed it to branch-1-win. Thank you Chuan.

Todd, please do post your comments. It will be addressed by a separate jira.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407508#comment-13407508 ] 

Todd Lipcon commented on HADOOP-8564:
-------------------------------------

Can this be merged into the existing NativeIO JNI library? Or are the number of {{#ifdef WINDOWS}} macros required so numerous that we should just have two entirely separate libhadoops?
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477648#comment-13477648 ] 

Suresh Srinivas commented on HADOOP-8564:
-----------------------------------------

Todd, please post if you have any comments. Otherwise I am going to commit this tomorrow.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>         Attachments: HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478207#comment-13478207 ] 

Todd Lipcon commented on HADOOP-8564:
-------------------------------------

Sorry, I missed that the new patch was uploaded. Can I have a couple days to review it? It's a big patch. If you want to go ahead and commit to the branch, that's OK so long as review feedback can be addressed afterwards.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>         Attachments: HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas resolved HADOOP-8564.
-------------------------------------

    Resolution: Fixed

I committed additional files missed in previous commit to branch-1-win.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win-newfiles.patch, HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408323#comment-13408323 ] 

Todd Lipcon commented on HADOOP-8564:
-------------------------------------

bq. NativeIO JNI library is only available on Linux while this class is only needed on Windows. I think it make sense to create a separate native lib file. We don't necessary need to name it libhadoop. For example, if the class is called 'WindowsFileInputStream', the new lib could be 'WindowsFileInputStream.dll'. Is there any concern over this? E.g. you want to reduce native library files exposed in Hadoop in general?

Currently NativeIO JNI is Linux-only, but I think all of the stuff found in there is useful on Windows as well. For example:
- Native CRC32 computation: the SSE instructions probably need slightly different syntax for the Windows C++ compiler, but are necessary for good performance
- Various other flags to open() needed for race-condition free security support: probably needs different APIs in Windows but likely there are equivalents available
- Compression: Windows equally needs fast compression libraries, etc

So, I think it makes sense to get libhadoop generally compiling on Windows and making it the central place for native dependency code.
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas reopened HADOOP-8564:
-------------------------------------


Reopened because I had reverted the patch earlier due to build issue.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated HADOOP-8564:
------------------------------

    Attachment: HADOOP-8564-branch-1-win.patch
    
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>         Attachments: HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478620#comment-13478620 ] 

Chuan Liu commented on HADOOP-8564:
-----------------------------------

Let me try it on my machine.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413867#comment-13413867 ] 

Todd Lipcon commented on HADOOP-8564:
-------------------------------------

Hi Chuan. Thanks for taking a look into that.

I wouldn't be concerned about API compatibility here, since these are private-facing (internal) APIs. We can change them between versions without breaking any contracts with downstream projects.

I think your idea of separating the windows calls from the POSIX ones makes sense. But, we should probably also enumerate the uses of the POSIX calls and figure out what the equivalents are on the Windows side - for example, we use fstat and open(O_EXCL) for a lot of security reasons. I don't know for sure whether Windows has equivalent APIs or we need to take another route entirely in those situations.
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated HADOOP-8564:
------------------------------

    Summary: Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue  (was: ort and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue)
    
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478538#comment-13478538 ] 

Suresh Srinivas commented on HADOOP-8564:
-----------------------------------------

I reverted the commit. 

Chuan the patch fails build:
{noformat}
    [javac] .../src/test/org/apache/hadoop/io/nativeio/TestNativeIO.java:83: cannot find symbol
    [javac] symbol  : method setOwner(java.io.File,java.lang.String,<nulltype>)
    [javac] location: class org.apache.hadoop.fs.FileUtil
    [javac]     FileUtil.setOwner(testFile, username, null);
{noformat}

Is this patch dependent on any other jira?
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>             Fix For: 1-win
>
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8564) ort and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated HADOOP-8564:
------------------------------

    Summary: ort and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue  (was: Create a Windows native InputStream class to address datanode concurrent reading and writing issue)
    
> ort and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8564) Create a Windows native InputStream class to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407491#comment-13407491 ] 

Suresh Srinivas commented on HADOOP-8564:
-----------------------------------------

+1 for the second option. This will also allow adding future optimization at the stream level on Windows, similar to the ones done for Linux.
                
> Create a Windows native InputStream class to address datanode concurrent reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8564) Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-8564:
------------------------------------

    Attachment: HADOOP-8564-branch-1-win.patch

Attaching the patch with indentation changed to spaces and CRLF changed LF.
                
> Port and extend Hadoop native libraries for Windows to address datanode concurrent reading and writing issue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>         Attachments: HADOOP-8564-branch-1-win.patch, HADOOP-8564-branch-1-win.patch
>
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data is written to datanode, an active or temporary file is created to receive packets. After the last packet for the block is received, we will finalize the block. One step during finalization is to rename the block file to a new directory. The relevant code can be found via the call sequence: FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also read these unfinished blocks. So when the read calls from client reach datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode receives last packet from client and try to rename the finished block file. This operation will succeed on Linux, but not on Windows .  The behavior can be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. sharing the delete (including renaming) permission with other processes while opening the file. There is also a Java bug ([id 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported a while back on this. However, since this behavior exists for Java on Windows since JDK 1.0, the Java developers do not want to break the backward compatibility on this behavior. Instead, a new file system API is proposed in JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is to use two phase renaming, i.e. first hardlink; then remove the old hardlink when read is finished. This option was thought to be rather pervasive.  Another option discussed is to change the HDFS behavior on Windows by not allowing client reading unfinished blocks. However this behavior change is thought to be problematic and may affect other application build on top of HDFS.
> For all the reasons discussed above, we will use the second approach to address the problem.
> If there are better options to fix the problem, we would also like to hear about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira