You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "he yongqiang (JIRA)" <ji...@apache.org> on 2009/03/09 06:58:52 UTC

[jira] Created: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Merge FileSystem.create and FileSystem.append
---------------------------------------------

                 Key: HADOOP-5438
                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: he yongqiang


Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
the code looks like:
{code}
FSDataOutputStream out_1 = null;
if (fs.exists(path_1))
   out_1 = fs.append(path_1);
else
   out_1 = fs.create(path_1);
{code}
. On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.

It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
{code}
boolean exists = fs.exists(name);
if (overwrite) {
    if (exists)
       fs.delete(name, true);
     this.out = fs.create(name, overwrite, bufferSize, replication,
				    blockSize, progress);
     this.currentRowID = 0;
 } else {
   if (!exists)
	this.out = fs.create(name, overwrite, bufferSize,
					replication, blockSize, progress);
   else
	this.out = fs.append(name, bufferSize, progress);
{code}

Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.

BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689179#action_12689179 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

+1 for Konstantin's proposal with a minor modification.

{code}
enum ClreateFlag {
    OVERWRITE ((short)0x01),
    APPEND    ((short)0x02);
    private short mode;

    private setCreateFlag(short mode) {
      this.mode &= mode;
    }
    short getMode() {return mode;}
}

{code}



> Then C-code will be able to use them, rather than converting.

is this really true? It actually depends on the internal JVM implementation, byte ordering, isn't it?


> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689229#action_12689229 ] 

Jakob Homan commented on HADOOP-5438:
-------------------------------------

The Java-centric way to do C-style bit packing is with EnumSets (http://www.jakobhoman.com/2008/08/javas-enumset-fun-for-whole-family.html), which is future-proof but a bit verbose.
{code}
  enum CreateFlag {
    OVERWRITE ((short)0x01),
    APPEND    ((short)0x02);

    CreateFlag(short mode) { this.mode = mode; }
    CreateFlag() { this.mode = (short)0x01; }
    
    public short getMode() { return mode; }
    private short mode;

  }
  
  public static createFile(Path p, EnumSet<CreateFlag> flags) {
    if(flags.contains(CreateFlag.APPEND))
      // blah
 }
{code}
We ran into a similar issue with create flags on Zookeeper (ZOOKEEPER-132), and ended up using them but had to enumerate all the possible values for creating actual integer flags.


> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688224#action_12688224 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

I prefer the way adding a new param to append(). It will need little modification to user app, and the modifications to hdfs code would be only a few lines.  

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Status: Patch Available  (was: Open)

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689178#action_12689178 ] 

Konstantin Shvachko commented on HADOOP-5438:
---------------------------------------------

You can use exact values, like
{code}
enum ClreateFlag {
    OVERWRITE ((short)0x01),
    APPEND    ((short)0x02);
    private short mode;

    private ClreateFlag(short mode) {
      this.mode = mode;
    }
    short getMode() {return mode;}
}
{code}
Then C-code will be able to use them, rather than converting.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711619#action_12711619 ] 

Hadoop QA commented on HADOOP-5438:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408434/Hadoop-5438-2009-05-19.patch
  against trunk revision 777019.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/console

This message is automatically generated.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-19.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689002#action_12689002 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

Is an eumn like this ok?
{code}
static enum CreateFlag {
	CREATE, OVERWRITE, APPEND, TUNCATE;
 }
{code}

And the change Filesytem.create to:
{code}
      public FSDataOutputStream create(Path f, FsPermission permission,
			boolean overwrite, int bufferSize, short replication,
			long blockSize, Progressable progress) throws IOException {

		return this.create(f, permission, overwrite ? CreateFlag.OVERWRITE
				: CreateFlag.CREATE, bufferSize, replication, blockSize,
				progress);
	}

	public abstract FSDataOutputStream create(Path f, FsPermission permission,
			CreateFlag flag, int bufferSize, short replication, long blockSize,
			Progressable progress) throws IOException ;
{code}

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688534#action_12688534 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

thanks, dhruba.
yeah, Option 2 is generally more like the POSIX way. But it seems HDFS only provides two way for writing, append or overwrite. And I think TUNCATE is of the same meaning of overwrite, is that right?

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693861#action_12693861 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

1. You have to bump out the versionID number in ClientProtocol.
2. Please leave the original public method signatures in FileSystem.java as they are (deprecate them). This is to ensure that existing apps do not break. Then create a new create method signature and make the old methods invoke the new create method internally.
3. And, of course, we would like a unit test that invokes the new create method with all supported flags. One option would be to add a new method to TestFileCreation.java.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438-2009-03-30.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693410#action_12693410 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

Will Users feel comfortable with the verbose createflag operations? I think It would better if we made the CreateFlag a static class, with which users can perform | operation, like CreateFlag.CREATE|CreateFlag.APPEND.  It would be much like the code in my initial propose.
so far, we have all focused on the outside user interface. I think it is the most important part, and once it is settled, this issue can be fixed quickly.
Will make it an enum be better than a class? I mean not everyone is familiar with the java EnumSet (at least me :) ). 


> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "he yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680366#action_12680366 ] 

he yongqiang commented on HADOOP-5438:
--------------------------------------

I guess the open and create interfaces can leave there for compatibility, and can we add a new interface for open? With the new open interface, all work can be done with only one RPC. And the atomicity can be guaranteed at the server side. That's great, since the atomicity can not be made at the client side. 

The atomicity is a great point, thanks hong. 

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: he yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680554#action_12680554 ] 

Konstantin Shvachko commented on HADOOP-5438:
---------------------------------------------

{{FileSystem.append()}} semantically simply opens a file for append, and {{open()}} usually creates a file if it does not exist (with appropriate flags). So it seems to me reasonable to merge these methods to make them more POSIX friendly.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: he yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438-2009-03-31.patch

1) attach an update version according to dhruba's suggestions. Thanks, dhruba.
2) create HADOOP-5596 for the EnumSet problem. I have tried to add the EnumSet support in ObjectWritable, it seems will not need much work to do.  

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698619#action_12698619 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

yeah, the import statements changes in DFSClient.java is because that some import .....* is unfolded and replaced with the exact classes in that package.
yes, we should add some enhancements with combination of OVERWRITE flags, more tests are always better :) .

btw, it seems Hadoop-5438(2009-04-06).patch already contains the patch i made for H-5596, so should i close H-5596 or remove the H5596 code in Hadoop-5438(2009-04-06).patch?

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Status: Open  (was: Patch Available)

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695274#action_12695274 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

This issue is now blocked by Hadoop-5596. Hadoop-5596 is to make ObjectWritable support the EnumSet data type. EnumSet is used in the newly added NameNode's create() interface. 

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Status: Open  (was: Patch Available)

It seems the code has changed a lot and the patch can not be applied. I will submit a new patch againt the truck code.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5438:
-------------------------------------

    Status: Open  (was: Patch Available)

This patch does not merge with trunk. Can you pl resubmit this patch? Thanks.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "he yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683268#action_12683268 ] 

he yongqiang commented on HADOOP-5438:
--------------------------------------

How about we add a new boolean param for appen()?
{code}
  public abstract FSDataOutputStream append(Path f, int bufferSize,boolean create,Progressable progress) throws IOException;
{code}

Since this is a abstract method on FileSystem, it would need all implemented filesystems to modify to support this. I have checked all implemented fs, and it seems currently only DistributedFileSystem and RawLocalFileSystem support append, and others just throw a Not Supported Exception.  

Any comments?

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: he yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Status: Patch Available  (was: Open)

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689189#action_12689189 ] 

Konstantin Shvachko commented on HADOOP-5438:
---------------------------------------------

Dhruba, this will not compile. Constructor is required.
I not sure you can combine bits in enum that way, but it would be nice.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438-2009-05-15.patch

A new patch againt truck code.
Thanks, dhruba.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688793#action_12688793 ] 

Konstantin Shvachko commented on HADOOP-5438:
---------------------------------------------

+1 for Option 2. Even though it requires more changes.
Can you also make Flag (or may be CreateFlag) a real enum rather than a class.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-5438:
-------------------------------------------

    Component/s: fs
       Assignee: He Yongqiang

As we discussed in HADOOP-5596, we should not change ObjectWritable for supporting EnumSet.  We should define a new class EnumSetWritable.  We may declare FileSystem.create(..) with EnumSet and ClientProtocol.create(..) with EnumSetWritable.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438-2009-03-30.patch

A quick fix for this issue. It uses Enum for CreateFlag.
All implemented FileSystems are modified.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438-2009-03-30.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693986#action_12693986 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

Thanks, dhruba.
1. done
2. done (the previous patch was like what you suggeted. )
3. done

Before i attach the new patch, there is one serious problem. The patch can not pass the test. 
It seems EnumSet can not be serialized by ObjectWritable. Any ideas?

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438-2009-03-30.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710588#action_12710588 ] 

Hadoop QA commented on HADOOP-5438:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408214/Hadoop-5438-2009-05-15.patch
  against trunk revision 776148.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/353/console

This message is automatically generated.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438-2009-05-19.patch

The earlier one can not be applied to truck code again. 
Hope this time when the patch is run in Hudson, it can be merged to truck.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-19.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712527#action_12712527 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

+1 Code looks good. I will commit it tomorrow.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-19.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680074#action_12680074 ] 

Zheng Shao commented on HADOOP-5438:
------------------------------------

You can create a wrapper for a method like createOrAppend to provide fopen "a+" semantics.

However we should not change the semantics of existing functions.


> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: he yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707505#action_12707505 ] 

Hadoop QA commented on HADOOP-5438:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12407209/Hadoop-5438-2009-05-5.patch
  against trunk revision 772960.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/301/console

This message is automatically generated.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689330#action_12689330 ] 

Jakob Homan commented on HADOOP-5438:
-------------------------------------

bq. The reason that i have not assigned a short mode to each flag is that i think flags in CreateFlag can not coexist with each other. Like you can not appoint Append together with Overwrite.

Right, that's what EnumSets are for:
{code}
EnumSet<CreateFlag> flags = EnumSet.of(CreateFlag.APPEND, CreateFlag.OVERWRITE);
{code}
But, like I said: verbose.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689237#action_12689237 ] 

Jakob Homan commented on HADOOP-5438:
-------------------------------------

You can also use these to combine modes, similar to what Dhruba was working on:
{code}
    static public int combineModes(EnumSet<CreateFlag> flags) {
      int combinedMode = 0;
      for(CreateFlag f : flags) {
        combinedMode |= f.getMode();
      }
        
      return combinedMode;
    }
{code}
for passing onto C

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5438:
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks He!

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>             Fix For: 0.21.0
>
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-19.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707867#action_12707867 ] 

Hadoop QA commented on HADOOP-5438:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12407719/Hadoop-5438-2009-05-10.patch
  against trunk revision 772960.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/console

This message is automatically generated.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683330#action_12683330 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

yeah, it would be more POSIX-like if we add this into FileSystem.open(). But Hadoop's FileSystem.open() opens an InputStream for reading, and FileSytem.create() and FileSystem.append() opens an OutputStream for writting. I think it would be more easy to add this to create() or appen(). 
I think we have two options here:
1) add a boolean flag to append() to indicate that should we create the file if it does not existes
2) add a Enum Type (let's call it Flag or sth. like that).  And pass this mode param to create().  like:
{code}
 public abstract FSDataOutputStream create(Path f,
      FsPermission permission,
  //    boolean overwrite,
      short mode, 
      int bufferSize,
      short replication,
      long blockSize,
      Progressable progress) throws IOException;

static class Flag {
		private static short OVERWRITE = (short) 0x01;
		private static short APPEND = (short) 0x02;
		private short mode;

		public Flag(short mode) {
			this.mode = mode;
		}
		
		public boolean isOverwrite(){
			return (this.mode&OVERWRITE)>0;
		}
		
		public boolean isAppend(){
			return (this.mode&APPEND)>0;
		}
	}
{code}

1) and 2) have the same effect. And 2) is more POSIX-like, but almost every supported file systems have implemented the create(), so it need more modication to the code.

with 1), we only need to add few lines in FSNameSystem's startFileInternal like this:

{code}
 if (append) {
        if (myFile == null) {
+           if(create){
+              startFileInternal(src, permissions,holder, clientMachine,false, false, replication,blockSize)  // create the file.
+           } else{
               throw new FileNotFoundException("failed to append to non-existent file "
                                       + src + " on client " + clientMachine);
             }
        } else if (myFile.isDirectory()) {
          throw new IOException("failed to append to directory " + src 
                                +" on client " + clientMachine);
        }
      } else if (!dir.isValidToCreate(src)) {
        if (overwrite) {
          delete(src, true);
        } else {
          throw new IOException("failed to create file " + src 
                                +" on client " + clientMachine
                                +" either because the filename is invalid or the file exists");
        }
      }
{code}

pls comment on which option should we choose or other options.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688674#action_12688674 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

> And I think TUNCATE is of the same meaning of overwrite, i

TRUNCATE should keep the original file metadata (user permissions, block size, etc)... 

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688227#action_12688227 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

I am fine either way, but I strongly like Option 2 because it manages the FileSystem API better and allows setting up other flags in future. For example, we might want to support a TRUNCATE flag in the future.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Status: Patch Available  (was: Open)

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-19.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718610#action_12718610 ] 

Hudson commented on HADOOP-5438:
--------------------------------

Integrated in Hadoop-trunk #863 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/])
    

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>             Fix For: 0.21.0
>
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-15.patch, Hadoop-5438-2009-05-19.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438-2009-03-31-2.patch

Attched Hadoop-5438-2009-03-31-2.patch which passes the local TestFileCreation test with applying hadoop-5596.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680320#action_12680320 ] 

Hong Tang commented on HADOOP-5438:
-----------------------------------

fopen with mode "a+" or open with flag=O_CREAT|O_APPEND actually has the extra atomicity guarantee while the check-existence-then-create-or-append does not.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: he yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683279#action_12683279 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

I like the idea, but it isn't it better if, instead of adding a boolean flag, we introduce a bit-map (i.e. EnumType) so that various flags can be passed into the "append" method? Also, maybe it is appropriate to call it FileSystem.open() to make it more POSIX-like.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698865#action_12698865 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

I would vote for closing HADOOP-5596 and mark it as related to HADOOP-5438. 

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438(2009-04-06).patch

Hadoop-5438(2009-04-06).patch added in javadoc and apache license for CreateFlag. 
Summary:
1. added an Enum Type called CreateFlag for specifiying the semantic of create()
2. currently CreateFlag only includes: OVERWRITE, APPEND and CREATE
3. Semantics: 
(1)CREATE: create the file if it does not exist, throw an exception if it already existes 
(2)OVERWRITE: overwrite a file if it already exists and create it if it does not exist
(3)APPEND: append to the file if it existes, throw an exception if it does not exists
4. combine OVERWRITE with either CREATE or APPEND does the same as only use OVERWRITE
5. combine CREATE and APPEND has the semantic: create a file if it does not exist, and append to it if it already existes
6. users can comibine mode with: EnumSet.of(CreateFlag, CreateFlag...).
7. Because the current RPC system can not serialize EnumSet. This issue depends on H-5596, which adds support for serializing EnumSet

Thanks,dhruba borthakur, Konstantin Shvachko and Jakob Homan. All your suggestions are really helpful !


> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689315#action_12689315 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

The reason that i have not assigned a short mode to each flag is that i think flags in CreateFlag can not coexist with each other. Like you can not appoint Append together with Overwrite.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708098#action_12708098 ] 

He Yongqiang commented on HADOOP-5438:
--------------------------------------

The failed contrib test is not related to this patch.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689310#action_12689310 ] 

Konstantin Shvachko commented on HADOOP-5438:
---------------------------------------------

Sounds like a reasonable idea.

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698327#action_12698327 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

This s continuing to look better now. What does folks think about completely deprecating FileSystem.append?

There seems to be lots of changes in the "import ..." statements for DFSClient.java, is this a code cleanup?

Does the unit test need to be enhanced to use a combination of flags along with the OVERWRITE flag?



> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438-2009-05-10.patch

The new patch file against the truck code

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-10.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683750#action_12683750 ] 

dhruba borthakur commented on HADOOP-5438:
------------------------------------------

I am ok with adding yet another parameter to FileSystem.create(). This will be a parameter of type FileSystem.OpenFlags or something like that. 

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Attachment: Hadoop-5438-2009-05-5.patch

H-5596 is committed by Nicholas. 
Hadoop-5438-2009-05-5.patch is the corresponding patch relying on H-5596.


> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5438) Merge FileSystem.create and FileSystem.append

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HADOOP-5438:
---------------------------------

    Status: Patch Available  (was: Open)

> Merge FileSystem.create and FileSystem.append
> ---------------------------------------------
>
>                 Key: HADOOP-5438
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5438
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: Hadoop-5438(2009-04-06).patch, Hadoop-5438-2009-03-30.patch, Hadoop-5438-2009-03-31-2.patch, Hadoop-5438-2009-03-31.patch, Hadoop-5438-2009-05-5.patch
>
>
> Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
> the code looks like:
> {code}
> FSDataOutputStream out_1 = null;
> if (fs.exists(path_1))
>    out_1 = fs.append(path_1);
> else
>    out_1 = fs.create(path_1);
> {code}
> . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.
> It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:
> {code}
> boolean exists = fs.exists(name);
> if (overwrite) {
>     if (exists)
>        fs.delete(name, true);
>      this.out = fs.create(name, overwrite, bufferSize, replication,
> 				    blockSize, progress);
>      this.currentRowID = 0;
>  } else {
>    if (!exists)
> 	this.out = fs.create(name, overwrite, bufferSize,
> 					replication, blockSize, progress);
>    else
> 	this.out = fs.append(name, bufferSize, progress);
> {code}
> Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.
> BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.