You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Arpit Agarwal <aa...@hortonworks.com> on 2013/04/02 00:17:39 UTC

Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes
concurrent client requests via locks (you can look at the FSNameSystem
class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit


On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:

> If you look at DistributedFileSystem source code, you would see that it
> calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>
> An hdfs committer would be able to give you affirmative answer.
>
>
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>wrote:
>
>>  *From:* Ted Yu [mailto:yuzhihong@gmail.com]
>> *Subject:* Re: Is FileSystem thread-safe?****
>>
>> >>FileSystem is an abstract class, what concrete class are you using
>> (DistributedFileSystem, etc) ? ****
>>
>> Good point.  I am calling FileSystem.get(URI uri, Configuration conf)
>> with an URI like “hdfs://server:port/…” on a remote server, so I assume it
>> is creating a DistributedFileSystem.  However I am not finding any
>> documentation discussing its thread-safety (or lack thereof), perhaps you
>> can point me to it?****
>>
>> Thanks, john****
>>
>
>

Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I see. The lots-of-part-files pattern is what most of us end up using.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 10:16 AM, John Lilley wrote:

> Vinod,
> Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I’ve seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
> John
>  
>  
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] 
> Sent: Friday, May 17, 2013 11:14 AM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
>  
> As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.
>  
> Thanks,
> +Vinod Kumar Vavilapalli
>  
> On May 17, 2013, at 6:40 AM, John Lilley wrote:
> 
> 
> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I see. The lots-of-part-files pattern is what most of us end up using.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 10:16 AM, John Lilley wrote:

> Vinod,
> Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I’ve seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
> John
>  
>  
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] 
> Sent: Friday, May 17, 2013 11:14 AM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
>  
> As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.
>  
> Thanks,
> +Vinod Kumar Vavilapalli
>  
> On May 17, 2013, at 6:40 AM, John Lilley wrote:
> 
> 
> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I see. The lots-of-part-files pattern is what most of us end up using.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 10:16 AM, John Lilley wrote:

> Vinod,
> Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I’ve seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
> John
>  
>  
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] 
> Sent: Friday, May 17, 2013 11:14 AM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
>  
> As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.
>  
> Thanks,
> +Vinod Kumar Vavilapalli
>  
> On May 17, 2013, at 6:40 AM, John Lilley wrote:
> 
> 
> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
I see. The lots-of-part-files pattern is what most of us end up using.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 10:16 AM, John Lilley wrote:

> Vinod,
> Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I’ve seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
> John
>  
>  
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] 
> Sent: Friday, May 17, 2013 11:14 AM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
>  
> As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.
>  
> Thanks,
> +Vinod Kumar Vavilapalli
>  
> On May 17, 2013, at 6:40 AM, John Lilley wrote:
> 
> 
> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Vinod,
Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I've seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
John


From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Friday, May 17, 2013 11:14 AM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?


As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:


Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit


On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Vinod,
Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I've seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
John


From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Friday, May 17, 2013 11:14 AM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?


As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:


Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit


On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Vinod,
Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I've seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
John


From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Friday, May 17, 2013 11:14 AM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?


As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:


Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit


On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Vinod,
Thanks, I was mostly asking in the context of attempting to unify the output of multiple tasks.  I've seen that in most cases, users opt to output a folder full of file parts into HDFS and then read them directly or unify them later.
John


From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Friday, May 17, 2013 11:14 AM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?


As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:


Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit


On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:

> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:

> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:

> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


Re: Is FileSystem thread-safe?

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file.

Thanks,
+Vinod Kumar Vavilapalli

On May 17, 2013, at 6:40 AM, John Lilley wrote:

> Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
> john
>  
> From: Arpit Agarwal [mailto:aagarwal@hortonworks.com] 
> Sent: Monday, April 01, 2013 4:18 PM
> To: user@hadoop.apache.org
> Subject: Re: Is FileSystem thread-safe?
>  
> Hi John,
> 
> DistributedFileSystem is intended to be thread-safe, true to its name. 
> 
> Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).
> 
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
> 
> -Arpit
> 
> 
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com> wrote:
> If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
> Requests to Namenode are then made through ClientProtocol.
>  
> An hdfs committer would be able to give you affirmative answer.
>  
> 
> On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net> wrote:
> From: Ted Yu [mailto:yuzhihong@gmail.com] 
> Subject: Re: Is FileSystem thread-safe?
> >>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
> Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like “hdfs://server:port/…” on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
> Thanks, john
>  


RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit

On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit

On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit

On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



Re: Is FileSystem thread-safe?

Posted by Matthew Farrellee <ma...@redhat.com>.
If you're interested in the semantics of FileSystem operations, have a 
look a HADOOP-9371[0]

Depending on what you're trying to do, the thread-safety of a particular 
FS implementation in a single JVM instance may not be as important as 
the semantics you get across JVM instances.

Best,


matt

[0] https://issues.apache.org/jira/browse/HADOOP-9371


On 04/01/2013 06:17 PM, Arpit Agarwal wrote:
> Hi John,
>
> DistributedFileSystem is intended to be thread-safe, true to its name.
>
> Metadata operations are handled by the NameNode server which
> synchronizes concurrent client requests via locks (you can look at the
> FSNameSystem class).
>
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
>
> -Arpit
>
>
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yuzhihong@gmail.com
> <ma...@gmail.com>> wrote:
>
>     If you look at DistributedFileSystem source code, you would see that
>     it calls the DFSClient field member for most of the actions.
>     Requests to Namenode are then made through ClientProtocol.
>
>     An hdfs committer would be able to give you affirmative answer.
>
>
>     On Sun, Mar 31, 2013 at 11:27 AM, John Lilley
>     <john.lilley@redpoint.net <ma...@redpoint.net>> wrote:
>
>         *From:*Ted Yu [mailto:yuzhihong@gmail.com
>         <ma...@gmail.com>]
>         *Subject:* Re: Is FileSystem thread-safe?____
>
>         >>FileSystem is an abstract class, what concrete class are you
>         using (DistributedFileSystem, etc) ? ____
>
>         Good point.  I am calling FileSystem.get(URI uri, Configuration
>         conf) with an URI like “hdfs://server:port/…” on a remote
>         server, so I assume it is creating a DistributedFileSystem.
>         However I am not finding any documentation discussing its
>         thread-safety (or lack thereof), perhaps you can point me to it?____
>
>         Thanks, john____
>
>
>


RE: Is FileSystem thread-safe?

Posted by John Lilley <jo...@redpoint.net>.
Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved?
john

From: Arpit Agarwal [mailto:aagarwal@hortonworks.com]
Sent: Monday, April 01, 2013 4:18 PM
To: user@hadoop.apache.org
Subject: Re: Is FileSystem thread-safe?

Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes concurrent client requests via locks (you can look at the FSNameSystem class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit

On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yu...@gmail.com>> wrote:
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.

An hdfs committer would be able to give you affirmative answer.

On Sun, Mar 31, 2013 at 11:27 AM, John Lilley <jo...@redpoint.net>> wrote:
From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ?
Good point.  I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." on a remote server, so I assume it is creating a DistributedFileSystem.  However I am not finding any documentation discussing its thread-safety (or lack thereof), perhaps you can point me to it?
Thanks, john



Re: Is FileSystem thread-safe?

Posted by Matthew Farrellee <ma...@redhat.com>.
If you're interested in the semantics of FileSystem operations, have a 
look a HADOOP-9371[0]

Depending on what you're trying to do, the thread-safety of a particular 
FS implementation in a single JVM instance may not be as important as 
the semantics you get across JVM instances.

Best,


matt

[0] https://issues.apache.org/jira/browse/HADOOP-9371


On 04/01/2013 06:17 PM, Arpit Agarwal wrote:
> Hi John,
>
> DistributedFileSystem is intended to be thread-safe, true to its name.
>
> Metadata operations are handled by the NameNode server which
> synchronizes concurrent client requests via locks (you can look at the
> FSNameSystem class).
>
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
>
> -Arpit
>
>
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yuzhihong@gmail.com
> <ma...@gmail.com>> wrote:
>
>     If you look at DistributedFileSystem source code, you would see that
>     it calls the DFSClient field member for most of the actions.
>     Requests to Namenode are then made through ClientProtocol.
>
>     An hdfs committer would be able to give you affirmative answer.
>
>
>     On Sun, Mar 31, 2013 at 11:27 AM, John Lilley
>     <john.lilley@redpoint.net <ma...@redpoint.net>> wrote:
>
>         *From:*Ted Yu [mailto:yuzhihong@gmail.com
>         <ma...@gmail.com>]
>         *Subject:* Re: Is FileSystem thread-safe?____
>
>         >>FileSystem is an abstract class, what concrete class are you
>         using (DistributedFileSystem, etc) ? ____
>
>         Good point.  I am calling FileSystem.get(URI uri, Configuration
>         conf) with an URI like “hdfs://server:port/…” on a remote
>         server, so I assume it is creating a DistributedFileSystem.
>         However I am not finding any documentation discussing its
>         thread-safety (or lack thereof), perhaps you can point me to it?____
>
>         Thanks, john____
>
>
>


Re: Is FileSystem thread-safe?

Posted by Matthew Farrellee <ma...@redhat.com>.
If you're interested in the semantics of FileSystem operations, have a 
look a HADOOP-9371[0]

Depending on what you're trying to do, the thread-safety of a particular 
FS implementation in a single JVM instance may not be as important as 
the semantics you get across JVM instances.

Best,


matt

[0] https://issues.apache.org/jira/browse/HADOOP-9371


On 04/01/2013 06:17 PM, Arpit Agarwal wrote:
> Hi John,
>
> DistributedFileSystem is intended to be thread-safe, true to its name.
>
> Metadata operations are handled by the NameNode server which
> synchronizes concurrent client requests via locks (you can look at the
> FSNameSystem class).
>
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
>
> -Arpit
>
>
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yuzhihong@gmail.com
> <ma...@gmail.com>> wrote:
>
>     If you look at DistributedFileSystem source code, you would see that
>     it calls the DFSClient field member for most of the actions.
>     Requests to Namenode are then made through ClientProtocol.
>
>     An hdfs committer would be able to give you affirmative answer.
>
>
>     On Sun, Mar 31, 2013 at 11:27 AM, John Lilley
>     <john.lilley@redpoint.net <ma...@redpoint.net>> wrote:
>
>         *From:*Ted Yu [mailto:yuzhihong@gmail.com
>         <ma...@gmail.com>]
>         *Subject:* Re: Is FileSystem thread-safe?____
>
>         >>FileSystem is an abstract class, what concrete class are you
>         using (DistributedFileSystem, etc) ? ____
>
>         Good point.  I am calling FileSystem.get(URI uri, Configuration
>         conf) with an URI like “hdfs://server:port/…” on a remote
>         server, so I assume it is creating a DistributedFileSystem.
>         However I am not finding any documentation discussing its
>         thread-safety (or lack thereof), perhaps you can point me to it?____
>
>         Thanks, john____
>
>
>


Re: Is FileSystem thread-safe?

Posted by Matthew Farrellee <ma...@redhat.com>.
If you're interested in the semantics of FileSystem operations, have a 
look a HADOOP-9371[0]

Depending on what you're trying to do, the thread-safety of a particular 
FS implementation in a single JVM instance may not be as important as 
the semantics you get across JVM instances.

Best,


matt

[0] https://issues.apache.org/jira/browse/HADOOP-9371


On 04/01/2013 06:17 PM, Arpit Agarwal wrote:
> Hi John,
>
> DistributedFileSystem is intended to be thread-safe, true to its name.
>
> Metadata operations are handled by the NameNode server which
> synchronizes concurrent client requests via locks (you can look at the
> FSNameSystem class).
>
> Some discussion on the thread-safety aspects of HDFS:
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
>
> -Arpit
>
>
> On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu <yuzhihong@gmail.com
> <ma...@gmail.com>> wrote:
>
>     If you look at DistributedFileSystem source code, you would see that
>     it calls the DFSClient field member for most of the actions.
>     Requests to Namenode are then made through ClientProtocol.
>
>     An hdfs committer would be able to give you affirmative answer.
>
>
>     On Sun, Mar 31, 2013 at 11:27 AM, John Lilley
>     <john.lilley@redpoint.net <ma...@redpoint.net>> wrote:
>
>         *From:*Ted Yu [mailto:yuzhihong@gmail.com
>         <ma...@gmail.com>]
>         *Subject:* Re: Is FileSystem thread-safe?____
>
>         >>FileSystem is an abstract class, what concrete class are you
>         using (DistributedFileSystem, etc) ? ____
>
>         Good point.  I am calling FileSystem.get(URI uri, Configuration
>         conf) with an URI like “hdfs://server:port/…” on a remote
>         server, so I assume it is creating a DistributedFileSystem.
>         However I am not finding any documentation discussing its
>         thread-safety (or lack thereof), perhaps you can point me to it?____
>
>         Thanks, john____
>
>
>