You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by jeremy p <at...@gmail.com> on 2013/04/09 22:49:07 UTC

When copying a file to HDFS, how to control what nodes that file will reside on?

Hey all,

I'm dealing with kind of a bizarre use case where I need to make sure that
File A is local to Machine A, File B is local to Machine B, etc.  When
copying a file to HDFS, is there a way to control which machines that file
will reside on?  I know that any given file will be replicated across three
machines, but I need to be able to say "File A will DEFINITELY exist on
Machine A".  I don't really care about the other two machines -- they could
be any machines on my cluster.

Thank you.

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Patrick Angeles <pa...@cloudera.com>.

If the client is in machine A (e.g., you execute "hadoop fs -put xxxx" from
A), then the first copy will be in machine A.



On Tue, Apr 9, 2013 at 4:49 PM, jeremy p <at...@gmail.com>wrote:

> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

Which java file is responsible for replication?
Which file chooses random data node from same rack and which chooses random
rack?


On Wed, Apr 10, 2013 at 3:26 AM, Raj Vishwanathan <ra...@yahoo.com> wrote:

> You could use the following facts.
> 1. Files are stored in blocks. So make your blocksize bigger than the
> largest file.
> 2, The first split is stored on the localnode.
>
> Raj
>
>   ------------------------------
> *From:* jeremy p <at...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 9, 2013 1:49 PM
> *Subject:* When copying a file to HDFS, how to control what nodes that
> file will reside on?
>
> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

Which java file is responsible for replication?
Which file chooses random data node from same rack and which chooses random
rack?


On Wed, Apr 10, 2013 at 3:26 AM, Raj Vishwanathan <ra...@yahoo.com> wrote:

> You could use the following facts.
> 1. Files are stored in blocks. So make your blocksize bigger than the
> largest file.
> 2, The first split is stored on the localnode.
>
> Raj
>
>   ------------------------------
> *From:* jeremy p <at...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 9, 2013 1:49 PM
> *Subject:* When copying a file to HDFS, how to control what nodes that
> file will reside on?
>
> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

Which java file is responsible for replication?
Which file chooses random data node from same rack and which chooses random
rack?


On Wed, Apr 10, 2013 at 3:26 AM, Raj Vishwanathan <ra...@yahoo.com> wrote:

> You could use the following facts.
> 1. Files are stored in blocks. So make your blocksize bigger than the
> largest file.
> 2, The first split is stored on the localnode.
>
> Raj
>
>   ------------------------------
> *From:* jeremy p <at...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 9, 2013 1:49 PM
> *Subject:* When copying a file to HDFS, how to control what nodes that
> file will reside on?
>
> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

Which java file is responsible for replication?
Which file chooses random data node from same rack and which chooses random
rack?


On Wed, Apr 10, 2013 at 3:26 AM, Raj Vishwanathan <ra...@yahoo.com> wrote:

> You could use the following facts.
> 1. Files are stored in blocks. So make your blocksize bigger than the
> largest file.
> 2, The first split is stored on the localnode.
>
> Raj
>
>   ------------------------------
> *From:* jeremy p <at...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 9, 2013 1:49 PM
> *Subject:* When copying a file to HDFS, how to control what nodes that
> file will reside on?
>
> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Raj Vishwanathan <ra...@yahoo.com>.

You could use the following facts.
1. Files are stored in blocks. So make your blocksize bigger than the largest file.
2, The first split is stored on the localnode.

Raj



>________________________________
> From: jeremy p <at...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 9, 2013 1:49 PM
>Subject: When copying a file to HDFS, how to control what nodes that file will reside on?
> 
>
>Hey all,
>
>
>I'm dealing with kind of a bizarre use case where I need to make sure that File A is local to Machine A, File B is local to Machine B, etc.  When copying a file to HDFS, is there a way to control which machines that file will reside on?  I know that any given file will be replicated across three machines, but I need to be able to say "File A will DEFINITELY exist on Machine A".  I don't really care about the other two machines -- they could be any machines on my cluster.
>
>
>Thank you.
>
>

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Raj Vishwanathan <ra...@yahoo.com>.

You could use the following facts.
1. Files are stored in blocks. So make your blocksize bigger than the largest file.
2, The first split is stored on the localnode.

Raj



>________________________________
> From: jeremy p <at...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 9, 2013 1:49 PM
>Subject: When copying a file to HDFS, how to control what nodes that file will reside on?
> 
>
>Hey all,
>
>
>I'm dealing with kind of a bizarre use case where I need to make sure that File A is local to Machine A, File B is local to Machine B, etc.  When copying a file to HDFS, is there a way to control which machines that file will reside on?  I know that any given file will be replicated across three machines, but I need to be able to say "File A will DEFINITELY exist on Machine A".  I don't really care about the other two machines -- they could be any machines on my cluster.
>
>
>Thank you.
>
>

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Raj Vishwanathan <ra...@yahoo.com>.

You could use the following facts.
1. Files are stored in blocks. So make your blocksize bigger than the largest file.
2, The first split is stored on the localnode.

Raj



>________________________________
> From: jeremy p <at...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 9, 2013 1:49 PM
>Subject: When copying a file to HDFS, how to control what nodes that file will reside on?
> 
>
>Hey all,
>
>
>I'm dealing with kind of a bizarre use case where I need to make sure that File A is local to Machine A, File B is local to Machine B, etc.  When copying a file to HDFS, is there a way to control which machines that file will reside on?  I know that any given file will be replicated across three machines, but I need to be able to say "File A will DEFINITELY exist on Machine A".  I don't really care about the other two machines -- they could be any machines on my cluster.
>
>
>Thank you.
>
>

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Raj Vishwanathan <ra...@yahoo.com>.

You could use the following facts.
1. Files are stored in blocks. So make your blocksize bigger than the largest file.
2, The first split is stored on the localnode.

Raj



>________________________________
> From: jeremy p <at...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 9, 2013 1:49 PM
>Subject: When copying a file to HDFS, how to control what nodes that file will reside on?
> 
>
>Hey all,
>
>
>I'm dealing with kind of a bizarre use case where I need to make sure that File A is local to Machine A, File B is local to Machine B, etc.  When copying a file to HDFS, is there a way to control which machines that file will reside on?  I know that any given file will be replicated across three machines, but I need to be able to say "File A will DEFINITELY exist on Machine A".  I don't really care about the other two machines -- they could be any machines on my cluster.
>
>
>Thank you.
>
>

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Patrick Angeles <pa...@cloudera.com>.

If the client is in machine A (e.g., you execute "hadoop fs -put xxxx" from
A), then the first copy will be in machine A.



On Tue, Apr 9, 2013 at 4:49 PM, jeremy p <at...@gmail.com>wrote:

> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Patrick Angeles <pa...@cloudera.com>.

If the client is in machine A (e.g., you execute "hadoop fs -put xxxx" from
A), then the first copy will be in machine A.



On Tue, Apr 9, 2013 at 4:49 PM, jeremy p <at...@gmail.com>wrote:

> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Posted by Patrick Angeles <pa...@cloudera.com>.

If the client is in machine A (e.g., you execute "hadoop fs -put xxxx" from
A), then the first copy will be in machine A.



On Tue, Apr 9, 2013 at 4:49 PM, jeremy p <at...@gmail.com>wrote:

> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>