You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tomer Benyamini <to...@gmail.com> on 2014/09/07 15:42:26 UTC

distcp on ec2 standalone spark cluster

Hi,

I would like to copy log files from s3 to the cluster's
ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
running on the cluster - I'm getting the exception below.

Is there a way to activate it, or is there a spark alternative to distcp?

Thanks,
Tomer

mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
Invalid "mapreduce.jobtracker.address" configuration value for
LocalJobRunner : "XXX:9001"

ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.

at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)

at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)

at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: distcp on ec2 standalone spark cluster

Posted by Frank Austin Nothaft <fn...@berkeley.edu>.

Tomer,

To use distcp, you need to have a Hadoop compute cluster up. start-dfs just restarts HDFS. I don’t have a Spark 1.0.2 cluster up right now, but there should be a start-mapred*.sh or start-all.sh script that will launch the Hadoop MapReduce cluster that you will need for distcp.

Regards,

Frank Austin Nothaft
fnothaft@berkeley.edu
fnothaft@eecs.berkeley.edu
202-340-0466

On Sep 8, 2014, at 12:28 AM, Tomer Benyamini <to...@gmail.com> wrote:

> ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
> 
> I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
> ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
> when trying to run distcp:
> 
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> 
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> 
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> 
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> 
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> 
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> 
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> 
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> 
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> 
> Any idea?
> 
> Thanks!
> Tomer
> 
> On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <ro...@gmail.com> wrote:
>> If I recall, you should be able to start Hadoop MapReduce using
>> ~/ephemeral-hdfs/sbin/start-mapred.sh.
>> 
>> On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <to...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I would like to copy log files from s3 to the cluster's
>>> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
>>> running on the cluster - I'm getting the exception below.
>>> 
>>> Is there a way to activate it, or is there a spark alternative to distcp?
>>> 
>>> Thanks,
>>> Tomer
>>> 
>>> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
>>> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
>>> Invalid "mapreduce.jobtracker.address" configuration value for
>>> LocalJobRunner : "XXX:9001"
>>> 
>>> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>>> 
>>> java.io.IOException: Cannot initialize Cluster. Please check your
>>> configuration for mapreduce.framework.name and the correspond server
>>> addresses.
>>> 
>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>>> 
>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>>> 
>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>>> 
>>> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>>> 
>>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>>> 
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>>> 
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> 
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

Re: distcp on ec2 standalone spark cluster

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Did you follow these steps? https://wiki.apache.org/hadoop/AmazonS3  Also
make sure your jobtracker/mapreduce processes are running fine.

Thanks
Best Regards

On Sun, Mar 8, 2015 at 7:32 AM, roni <ro...@gmail.com> wrote:

> Did you get this to work?
> I got pass the issues with the cluster not startetd problem
> I am having problem where distcp with s3 URI says incorrect forlder path
> and
> s3n:// hangs.
> stuck for 2 days :(
> Thanks
> -R
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: distcp on ec2 standalone spark cluster

Posted by roni <ro...@gmail.com>.

Did you get this to work?
I got pass the issues with the cluster not startetd problem
I am having problem where distcp with s3 URI says incorrect forlder path and
s3n:// hangs.
stuck for 2 days :(
Thanks
-R



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: distcp on ec2 standalone spark cluster

Posted by Ye Xianjin <ad...@gmail.com>.

well, this means you didn't start a compute cluster. Most likely because the wrong value of mapreduce.jobtracker.address cause the slave node cannot start the node manager. ( I am not familiar with the ec2 script, so I don't know whether the slave node has node manager installed or not.) 
Can you check the slave node the hadoop daemon log to see whether you started the nodemanager  but failed or there is no nodemanager to start? The log file location defaults to
/var/log/hadoop-xxx if my memory is correct.

Sent from my iPhone

> On 2014年9月9日, at 0:08, Tomer Benyamini <to...@gmail.com> wrote:
> 
> No tasktracker or nodemanager. This is what I see:
> 
> On the master:
> 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
> org.apache.hadoop.hdfs.server.namenode.NameNode
> 
> On the data node (slave):
> 
> org.apache.hadoop.hdfs.server.datanode.DataNode
> 
> 
> 
>> On Mon, Sep 8, 2014 at 6:39 PM, Ye Xianjin <ad...@gmail.com> wrote:
>> what did you see in the log? was there anything related to mapreduce?
>> can you log into your hdfs (data) node, use jps to list all java process and
>> confirm whether there is a tasktracker process (or nodemanager) running with
>> datanode process
>> 
>> --
>> Ye Xianjin
>> Sent with Sparrow
>> 
>> On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:
>> 
>> Still no luck, even when running stop-all.sh followed by start-all.sh.
>> 
>> On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
>> <ni...@gmail.com> wrote:
>> 
>> Tomer,
>> 
>> Did you try start-all.sh? It worked for me the last time I tried using
>> distcp, and it worked for this guy too.
>> 
>> Nick
>> 
>> 
>> On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini <to...@gmail.com> wrote:
>> 
>> 
>> ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
>> 
>> I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
>> ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
>> when trying to run distcp:
>> 
>> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>> 
>> java.io.IOException: Cannot initialize Cluster. Please check your
>> configuration for mapreduce.framework.name and the correspond server
>> addresses.
>> 
>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>> 
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>> 
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>> 
>> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>> 
>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>> 
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>> 
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> 
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>> 
>> Any idea?
>> 
>> Thanks!
>> Tomer
>> 
>> On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <ro...@gmail.com> wrote:
>> 
>> If I recall, you should be able to start Hadoop MapReduce using
>> ~/ephemeral-hdfs/sbin/start-mapred.sh.
>> 
>> On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <to...@gmail.com>
>> wrote:
>> 
>> 
>> Hi,
>> 
>> I would like to copy log files from s3 to the cluster's
>> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
>> running on the cluster - I'm getting the exception below.
>> 
>> Is there a way to activate it, or is there a spark alternative to
>> distcp?
>> 
>> Thanks,
>> Tomer
>> 
>> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
>> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
>> Invalid "mapreduce.jobtracker.address" configuration value for
>> LocalJobRunner : "XXX:9001"
>> 
>> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>> 
>> java.io.IOException: Cannot initialize Cluster. Please check your
>> configuration for mapreduce.framework.name and the correspond server
>> addresses.
>> 
>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>> 
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>> 
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>> 
>> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>> 
>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>> 
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>> 
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> 
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: distcp on ec2 standalone spark cluster

Posted by Tomer Benyamini <to...@gmail.com>.

No tasktracker or nodemanager. This is what I see:

On the master:

org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
org.apache.hadoop.hdfs.server.namenode.NameNode

On the data node (slave):

org.apache.hadoop.hdfs.server.datanode.DataNode



On Mon, Sep 8, 2014 at 6:39 PM, Ye Xianjin <ad...@gmail.com> wrote:
> what did you see in the log? was there anything related to mapreduce?
> can you log into your hdfs (data) node, use jps to list all java process and
> confirm whether there is a tasktracker process (or nodemanager) running with
> datanode process
>
> --
> Ye Xianjin
> Sent with Sparrow
>
> On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:
>
> Still no luck, even when running stop-all.sh followed by start-all.sh.
>
> On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>
> Tomer,
>
> Did you try start-all.sh? It worked for me the last time I tried using
> distcp, and it worked for this guy too.
>
> Nick
>
>
> On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini <to...@gmail.com> wrote:
>
>
> ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
>
> I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
> ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
> when trying to run distcp:
>
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
>
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>
> Any idea?
>
> Thanks!
> Tomer
>
> On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <ro...@gmail.com> wrote:
>
> If I recall, you should be able to start Hadoop MapReduce using
> ~/ephemeral-hdfs/sbin/start-mapred.sh.
>
> On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <to...@gmail.com>
> wrote:
>
>
> Hi,
>
> I would like to copy log files from s3 to the cluster's
> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> running on the cluster - I'm getting the exception below.
>
> Is there a way to activate it, or is there a spark alternative to
> distcp?
>
> Thanks,
> Tomer
>
> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> Invalid "mapreduce.jobtracker.address" configuration value for
> LocalJobRunner : "XXX:9001"
>
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
>
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: distcp on ec2 standalone spark cluster

Posted by Ye Xianjin <ad...@gmail.com>.

what did you see in the log? was there anything related to mapreduce?
can you log into your hdfs (data) node, use jps to list all java process and confirm whether there is a tasktracker process (or nodemanager) running with datanode process


-- 
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:

> Still no luck, even when running stop-all.sh (http://stop-all.sh) followed by start-all.sh (http://start-all.sh).
> 
> On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
> <nicholas.chammas@gmail.com (mailto:nicholas.chammas@gmail.com)> wrote:
> > Tomer,
> > 
> > Did you try start-all.sh (http://start-all.sh)? It worked for me the last time I tried using
> > distcp, and it worked for this guy too.
> > 
> > Nick
> > 
> > 
> > On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini <tomer.ben@gmail.com (mailto:tomer.ben@gmail.com)> wrote:
> > > 
> > > ~/ephemeral-hdfs/sbin/start-mapred.sh (http://start-mapred.sh) does not exist on spark-1.0.2;
> > > 
> > > I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh (http://stop-dfs.sh) and
> > > ~/ephemeral-hdfs/sbin/start-dfs.sh (http://start-dfs.sh), but still getting the same error
> > > when trying to run distcp:
> > > 
> > > ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> > > 
> > > java.io.IOException: Cannot initialize Cluster. Please check your
> > > configuration for mapreduce.framework.name (http://mapreduce.framework.name) and the correspond server
> > > addresses.
> > > 
> > > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> > > 
> > > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> > > 
> > > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> > > 
> > > at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> > > 
> > > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> > > 
> > > at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> > > 
> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > 
> > > at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> > > 
> > > Any idea?
> > > 
> > > Thanks!
> > > Tomer
> > > 
> > > On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <rosenville@gmail.com (mailto:rosenville@gmail.com)> wrote:
> > > > If I recall, you should be able to start Hadoop MapReduce using
> > > > ~/ephemeral-hdfs/sbin/start-mapred.sh (http://start-mapred.sh).
> > > > 
> > > > On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <tomer.ben@gmail.com (mailto:tomer.ben@gmail.com)>
> > > > wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > I would like to copy log files from s3 to the cluster's
> > > > > ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> > > > > running on the cluster - I'm getting the exception below.
> > > > > 
> > > > > Is there a way to activate it, or is there a spark alternative to
> > > > > distcp?
> > > > > 
> > > > > Thanks,
> > > > > Tomer
> > > > > 
> > > > > mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> > > > > org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> > > > > Invalid "mapreduce.jobtracker.address" configuration value for
> > > > > LocalJobRunner : "XXX:9001"
> > > > > 
> > > > > ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> > > > > 
> > > > > java.io.IOException: Cannot initialize Cluster. Please check your
> > > > > configuration for mapreduce.framework.name (http://mapreduce.framework.name) and the correspond server
> > > > > addresses.
> > > > > 
> > > > > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> > > > > 
> > > > > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> > > > > 
> > > > > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> > > > > 
> > > > > at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> > > > > 
> > > > > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> > > > > 
> > > > > at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> > > > > 
> > > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > > 
> > > > > at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> > > > > 
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org (mailto:user-unsubscribe@spark.apache.org)
> > > > > For additional commands, e-mail: user-help@spark.apache.org (mailto:user-help@spark.apache.org)
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org (mailto:user-unsubscribe@spark.apache.org)
> > > For additional commands, e-mail: user-help@spark.apache.org (mailto:user-help@spark.apache.org)
> > > 
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org (mailto:user-unsubscribe@spark.apache.org)
> For additional commands, e-mail: user-help@spark.apache.org (mailto:user-help@spark.apache.org)
> 
>

Re: distcp on ec2 standalone spark cluster

Posted by Tomer Benyamini <to...@gmail.com>.

Still no luck, even when running stop-all.sh followed by start-all.sh.

On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
<ni...@gmail.com> wrote:
> Tomer,
>
> Did you try start-all.sh? It worked for me the last time I tried using
> distcp, and it worked for this guy too.
>
> Nick
>
>
> On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini <to...@gmail.com> wrote:
>>
>> ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
>>
>> I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
>> ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
>> when trying to run distcp:
>>
>> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>>
>> java.io.IOException: Cannot initialize Cluster. Please check your
>> configuration for mapreduce.framework.name and the correspond server
>> addresses.
>>
>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>>
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>>
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>>
>> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>>
>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>>
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>>
>> Any idea?
>>
>> Thanks!
>> Tomer
>>
>> On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <ro...@gmail.com> wrote:
>> > If I recall, you should be able to start Hadoop MapReduce using
>> > ~/ephemeral-hdfs/sbin/start-mapred.sh.
>> >
>> > On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <to...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I would like to copy log files from s3 to the cluster's
>> >> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
>> >> running on the cluster - I'm getting the exception below.
>> >>
>> >> Is there a way to activate it, or is there a spark alternative to
>> >> distcp?
>> >>
>> >> Thanks,
>> >> Tomer
>> >>
>> >> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
>> >> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
>> >> Invalid "mapreduce.jobtracker.address" configuration value for
>> >> LocalJobRunner : "XXX:9001"
>> >>
>> >> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>> >>
>> >> java.io.IOException: Cannot initialize Cluster. Please check your
>> >> configuration for mapreduce.framework.name and the correspond server
>> >> addresses.
>> >>
>> >> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>> >>
>> >> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>> >>
>> >> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>> >>
>> >> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>> >>
>> >> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>> >>
>> >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>> >>
>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>
>> >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: distcp on ec2 standalone spark cluster

Posted by Nicholas Chammas <ni...@gmail.com>.

Tomer,

Did you try start-all.sh? It worked for me the last time I tried using
distcp, and it worked for this guy too
<http://stackoverflow.com/a/18083790/877069>.

Nick


On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini <to...@gmail.com> wrote:

> ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
>
> I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
> ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
> when trying to run distcp:
>
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
>
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>
> Any idea?
>
> Thanks!
> Tomer
>
> On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <ro...@gmail.com> wrote:
> > If I recall, you should be able to start Hadoop MapReduce using
> > ~/ephemeral-hdfs/sbin/start-mapred.sh.
> >
> > On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <to...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> I would like to copy log files from s3 to the cluster's
> >> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> >> running on the cluster - I'm getting the exception below.
> >>
> >> Is there a way to activate it, or is there a spark alternative to
> distcp?
> >>
> >> Thanks,
> >> Tomer
> >>
> >> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> >> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> >> Invalid "mapreduce.jobtracker.address" configuration value for
> >> LocalJobRunner : "XXX:9001"
> >>
> >> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> >>
> >> java.io.IOException: Cannot initialize Cluster. Please check your
> >> configuration for mapreduce.framework.name and the correspond server
> >> addresses.
> >>
> >> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> >>
> >> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> >>
> >> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> >>
> >> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> >>
> >> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> >>
> >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> >>
> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>
> >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: distcp on ec2 standalone spark cluster

Posted by Tomer Benyamini <to...@gmail.com>.

~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;

I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
when trying to run distcp:

ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.

at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)

at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)

at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

Any idea?

Thanks!
Tomer

On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <ro...@gmail.com> wrote:
> If I recall, you should be able to start Hadoop MapReduce using
> ~/ephemeral-hdfs/sbin/start-mapred.sh.
>
> On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <to...@gmail.com> wrote:
>>
>> Hi,
>>
>> I would like to copy log files from s3 to the cluster's
>> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
>> running on the cluster - I'm getting the exception below.
>>
>> Is there a way to activate it, or is there a spark alternative to distcp?
>>
>> Thanks,
>> Tomer
>>
>> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
>> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
>> Invalid "mapreduce.jobtracker.address" configuration value for
>> LocalJobRunner : "XXX:9001"
>>
>> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>>
>> java.io.IOException: Cannot initialize Cluster. Please check your
>> configuration for mapreduce.framework.name and the correspond server
>> addresses.
>>
>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>>
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>>
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>>
>> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>>
>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>>
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: distcp on ec2 standalone spark cluster

Posted by Josh Rosen <ro...@gmail.com>.

If I recall, you should be able to start Hadoop MapReduce using
~/ephemeral-hdfs/sbin/start-mapred.sh.

On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <to...@gmail.com> wrote:

> Hi,
>
> I would like to copy log files from s3 to the cluster's
> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> running on the cluster - I'm getting the exception below.
>
> Is there a way to activate it, or is there a spark alternative to distcp?
>
> Thanks,
> Tomer
>
> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> Invalid "mapreduce.jobtracker.address" configuration value for
> LocalJobRunner : "XXX:9001"
>
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
>
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: distcp on ec2 standalone spark cluster

Posted by Nicholas Chammas <ni...@gmail.com>.

I think you need to run start-all.sh or something similar on the EC2
cluster. MR is installed but is not running by default on EC2 clusters spun
up by spark-ec2.


On Sun, Sep 7, 2014 at 12:33 PM, Tomer Benyamini <to...@gmail.com>
wrote:

> I've installed a spark standalone cluster on ec2 as defined here -
> https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if
> mr1/2 is part of this installation.
>
>
> On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin <ad...@gmail.com> wrote:
> > Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce
> > cluster on your hdfs?
> > And from the error message, it seems that you didn't specify your
> jobtracker
> > address.
> >
> > --
> > Ye Xianjin
> > Sent with Sparrow
> >
> > On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:
> >
> > Hi,
> >
> > I would like to copy log files from s3 to the cluster's
> > ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> > running on the cluster - I'm getting the exception below.
> >
> > Is there a way to activate it, or is there a spark alternative to distcp?
> >
> > Thanks,
> > Tomer
> >
> > mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> > org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> > Invalid "mapreduce.jobtracker.address" configuration value for
> > LocalJobRunner : "XXX:9001"
> >
> > ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> >
> > java.io.IOException: Cannot initialize Cluster. Please check your
> > configuration for mapreduce.framework.name and the correspond server
> > addresses.
> >
> > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> >
> > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> >
> > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> >
> > at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> >
> > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> >
> > at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> >
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >
> > at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: distcp on ec2 standalone spark cluster

Posted by Tomer Benyamini <to...@gmail.com>.

I've installed a spark standalone cluster on ec2 as defined here -
https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if
mr1/2 is part of this installation.


On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin <ad...@gmail.com> wrote:
> Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce
> cluster on your hdfs?
> And from the error message, it seems that you didn't specify your jobtracker
> address.
>
> --
> Ye Xianjin
> Sent with Sparrow
>
> On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:
>
> Hi,
>
> I would like to copy log files from s3 to the cluster's
> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> running on the cluster - I'm getting the exception below.
>
> Is there a way to activate it, or is there a spark alternative to distcp?
>
> Thanks,
> Tomer
>
> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> Invalid "mapreduce.jobtracker.address" configuration value for
> LocalJobRunner : "XXX:9001"
>
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
>
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: distcp on ec2 standalone spark cluster

Posted by Ye Xianjin <ad...@gmail.com>.

Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce cluster on your hdfs? 
And from the error message, it seems that you didn't specify your jobtracker address.


-- 
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:

> Hi,
> 
> I would like to copy log files from s3 to the cluster's
> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> running on the cluster - I'm getting the exception below.
> 
> Is there a way to activate it, or is there a spark alternative to distcp?
> 
> Thanks,
> Tomer
> 
> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> Invalid "mapreduce.jobtracker.address" configuration value for
> LocalJobRunner : "XXX:9001"
> 
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> 
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name (http://mapreduce.framework.name) and the correspond server
> addresses.
> 
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> 
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> 
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> 
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> 
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> 
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> 
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org (mailto:user-unsubscribe@spark.apache.org)
> For additional commands, e-mail: user-help@spark.apache.org (mailto:user-help@spark.apache.org)
> 
>