You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by sangroya <sa...@gmail.com> on 2012/06/13 17:28:12 UTC

How to use GridMix3

Hi,

I want to use Gridmix3 benchmark with hadoop version 1.0.0.

I am following this link.

http://hadoop.apache.org/mapreduce/docs/current/gridmix.html

Here, it is mentioned that, to run Gridmix3, I need

data, users-list and traces

Can anyone suggest me where do I get them.

Is there any detailed documentation and tutorials on Gridmix3 benchmark. Can
somebody point me to these links.

1) How to use Gridmix3 benchmark? 

2) What are the inputs and outputs for Gridmix3 benchmark? e.g. What do we
mean by traces, data, and userlist.

3) How can we analyze the outputs of Gridmix3 benchmark? 



Thanks in advance,

Amit


-----
Sangroya
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-GridMix3-tp3989438.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: How to use GridMix3

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Amit,

I doubt that it is a problem with some version of hadoop. Could you please
post the stack trace if its available? Along with the output of hadoop
version? How did you set up your cluster? Did you setup mapred and hdfs
users? It seems to me more likely that the user trying to run the job,
(mapred? maybe) is not able to chmod that directory (which might be owned
by you?/some other user?)

Ravi

On Tue, Jun 19, 2012 at 7:42 AM, sangroya <sa...@gmail.com> wrote:

> Hi Ravi,
>
> Yes, I was using a local path instead of HDFS.
>
> I corrected this.
>
> But, now, I am getting the following issue:
>
>
> 12/06/19 14:38:08 INFO gridmix.SubmitterUserResolver:  Current user
> resolver
> is SubmitterUserResolver
> 12/06/19 14:38:08 WARN gridmix.Gridmix: Resource null ignored
> 12/06/19 14:38:08 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 12/06/19 14:38:08 ERROR security.UserGroupInformation:
> PriviledgedActionException as:sangroya cause:java.io.IOException: Failed to
> set permissions of path: /user/sangroya/TestGridmix/gridmix to 0777
>
>
> This seems to be a file permission issue.
>
> I granted 777 to this folder by:
>
>  bin/hadoop dfs -chmod -R 777 /user/sangroya
>
> But still, I get the same error.
>
> Searching the error on the web, it seems that it is a known issue in some
> hadoop version.
>
> Do you or anyone else have more idea on this.
>
> Thanks in advance,
> Amit
>
>
>
>
>
> -----
> Sangroya
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-use-GridMix3-tp3989438p3990285.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Re: How to use GridMix3

Posted by sangroya <sa...@gmail.com>.

Hi Ravi,

Yes, I was using a local path instead of HDFS.

I corrected this.

But, now, I am getting the following issue:


12/06/19 14:38:08 INFO gridmix.SubmitterUserResolver:  Current user resolver
is SubmitterUserResolver 
12/06/19 14:38:08 WARN gridmix.Gridmix: Resource null ignored
12/06/19 14:38:08 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/06/19 14:38:08 ERROR security.UserGroupInformation:
PriviledgedActionException as:sangroya cause:java.io.IOException: Failed to
set permissions of path: /user/sangroya/TestGridmix/gridmix to 0777


This seems to be a file permission issue.

I granted 777 to this folder by:

 bin/hadoop dfs -chmod -R 777 /user/sangroya

But still, I get the same error.

Searching the error on the web, it seems that it is a known issue in some
hadoop version.

Do you or anyone else have more idea on this.

Thanks in advance,
Amit





-----
Sangroya
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-GridMix3-tp3989438p3990285.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: How to use GridMix3

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Amit,

In your command the iopath directory ( ~/Desktop/test_gridmix_output )
doesn't seem to be an HDFS location. I believe it needs to be HDFS.

HTH
Ravi.


On Mon, Jun 18, 2012 at 11:16 AM, sangroya <sa...@gmail.com> wrote:

> Hello Ravi,
>
> Thanks for your response.
>
> I got started by running Rumen and generating the required trace file.
>
> However, while trying to run Gridmix with following command,
>
> java -classpath $JAR_CLASSPATH org.apache.hadoop.mapred.gridmix.Gridmix
> -generate 10m ~/Desktop/test_gridmix_output
> /home/username/Desktop/test_rumen_output/job-trace.json
>
>
> *I get the following error in starting Gridmix.
>
> 12/06/18 17:32:34 ERROR gridmix.Gridmix: Startup failed
> *
>
> Here, I suppose that:
>
> ~/Desktop/test_gridmix_output is the output data directory.
> /home/username/Desktop/test_rumen_output/job-trace.json is the output file
> of Rumen.
> 10m is the size of data to be generated
> $JAR_CLASSPATH contains the path of Gridmix.jar among other dependencies.
>
>
> I am running the test on a single node.
>
> Please suggest if there are any other hadoop related settings that I have
> to
> do.
>
> My version of hadoop is 1.0.0.
>
> Why do Gridmix raises this exception
> "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /home/username/Desktop/test already exists"
>
> I do not have this directory already.
>
>
> Here is the complete Error trace...............
>
>
>
> 2/06/18 17:32:23 INFO gridmix.SubmitterUserResolver:  Current user resolver
> is SubmitterUserResolver
> 12/06/18 17:32:23 WARN gridmix.Gridmix: Resource null ignored
> 12/06/18 17:32:23 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 12/06/18 17:32:23 INFO gridmix.Gridmix:  Submission policy is STRESS
> 12/06/18 17:32:24 INFO gridmix.Gridmix: Generating 1.0m of test data...
> 12/06/18 17:32:24 INFO mapred.JobClient: Cleaning up the staging area
>
> file:/tmp/hadoop-username/mapred/staging/username446004824/.staging/job_local_0001
> 12/06/18 17:32:24 ERROR security.UserGroupInformation:
> PriviledgedActionException as:username
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /home/username/Desktop/test already exists
> 12/06/18 17:32:24 ERROR security.UserGroupInformation:
> PriviledgedActionException as:username
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /home/username/Desktop/test already exists
> 12/06/18 17:32:24 WARN gridmix.JobSubmitter: Failed to submit
> GRIDMIX_GENDATA as username
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /home/username/Desktop/test already exists
>        at
>
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:416)
>        at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>        at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
>        at
> org.apache.hadoop.mapred.gridmix.GenerateData$1.run(GenerateData.java:116)
>        at
> org.apache.hadoop.mapred.gridmix.GenerateData$1.run(GenerateData.java:101)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:416)
>        at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>        at
> org.apache.hadoop.mapred.gridmix.GenerateData.call(GenerateData.java:101)
>        at
> org.apache.hadoop.mapred.gridmix.GenerateData.call(GenerateData.java:57)
>        at
>
> org.apache.hadoop.mapred.gridmix.JobSubmitter$SubmitTask.run(JobSubmitter.java:106)
>        at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:679)
> 12/06/18 17:32:24 INFO gridmix.JobMonitor:  Job submission failed notify if
> anyone is waiting org.apache.hadoop.mapreduce.Job@7bcd107f
> 12/06/18 17:32:34 INFO mapred.JobClient: Cleaning up the staging area
> file:/tmp/hadoop-username/mapred/staging/username-2025454801
> /.staging/job_local_0002
> 12/06/18 17:32:34 ERROR security.UserGroupInformation:
> PriviledgedActionException as:username
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /home/username/Desktop/test already exists
> 12/06/18 17:32:34 ERROR gridmix.Gridmix: Startup failed
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /home/username/Desktop/test already exists
>        at
>
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:416)
>        at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>        at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
>        at
> org.apache.hadoop.mapred.gridmix.Gridmix.writeInputData(Gridmix.java:118)
>        at org.apache.hadoop.mapred.gridmix.Gridmix.start(Gridmix.java:283)
>        at org.apache.hadoop.mapred.gridmix.Gridmix.runJob(Gridmix.java:263)
>        at
> org.apache.hadoop.mapred.gridmix.Gridmix.access$000(Gridmix.java:55)
>        at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:217)
>        at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:215)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:416)
>        at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>        at org.apache.hadoop.mapred.gridmix.Gridmix.run(Gridmix.java:215)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.mapred.gridmix.Gridmix.main(Gridmix.java:390)
>
>
> Thanks a lot!
> Amit
>
> -----
> Sangroya
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-use-GridMix3-tp3989438p3990160.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Re: How to use GridMix3

Posted by sangroya <sa...@gmail.com>.

Hello Ravi, 

Thanks for your response.

I got started by running Rumen and generating the required trace file.

However, while trying to run Gridmix with following command,

java -classpath $JAR_CLASSPATH org.apache.hadoop.mapred.gridmix.Gridmix
-generate 10m ~/Desktop/test_gridmix_output
/home/username/Desktop/test_rumen_output/job-trace.json


*I get the following error in starting Gridmix.

12/06/18 17:32:34 ERROR gridmix.Gridmix: Startup failed
*

Here, I suppose that:

~/Desktop/test_gridmix_output is the output data directory.
/home/username/Desktop/test_rumen_output/job-trace.json is the output file
of Rumen.
10m is the size of data to be generated
$JAR_CLASSPATH contains the path of Gridmix.jar among other dependencies.


I am running the test on a single node.

Please suggest if there are any other hadoop related settings that I have to
do.

My version of hadoop is 1.0.0.

Why do Gridmix raises this exception
"org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/username/Desktop/test already exists"

I do not have this directory already.


Here is the complete Error trace...............



2/06/18 17:32:23 INFO gridmix.SubmitterUserResolver:  Current user resolver
is SubmitterUserResolver 
12/06/18 17:32:23 WARN gridmix.Gridmix: Resource null ignored
12/06/18 17:32:23 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/06/18 17:32:23 INFO gridmix.Gridmix:  Submission policy is STRESS
12/06/18 17:32:24 INFO gridmix.Gridmix: Generating 1.0m of test data...
12/06/18 17:32:24 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-username/mapred/staging/username446004824/.staging/job_local_0001
12/06/18 17:32:24 ERROR security.UserGroupInformation:
PriviledgedActionException as:username
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/username/Desktop/test already exists
12/06/18 17:32:24 ERROR security.UserGroupInformation:
PriviledgedActionException as:username
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/username/Desktop/test already exists
12/06/18 17:32:24 WARN gridmix.JobSubmitter: Failed to submit
GRIDMIX_GENDATA as username
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/username/Desktop/test already exists
	at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
	at
org.apache.hadoop.mapred.gridmix.GenerateData$1.run(GenerateData.java:116)
	at
org.apache.hadoop.mapred.gridmix.GenerateData$1.run(GenerateData.java:101)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
	at
org.apache.hadoop.mapred.gridmix.GenerateData.call(GenerateData.java:101)
	at org.apache.hadoop.mapred.gridmix.GenerateData.call(GenerateData.java:57)
	at
org.apache.hadoop.mapred.gridmix.JobSubmitter$SubmitTask.run(JobSubmitter.java:106)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:679)
12/06/18 17:32:24 INFO gridmix.JobMonitor:  Job submission failed notify if
anyone is waiting org.apache.hadoop.mapreduce.Job@7bcd107f
12/06/18 17:32:34 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-username/mapred/staging/username-2025454801/.staging/job_local_0002
12/06/18 17:32:34 ERROR security.UserGroupInformation:
PriviledgedActionException as:username
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/username/Desktop/test already exists
12/06/18 17:32:34 ERROR gridmix.Gridmix: Startup failed
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/username/Desktop/test already exists
	at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
	at
org.apache.hadoop.mapred.gridmix.Gridmix.writeInputData(Gridmix.java:118)
	at org.apache.hadoop.mapred.gridmix.Gridmix.start(Gridmix.java:283)
	at org.apache.hadoop.mapred.gridmix.Gridmix.runJob(Gridmix.java:263)
	at org.apache.hadoop.mapred.gridmix.Gridmix.access$000(Gridmix.java:55)
	at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:217)
	at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:215)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
	at org.apache.hadoop.mapred.gridmix.Gridmix.run(Gridmix.java:215)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.mapred.gridmix.Gridmix.main(Gridmix.java:390)


Thanks a lot!
Amit

-----
Sangroya
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-GridMix3-tp3989438p3990160.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: How to use GridMix3

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Amit,

Gridmix generates its own data. You can get a trace by

1. Running a set of jobs on your own cluster
2. Copy the jobhistory files (usually in /mapred/history/done in HDFS) to
local disk
3. Run Rumen on the history files (which is also easy
http://hadoop.apache.org/mapreduce/docs/current/rumen.html )

OR if you have a compelling research requirement, from Yahoo's WebScope
http://webscope.sandbox.yahoo.com/catalog.php?datatype=s

I don't think you need users-list if you run without security.

Hope this helps
Ravi

On Wed, Jun 13, 2012 at 8:28 AM, sangroya <sa...@gmail.com> wrote:

> Hi,
>
> I want to use Gridmix3 benchmark with hadoop version 1.0.0.
>
> I am following this link.
>
> http://hadoop.apache.org/mapreduce/docs/current/gridmix.html
>
> Here, it is mentioned that, to run Gridmix3, I need
>
> data, users-list and traces
>
> Can anyone suggest me where do I get them.
>
> Is there any detailed documentation and tutorials on Gridmix3 benchmark.
> Can
> somebody point me to these links.
>
> 1) How to use Gridmix3 benchmark?
>
> 2) What are the inputs and outputs for Gridmix3 benchmark? e.g. What do we
> mean by traces, data, and userlist.
>
> 3) How can we analyze the outputs of Gridmix3 benchmark?
>
>
>
> Thanks in advance,
>
> Amit
>
>
> -----
> Sangroya
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-use-GridMix3-tp3989438.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>