You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whirr.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2010/12/01 06:33:10 UTC
[jira] Created: (WHIRR-148) Hadoop jobs fail on large EC2 instances
Hadoop jobs fail on large EC2 instances
---------------------------------------
Key: WHIRR-148
URL: https://issues.apache.org/jira/browse/WHIRR-148
Project: Whirr
Issue Type: Bug
Components: service/hadoop
Affects Versions: 0.3.0
Reporter: Tom White
Assignee: Tom White
When using a m1.large or c1.xlarge hardware-id, jobs fail with a error like:
{noformat}
FAILED
java.io.IOException: Task process exit with nonzero status of 134.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (WHIRR-148) Hadoop jobs fail on large EC2
instances, possibly RHEL6 related
Posted by "Lars George (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars George updated WHIRR-148:
------------------------------
Summary: Hadoop jobs fail on large EC2 instances, possibly RHEL6 related (was: Hadoop jobs fail on large EC2 instances)
> Hadoop jobs fail on large EC2 instances, possibly RHEL6 related
> ---------------------------------------------------------------
>
> Key: WHIRR-148
> URL: https://issues.apache.org/jira/browse/WHIRR-148
> Project: Whirr
> Issue Type: Bug
> Components: service/hadoop
> Affects Versions: 0.3.0
> Reporter: Tom White
> Assignee: Tom White
>
> When using a m1.large or c1.xlarge hardware-id, jobs fail with a error like:
> {noformat}
> FAILED
> java.io.IOException: Task process exit with nonzero status of 134.
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (WHIRR-148) Hadoop jobs fail on large EC2
instances
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965572#action_12965572 ]
Tom White commented on WHIRR-148:
---------------------------------
This seems not to be a problem with the JVM (the failure occurs with 1.6.0_17 and 1.6.0_21), but the AMI. The failure was seen with amzn-ami-0.9.9-beta.x86_64-S3 (ami-827185eb) which was automatically picked by Whirr/jclouds, but when setting {{whirr.image-id=us-east-1/ami-da0cf8b3}} (Ubuntu 10.04 LTS Lucid) the job succeeds.
I think we should test configurations using WHIRR-92 and document them using WHIRR-145.
> Hadoop jobs fail on large EC2 instances
> ---------------------------------------
>
> Key: WHIRR-148
> URL: https://issues.apache.org/jira/browse/WHIRR-148
> Project: Whirr
> Issue Type: Bug
> Components: service/hadoop
> Affects Versions: 0.3.0
> Reporter: Tom White
> Assignee: Tom White
>
> When using a m1.large or c1.xlarge hardware-id, jobs fail with a error like:
> {noformat}
> FAILED
> java.io.IOException: Task process exit with nonzero status of 134.
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (WHIRR-148) Hadoop jobs fail on large EC2
instances, possibly RHEL6 related
Posted by "Lars George (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/WHIRR-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977424#action_12977424 ]
Lars George commented on WHIRR-148:
-----------------------------------
This happened again in a different context and it seems that may be caused by RHEL6. It has glibc 2.11 which includes a new allocator which uses per-thread heaps for fast allocation. By default it will map 64M chunks and up to 8*num_cores on a 64-bit system, so you can expect 4G of virtual memory usage in any highly threaded app. You can constrain the number of allocation arenas by setting MALLOC_ARENA_MAX=4 for example. Or even lower. We may simply add this to the "init" scripts and try?
> Hadoop jobs fail on large EC2 instances, possibly RHEL6 related
> ---------------------------------------------------------------
>
> Key: WHIRR-148
> URL: https://issues.apache.org/jira/browse/WHIRR-148
> Project: Whirr
> Issue Type: Bug
> Components: service/hadoop
> Affects Versions: 0.3.0
> Reporter: Tom White
> Assignee: Tom White
>
> When using a m1.large or c1.xlarge hardware-id, jobs fail with a error like:
> {noformat}
> FAILED
> java.io.IOException: Task process exit with nonzero status of 134.
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.