You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2016/07/26 08:17:20 UTC

[jira] [Commented] (PIG-4937) Pigmix hangs when generating data after rows is set as 625000000 in test/perf/pigmix/conf/config.sh

    [ https://issues.apache.org/jira/browse/PIG-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393408#comment-15393408 ] 

liyunzhang_intel commented on PIG-4937:
---------------------------------------

[~rohini] and [~daijy]:
 After generating all the test data(1TB), I have run first round of test in mr mode.
The cluster has 8 nodes(each node has 40 cores and 60g memory, will assign 28 cores and 56g for  nodemanager on the node).  Total cores and memory for the cluster is 224 cores and 448g memory.

The snippet of yarn-site.xml:
{code}
<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>57344</value>
    <description>the amount of memory on the NodeManager in MB</description>
  </property>
   <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>28</value>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>2048</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>57344</value>
  </property>
    <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
    <description>Whether virtual memory limits will be enforced for containers</description>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>4</value>
    <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
  </property>
{code}
The snippet of mapred-site.xml is
{code}
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx1638m</value>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx3276m</value>
  </property>
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>2048</value>
  </property>
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>4096</value>
  </property>
  <property>
    <name>mapreduce.task.io.sort.mb</name>
    <value>820</value>
  </property>
  <property>
    <name>mapred.task.timeout</name>
    <value>1200000</value>
  </property>
{code}

The snippet of hdfs-site.xml
{code}
<property>
    <name>dfs.blocksize</name>
    <value>1124217344</value>
  </property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>1200000</value>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>1200000</value>
</property>
{code}
The result of last run of pigmix in mr mode(L9,10,13,14,17 fail). It shows that the average time spent on one script is nearly 6 hours.  I don't know whether it really need so much time to run L1~L17?  Can you share configuration and expected result with me if you have some experience on it ?
||Script||MR
|L_1|21544
|L_2|20482
|L_3|21629
|L_4|20905
|L_5|20738
|L_6|24131
|L_7|21983
|L_8|24549
|L_9|6585(Fail)
|L_10|22286(Fail)
|L_11|21849
|L_12|21266
|L_13|11099(Fail)
|L_14|43(Fail)
|L_15|23808
|L_16|42889
|L_17|10(Fail)




> Pigmix hangs when generating data after rows  is set as 625000000 in  test/perf/pigmix/conf/config.sh
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4937
>                 URL: https://issues.apache.org/jira/browse/PIG-4937
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>         Attachments: pigmix1.PNG, pigmix2.PNG
>
>
> use the default setting in test/perf/pigmix/conf/config.sh, generate data by
> "ant -v -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23  pigmix-deploy >ant.pigmix.deploy"
> it hangs in the log:
> {code}
>  [exec] Generating mapping file for column d:1:100000:z:5 into hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp-786100428
>      [exec] processed 99%.
>      [exec] Generating input files into hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp595036324
>      [exec] Submit hadoop job...
>      [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to ResourceManager at bdpe41/10.239.47.137:8032
>      [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to ResourceManager at bdpe41/10.239.47.137:8032
>      [exec] 16/06/25 23:06:32 INFO mapred.FileInputFormat: Total input paths to process : 90
>      [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: number of splits:90
>      [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1466776148247_0034
>      [exec] 16/06/25 23:06:33 INFO impl.YarnClientImpl: Submitted application application_1466776148247_0034
>      [exec] 16/06/25 23:06:33 INFO mapreduce.Job: The url to track the job: http://bdpe41:8088/proxy/application_1466776148247_0034/     [exec] 16/06/25 23:06:33 INFO mapreduce.Job: Running job: job_1466776148247_0034
>      [exec] 16/06/25 23:06:38 INFO mapreduce.Job: Job job_1466776148247_0034 running in uber mode : false
>      [exec] 16/06/25 23:06:38 INFO mapreduce.Job:  map 0% reduce 0%
>      [exec] 16/06/25 23:06:53 INFO mapreduce.Job:  map 2% reduce 0%
>      [exec] 16/06/25 23:06:59 INFO mapreduce.Job:  map 26% reduce 0%
>      [exec] 16/06/25 23:07:00 INFO mapreduce.Job:  map 61% reduce 0%
>      [exec] 16/06/25 23:07:02 INFO mapreduce.Job:  map 62% reduce 0%
>      [exec] 16/06/25 23:07:03 INFO mapreduce.Job:  map 64% reduce 0%
>      [exec] 16/06/25 23:07:04 INFO mapreduce.Job:  map 79% reduce 0%
>      [exec] 16/06/25 23:07:05 INFO mapreduce.Job:  map 86% reduce 0%
>      [exec] 16/06/25 23:07:06 INFO mapreduce.Job:  map 92% reduce 0%
> {code}
> When i use 625000 as the rows in test/perf/pigmix/conf/config.sh, it is successful to generate test data. So is the problem on the limit resources(disk size or others)?  My env is 3 nodes cluster(each node has about a disk(830G)) and i assign memory and cpu in the yarn-site.xml like following:
> {code}
>  yarn.nodemanager.resource.memory-mb=56G
>  yarn.nodemanger.resource.cpu-vcores=28
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)