You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Evan Pollan (Updated) (JIRA)" <ji...@apache.org> on 2012/02/01 00:40:04 UTC

[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evan Pollan updated MAPREDUCE-3583:
-----------------------------------

    Priority: Critical  (was: Major)

This became a critical blocking issue for me today.  This is preventing distcp commands from completing successfully on two different CDH3 update 2 environment's I'm using, meaning I cannot do any offline log processing/analytics.

I think the above analysis of the failure is a bit off -- it's not actually the pid that's blowing up the number parsing:  it's one of the (presumed) longs.  The code is extracting capture groups 7, 8, 10, and 11, parsing them as signed 64-bit longs, and interpreting them as utime, stime, vsize, and rss, respectively.

Here's an example of the contents of a /proc/X/stat file on one of my affected systems, listed in conjunction with how the man page describes each field

| pid |	1686 |
| comm | (ssh) |
| state | S |
| ppid | 1685 |
| pgrp | 1672 |
| session | 1415 |
| tty_nr | 34816 |
| tpgid | 4884 |
| flags | 4202496 |
| minflt | 1922 |
| cminflt | 0 |
| majflt | 3 |
| cmajflt | 0 |
| utime | 67 |
| stime | 82 |
| cutime | 0 |
| cstime | 0 |
| priority | 20 |
| nice | 0 |
| num_threads | 1 |
| itrealvalue | 0 |
| starttime | 144184 |
| vsize | 62341120 |
| rss | 1120 |
| rsslim | 18,446,744,073,709,500,000 |
| startcode | 139,935,780,638,720 |
| endcode | 139,935,781,007,452 |
| startstack | 140,735,070,560,080 |
| kstkesp | 140,735,070,553,640 |
| kstkeip | 139,935,743,316,835 |
| signal | 0 |
| blocked | 0 |
| sigignore | 4102 |
| sigcatch | 134234113 |
| wchan | 18,446,744,071,579,900,000 |
| nswap | 0 |
| cnswap | 0 |
| exit_signal | 17 |
| processor | 0 |
| rt_priority | 0 |
| policy | 0 |
| delayacct_blkio_ticks	| 2 |
| guest_time | 0 |
| cguest_time | 0 |

As I said, I'm using cloudera CDH3U2, and the relevant regexp pattern used to capture /proc/X/stat fields is:

{code}
  private static final Pattern PROCFS_STAT_FILE_FORMAT = Pattern
      .compile("^([0-9-]+)\\s([^\\s]+)\\s[^\\s]\\s([0-9-]+)\\s([0-9-]+)\\s([0-9-]+)\\s([0-9-]+\\s){16}([0-9]+)(\\s[0-9-]+){16}");
{code}

The parsing code is:

{code}
        // Set ( name ) ( ppid ) ( pgrpId ) (session ) (vsize )
        pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), Integer
            .parseInt(m.group(4)), Integer.parseInt(m.group(5)), Long
            .parseLong(m.group(7)));
{code}

The thing that's baffling me is that the field the Long.parseLong is choking on is nowhere to be found in the contents of any /proc/X/stat file that exists while the job is running.  E.g., :

{code}
2/01/31 23:31:03 INFO tools.DistCp: sourcePathsCount=1
12/01/31 23:31:03 INFO tools.DistCp: filesToCopyCount=1
12/01/31 23:31:03 INFO tools.DistCp: bytesToCopyCount=122.0k
12/01/31 23:31:03 INFO mapred.JobClient: Running job: job_201201312321_0002
12/01/31 23:31:04 INFO mapred.JobClient:  map 0% reduce 0%
12/01/31 23:31:08 INFO mapred.JobClient: Task Id : attempt_201201312321_0002_m_000002_0, Status : FAILED
java.lang.NumberFormatException: For input string: "18446744073709551532"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
	at java.lang.Long.parseLong(Long.java:422)
	at java.lang.Long.parseLong(Long.java:468)
	at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
	at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
	at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
	at org.apache.hadoop.mapred.Task.initialize(Task.java:532)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:306)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
{code} 

Here's what the entire set of /proc/X/stat files look like while this job is running (I'm looking at the /proc file system on the only task tracker/data node in the cluster) -- if Long.parseLong was going to fail, I assume it would choke on '18446744073709551615'.:
{code}
10 (async/mgr) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
11 (xenwatch) S 2 0 0 0 -1 2149613888 0 0 0 0 1 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
120 (upstart-udev-br) S 1 119 119 0 -1 4202560 215 0 0 0 6 0 0 0 20 0 1 0 234 17444864 239 18446744073709551615 140724289748992 140724289787412 0 0 0 0 0 4097 81920 18446744073709551615 0 0 17 1 0 0 0 0 0
122 (udevd) S 1 122 122 0 -1 4202816 636 23063 0 13 1 3 105 15 16 -4 1 0 235 17289216 164 18446744073709551615 140382903398400 140382903499908 0 0 0 0 2147221247 0 0 18446744073709551615 0 0 17 0 0 0 0 0 0
12 (xenbus) S 2 0 0 0 -1 2149613632 0 0 0 0 1 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
14 (migration/1) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 -100 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 99 1 0 0 0
15 (ksoftirqd/1) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
16 (watchdog/1) S 2 0 0 0 -1 2216722752 0 0 0 0 0 0 0 0 -100 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 99 1 0 0 0
1719 (avahi-daemon) S 1 1718 1718 0 -1 4202816 446 0 0 0 2 0 0 0 20 0 1 0 5195 34873344 418 18446744073709551615 4194304 4307028 0 0 0 0 0 3674112 16903 18446744073709551615 0 0 17 0 0 0 0 0 0
1720 (avahi-daemon) S 1719 1720 1720 0 -1 4202560 90 0 0 0 0 0 0 0 20 0 1 0 5195 34742272 143 18446744073709551615 4194304 4307028 0 0 0 0 0 3670016 0 18446744073709551615 0 0 17 1 0 0 0 0 0
17 (events/1) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
188 (udevd) S 122 122 122 0 -1 4202816 91 0 0 0 0 0 0 0 18 -2 1 0 241 17285120 160 18446744073709551615 140382903398400 140382903499908 0 0 0 0 2147196671 0 24576 18446744073709551615 0 0 17 2 0 0 0 0 0
189 (udevd) S 122 122 122 0 -1 4202816 89 0 0 0 0 0 0 0 18 -2 1 0 241 17285120 159 18446744073709551615 140382903398400 140382903499908 0 0 0 0 2147196671 0 24576 18446744073709551615 0 0 17 3 0 0 0 0 0
18 (migration/2) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 -100 0 1 0 126 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 99 1 0 0 0
19 (ksoftirqd/2) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 126 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
1 (init) S 0 1 1 0 -1 4202752 5633 1150472 28 692 8 15 -5995191823955592639 -3228180212899177193 20 0 1 0 93 24281088 475 18446744073709551615 140133429972992 140133430091084 0 0 0 0 0 4096 536946211 18446744073709551615 0 0 0 0 0 0 0 0 0
20 (watchdog/2) S 2 0 0 0 -1 2216722752 0 0 0 0 0 0 0 0 -100 0 1 0 126 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 99 1 0 0 0
21 (events/2) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 126 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
22 (migration/3) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 -100 0 1 0 158 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 99 1 0 0 0
23 (ksoftirqd/3) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 158 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
24 (watchdog/3) S 2 0 0 0 -1 2216722752 0 0 0 0 0 0 0 0 -100 0 1 0 158 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 99 1 0 0 0
25 (events/3) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 158 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
26 (sync_supers) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
27 (bdi-default) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
28 (kintegrityd/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
2928 (su) S 1 2772 2772 0 -1 4202752 853 0 1 0 1 0 0 0 20 0 1 0 11368 48869376 442 18446744073709551615 4194304 4224396 0 0 0 0 2147196671 1 16384 18446744073709551615 0 0 17 0 0 0 0 0 0
2937 (java) S 2928 2772 2772 0 -1 4202496 32799 579 0 1 233 19 2 0 20 0 40 0 11383 1438593024 22308 18446744073709551615 1073741824 1073778416 0 0 0 0 0 1 16800974 18446744073709551615 0 0 17 3 0 0 0 0 0
29 (kintegrityd/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
2 (kthreadd) S 0 0 0 0 -1 2149613632 0 0 0 0 1 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 0 2 0 0 0 0 0
30 (kintegrityd/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
3106 (su) S 1 2772 2772 0 -1 4202752 853 0 0 0 1 0 0 0 20 0 1 0 12056 48869376 443 18446744073709551615 4194304 4224396 0 0 0 0 2147196671 1 16384 18446744073709551615 0 0 17 0 0 0 0 0 0
3115 (java) S 3106 2772 2772 0 -1 4202496 45170 166286 0 1 451 18446744073709551522 1666 10 20 0 42 0 12058 1450819584 31461 18446744073709551615 1073741824 1073778416 0 0 0 0 0 1 16800974 18446744073709551615 0 0 17 3 0 0 0 0 0
319 (dhclient3) S 1 319 319 0 -1 4202560 59 0 0 0 0 0 0 0 20 0 1 0 599 6713344 85 18446744073709551615 140272284925952 140272285354020 0 0 0 0 0 0 0 18446744073709551615 0 0 17 0 0 0 0 0 0
31 (kintegrityd/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
3222 (sshd) S 474 3222 3222 0 -1 4202752 1152 26602 0 0 2 0 76 16 20 0 1 0 22284 83111936 879 18446744073709551615 139945989926912 139945990366548 0 0 0 0 0 4096 16387 18446744073709551615 0 0 17 0 0 0 0 0 0
32 (kblockd/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
3301 (sshd) S 3222 3222 3222 0 -1 4202816 295 0 0 0 1 0 0 0 20 0 1 0 22441 83111936 415 18446744073709551615 139945989926912 139945990366548 0 0 0 0 0 4096 65536 18446744073709551615 0 0 17 0 0 0 0 0 0
3302 (bash) S 3301 3302 3302 34816 3826 4202496 8064 78426 1 2 3 12 291 31 20 0 1 0 22442 19914752 553 18446744073709551615 4194304 5087404 140734376532864 18446744073709551615 139668755529598 0 65536 3686404 1266761467 18446744071579111781 0 0 17 3 0 0 0 0 0
33 (kblockd/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
34 (kblockd/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
35 (kblockd/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
36 (kseriod) S 2 0 0 0 -1 2149580864 0 0 0 0 0 0 0 0 20 0 1 0 194 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
377 (flush-1:0) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
378 (flush-1:1) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
379 (flush-1:2) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
380 (flush-1:3) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
381 (flush-1:4) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
382 (flush-1:5) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
383 (flush-1:6) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
384 (flush-1:7) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
385 (flush-1:8) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
386 (flush-1:9) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
387 (flush-1:10) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
388 (flush-1:11) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
389 (flush-1:12) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
390 (flush-1:13) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
391 (flush-1:14) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
392 (flush-1:15) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
393 (flush-8:1) S 2 0 0 0 -1 2157973568 0 0 0 0 1 14 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
394 (flush-8:16) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
395 (flush-8:32) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
396 (flush-8:48) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
397 (flush-8:64) S 2 0 0 0 -1 2157973568 0 0 0 0 0 0 0 0 20 0 1 0 701 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
3 (migration/0) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 -100 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 99 1 0 0 0
41 (khungtaskd) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
43 (kswapd0) S 2 0 0 0 -1 2158233664 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
44 (aio/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
458 (rsyslogd) S 1 426 426 0 -1 4202816 386 0 1 0 1 2 0 0 20 0 4 0 851 133304320 395 18446744073709551615 4194304 4462780 0 0 0 0 0 16781830 85025 18446744073709551615 0 0 17 0 0 0 0 0 0
45 (aio/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
46 (aio/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
474 (sshd) S 1 474 474 0 -1 4202816 229 63238 0 40 1 0 283 28 20 0 1 0 854 50442240 271 18446744073709551615 140147055022080 140147055461716 0 0 0 0 0 4096 81925 18446744073709551615 0 0 17 1 0 0 0 0 0
475 (dbus-daemon) S 1 475 475 0 -1 4202816 368 54 0 0 2 0 0 0 20 0 1 0 855 24141824 342 18446744073709551615 140490573344768 140490573663756 0 0 0 0 0 4096 16385 18446744073709551615 0 0 17 0 0 0 0 0 0
478 (kjournald) S 2 0 0 0 -1 2149613632 0 0 0 0 1 0 0 0 20 0 1 0 856 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
47 (aio/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
48 (jfsIO) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
49 (jfsCommit) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
4 (ksoftirqd/0) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
503 (atd) S 1 503 503 0 -1 4202560 85 0 0 0 0 0 0 0 20 0 1 0 886 19337216 116 18446744073709551615 4194304 4210820 0 0 0 0 0 0 81923 18446744073709551615 0 0 17 0 0 0 0 0 0
504 (cron) S 1 504 504 0 -1 4202560 257 0 0 0 1 0 0 0 20 0 1 0 886 21581824 254 18446744073709551615 4194304 4228572 0 0 0 0 0 0 65537 18446744073709551615 0 0 17 0 0 0 0 0 0
50 (jfsCommit) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
51 (jfsCommit) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
52 (jfsCommit) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
53 (jfsSync) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
54 (xfs_mru_cache) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
558 (getty) S 1 558 558 1025 558 4202496 199 0 1 0 0 1 0 0 20 0 1 0 931 6225920 162 18446744073709551615 4194304 4210980 0 0 0 0 0 0 0 18446744073709551615 0 0 17 1 0 0 0 0 0
55 (xfslogd/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
563 (console-kit-dae) S 1 475 475 0 -1 4202752 2810 8142 26 7 4611686018427387902 0 22 2 20 0 3 0 1288 327077888 1000 18446744073709551615 4194304 4326916 0 0 0 0 0 4096 66048 18446744073709551615 0 0 17 0 0 0 0 0 0
56 (xfslogd/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
57 (xfslogd/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
58 (xfslogd/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
59 (xfsdatad/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
5 (watchdog/0) S 2 0 0 0 -1 2216722752 0 0 0 0 0 0 0 0 -100 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 99 1 0 0 0
60 (xfsdatad/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
61 (xfsdatad/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
62 (xfsdatad/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
63 (xfsconvertd/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
64 (xfsconvertd/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
65 (xfsconvertd/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
66 (xfsconvertd/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
67 (glock_workqueue) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
68 (glock_workqueue) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
69 (glock_workqueue) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
6 (events/0) S 2 0 0 0 -1 2216722496 0 0 0 0 1 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
70 (glock_workqueue) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
71 (delete_workqueu) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
72 (delete_workqueu) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
73 (delete_workqueu) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
74 (delete_workqueu) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
75 (kslowd000) S 2 0 0 0 -1 2149580864 0 0 0 0 0 0 0 0 15 -5 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
76 (kslowd001) S 2 0 0 0 -1 2149580864 0 0 0 0 0 0 0 0 15 -5 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
77 (crypto/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
78 (crypto/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
79 (crypto/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
7 (cpuset) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
80 (crypto/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 195 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
83 (net_accel/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 200 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
84 (net_accel/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 200 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
85 (net_accel/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 200 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
86 (net_accel/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 200 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
87 (sfc_netfront/0) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 203 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
88 (sfc_netfront/1) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 203 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
89 (sfc_netfront/2) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 203 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
8 (khelper) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
90 (sfc_netfront/3) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 20 0 1 0 203 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 0 0 0 0 0
91 (kstriped) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 203 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 0 0 0 0 0
92 (kjournald) S 2 0 0 0 -1 2149613632 0 0 0 0 6 0 0 0 20 0 1 0 216 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 0 0 0 0 0
9 (netns) S 2 0 0 0 -1 2149613632 0 0 0 0 0 0 0 0 20 0 1 0 93 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 0 0 0 0 0
3916 (cat) R 3302 3916 3302 34816 3916 4202496 248 0 0 0 0 0 0 0 20 0 1 0 26900 5636096 182 18446744073709551615 4194304 4247204 140736357334480 18446744073709551615 139795312997536 0 0 0 0 0 0 0 17 1 0 0 0 0 0
{code}

This is using the Ubuntu 10.04 64 bit AMI (us-east-1/ami-da0cf8b3), cluster created by whirr-0.6.0-incubating.

Any ideas here?  I'm dead in the water.  I'm going to take a stab at using CDH3U3, which just came out, but since this defect isn't yet resolved, I'm not holding out much hope.
                
> ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.205.0
>         Environment: 64-bit Linux:
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux
>            Reporter: Zhihong Yu
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: mapreduce-3583.txt
>
>
> HBase PreCommit builds frequently gave us NumberFormatException.
> From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
> {code}
> 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> java.lang.NumberFormatException: For input string: "18446743988060683582"
> 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> 	at java.lang.Long.parseLong(Long.java:422)
> 	at java.lang.Long.parseLong(Long.java:468)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
> 	at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
> 	at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE:
> {code}
>         // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
>          pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
> {code}
> You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
> {code}
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 20
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 16382
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 60000
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 2048
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> 60000
> Running in Jenkins mode
> {code}
> From Nicolas Sze:
> {noformat}
> It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers.  In your case,
>   2^64 > 18446743988060683582 > 2^63.
> Therefore, there is a NFE. 
> {noformat}
> I propose changing allProcessInfo to Map<String, ProcessInfo> so that we don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira