You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mitesh Singh Jat (JIRA)" <ji...@apache.org> on 2013/09/18 17:52:00 UTC
[jira] [Comment Edited] (NUTCH-1640) OOM in ParseSegment Phase
[ https://issues.apache.org/jira/browse/NUTCH-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770902#comment-13770902 ]
Mitesh Singh Jat edited comment on NUTCH-1640 at 9/18/13 3:51 PM:
------------------------------------------------------------------
When a test run done with 65312 records, ParseSegment MR Job status
{noformat}
13/09/18 21:02:13 INFO mapred.JobClient: Job complete: job_201308301130_0517
13/09/18 21:02:13 INFO mapred.JobClient: Counters: 32
13/09/18 21:02:13 INFO mapred.JobClient: ParserStatus
13/09/18 21:02:13 INFO mapred.JobClient: failed=96
13/09/18 21:02:13 INFO mapred.JobClient: success=52393
13/09/18 21:02:13 INFO mapred.JobClient: Job Counters
13/09/18 21:02:13 INFO mapred.JobClient: Launched reduce tasks=1
13/09/18 21:02:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=402103
13/09/18 21:02:13 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/18 21:02:13 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/18 21:02:13 INFO mapred.JobClient: Launched map tasks=3
13/09/18 21:02:13 INFO mapred.JobClient: Data-local map tasks=3
13/09/18 21:02:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=125910
13/09/18 21:02:13 INFO mapred.JobClient: File Input Format Counters
13/09/18 21:02:13 INFO mapred.JobClient: Bytes Read=169842629
13/09/18 21:02:13 INFO mapred.JobClient: File Output Format Counters
13/09/18 21:02:13 INFO mapred.JobClient: Bytes Written=0
13/09/18 21:02:13 INFO mapred.JobClient: FileSystemCounters
13/09/18 21:02:13 INFO mapred.JobClient: FILE_BYTES_READ=79839439
13/09/18 21:02:13 INFO mapred.JobClient: HDFS_BYTES_READ=169847735
13/09/18 21:02:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=137342604
13/09/18 21:02:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=67039214
13/09/18 21:02:13 INFO mapred.JobClient: Map-Reduce Framework
13/09/18 21:02:13 INFO mapred.JobClient: Map output materialized bytes=57153030
13/09/18 21:02:13 INFO mapred.JobClient: Map input records=65312
13/09/18 21:02:13 INFO mapred.JobClient: Reduce shuffle bytes=57153030
13/09/18 21:02:13 INFO mapred.JobClient: Spilled Records=125780
13/09/18 21:02:13 INFO mapred.JobClient: Map output bytes=202984117
13/09/18 21:02:13 INFO mapred.JobClient: Total committed heap usage (bytes)=682557440
13/09/18 21:02:13 INFO mapred.JobClient: CPU time spent (ms)=398190
13/09/18 21:02:13 INFO mapred.JobClient: Map input bytes=169837445
13/09/18 21:02:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=435
13/09/18 21:02:13 INFO mapred.JobClient: Combine input records=0
13/09/18 21:02:13 INFO mapred.JobClient: Reduce input records=52489
13/09/18 21:02:13 INFO mapred.JobClient: Reduce input groups=52489
13/09/18 21:02:13 INFO mapred.JobClient: Combine output records=0
13/09/18 21:02:13 INFO mapred.JobClient: Physical memory (bytes) snapshot=1939349504
13/09/18 21:02:13 INFO mapred.JobClient: Reduce output records=52489
13/09/18 21:02:13 INFO mapred.JobClient: Virtual memory (bytes) snapshot=32182784000
13/09/18 21:02:13 INFO mapred.JobClient: Map output records=52489
13/09/18 21:02:13 INFO parse.ParseSegment: ParseSegment: finished at 2013-09-18 21:02:13, elapsed: 00:05:04
{noformat}
jstat gccause results
{noformat}
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
...
10.1 50.30 2.15 100.00 75.54 96.81 7 0.079 0 0.000 0.079 unknown GCCause Allocation Failure
32.3 0.00 12.16 93.16 80.23 99.30 32 0.468 0 0.000 0.468 unknown GCCause Allocation Failure
38.1 34.72 0.00 98.54 81.65 99.99 43 0.709 0 0.000 0.709 unknown GCCause Allocation Failure
40.7 33.33 0.00 100.00 82.19 99.35 46 0.813 0 0.000 0.813 unknown GCCause Allocation Failure
43.9 0.00 37.50 98.02 82.88 99.48 52 0.956 0 0.000 0.956 unknown GCCause Allocation Failure
45.7 35.00 45.45 100.00 83.28 99.57 54 1.014 0 0.000 1.014 unknown GCCause Allocation Failure
49.2 0.00 58.34 97.81 83.67 99.64 58 1.167 0 0.000 1.167 unknown GCCause Allocation Failure
53.2 75.00 0.00 99.88 84.31 99.75 62 1.349 0 0.000 1.349 unknown GCCause Allocation Failure
55.2 0.00 96.49 97.67 84.71 99.98 66 1.460 0 0.000 1.460 unknown GCCause Allocation Failure
57.7 69.45 38.89 100.00 85.26 99.35 69 1.569 0 0.000 1.569 unknown GCCause Allocation Failure
59.7 44.44 69.45 100.00 85.68 99.40 72 1.685 0 0.000 1.685 unknown GCCause Allocation Failure
61.4 0.00 91.67 96.20 85.81 99.44 74 1.773 0 0.000 1.773 unknown GCCause Allocation Failure
63.7 0.00 68.75 98.69 86.08 99.47 76 1.861 0 0.000 1.861 unknown GCCause Allocation Failure
66.5 0.00 56.25 100.00 86.65 99.53 80 2.028 0 0.000 2.028 unknown GCCause Allocation Failure
69.0 43.75 62.50 100.00 87.04 99.78 82 2.125 0 0.000 2.125 unknown GCCause Allocation Failure
71.0 78.13 0.00 99.98 87.34 99.81 85 2.271 0 0.000 2.271 unknown GCCause Allocation Failure
74.7 0.00 64.29 100.00 88.05 99.87 90 2.548 0 0.000 2.548 unknown GCCause Allocation Failure
76.2 0.00 57.14 99.99 88.33 99.90 92 2.652 0 0.000 2.652 unknown GCCause Allocation Failure
77.5 57.14 75.00 100.00 88.73 99.93 94 2.744 0 0.000 2.744 unknown GCCause Allocation Failure
89.4 42.86 71.43 100.00 90.75 99.41 110 3.681 0 0.000 3.681 unknown GCCause Allocation Failure
90.2 0.00 71.43 0.00 91.05 99.42 111 3.768 1 0.000 3.768 Allocation Failure unknown GCCause
90.4 0.00 71.43 0.00 91.05 99.42 111 3.768 1 0.000 3.768 Allocation Failure unknown GCCause
91.2 0.00 0.00 100.00 91.23 49.74 111 3.768 2 0.492 4.261 Allocation Failure unknown GCCause
91.9 0.00 0.00 100.00 91.46 44.44 111 3.768 3 0.856 4.624 Allocation Failure unknown GCCause
92.2 0.00 0.00 100.00 91.46 44.44 111 3.768 3 0.856 4.624 Allocation Failure unknown GCCause
93.0 0.00 0.00 100.00 91.60 45.88 111 3.768 4 1.181 4.949 Allocation Failure unknown GCCause
93.2 0.00 0.00 100.00 91.60 45.27 111 3.768 4 1.181 4.949 Allocation Failure unknown GCCause
94.0 50.00 0.00 100.00 87.67 47.83 112 3.768 4 1.571 5.340 unknown GCCause Allocation Failure
95.2 0.00 100.00 87.74 87.73 47.86 114 3.907 4 1.571 5.478 unknown GCCause Allocation Failure
96.0 75.00 0.00 99.23 87.96 47.89 115 3.961 4 1.571 5.533 unknown GCCause Allocation Failure
98.3 97.50 0.00 100.00 88.26 47.93 119 4.191 4 1.571 5.763 unknown GCCause Allocation Failure
98.8 77.27 72.73 100.00 88.52 47.94 120 4.246 4 1.571 5.817 unknown GCCause Allocation Failure
102.2 100.00 0.00 99.96 88.78 47.97 125 4.572 4 1.571 6.144 unknown GCCause Allocation Failure
102.8 0.00 98.53 96.96 88.87 47.98 126 4.631 4 1.571 6.202 unknown GCCause Allocation Failure
103.3 92.65 0.00 96.77 89.11 47.98 127 4.687 4 1.571 6.259 unknown GCCause Allocation Failure
105.0 92.59 98.91 100.00 89.11 48.01 130 4.852 4 1.571 6.424 unknown GCCause Allocation Failure
106.0 92.42 98.40 100.00 89.11 48.02 132 4.968 4 1.571 6.540 unknown GCCause Allocation Failure
107.0 88.42 97.86 100.00 89.11 48.03 134 5.073 4 1.571 6.644 unknown GCCause Allocation Failure
107.7 89.63 64.89 100.00 89.11 48.03 135 5.148 4 1.571 6.719 unknown GCCause Allocation Failure
115.5 0.00 62.10 95.20 90.55 48.09 148 6.034 4 1.571 7.605 unknown GCCause Allocation Failure
116.0 64.41 0.00 94.67 90.68 48.09 149 6.093 4 1.571 7.665 unknown GCCause Allocation Failure
116.8 60.34 62.50 100.00 90.92 48.10 150 6.145 4 1.571 7.717 unknown GCCause Allocation Failure
117.0 66.67 0.00 0.00 90.93 48.10 150 6.268 5 1.571 7.839 Allocation Failure unknown GCCause
117.3 66.67 0.00 0.00 90.93 48.10 150 6.268 5 1.571 7.839 Allocation Failure unknown GCCause
117.5 66.67 0.00 0.00 90.93 48.10 150 6.268 5 1.571 7.839 Allocation Failure unknown GCCause
118.8 0.00 0.00 100.00 92.68 49.94 150 6.268 6 2.293 8.560 Allocation Failure unknown GCCause
119.6 0.00 0.00 100.00 92.76 52.56 150 6.268 7 2.673 8.941 Allocation Failure unknown GCCause
119.8 0.00 0.00 100.00 92.76 52.56 150 6.268 7 2.673 8.941 Allocation Failure unknown GCCause
120.3 0.00 0.00 99.90 92.70 55.38 150 6.268 8 3.015 9.282 Allocation Failure unknown GCCause
120.6 0.00 0.00 100.00 92.70 55.38 150 6.268 8 3.015 9.282 Allocation Failure unknown GCCause
122.3 12.08 14.75 100.00 87.68 59.13 153 6.361 8 3.366 9.727 unknown GCCause Allocation Failure
122.8 17.74 95.00 100.00 87.68 59.13 154 6.422 8 3.366 9.788 unknown GCCause Allocation Failure
123.3 18.55 0.00 91.70 87.68 59.14 155 6.480 8 3.366 9.846 unknown GCCause Allocation Failure
127.2 52.50 0.00 96.64 87.68 59.20 161 6.841 8 3.366 10.208 unknown GCCause Allocation Failure
133.6 72.12 69.81 100.00 87.93 59.27 167 7.231 8 3.366 10.597 unknown GCCause Allocation Failure
136.0 67.86 66.07 100.00 88.42 59.31 171 7.469 8 3.366 10.836 unknown GCCause Allocation Failure
137.7 65.45 69.20 100.00 88.64 59.33 174 7.662 8 3.366 11.028 unknown GCCause Allocation Failure
138.2 70.91 0.00 100.00 88.77 59.33 175 7.731 8 3.366 11.097 unknown GCCause Allocation Failure
138.7 0.00 69.09 100.00 88.89 59.34 176 7.792 8 3.366 11.158 unknown GCCause Allocation Failure
140.5 4.72 71.30 100.00 89.12 59.35 178 7.910 8 3.366 11.276 unknown GCCause Allocation Failure
141.7 62.50 72.64 100.00 89.36 59.35 180 8.070 8 3.366 11.437 unknown GCCause Allocation Failure
144.5 79.59 76.00 100.00 90.09 59.37 185 8.398 8 3.366 11.764 unknown GCCause Allocation Failure
145.0 6.00 80.00 100.00 90.09 59.38 186 8.510 8 3.366 11.876 unknown GCCause Allocation Failure
145.5 83.85 0.00 100.00 90.21 59.39 187 8.571 8 3.366 11.938 unknown GCCause Allocation Failure
146.0 0.00 83.67 96.51 90.33 59.39 188 8.642 8 3.366 12.008 unknown GCCause Allocation Failure
148.5 85.64 0.00 0.00 90.93 59.40 192 8.976 9 3.366 12.342 Allocation Failure unknown GCCause
149.5 0.00 0.00 100.00 92.77 62.11 192 8.976 10 3.729 12.704 Allocation Failure unknown GCCause
150.5 0.00 0.00 100.00 92.90 65.32 192 8.976 11 4.101 13.076 Allocation Failure unknown GCCause
150.7 0.00 0.00 100.00 92.90 65.32 192 8.976 11 4.101 13.076 Allocation Failure unknown GCCause
151.2 0.00 0.00 100.00 92.89 68.47 192 8.976 12 4.438 13.414 Allocation Failure unknown GCCause
151.5 0.00 0.00 100.00 92.89 68.47 192 8.976 12 4.438 13.414 Allocation Failure unknown GCCause
152.0 0.00 5.21 100.00 87.95 71.52 193 8.976 12 4.756 13.732 unknown GCCause Allocation Failure
154.2 21.55 0.00 93.09 87.95 71.66 197 9.204 12 4.756 13.960 unknown GCCause Allocation Failure
...
165.8 0.00 22.03 23.28 87.95 72.33 197 9.265 12 4.756 14.021 unknown GCCause No GC
166.0 0.00 22.03 35.03 87.95 72.54 197 9.265 12 4.756 14.021 unknown GCCause No GC
{noformat}
When the same set of records given to the patched ParseSegment
{noformat}
13/09/18 21:16:27 INFO mapred.JobClient: Job complete: job_201308301130_0519
13/09/18 21:16:27 INFO mapred.JobClient: Counters: 32
13/09/18 21:16:27 INFO mapred.JobClient: ParserStatus
13/09/18 21:16:27 INFO mapred.JobClient: failed=96
13/09/18 21:16:27 INFO mapred.JobClient: success=52393
13/09/18 21:16:27 INFO mapred.JobClient: Job Counters
13/09/18 21:16:27 INFO mapred.JobClient: Launched reduce tasks=1
13/09/18 21:16:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=228650
13/09/18 21:16:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/18 21:16:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/18 21:16:27 INFO mapred.JobClient: Launched map tasks=3
13/09/18 21:16:27 INFO mapred.JobClient: Data-local map tasks=3
13/09/18 21:16:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=94147
13/09/18 21:16:27 INFO mapred.JobClient: File Input Format Counters
13/09/18 21:16:27 INFO mapred.JobClient: Bytes Read=169842629
13/09/18 21:16:27 INFO mapred.JobClient: File Output Format Counters
13/09/18 21:16:27 INFO mapred.JobClient: Bytes Written=0
13/09/18 21:16:27 INFO mapred.JobClient: FileSystemCounters
13/09/18 21:16:27 INFO mapred.JobClient: FILE_BYTES_READ=79839439
13/09/18 21:16:27 INFO mapred.JobClient: HDFS_BYTES_READ=169847735
13/09/18 21:16:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=137342636
13/09/18 21:16:27 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=67039060
13/09/18 21:16:27 INFO mapred.JobClient: Map-Reduce Framework
13/09/18 21:16:27 INFO mapred.JobClient: Map output materialized bytes=57153030
13/09/18 21:16:27 INFO mapred.JobClient: Map input records=65312
13/09/18 21:16:27 INFO mapred.JobClient: Reduce shuffle bytes=57153030
13/09/18 21:16:27 INFO mapred.JobClient: Spilled Records=125780
13/09/18 21:16:27 INFO mapred.JobClient: Map output bytes=202984117
13/09/18 21:16:27 INFO mapred.JobClient: Total committed heap usage (bytes)=679673856
13/09/18 21:16:27 INFO mapred.JobClient: CPU time spent (ms)=271210
13/09/18 21:16:27 INFO mapred.JobClient: Map input bytes=169837445
13/09/18 21:16:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=435
13/09/18 21:16:27 INFO mapred.JobClient: Combine input records=0
13/09/18 21:16:27 INFO mapred.JobClient: Reduce input records=52489
13/09/18 21:16:27 INFO mapred.JobClient: Reduce input groups=52489
13/09/18 21:16:27 INFO mapred.JobClient: Combine output records=0
13/09/18 21:16:27 INFO mapred.JobClient: Physical memory (bytes) snapshot=1009508352
13/09/18 21:16:27 INFO mapred.JobClient: Reduce output records=52489
13/09/18 21:16:27 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4003405824
13/09/18 21:16:27 INFO mapred.JobClient: Map output records=52489
13/09/18 21:16:27 INFO parse.ParseSegment: ParseSegment: finished at 2013-09-18 21:16:27, elapsed: 00:03:20
{noformat}
jstat gccause (only one line without "No GC")
{noformat}
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
...
29.1 0.00 75.00 99.16 77.81 99.76 20 0.197 0 0.000 0.197 unknown GCCause Allocation Failure
...
94.4 0.00 50.00 94.38 79.28 99.63 199 1.065 0 0.000 1.065 unknown GCCause No GC
94.4 0.00 50.00 94.38 79.28 99.68 199 1.065 0 0.000 1.065 unknown GCCause No GC
{noformat}
was (Author: miteshsjat):
When a test run done with 65312 records, ParseSegment MR Job status
{noformat}
13/09/18 21:02:13 INFO mapred.JobClient: Job complete: job_201308301130_0517
13/09/18 21:02:13 INFO mapred.JobClient: Counters: 32
13/09/18 21:02:13 INFO mapred.JobClient: ParserStatus
13/09/18 21:02:13 INFO mapred.JobClient: failed=96
13/09/18 21:02:13 INFO mapred.JobClient: success=52393
13/09/18 21:02:13 INFO mapred.JobClient: Job Counters
13/09/18 21:02:13 INFO mapred.JobClient: Launched reduce tasks=1
13/09/18 21:02:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=402103
13/09/18 21:02:13 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/18 21:02:13 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/18 21:02:13 INFO mapred.JobClient: Launched map tasks=3
13/09/18 21:02:13 INFO mapred.JobClient: Data-local map tasks=3
13/09/18 21:02:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=125910
13/09/18 21:02:13 INFO mapred.JobClient: File Input Format Counters
13/09/18 21:02:13 INFO mapred.JobClient: Bytes Read=169842629
13/09/18 21:02:13 INFO mapred.JobClient: File Output Format Counters
13/09/18 21:02:13 INFO mapred.JobClient: Bytes Written=0
13/09/18 21:02:13 INFO mapred.JobClient: FileSystemCounters
13/09/18 21:02:13 INFO mapred.JobClient: FILE_BYTES_READ=79839439
13/09/18 21:02:13 INFO mapred.JobClient: HDFS_BYTES_READ=169847735
13/09/18 21:02:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=137342604
13/09/18 21:02:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=67039214
13/09/18 21:02:13 INFO mapred.JobClient: Map-Reduce Framework
13/09/18 21:02:13 INFO mapred.JobClient: Map output materialized bytes=57153030
13/09/18 21:02:13 INFO mapred.JobClient: Map input records=65312
13/09/18 21:02:13 INFO mapred.JobClient: Reduce shuffle bytes=57153030
13/09/18 21:02:13 INFO mapred.JobClient: Spilled Records=125780
13/09/18 21:02:13 INFO mapred.JobClient: Map output bytes=202984117
13/09/18 21:02:13 INFO mapred.JobClient: Total committed heap usage (bytes)=682557440
13/09/18 21:02:13 INFO mapred.JobClient: CPU time spent (ms)=398190
13/09/18 21:02:13 INFO mapred.JobClient: Map input bytes=169837445
13/09/18 21:02:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=435
13/09/18 21:02:13 INFO mapred.JobClient: Combine input records=0
13/09/18 21:02:13 INFO mapred.JobClient: Reduce input records=52489
13/09/18 21:02:13 INFO mapred.JobClient: Reduce input groups=52489
13/09/18 21:02:13 INFO mapred.JobClient: Combine output records=0
13/09/18 21:02:13 INFO mapred.JobClient: Physical memory (bytes) snapshot=1939349504
13/09/18 21:02:13 INFO mapred.JobClient: Reduce output records=52489
13/09/18 21:02:13 INFO mapred.JobClient: Virtual memory (bytes) snapshot=32182784000
13/09/18 21:02:13 INFO mapred.JobClient: Map output records=52489
13/09/18 21:02:13 INFO parse.ParseSegment: ParseSegment: finished at 2013-09-18 21:02:13, elapsed: 00:05:04
{noformat}
jstat gccause results
{noformat}
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
...
10.1 50.30 2.15 100.00 75.54 96.81 7 0.079 0 0.000 0.079 unknown GCCause Allocation Failure
32.3 0.00 12.16 93.16 80.23 99.30 32 0.468 0 0.000 0.468 unknown GCCause Allocation Failure
38.1 34.72 0.00 98.54 81.65 99.99 43 0.709 0 0.000 0.709 unknown GCCause Allocation Failure
40.7 33.33 0.00 100.00 82.19 99.35 46 0.813 0 0.000 0.813 unknown GCCause Allocation Failure
43.9 0.00 37.50 98.02 82.88 99.48 52 0.956 0 0.000 0.956 unknown GCCause Allocation Failure
45.7 35.00 45.45 100.00 83.28 99.57 54 1.014 0 0.000 1.014 unknown GCCause Allocation Failure
49.2 0.00 58.34 97.81 83.67 99.64 58 1.167 0 0.000 1.167 unknown GCCause Allocation Failure
53.2 75.00 0.00 99.88 84.31 99.75 62 1.349 0 0.000 1.349 unknown GCCause Allocation Failure
55.2 0.00 96.49 97.67 84.71 99.98 66 1.460 0 0.000 1.460 unknown GCCause Allocation Failure
57.7 69.45 38.89 100.00 85.26 99.35 69 1.569 0 0.000 1.569 unknown GCCause Allocation Failure
59.7 44.44 69.45 100.00 85.68 99.40 72 1.685 0 0.000 1.685 unknown GCCause Allocation Failure
61.4 0.00 91.67 96.20 85.81 99.44 74 1.773 0 0.000 1.773 unknown GCCause Allocation Failure
63.7 0.00 68.75 98.69 86.08 99.47 76 1.861 0 0.000 1.861 unknown GCCause Allocation Failure
66.5 0.00 56.25 100.00 86.65 99.53 80 2.028 0 0.000 2.028 unknown GCCause Allocation Failure
69.0 43.75 62.50 100.00 87.04 99.78 82 2.125 0 0.000 2.125 unknown GCCause Allocation Failure
71.0 78.13 0.00 99.98 87.34 99.81 85 2.271 0 0.000 2.271 unknown GCCause Allocation Failure
74.7 0.00 64.29 100.00 88.05 99.87 90 2.548 0 0.000 2.548 unknown GCCause Allocation Failure
76.2 0.00 57.14 99.99 88.33 99.90 92 2.652 0 0.000 2.652 unknown GCCause Allocation Failure
77.5 57.14 75.00 100.00 88.73 99.93 94 2.744 0 0.000 2.744 unknown GCCause Allocation Failure
89.4 42.86 71.43 100.00 90.75 99.41 110 3.681 0 0.000 3.681 unknown GCCause Allocation Failure
90.2 0.00 71.43 0.00 91.05 99.42 111 3.768 1 0.000 3.768 Allocation Failure unknown GCCause
90.4 0.00 71.43 0.00 91.05 99.42 111 3.768 1 0.000 3.768 Allocation Failure unknown GCCause
91.2 0.00 0.00 100.00 91.23 49.74 111 3.768 2 0.492 4.261 Allocation Failure unknown GCCause
91.9 0.00 0.00 100.00 91.46 44.44 111 3.768 3 0.856 4.624 Allocation Failure unknown GCCause
92.2 0.00 0.00 100.00 91.46 44.44 111 3.768 3 0.856 4.624 Allocation Failure unknown GCCause
93.0 0.00 0.00 100.00 91.60 45.88 111 3.768 4 1.181 4.949 Allocation Failure unknown GCCause
93.2 0.00 0.00 100.00 91.60 45.27 111 3.768 4 1.181 4.949 Allocation Failure unknown GCCause
94.0 50.00 0.00 100.00 87.67 47.83 112 3.768 4 1.571 5.340 unknown GCCause Allocation Failure
95.2 0.00 100.00 87.74 87.73 47.86 114 3.907 4 1.571 5.478 unknown GCCause Allocation Failure
96.0 75.00 0.00 99.23 87.96 47.89 115 3.961 4 1.571 5.533 unknown GCCause Allocation Failure
98.3 97.50 0.00 100.00 88.26 47.93 119 4.191 4 1.571 5.763 unknown GCCause Allocation Failure
98.8 77.27 72.73 100.00 88.52 47.94 120 4.246 4 1.571 5.817 unknown GCCause Allocation Failure
102.2 100.00 0.00 99.96 88.78 47.97 125 4.572 4 1.571 6.144 unknown GCCause Allocation Failure
102.8 0.00 98.53 96.96 88.87 47.98 126 4.631 4 1.571 6.202 unknown GCCause Allocation Failure
103.3 92.65 0.00 96.77 89.11 47.98 127 4.687 4 1.571 6.259 unknown GCCause Allocation Failure
105.0 92.59 98.91 100.00 89.11 48.01 130 4.852 4 1.571 6.424 unknown GCCause Allocation Failure
106.0 92.42 98.40 100.00 89.11 48.02 132 4.968 4 1.571 6.540 unknown GCCause Allocation Failure
107.0 88.42 97.86 100.00 89.11 48.03 134 5.073 4 1.571 6.644 unknown GCCause Allocation Failure
107.7 89.63 64.89 100.00 89.11 48.03 135 5.148 4 1.571 6.719 unknown GCCause Allocation Failure
115.5 0.00 62.10 95.20 90.55 48.09 148 6.034 4 1.571 7.605 unknown GCCause Allocation Failure
116.0 64.41 0.00 94.67 90.68 48.09 149 6.093 4 1.571 7.665 unknown GCCause Allocation Failure
116.8 60.34 62.50 100.00 90.92 48.10 150 6.145 4 1.571 7.717 unknown GCCause Allocation Failure
117.0 66.67 0.00 0.00 90.93 48.10 150 6.268 5 1.571 7.839 Allocation Failure unknown GCCause
117.3 66.67 0.00 0.00 90.93 48.10 150 6.268 5 1.571 7.839 Allocation Failure unknown GCCause
117.5 66.67 0.00 0.00 90.93 48.10 150 6.268 5 1.571 7.839 Allocation Failure unknown GCCause
118.8 0.00 0.00 100.00 92.68 49.94 150 6.268 6 2.293 8.560 Allocation Failure unknown GCCause
119.6 0.00 0.00 100.00 92.76 52.56 150 6.268 7 2.673 8.941 Allocation Failure unknown GCCause
119.8 0.00 0.00 100.00 92.76 52.56 150 6.268 7 2.673 8.941 Allocation Failure unknown GCCause
120.3 0.00 0.00 99.90 92.70 55.38 150 6.268 8 3.015 9.282 Allocation Failure unknown GCCause
120.6 0.00 0.00 100.00 92.70 55.38 150 6.268 8 3.015 9.282 Allocation Failure unknown GCCause
122.3 12.08 14.75 100.00 87.68 59.13 153 6.361 8 3.366 9.727 unknown GCCause Allocation Failure
122.8 17.74 95.00 100.00 87.68 59.13 154 6.422 8 3.366 9.788 unknown GCCause Allocation Failure
123.3 18.55 0.00 91.70 87.68 59.14 155 6.480 8 3.366 9.846 unknown GCCause Allocation Failure
127.2 52.50 0.00 96.64 87.68 59.20 161 6.841 8 3.366 10.208 unknown GCCause Allocation Failure
133.6 72.12 69.81 100.00 87.93 59.27 167 7.231 8 3.366 10.597 unknown GCCause Allocation Failure
136.0 67.86 66.07 100.00 88.42 59.31 171 7.469 8 3.366 10.836 unknown GCCause Allocation Failure
137.7 65.45 69.20 100.00 88.64 59.33 174 7.662 8 3.366 11.028 unknown GCCause Allocation Failure
138.2 70.91 0.00 100.00 88.77 59.33 175 7.731 8 3.366 11.097 unknown GCCause Allocation Failure
138.7 0.00 69.09 100.00 88.89 59.34 176 7.792 8 3.366 11.158 unknown GCCause Allocation Failure
140.5 4.72 71.30 100.00 89.12 59.35 178 7.910 8 3.366 11.276 unknown GCCause Allocation Failure
141.7 62.50 72.64 100.00 89.36 59.35 180 8.070 8 3.366 11.437 unknown GCCause Allocation Failure
144.5 79.59 76.00 100.00 90.09 59.37 185 8.398 8 3.366 11.764 unknown GCCause Allocation Failure
145.0 6.00 80.00 100.00 90.09 59.38 186 8.510 8 3.366 11.876 unknown GCCause Allocation Failure
145.5 83.85 0.00 100.00 90.21 59.39 187 8.571 8 3.366 11.938 unknown GCCause Allocation Failure
146.0 0.00 83.67 96.51 90.33 59.39 188 8.642 8 3.366 12.008 unknown GCCause Allocation Failure
148.5 85.64 0.00 0.00 90.93 59.40 192 8.976 9 3.366 12.342 Allocation Failure unknown GCCause
149.5 0.00 0.00 100.00 92.77 62.11 192 8.976 10 3.729 12.704 Allocation Failure unknown GCCause
150.5 0.00 0.00 100.00 92.90 65.32 192 8.976 11 4.101 13.076 Allocation Failure unknown GCCause
150.7 0.00 0.00 100.00 92.90 65.32 192 8.976 11 4.101 13.076 Allocation Failure unknown GCCause
151.2 0.00 0.00 100.00 92.89 68.47 192 8.976 12 4.438 13.414 Allocation Failure unknown GCCause
151.5 0.00 0.00 100.00 92.89 68.47 192 8.976 12 4.438 13.414 Allocation Failure unknown GCCause
152.0 0.00 5.21 100.00 87.95 71.52 193 8.976 12 4.756 13.732 unknown GCCause Allocation Failure
154.2 21.55 0.00 93.09 87.95 71.66 197 9.204 12 4.756 13.960 unknown GCCause Allocation Failure
...
165.8 0.00 22.03 23.28 87.95 72.33 197 9.265 12 4.756 14.021 unknown GCCause No GC
166.0 0.00 22.03 35.03 87.95 72.54 197 9.265 12 4.756 14.021 unknown GCCause No GC
{noformat}
When the same set of records given to the patched ParseSegment
{noformat}
13/09/18 21:16:27 INFO mapred.JobClient: Job complete: job_201308301130_0519
13/09/18 21:16:27 INFO mapred.JobClient: Counters: 32
13/09/18 21:16:27 INFO mapred.JobClient: ParserStatus
13/09/18 21:16:27 INFO mapred.JobClient: failed=96
13/09/18 21:16:27 INFO mapred.JobClient: success=52393
13/09/18 21:16:27 INFO mapred.JobClient: Job Counters
13/09/18 21:16:27 INFO mapred.JobClient: Launched reduce tasks=1
13/09/18 21:16:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=228650
13/09/18 21:16:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/18 21:16:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/18 21:16:27 INFO mapred.JobClient: Launched map tasks=3
13/09/18 21:16:27 INFO mapred.JobClient: Data-local map tasks=3
13/09/18 21:16:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=94147
13/09/18 21:16:27 INFO mapred.JobClient: File Input Format Counters
13/09/18 21:16:27 INFO mapred.JobClient: Bytes Read=169842629
13/09/18 21:16:27 INFO mapred.JobClient: File Output Format Counters
13/09/18 21:16:27 INFO mapred.JobClient: Bytes Written=0
13/09/18 21:16:27 INFO mapred.JobClient: FileSystemCounters
13/09/18 21:16:27 INFO mapred.JobClient: FILE_BYTES_READ=79839439
13/09/18 21:16:27 INFO mapred.JobClient: HDFS_BYTES_READ=169847735
13/09/18 21:16:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=137342636
13/09/18 21:16:27 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=67039060
13/09/18 21:16:27 INFO mapred.JobClient: Map-Reduce Framework
13/09/18 21:16:27 INFO mapred.JobClient: Map output materialized bytes=57153030
13/09/18 21:16:27 INFO mapred.JobClient: Map input records=65312
13/09/18 21:16:27 INFO mapred.JobClient: Reduce shuffle bytes=57153030
13/09/18 21:16:27 INFO mapred.JobClient: Spilled Records=125780
13/09/18 21:16:27 INFO mapred.JobClient: Map output bytes=202984117
13/09/18 21:16:27 INFO mapred.JobClient: Total committed heap usage (bytes)=679673856
13/09/18 21:16:27 INFO mapred.JobClient: CPU time spent (ms)=271210
13/09/18 21:16:27 INFO mapred.JobClient: Map input bytes=169837445
13/09/18 21:16:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=435
13/09/18 21:16:27 INFO mapred.JobClient: Combine input records=0
13/09/18 21:16:27 INFO mapred.JobClient: Reduce input records=52489
13/09/18 21:16:27 INFO mapred.JobClient: Reduce input groups=52489
13/09/18 21:16:27 INFO mapred.JobClient: Combine output records=0
13/09/18 21:16:27 INFO mapred.JobClient: Physical memory (bytes) snapshot=1009508352
13/09/18 21:16:27 INFO mapred.JobClient: Reduce output records=52489
13/09/18 21:16:27 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4003405824
13/09/18 21:16:27 INFO mapred.JobClient: Map output records=52489
13/09/18 21:16:27 INFO crawler.NISParseSegment: ParseSegment: finished at 2013-09-18 21:16:27, elapsed: 00:03:20
{noformat}
jstat gccause (only one line without "No GC")
{noformat}
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
...
29.1 0.00 75.00 99.16 77.81 99.76 20 0.197 0 0.000 0.197 unknown GCCause Allocation Failure
...
94.4 0.00 50.00 94.38 79.28 99.63 199 1.065 0 0.000 1.065 unknown GCCause No GC
94.4 0.00 50.00 94.38 79.28 99.68 199 1.065 0 0.000 1.065 unknown GCCause No GC
{noformat}
> OOM in ParseSegment Phase
> -------------------------
>
> Key: NUTCH-1640
> URL: https://issues.apache.org/jira/browse/NUTCH-1640
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.7
> Environment: RHEL 6.2 x86_64
> Reporter: Mitesh Singh Jat
> Attachments: NUTCH-1640.patch
>
>
> The nutch ParseSegment phase fails after 2 runs on same TaskTracker, with the following Exception:
> {noformat}
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:640)
> at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:553)
> at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317)
> at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297)
> at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289)
> at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158)
> at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:802)
> at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3315)
> at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:3287)
> at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:2316)
> at org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:3710)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)
> at org.apache.hadoop.ipc.Client.call(Client.java:1118)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> at $Proxy1.fatalError(Unknown Source)
> at org.apache.hadoop.mapred.Child.main(Child.java:310)
> {noformat}
> Whereas similar parsing when done in Nutch Fetcher Phase (fetcher.parse=true, fetcher.store.content=false) does not give such issue.
> Hence, on analysing the code of Fetcher and ParseSegment, it seems the issue
> should be related to creation parseResult foreach url in ParseSegment.java.
> {code}
> 95 ParseResult parseResult = null;
> 96 try {
> 97 parseResult = new ParseUtil(getConf()).parse(content); // <*****
> 98 } catch (Exception e) {
> 99 LOG.warn("Error parsing: " + key + ": " + StringUtils.stringifyException(e));
> 100 return;
> 101 }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira