You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Bayu Widyasanyata <bw...@gmail.com> on 2013/11/01 00:54:05 UTC

Re: How to set JVM heap size on crawl script?

Hi,

One more question for NUTCH_OPTS.
Is it only for Java additional options or we could pass any nutch options
e.g. topN, depth, etc.?

Since I couldn't find more tutorial on crawl script instead of mentioned
here [1].

[1] http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script


On Thu, Oct 31, 2013 at 8:43 PM, Bayu Widyasanyata
<bw...@gmail.com>wrote:

> Hi Sebastian,
>
> Thanks for the hint.
>
> ---
> wassalam,
> [bayu]
>
> /sent from Android phone/
> On Oct 30, 2013 7:54 PM, "Sebastian Nagel" <wa...@googlemail.com>
> wrote:
>
>> Hi,
>>
>> the script bin/crawl executes bin/nutch for every step (inject, fetch,
>> etc.).
>>
>> bin/nutch makes use of two environment variables (see comments in
>> bin/nutch
>> ):
>>  NUTCH_HEAPSIZE  (in MB)
>>  NUTCH_OPTS         Extra Java runtime options
>>
>>  export NUTCH_HEAPSIZE=2048
>> should work but also
>>  export NUTCH_OPTS="-Xmx2048m"
>>
>> The latter one would allow to add more Java options separated by space.
>>
>> Sebastian
>>
>>
>> 2013/10/30 Bayu Widyasanyata <bw...@gmail.com>
>>
>> > Hi All,
>> >
>> > When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap
>> space:
>> >
>> > 2013-10-29 12:56:25,407 WARN  mapred.LocalJobRunner -
>> > job_local1484958909_0001
>> >
>> > java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
>> >
>> >        at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>> >
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> >        at
>> > org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
>> >
>> >        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
>> >
>> >        at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
>> >
>> >        at
>> >
>> >
>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348)
>> >
>> >        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368)
>> >
>> >        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>> >
>> >        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517)
>> >
>> >        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399)
>> >
>> >        at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
>> >
>> >        at
>> >
>> >
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698)
>> >
>> >        at
>> >
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
>> >
>> >        at
>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
>> >
>> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>> >
>> >        at
>> >
>> >
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>> >
>> >        at
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >
>> >        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >
>> >        at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >
>> >        at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >
>> >        at java.lang.Thread.run(Thread.java:744)
>> >
>> > 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher:
>> > java.io.IOException: Job failed!
>> >
>> >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>> >
>> >        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
>> >
>> >        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
>> >
>> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >
>> >        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
>> >
>> > I use nutch 1.7 and JDK 1.7.0.45.
>> >
>> > How to put java max heap size on crawl script? (-Xmx option)?
>> >
>> > Thanks in advance.-
>> >
>> > [1]
>> > http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
>> >
>> > --
>> > wassalam,
>> > [bayu]
>> >
>>
>


-- 
wassalam,
[bayu]

Re: How to set JVM heap size on crawl script?

Posted by Sebastian Nagel <wa...@googlemail.com>.

Hi,

arguments are:
 bin/crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds>
- depth is numberOfRounds
- topN is fix (50000, see var sizeFetchlist),
  you have to modify bin/crawl to your needs.
  That's explicitly intended, bin/crawl is more an
  example and by far not an all-purpose tool.
- etc.: it depends: if set by command-line
  arguments to bin/nutch commands, modify bin/crawl.
  Otherwise you have to   set the corresponding properties
  in nutch-site.xml

Sebastian

On 11/03/2013 02:44 PM, Bayu Widyasanyata wrote:
> Then, how we set nutch options topN, depth, etc in crawl script?
> 
> Thanks.-
> 
> 
> On Fri, Nov 1, 2013 at 9:05 PM, Sebastian Nagel
> <wa...@googlemail.com>wrote:
> 
>> Hi Bayu,
>>
>> the short answer is: no.
>>
>> The detailed answer:
>>
>> NUTCH_OPTS is used to pass arguments to the Java VM.
>> This also included Java system properties.
>> You could define Nutch/Hadoop configuration properties using variables
>> which
>> then will be substituted by Java system properties, e.g.
>>   <property>
>>    <name>my.prop</name>
>>    <value>${my.prop}</name>
>>   </property>
>> Setting
>>  NUTCH_OPTS="-Xmx2048m -Dmy.prop=XYZ"
>> would allow to pass the desired value XYZ via command-line or NUTCH_OPTS.
>> See
>> https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/conf/Configuration.html
>> for details about variable substitution.
>>
>> One remark: You may have seen commands like
>>  bin/nutch org.apache.nutch.indexer.IndexingJob -Dsolr.server.url="..." ...
>> That's no a system property because the argument "-D..." comes after
>> the class to be run. Most (if not all) Nutch tools/commands use
>> ToolRunner.run()
>> which supports generic options (among them -Dproperty=value).
>>
>> Sebastian
>>
>> On 11/01/2013 12:54 AM, Bayu Widyasanyata wrote:
>>> Hi,
>>>
>>> One more question for NUTCH_OPTS.
>>> Is it only for Java additional options or we could pass any nutch options
>>> e.g. topN, depth, etc.?
>>>
>>> Since I couldn't find more tutorial on crawl script instead of mentioned
>>> here [1].
>>>
>>> [1]
>> http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
>>>
>>>
>>> On Thu, Oct 31, 2013 at 8:43 PM, Bayu Widyasanyata
>>> <bw...@gmail.com>wrote:
>>>
>>>> Hi Sebastian,
>>>>
>>>> Thanks for the hint.
>>>>
>>>> ---
>>>> wassalam,
>>>> [bayu]
>>>>
>>>> /sent from Android phone/
>>>> On Oct 30, 2013 7:54 PM, "Sebastian Nagel" <wa...@googlemail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> the script bin/crawl executes bin/nutch for every step (inject, fetch,
>>>>> etc.).
>>>>>
>>>>> bin/nutch makes use of two environment variables (see comments in
>>>>> bin/nutch
>>>>> ):
>>>>>  NUTCH_HEAPSIZE  (in MB)
>>>>>  NUTCH_OPTS         Extra Java runtime options
>>>>>
>>>>>  export NUTCH_HEAPSIZE=2048
>>>>> should work but also
>>>>>  export NUTCH_OPTS="-Xmx2048m"
>>>>>
>>>>> The latter one would allow to add more Java options separated by space.
>>>>>
>>>>> Sebastian
>>>>>
>>>>>
>>>>> 2013/10/30 Bayu Widyasanyata <bw...@gmail.com>
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap
>>>>> space:
>>>>>>
>>>>>> 2013-10-29 12:56:25,407 WARN  mapred.LocalJobRunner -
>>>>>> job_local1484958909_0001
>>>>>>
>>>>>> java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>>        at
>>>>>>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>>>>>
>>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
>>>>>>
>>>>>>        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
>>>>>>
>>>>>>        at
>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
>>>>>>
>>>>>>        at
>>>>>>
>>>>>>
>>>>>
>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348)
>>>>>>
>>>>>>        at
>>>>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368)
>>>>>>
>>>>>>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>>>>>>
>>>>>>        at
>>>>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517)
>>>>>>
>>>>>>        at
>>>>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399)
>>>>>>
>>>>>>        at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
>>>>>>
>>>>>>        at
>>>>>>
>>>>>>
>>>>>
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698)
>>>>>>
>>>>>>        at
>>>>>>
>>>>>
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
>>>>>>
>>>>>>        at
>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
>>>>>>
>>>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>>>>>
>>>>>>        at
>>>>>>
>>>>>>
>>>>>
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>>>>>
>>>>>>        at
>>>>>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>>
>>>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>
>>>>>>        at
>>>>>>
>>>>>>
>>>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>
>>>>>>        at
>>>>>>
>>>>>>
>>>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>
>>>>>>        at java.lang.Thread.run(Thread.java:744)
>>>>>>
>>>>>> 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher:
>>>>>> java.io.IOException: Job failed!
>>>>>>
>>>>>>        at
>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>>>>
>>>>>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
>>>>>>
>>>>>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
>>>>>>
>>>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>
>>>>>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
>>>>>>
>>>>>> I use nutch 1.7 and JDK 1.7.0.45.
>>>>>>
>>>>>> How to put java max heap size on crawl script? (-Xmx option)?
>>>>>>
>>>>>> Thanks in advance.-
>>>>>>
>>>>>> [1]
>>>>>>
>> http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
>>>>>>
>>>>>> --
>>>>>> wassalam,
>>>>>> [bayu]
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
> 
>

Re: How to set JVM heap size on crawl script?

Posted by Bayu Widyasanyata <bw...@gmail.com>.

Then, how we set nutch options topN, depth, etc in crawl script?

Thanks.-


On Fri, Nov 1, 2013 at 9:05 PM, Sebastian Nagel
<wa...@googlemail.com>wrote:

> Hi Bayu,
>
> the short answer is: no.
>
> The detailed answer:
>
> NUTCH_OPTS is used to pass arguments to the Java VM.
> This also included Java system properties.
> You could define Nutch/Hadoop configuration properties using variables
> which
> then will be substituted by Java system properties, e.g.
>   <property>
>    <name>my.prop</name>
>    <value>${my.prop}</name>
>   </property>
> Setting
>  NUTCH_OPTS="-Xmx2048m -Dmy.prop=XYZ"
> would allow to pass the desired value XYZ via command-line or NUTCH_OPTS.
> See
> https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/conf/Configuration.html
> for details about variable substitution.
>
> One remark: You may have seen commands like
>  bin/nutch org.apache.nutch.indexer.IndexingJob -Dsolr.server.url="..." ...
> That's no a system property because the argument "-D..." comes after
> the class to be run. Most (if not all) Nutch tools/commands use
> ToolRunner.run()
> which supports generic options (among them -Dproperty=value).
>
> Sebastian
>
> On 11/01/2013 12:54 AM, Bayu Widyasanyata wrote:
> > Hi,
> >
> > One more question for NUTCH_OPTS.
> > Is it only for Java additional options or we could pass any nutch options
> > e.g. topN, depth, etc.?
> >
> > Since I couldn't find more tutorial on crawl script instead of mentioned
> > here [1].
> >
> > [1]
> http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
> >
> >
> > On Thu, Oct 31, 2013 at 8:43 PM, Bayu Widyasanyata
> > <bw...@gmail.com>wrote:
> >
> >> Hi Sebastian,
> >>
> >> Thanks for the hint.
> >>
> >> ---
> >> wassalam,
> >> [bayu]
> >>
> >> /sent from Android phone/
> >> On Oct 30, 2013 7:54 PM, "Sebastian Nagel" <wa...@googlemail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> the script bin/crawl executes bin/nutch for every step (inject, fetch,
> >>> etc.).
> >>>
> >>> bin/nutch makes use of two environment variables (see comments in
> >>> bin/nutch
> >>> ):
> >>>  NUTCH_HEAPSIZE  (in MB)
> >>>  NUTCH_OPTS         Extra Java runtime options
> >>>
> >>>  export NUTCH_HEAPSIZE=2048
> >>> should work but also
> >>>  export NUTCH_OPTS="-Xmx2048m"
> >>>
> >>> The latter one would allow to add more Java options separated by space.
> >>>
> >>> Sebastian
> >>>
> >>>
> >>> 2013/10/30 Bayu Widyasanyata <bw...@gmail.com>
> >>>
> >>>> Hi All,
> >>>>
> >>>> When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap
> >>> space:
> >>>>
> >>>> 2013-10-29 12:56:25,407 WARN  mapred.LocalJobRunner -
> >>>> job_local1484958909_0001
> >>>>
> >>>> java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
> >>>>
> >>>>        at
> >>>>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> >>>>
> >>>> Caused by: java.lang.OutOfMemoryError: Java heap space
> >>>>
> >>>>        at
> >>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
> >>>>
> >>>>        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
> >>>>
> >>>>        at
> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
> >>>>
> >>>>        at
> >>>>
> >>>>
> >>>
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348)
> >>>>
> >>>>        at
> >>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368)
> >>>>
> >>>>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
> >>>>
> >>>>        at
> >>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517)
> >>>>
> >>>>        at
> >>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399)
> >>>>
> >>>>        at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
> >>>>
> >>>>        at
> >>>>
> >>>>
> >>>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698)
> >>>>
> >>>>        at
> >>>>
> >>>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
> >>>>
> >>>>        at
> >>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
> >>>>
> >>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> >>>>
> >>>>        at
> >>>>
> >>>>
> >>>
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
> >>>>
> >>>>        at
> >>>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>>
> >>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >>>>
> >>>>        at
> >>>>
> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>>>
> >>>>        at
> >>>>
> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>>
> >>>>        at java.lang.Thread.run(Thread.java:744)
> >>>>
> >>>> 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher:
> >>>> java.io.IOException: Job failed!
> >>>>
> >>>>        at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> >>>>
> >>>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
> >>>>
> >>>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
> >>>>
> >>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>>>
> >>>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
> >>>>
> >>>> I use nutch 1.7 and JDK 1.7.0.45.
> >>>>
> >>>> How to put java max heap size on crawl script? (-Xmx option)?
> >>>>
> >>>> Thanks in advance.-
> >>>>
> >>>> [1]
> >>>>
> http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
> >>>>
> >>>> --
> >>>> wassalam,
> >>>> [bayu]
> >>>>
> >>>
> >>
> >
> >
>
>


-- 
wassalam,
[bayu]

Re: How to set JVM heap size on crawl script?

Posted by Sebastian Nagel <wa...@googlemail.com>.

Hi Bayu,

the short answer is: no.

The detailed answer:

NUTCH_OPTS is used to pass arguments to the Java VM.
This also included Java system properties.
You could define Nutch/Hadoop configuration properties using variables which
then will be substituted by Java system properties, e.g.
  <property>
   <name>my.prop</name>
   <value>${my.prop}</name>
  </property>
Setting
 NUTCH_OPTS="-Xmx2048m -Dmy.prop=XYZ"
would allow to pass the desired value XYZ via command-line or NUTCH_OPTS.
See https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/conf/Configuration.html
for details about variable substitution.

One remark: You may have seen commands like
 bin/nutch org.apache.nutch.indexer.IndexingJob -Dsolr.server.url="..." ...
That's no a system property because the argument "-D..." comes after
the class to be run. Most (if not all) Nutch tools/commands use ToolRunner.run()
which supports generic options (among them -Dproperty=value).

Sebastian

On 11/01/2013 12:54 AM, Bayu Widyasanyata wrote:
> Hi,
> 
> One more question for NUTCH_OPTS.
> Is it only for Java additional options or we could pass any nutch options
> e.g. topN, depth, etc.?
> 
> Since I couldn't find more tutorial on crawl script instead of mentioned
> here [1].
> 
> [1] http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
> 
> 
> On Thu, Oct 31, 2013 at 8:43 PM, Bayu Widyasanyata
> <bw...@gmail.com>wrote:
> 
>> Hi Sebastian,
>>
>> Thanks for the hint.
>>
>> ---
>> wassalam,
>> [bayu]
>>
>> /sent from Android phone/
>> On Oct 30, 2013 7:54 PM, "Sebastian Nagel" <wa...@googlemail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> the script bin/crawl executes bin/nutch for every step (inject, fetch,
>>> etc.).
>>>
>>> bin/nutch makes use of two environment variables (see comments in
>>> bin/nutch
>>> ):
>>>  NUTCH_HEAPSIZE  (in MB)
>>>  NUTCH_OPTS         Extra Java runtime options
>>>
>>>  export NUTCH_HEAPSIZE=2048
>>> should work but also
>>>  export NUTCH_OPTS="-Xmx2048m"
>>>
>>> The latter one would allow to add more Java options separated by space.
>>>
>>> Sebastian
>>>
>>>
>>> 2013/10/30 Bayu Widyasanyata <bw...@gmail.com>
>>>
>>>> Hi All,
>>>>
>>>> When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap
>>> space:
>>>>
>>>> 2013-10-29 12:56:25,407 WARN  mapred.LocalJobRunner -
>>>> job_local1484958909_0001
>>>>
>>>> java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
>>>>
>>>>        at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>>>
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>
>>>>        at
>>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
>>>>
>>>>        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
>>>>
>>>>        at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
>>>>
>>>>        at
>>>>
>>>>
>>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348)
>>>>
>>>>        at
>>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368)
>>>>
>>>>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>>>>
>>>>        at
>>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517)
>>>>
>>>>        at
>>> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399)
>>>>
>>>>        at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
>>>>
>>>>        at
>>>>
>>>>
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698)
>>>>
>>>>        at
>>>>
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
>>>>
>>>>        at
>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
>>>>
>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>>>
>>>>        at
>>>>
>>>>
>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>>>
>>>>        at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>
>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>
>>>>        at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>
>>>>        at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>
>>>>        at java.lang.Thread.run(Thread.java:744)
>>>>
>>>> 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher:
>>>> java.io.IOException: Job failed!
>>>>
>>>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>>
>>>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
>>>>
>>>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
>>>>
>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>
>>>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
>>>>
>>>> I use nutch 1.7 and JDK 1.7.0.45.
>>>>
>>>> How to put java max heap size on crawl script? (-Xmx option)?
>>>>
>>>> Thanks in advance.-
>>>>
>>>> [1]
>>>> http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
>>>>
>>>> --
>>>> wassalam,
>>>> [bayu]
>>>>
>>>
>>
> 
>