You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Jeremy Huylebroeck (JIRA)" <ji...@apache.org> on 2006/08/02 02:06:13 UTC

[jira] Created: (NUTCH-337) Fetcher ignores the fetcher.parse value configured in config file

Fetcher ignores the fetcher.parse value configured in config file
-----------------------------------------------------------------

                 Key: NUTCH-337
                 URL: http://issues.apache.org/jira/browse/NUTCH-337
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.8, 0.9
            Reporter: Jeremy Huylebroeck
            Priority: Trivial


using the command line call to Fetcher, if the noParsing parameter is given, everything is fine.
if the noParsing is not given, the value in the nutch-site.xml (or nutch-default.xml) should be taken but it is "true" that is always given to the call to fetch.
it should be the value from the conf.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (NUTCH-337) Fetcher ignores the fetcher.parse value configured in config file

Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-337?page=all ]

Stefan Groschupf updated NUTCH-337:
-----------------------------------

    Attachment: respectFetcherParsePropertyV1.patch

Hi Jeremy, thanks for catching this. Attached a fix. Should be easy for a contributor to commit this to trunk....

> Fetcher ignores the fetcher.parse value configured in config file
> -----------------------------------------------------------------
>
>                 Key: NUTCH-337
>                 URL: http://issues.apache.org/jira/browse/NUTCH-337
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8, 0.9.0
>            Reporter: Jeremy Huylebroeck
>            Priority: Trivial
>         Attachments: respectFetcherParsePropertyV1.patch
>
>
> using the command line call to Fetcher, if the noParsing parameter is given, everything is fine.
> if the noParsing is not given, the value in the nutch-site.xml (or nutch-default.xml) should be taken but it is "true" that is always given to the call to fetch.
> it should be the value from the conf.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (NUTCH-337) Fetcher ignores the fetcher.parse value configured in config file

Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-337?page=all ]

Stefan Groschupf updated NUTCH-337:
-----------------------------------

    Priority: Major  (was: Trivial)

> Fetcher ignores the fetcher.parse value configured in config file
> -----------------------------------------------------------------
>
>                 Key: NUTCH-337
>                 URL: http://issues.apache.org/jira/browse/NUTCH-337
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8, 0.9.0
>            Reporter: Jeremy Huylebroeck
>         Attachments: respectFetcherParsePropertyV1.patch
>
>
> using the command line call to Fetcher, if the noParsing parameter is given, everything is fine.
> if the noParsing is not given, the value in the nutch-site.xml (or nutch-default.xml) should be taken but it is "true" that is always given to the call to fetch.
> it should be the value from the conf.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Closed: (NUTCH-337) Fetcher ignores the fetcher.parse value configured in config file

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-337?page=all ]

Andrzej Bialecki  closed NUTCH-337.
-----------------------------------

    Fix Version/s: 0.8.1
                   0.9.0
       Resolution: Fixed

Patch applied to branch-0.8 and trunk. Thanks!

> Fetcher ignores the fetcher.parse value configured in config file
> -----------------------------------------------------------------
>
>                 Key: NUTCH-337
>                 URL: http://issues.apache.org/jira/browse/NUTCH-337
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8, 0.9.0
>            Reporter: Jeremy Huylebroeck
>             Fix For: 0.8.1, 0.9.0
>
>         Attachments: respectFetcherParsePropertyV1.patch
>
>
> using the command line call to Fetcher, if the noParsing parameter is given, everything is fine.
> if the noParsing is not given, the value in the nutch-site.xml (or nutch-default.xml) should be taken but it is "true" that is always given to the call to fetch.
> it should be the value from the conf.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

RE: nutch

Posted by an...@orbita1.ru.
My settings:
....
<property>
  <name>mapred.local.dir</name>
  <value>/hadoop/mapred/local</value>
  <description>The local directory where MapReduce stores intermediate
  data files.  May be a comma-separated list of
  directories on different devices in order to spread disk i/o.
  </description>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/hadoop/mapred/system</value>
  <description>The shared directory where MapReduce stores control files.
  </description>
</property>
....

My device which mounted onto "/" have free space is 115G.

[root@xxxxx /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             133G   13G  113G  11% /

Anybody have other ideas?








-----Original Message-----
From: Sami Siren [mailto:ssiren@gmail.com] 
Sent: Wednesday, August 02, 2006 6:01 PM
To: nutch-dev@lucene.apache.org
Subject: Re: nutch
Importance: High

most propably you have run out of space in tmp (local) filesystem

use properties like

<property>
  <name>mapred.system.dir</name>
  <value><!-- path to fs that contains a lots of space --></value>
</property>
<property>
  <name>mapred.local.dir</name>
  <value><!-- path to fs that contains a lots of space --></value>
</property>

in hadoop-site.xml to get over this problem.


anton@orbita1.ru wrote:

>I forget.... ;-) One more question:
>This problem with nutch or hadoop?
>
>-----Original Message-----
>From: anton@orbita1.ru [mailto:anton@orbita1.ru] 
>Sent: Wednesday, August 02, 2006 11:38 AM
>To: nutch-dev@lucene.apache.org
>Subject: nutch
>Importance: High
>
>I use nutch 0.8(mapred). Nutch started on 3 servers.
>When my nutch try index segment I get error on tasktracker:
><skiped>
>
>
>
>
>
>  
>




Re: nutch

Posted by Sami Siren <ss...@gmail.com>.
most propably you have run out of space in tmp (local) filesystem

use properties like

<property>
  <name>mapred.system.dir</name>
  <value><!-- path to fs that contains a lots of space --></value>
</property>
<property>
  <name>mapred.local.dir</name>
  <value><!-- path to fs that contains a lots of space --></value>
</property>

in hadoop-site.xml to get over this problem.


anton@orbita1.ru wrote:

>I forget.... ;-) One more question:
>This problem with nutch or hadoop?
>
>-----Original Message-----
>From: anton@orbita1.ru [mailto:anton@orbita1.ru] 
>Sent: Wednesday, August 02, 2006 11:38 AM
>To: nutch-dev@lucene.apache.org
>Subject: nutch
>Importance: High
>
>I use nutch 0.8(mapred). Nutch started on 3 servers.
>When my nutch try index segment I get error on tasktracker:
><skiped>
>
>
>
>
>
>  
>


RE: nutch

Posted by an...@orbita1.ru.
I forget.... ;-) One more question:
This problem with nutch or hadoop?

-----Original Message-----
From: anton@orbita1.ru [mailto:anton@orbita1.ru] 
Sent: Wednesday, August 02, 2006 11:38 AM
To: nutch-dev@lucene.apache.org
Subject: nutch
Importance: High

I use nutch 0.8(mapred). Nutch started on 3 servers.
When my nutch try index segment I get error on tasktracker:
<skiped>





nutch

Posted by an...@orbita1.ru.
I use nutch 0.8(mapred). Nutch started on 3 servers.
When my nutch try index segment I get error on tasktracker:
060727 215111 task_0025_r_000000_1  SEVERE FSError from child
060727 215111 task_0025_r_000000_1 org.apache.hadoop.fs.FSError:
java.io.IOException: No space left on device
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
Sys
tem.java:152)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.fs.FSDataOutputStream$Summer.write(FSDataOutputStream.java
:69
)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
am.
java:98)
060727 215111 task_0025_r_000000_1      at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
060727 215111 task_0025_r_000000_1      at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
060727 215111 task_0025_r_000000_1      at
java.io.DataOutputStream.write(DataOutputStream.java:90)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:192)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:
873
)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.io.SequenceFile$Sorter$MergePass.run(SequenceFile.java:760
)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.io.SequenceFile$Sorter.mergePass(SequenceFile.java:696)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:522)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:755)
060727 215111 task_0025_r_000000_1 Caused by: java.io.IOException: No space
left on device
060727 215111 task_0025_r_000000_1      at
java.io.FileOutputStream.writeBytes(Native Method)
060727 215111 task_0025_r_000000_1      at
java.io.FileOutputStream.write(FileOutputStream.java:260)
060727 215111 task_0025_r_000000_1      at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
Sys
tem.java:150)
060727 215111 task_0025_r_000000_1      ... 12 more


But on server with tasktracker free space on the HDD is 115G. I try get
segment from dfs. Segment occupies 2,4G on HDD. Why I get this errors?
Anybody can help me decide this problem?