You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Chris Schneider (JIRA)" <ji...@apache.org> on 2006/08/05 17:04:13 UTC

[jira] Created: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default

Nutch commands log to nutch/logs/hadoop.logs by default
-------------------------------------------------------

                 Key: NUTCH-342
                 URL: http://issues.apache.org/jira/browse/NUTCH-342
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 0.8
            Reporter: Chris Schneider
            Priority: Minor


If (by default) Nutch commands are going to send their output to a file named "hadoop.log", then it seems like the default location for this file should be the same location where Hadoop is putting its hadoop.log file (i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR to a special location (via hadoop-env.sh), this has no effect on where Nutch commands send their output.

Some would probably suggest that I could just set NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that it should be defaulted this way in the nutch script. However, I'm unaware of an elegant way to modify such Nutch environment variables anyway. The hadoop-env.sh file provides a convenient place to modify Hadoop environment variables, but doing the same for Nutch environment variables presumably requires you to modify .bash_profile or a similar user script file (which is the way I used to accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default

Posted by "Chris Schneider (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/NUTCH-342?page=all ]

Chris Schneider updated NUTCH-342:
----------------------------------

    Attachment: NUTCH-342.patch

Here's a patch that defaults NUTCH_LOG_DIR to $HADOOP_LOG_DIR and NUTCH_LOGFILE to $HADOOP_LOG_FILE.

> Nutch commands log to nutch/logs/hadoop.logs by default
> -------------------------------------------------------
>
>                 Key: NUTCH-342
>                 URL: http://issues.apache.org/jira/browse/NUTCH-342
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Chris Schneider
>            Priority: Minor
>         Attachments: NUTCH-342.patch
>
>
> If (by default) Nutch commands are going to send their output to a file named "hadoop.log", then it seems like the default location for this file should be the same location where Hadoop is putting its hadoop.log file (i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR to a special location (via hadoop-env.sh), this has no effect on where Nutch commands send their output.
> Some would probably suggest that I could just set NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that it should be defaulted this way in the nutch script. However, I'm unaware of an elegant way to modify such Nutch environment variables anyway. The hadoop-env.sh file provides a convenient place to modify Hadoop environment variables, but doing the same for Nutch environment variables presumably requires you to modify .bash_profile or a similar user script file (which is the way I used to accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

some questions

Posted by an...@orbita1.ru.

I suggest to use nutch 0.8 on several computers with DFS. But I'm worried
about nutch's requirements to HDD free space.

For example, suppose I have

1)     server with job tracker and namenode
2)     5 servers with task trackers and 20 Gb HDDs
3)     5 servers with datenode and 20 Gb HDDs also (DFS, the replication
will be equal 1)

There are some questions:

1) Is this HDD space enough to run task trackers?

2) How to calculate the approximate free HDD space needed for servers with
task trackers, servers with with job trackers and name node?

3) Will I be able to increase the data storage space while increasing the
number of servers with date node? Or will it not be enough to increase the
number of date nodes?

[jira] Commented: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default

Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/NUTCH-342?page=comments#action_12428922 ] 
            
Stefan Groschupf commented on NUTCH-342:
----------------------------------------

We should cleanup logging in nutch in general asap! 
The way things are configured by today is everything else than elegant or clean. :-(  

> Nutch commands log to nutch/logs/hadoop.logs by default
> -------------------------------------------------------
>
>                 Key: NUTCH-342
>                 URL: http://issues.apache.org/jira/browse/NUTCH-342
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Chris Schneider
>            Priority: Minor
>         Attachments: NUTCH-342.patch
>
>
> If (by default) Nutch commands are going to send their output to a file named "hadoop.log", then it seems like the default location for this file should be the same location where Hadoop is putting its hadoop.log file (i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR to a special location (via hadoop-env.sh), this has no effect on where Nutch commands send their output.
> Some would probably suggest that I could just set NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that it should be defaulted this way in the nutch script. However, I'm unaware of an elegant way to modify such Nutch environment variables anyway. The hadoop-env.sh file provides a convenient place to modify Hadoop environment variables, but doing the same for Nutch environment variables presumably requires you to modify .bash_profile or a similar user script file (which is the way I used to accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default

Posted by "Chris Schneider (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/NUTCH-342?page=comments#action_12426039 ] 
            
Chris Schneider commented on NUTCH-342:
---------------------------------------

I apologize for my confusion. I had been thinking that hadoop-env.sh was getting sourced when a Nutch command was run; it is not. Thus, $HADOOP_LOG_DIR and $HADOOP_LOG_FILE are not set when executing Nutch commands. For now, I think it makes most sense for me to set NUTCH_LOG_DIR and NUTCH_LOGFILE to the same locations as $HADOOP_LOG_DIR and $HADOOP_LOG_FILE via .bash_profile, etc. I consider this awkward, but am unsure about how best to address this design problem. I'm beginning to think that NUTCH_LOGFILE should default to something like "nutch-$USER-$COMMAND-`hostname`.log", which would seem more appropriate to find within the $NUTCH_HOME/logs directory.

> Nutch commands log to nutch/logs/hadoop.logs by default
> -------------------------------------------------------
>
>                 Key: NUTCH-342
>                 URL: http://issues.apache.org/jira/browse/NUTCH-342
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Chris Schneider
>            Priority: Minor
>         Attachments: NUTCH-342.patch
>
>
> If (by default) Nutch commands are going to send their output to a file named "hadoop.log", then it seems like the default location for this file should be the same location where Hadoop is putting its hadoop.log file (i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR to a special location (via hadoop-env.sh), this has no effect on where Nutch commands send their output.
> Some would probably suggest that I could just set NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that it should be defaulted this way in the nutch script. However, I'm unaware of an elegant way to modify such Nutch environment variables anyway. The hadoop-env.sh file provides a convenient place to modify Hadoop environment variables, but doing the same for Nutch environment variables presumably requires you to modify .bash_profile or a similar user script file (which is the way I used to accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira