You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Jason Calabrese (JIRA)" <ji...@apache.org> on 2006/03/19 03:45:58 UTC

[jira] Created: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

PdfParser and RSSParser Log4j appender redirection
--------------------------------------------------

         Key: NUTCH-236
         URL: http://issues.apache.org/jira/browse/NUTCH-236
     Project: Nutch
        Type: Bug
    Versions: 0.8-dev    
 Environment: Linux, Nutch embedded in an other application
    Reporter: Jason Calabrese
    Priority: Minor


I just found a bug in the way the log messages from Hadoop LogFormatter are 
added as a new appender to the Log4j rootLogger in the PdfParser and RSSParser.

Since a new Log4j appender is created and added to the root logger each time 
these classes are loaded log messages start getting repeated.

I'm using Nutch/Hadoop inside an other application so other may not be seeing 
this problem.

I think the simple fix is as easy as setting a name for the new appender 
before adding it and then at the begining of the constructor checking to see 
if it's already been added.

Also as the comment says in both the PdfParser and RSSParser this code should 
be moved to a common place.

I'd be happy to make these changes and submit a patch, but I wanted to know it 
the change would be welcome first.  Also does anyone know a good place for 
the new util method?  Maybe a new static method on LogFormatter, but then the 
log4j jar would need to be added to the to the common lib and the classpath.

It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection.

Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-236?page=comments#action_12414599 ] 

Chris A. Mattmann commented on NUTCH-236:
-----------------------------------------

Hi Jason, 

   I'll have a patch prepared for this issue shortly, and I'll attach it to JIRA by this Sunday night.

Thanks,
  Chris


> PdfParser and RSSParser Log4j appender redirection
> --------------------------------------------------
>
>          Key: NUTCH-236
>          URL: http://issues.apache.org/jira/browse/NUTCH-236
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>  Environment: Linux, Nutch embedded in an other application
>     Reporter: Jason Calabrese
>     Assignee: Chris A. Mattmann
>     Priority: Minor

>
> I just found a bug in the way the log messages from Hadoop LogFormatter are 
> added as a new appender to the Log4j rootLogger in the PdfParser and RSSParser.
> Since a new Log4j appender is created and added to the root logger each time 
> these classes are loaded log messages start getting repeated.
> I'm using Nutch/Hadoop inside an other application so other may not be seeing 
> this problem.
> I think the simple fix is as easy as setting a name for the new appender 
> before adding it and then at the begining of the constructor checking to see 
> if it's already been added.
> Also as the comment says in both the PdfParser and RSSParser this code should 
> be moved to a common place.
> I'd be happy to make these changes and submit a patch, but I wanted to know it 
> the change would be welcome first.  Also does anyone know a good place for 
> the new util method?  Maybe a new static method on LogFormatter, but then the 
> log4j jar would need to be added to the to the common lib and the classpath.
> It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection.
> Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-236?page=all ]

Chris A. Mattmann updated NUTCH-236:
------------------------------------

    Attachment: NUTCH-236.Mattmann.060806.patch.txt

Okay a bit late, but as usual with me :-)

This patch implements Jason's suggestion for the following two issues:

1. Move log4j root logger redirection and appender code to common place (moved to utility method in org.apache.nutch.parse.ParseUtil)

2. Rename appender before adding it, and make sure it hasn't been added already before adding it

Jason's original suggestion was to move the common root logger redirection code to LogFormatter in Hadoop, but I neglected to do that in order to keep the code base within Nutch and not make this patch span the 2 projects. If there is a pressing need to have the utility code within Hadoop however, I can probably move the method to LogFormatter in Hadoop. Additionally, I just ran unit-level tests on this, I didn't run a full system test in an arena where the behavior that caused this issue has been seen already. It would be great if someone like Jason could test this in his own environment and see if it fixes the issue.



> PdfParser and RSSParser Log4j appender redirection
> --------------------------------------------------
>
>          Key: NUTCH-236
>          URL: http://issues.apache.org/jira/browse/NUTCH-236
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>  Environment: Linux, Nutch embedded in an other application
>     Reporter: Jason Calabrese
>     Assignee: Chris A. Mattmann
>     Priority: Minor
>  Attachments: NUTCH-236.Mattmann.060806.patch.txt
>
> I just found a bug in the way the log messages from Hadoop LogFormatter are 
> added as a new appender to the Log4j rootLogger in the PdfParser and RSSParser.
> Since a new Log4j appender is created and added to the root logger each time 
> these classes are loaded log messages start getting repeated.
> I'm using Nutch/Hadoop inside an other application so other may not be seeing 
> this problem.
> I think the simple fix is as easy as setting a name for the new appender 
> before adding it and then at the begining of the constructor checking to see 
> if it's already been added.
> Also as the comment says in both the PdfParser and RSSParser this code should 
> be moved to a common place.
> I'd be happy to make these changes and submit a patch, but I wanted to know it 
> the change would be welcome first.  Also does anyone know a good place for 
> the new util method?  Maybe a new static method on LogFormatter, but then the 
> log4j jar would need to be added to the to the common lib and the classpath.
> It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection.
> Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

Posted by "Jerome Charron (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-236?page=all ]
     
Jerome Charron closed NUTCH-236:
--------------------------------

    Fix Version: 0.8-dev
     Resolution: Fixed

As a side effect, this issue is solved by NUTCH-303 since nutch now uses Jakarta Commons Logging with the log4j default implementation.


> PdfParser and RSSParser Log4j appender redirection
> --------------------------------------------------
>
>          Key: NUTCH-236
>          URL: http://issues.apache.org/jira/browse/NUTCH-236
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>  Environment: Linux, Nutch embedded in an other application
>     Reporter: Jason Calabrese
>     Assignee: Chris A. Mattmann
>     Priority: Minor
>      Fix For: 0.8-dev
>  Attachments: NUTCH-236.Mattmann.060806.patch.txt
>
> I just found a bug in the way the log messages from Hadoop LogFormatter are 
> added as a new appender to the Log4j rootLogger in the PdfParser and RSSParser.
> Since a new Log4j appender is created and added to the root logger each time 
> these classes are loaded log messages start getting repeated.
> I'm using Nutch/Hadoop inside an other application so other may not be seeing 
> this problem.
> I think the simple fix is as easy as setting a name for the new appender 
> before adding it and then at the begining of the constructor checking to see 
> if it's already been added.
> Also as the comment says in both the PdfParser and RSSParser this code should 
> be moved to a common place.
> I'd be happy to make these changes and submit a patch, but I wanted to know it 
> the change would be welcome first.  Also does anyone know a good place for 
> the new util method?  Maybe a new static method on LogFormatter, but then the 
> log4j jar would need to be added to the to the common lib and the classpath.
> It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection.
> Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-236?page=all ]

Chris A. Mattmann updated NUTCH-236:
------------------------------------

    Due Date: 05/Jun/06

> PdfParser and RSSParser Log4j appender redirection
> --------------------------------------------------
>
>          Key: NUTCH-236
>          URL: http://issues.apache.org/jira/browse/NUTCH-236
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>  Environment: Linux, Nutch embedded in an other application
>     Reporter: Jason Calabrese
>     Assignee: Chris A. Mattmann
>     Priority: Minor

>
> I just found a bug in the way the log messages from Hadoop LogFormatter are 
> added as a new appender to the Log4j rootLogger in the PdfParser and RSSParser.
> Since a new Log4j appender is created and added to the root logger each time 
> these classes are loaded log messages start getting repeated.
> I'm using Nutch/Hadoop inside an other application so other may not be seeing 
> this problem.
> I think the simple fix is as easy as setting a name for the new appender 
> before adding it and then at the begining of the constructor checking to see 
> if it's already been added.
> Also as the comment says in both the PdfParser and RSSParser this code should 
> be moved to a common place.
> I'd be happy to make these changes and submit a patch, but I wanted to know it 
> the change would be welcome first.  Also does anyone know a good place for 
> the new util method?  Maybe a new static method on LogFormatter, but then the 
> log4j jar would need to be added to the to the common lib and the classpath.
> It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection.
> Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-236?page=comments#action_12371664 ] 

Chris A. Mattmann commented on NUTCH-236:
-----------------------------------------

>I'd be happy to make these changes and submit a patch, but I wanted to know it
>the change would be welcome first.

I think that the change makes sense to me. +1

> Also does anyone know a good place for the new util method?

There is a generic lib-log4j plugin I believe that right now just contains the common log4j jars depended on by other plugins. Maybe that would be a good place to put it. What do others think?  I don't hink that any log4j jar would need to be added to the common lib or the classpath in this case.

> It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection. 

Yup, I think so. 

> Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

I think that the steps to create a patch are something like as follows (paraphrased from Doug a while back):

1. svn checkout [latest nutch revision]
2. make changes in that checked out version
3. if you added any new files, type:

svn add /path/to/new/files

4. type svn status to make sure that your changes are being seen by svn

5. type svn diff > mypatch.txt

As for coding standards, I believe that Nutch uses Sun's coding standards. More info about how to contribute to Nutch is available on the Wiki at this page:

http://wiki.apache.org/nutch/HowToContribute






> PdfParser and RSSParser Log4j appender redirection
> --------------------------------------------------
>
>          Key: NUTCH-236
>          URL: http://issues.apache.org/jira/browse/NUTCH-236
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: Linux, Nutch embedded in an other application
>     Reporter: Jason Calabrese
>     Priority: Minor

>
> I just found a bug in the way the log messages from Hadoop LogFormatter are 
> added as a new appender to the Log4j rootLogger in the PdfParser and RSSParser.
> Since a new Log4j appender is created and added to the root logger each time 
> these classes are loaded log messages start getting repeated.
> I'm using Nutch/Hadoop inside an other application so other may not be seeing 
> this problem.
> I think the simple fix is as easy as setting a name for the new appender 
> before adding it and then at the begining of the constructor checking to see 
> if it's already been added.
> Also as the comment says in both the PdfParser and RSSParser this code should 
> be moved to a common place.
> I'd be happy to make these changes and submit a patch, but I wanted to know it 
> the change would be welcome first.  Also does anyone know a good place for 
> the new util method?  Maybe a new static method on LogFormatter, but then the 
> log4j jar would need to be added to the to the common lib and the classpath.
> It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection.
> Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-236?page=all ]

Chris A. Mattmann reassigned NUTCH-236:
---------------------------------------

    Assign To: Chris A. Mattmann

> PdfParser and RSSParser Log4j appender redirection
> --------------------------------------------------
>
>          Key: NUTCH-236
>          URL: http://issues.apache.org/jira/browse/NUTCH-236
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>  Environment: Linux, Nutch embedded in an other application
>     Reporter: Jason Calabrese
>     Assignee: Chris A. Mattmann
>     Priority: Minor

>
> I just found a bug in the way the log messages from Hadoop LogFormatter are 
> added as a new appender to the Log4j rootLogger in the PdfParser and RSSParser.
> Since a new Log4j appender is created and added to the root logger each time 
> these classes are loaded log messages start getting repeated.
> I'm using Nutch/Hadoop inside an other application so other may not be seeing 
> this problem.
> I think the simple fix is as easy as setting a name for the new appender 
> before adding it and then at the begining of the constructor checking to see 
> if it's already been added.
> Also as the comment says in both the PdfParser and RSSParser this code should 
> be moved to a common place.
> I'd be happy to make these changes and submit a patch, but I wanted to know it 
> the change would be welcome first.  Also does anyone know a good place for 
> the new util method?  Maybe a new static method on LogFormatter, but then the 
> log4j jar would need to be added to the to the common lib and the classpath.
> It would also be good to create a property in nutch-site.xml that could disable this logging appender redirection.
> Like I said above I'd be more than happy to do this work, I'll just need some guidance to follow the project's conventions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira