You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2007/05/12 21:50:15 UTC

[jira] Created: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Change HtmlParseFilter 's to return ParseResult object instead of Parse object
------------------------------------------------------------------------------

                 Key: NUTCH-485
                 URL: https://issues.apache.org/jira/browse/NUTCH-485
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.0.0
         Environment: All
            Reporter: Gal Nitzan
             Fix For: 1.0.0


The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.

A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gal Nitzan updated NUTCH-485:
-----------------------------

    Attachment: NUTCH-485.200705130928.patch

Following Andrzej advice, a much cleaner code :)

Attached...

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gal Nitzan updated NUTCH-485:
-----------------------------

    Attachment: NUTCH-485.200705130945.patch

Yet another update with a cleaner code.

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Resolved: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by Gal Nitzan <ga...@gmail.com>.
Thanks Do?acan, much obliged.

Gal.

> -----Original Message-----
> From: Do?acan G?ney (JIRA) [mailto:jira@apache.org]
> Sent: Sunday, June 17, 2007 11:29 PM
> To: nutch-dev@lucene.apache.org
> Subject: [jira] Resolved: (NUTCH-485) Change HtmlParseFilter 's to return
> ParseResult object instead of Parse object
>
>
>      [ https://issues.apache.org/jira/browse/NUTCH-
> 485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Do?acan G?ney resolved NUTCH-485.
> ---------------------------------
>
>     Resolution: Fixed
>
> Committed in rev 548103 with two modifications:
>
> 1) Fix whitespace issues.
>
> 2) Original patch changed CCParseFilter to return the original parse
> result if CCParseFilter fails. Now if CCParseFilter fails with an
> exception, it returns an empty parse created from the exception.
>
> > Change HtmlParseFilter 's to return ParseResult object instead of Parse
> object
> > ------------------------------------------------------------------------
> ------
> >
> >                 Key: NUTCH-485
> >                 URL: https://issues.apache.org/jira/browse/NUTCH-485
> >             Project: Nutch
> >          Issue Type: Improvement
> >          Components: fetcher
> >    Affects Versions: 1.0.0
> >         Environment: All
> >            Reporter: Gal Nitzan
> >            Assignee: Do?acan G?ney
> >             Fix For: 1.0.0
> >
> >         Attachments: NUTCH-485.200705122151.patch, NUTCH-
> 485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-
> 485.200705131241.patch, NUTCH-485.200705140001.patch
> >
> >
> > The current implementation of HtmlParseFilters.java doesn't allow a
> filter to add parse objects to the ParseResult object.
> > A change to the HtmlParseFilter is needed which allows the filter to
> return ParseResult . and ofcourse a change to  HtmlParseFilters .
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.



[jira] Resolved: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney resolved NUTCH-485.
---------------------------------

    Resolution: Fixed

Committed in rev 548103 with two modifications:

1) Fix whitespace issues.

2) Original patch changed CCParseFilter to return the original parse result if CCParseFilter fails. Now if CCParseFilter fails with an exception, it returns an empty parse created from the exception.

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch, NUTCH-485.200705140001.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505456 ] 

Doğacan Güney commented on NUTCH-485:
-------------------------------------

If no one has any objections, I want to commit this one. 

However, I have a question about patches. Latest patch has a couple of places where it removes an empty line (without adding anything else), or removes an empty line and adds another empty line (because of indentations). What is the policy on these? Personally, I think these are OK, but I would like to know what others think.

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch, NUTCH-485.200705140001.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501914 ] 

Gal Nitzan commented on NUTCH-485:
----------------------------------

Could one of the commiters, review this patch and maybe submit it please?


The patch "tuches" a few locations and with so many changes occuring right now it might be more complicated to fix it later...

Thanks

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch, NUTCH-485.200705140001.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gal Nitzan updated NUTCH-485:
-----------------------------

    Attachment: NUTCH-485.200705131241.patch

Thanks Doğacan, I missed it :( 

Thanks to all reviewers.
 
Yet another patch...

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495410 ] 

Doğacan Güney commented on NUTCH-485:
-------------------------------------

I have two more minor nits:

1) ParseResult.isSuccess returns true only if all parses are successful. This makes sense, but I think you should make it more obvious by mentioning it in method's javadoc. 

2) There seems to be some whitespace issues. For  example, some indents are 4 spaces. All indents should be 2 space-indents.

Anyway, I don't know if my vote counts, but, besides these two issues, I am +1 on this patch.

I think this may be very useful for image search. After parsing a page, one can traverse DOM, add image src's as urls and the immediate text around images as parse text (+ whatever data you can gather as parse data). Of course, this doesn't automatically make Nutch an image search engine, but is a good first step.

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505598 ] 

Andrzej Bialecki  commented on NUTCH-485:
-----------------------------------------

Whitespace changes should be committed as a separate patch, if really needed - otherwise the patch should not introduce purely whitespace changes. This is not a dogma, but keeping this rule makes it easier later on to see what is the meaning of the patch.

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch, NUTCH-485.200705140001.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gal Nitzan updated NUTCH-485:
-----------------------------

    Attachment: NUTCH-485.200705122151.patch

Attached patch for this issue.

Comments are welcome.

This patch tuches a few plugins, please review ....

Thanks,

Gal

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495319 ] 

Andrzej Bialecki  commented on NUTCH-485:
-----------------------------------------

I think a more natural change would be this:

ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc);

That is, HtmlParseFilter would get and return an instance of ParseResult (possible the same instance), adding/removing stuff from it as needed. Existing plugins could function as before - they would just need to work on the Parse instance that corresponds to Content.getUrl().

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney reassigned NUTCH-485:
-----------------------------------

    Assignee: Doğacan Güney

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch, NUTCH-485.200705140001.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505502 ] 

Chris A. Mattmann commented on NUTCH-485:
-----------------------------------------

Doğacan, +1.

As for your question, IMO, these type of minor changes for formatting purposes are somewhat unnecessary. However, if they are sparse, and really unnoticeable, I can say that I personally wouldn't have an issue with them. I know some of the other folks who've been around longer could probably provide some more guidance, however, again, my +1 for committing this.

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch, NUTCH-485.200705140001.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495350 ] 

Doğacan Güney commented on NUTCH-485:
-------------------------------------

You probably should not add "put(String/Text key, Parse parse)" methods to ParseResult. ParseResult doesn't have a direct method of adding a Parse object, so that it can check whether the parse object comes from a real url or a sub-url. 

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gal Nitzan updated NUTCH-485:
-----------------------------

    Attachment: NUTCH-485.200705140001.patch

Thanks Doğacan for taking the time to review the code.

I agree with your comments on the usage. I run a video search and it sure going to help. The ability to "discover" and add content "on the fly" to the segment while parsing is a functionality long awaited and it all made possible after NUTCH-443... :)


And yet one more update with a better description in javadoc and some fixes to indentation.

> Change HtmlParseFilter 's to return ParseResult object instead of Parse object
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-485
>                 URL: https://issues.apache.org/jira/browse/NUTCH-485
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Gal Nitzan
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-485.200705122151.patch, NUTCH-485.200705130928.patch, NUTCH-485.200705130945.patch, NUTCH-485.200705131241.patch, NUTCH-485.200705140001.patch
>
>
> The current implementation of HtmlParseFilters.java doesn't allow a filter to add parse objects to the ParseResult object.
> A change to the HtmlParseFilter is needed which allows the filter to return ParseResult . and ofcourse a change to  HtmlParseFilters .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.