You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/10/15 05:13:05 UTC

[jira] [Commented] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content

    [ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958233#comment-14958233 ] 

ASF GitHub Bot commented on NUTCH-2141:
---------------------------------------

GitHub user balajig17 opened a pull request:

    https://github.com/apache/nutch/pull/77

    fix for NUTCH-2141 contributed by Balaji Gurumurthy

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/balajig17/nutch NUTCH-2141

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/77.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #77
    
----
commit d9486a5567ceb9a6c77e6fe3994350f37a433510
Author: Balaji <ba...@gmail.com>
Date:   2015-10-15T03:10:16Z

    fix for NUTCH-2141 contributed by Balaji Gurumurthy

----


> Change the InteractiveSelenium plugin handler Interface to return page content
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-2141
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2141
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>            Reporter: Balaji Gurumurthy
>              Labels: selenium
>
> The handler interface in the protocol-interactiveselenium plugin currently provide methods to manipulate the page content and the HTTPResponse class read's the page content from the driver. This limits the amount of HTML content that could be returned to nutch.
> The processDriver method could return a String object instead. This is particularly helpful  in cases such as handling pagination when multiple pages' content can be appended and returned from the handler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)