You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by MJJoyce <gi...@git.apache.org> on 2015/07/21 18:37:19 UTC

[GitHub] nutch pull request: NUTCH-2062 - Interactive Selenium Plugin

GitHub user MJJoyce opened a pull request:

    https://github.com/apache/nutch/pull/46

    NUTCH-2062 - Interactive Selenium Plugin

    - Extend lib-selenium to allow for external interaction with the WebDriver.
    - Add Interactive Selenium plugin so users can create a Selenium Handler that does custom interaction with the page being fetched. Handlers are required to implement a simple interface and then can be included in crawls by adjusting the configuration.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MJJoyce/nutch NUTCH-2062

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/46.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #46
    
----
commit e1be2cf55b06d7e17e83ef74a53587807024adf4
Author: Michael Joyce <ml...@gmail.com>
Date:   2015-07-20T16:00:44Z

    NUTCH-2062 - lib-selenium interaction extension
    
    - Add ability for lib-selenium to pass off driver handling to caller.
      getDriverForPage loads a WebDriver for a given page and returns it to
      the caller. getHTMLContent takes a WebDriver and returns the body
      content to the caller. These changes will allow a plugin to control
      the interaction with the WebDriver to get at the data required for a
      particular page.

commit c12eb9ae88d91fd6f9e6dcebd6dc0dd04d12a9ae
Author: Michael Joyce <ml...@gmail.com>
Date:   2015-07-20T17:17:49Z

    NUTCH-2062 - Add default lib-selenium timeout to config

commit 2df485b1c1a6c5b4df22882f709de4f4c1b6732a
Author: Michael Joyce <ml...@gmail.com>
Date:   2015-07-20T17:18:46Z

    NUTCH-2062 - Add configurable wait to lib-selenium
    
    - You can now configure the delay that Selenium waits for a page to load
      by configuring the libselenium.page.load.delay parameter in
      nutch-default. The setting defaults to 3 seconds in lib-selenium if
      the parameter isn't available.

commit 8737084752ff8e92c4c4eef668e6ce0ca612f7fb
Author: Michael Joyce <ml...@gmail.com>
Date:   2015-07-21T16:16:42Z

    Add interactive Selenium plugin

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: NUTCH-2062 - Interactive Selenium Plugin

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nutch/pull/46


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---