You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/09/18 12:35:04 UTC

[jira] [Commented] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

    [ https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805366#comment-14805366 ] 

Lewis John McGibbney commented on NUTCH-2106:
---------------------------------------------

[~kwhitehall] lets touch base on this and try to include </excludes> within selenium definition. This is Maven magic so maybe we can print out 

{code}
ant report
{code}
.. that way we can see how many transient dependencies come from selenium.

[~wastl-nagel], tbh this was (and still is) and underlying concern for plugin dependencies... e.g. we recently introduced Apache Mahout. These libraries are non trivial by any means. We have the same issue.

I would encourage all additions to evaluate existing compatibility and where new functionality fits it. We do not want to break new features as old folks. :)


> Runtime to contain Selenium and dependencies only once
> ------------------------------------------------------
>
>                 Key: NUTCH-2106
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2106
>             Project: Nutch
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 1.11
>            Reporter: Sebastian Nagel
>             Fix For: 1.11
>
>         Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> <runtime>
>   ...
>   <library name="selenium-java-2.44.0.jar">
>     <export name="*"/>
>   </library>
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and protocol-selenium but not protocol-interactiveselenium. Is there a reason for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)