You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by tballison <gi...@git.apache.org> on 2016/06/16 16:57:20 UTC

[GitHub] lucene-solr pull request #44: SOLR-8981

GitHub user tballison opened a pull request:

    https://github.com/apache/lucene-solr/pull/44

    SOLR-8981

    SOLR-8981 upgrade to Tika 1.13

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tballison/lucene-solr SOLR-8981

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/44.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #44
    
----
commit ba0e71703464849198b384aa6e92962db8a04b51
Author: tballison <ta...@mitre.org>
Date:   2016-06-16T16:56:45Z

    SOLR-8981 upgrade to Tika 1.13

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Hallo,
    please also update all SHA1 hashes of files. Plesae run "ant precommit" from root folder of Lu/Solr. This will report all missing things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    I merged everything successfully, but I get one test failure in solr/contrib/extraction:
    
    [junit4] FAILURE 0.05s J0 | ExtractingRequestHandlerTest.testXPath <<<
    [junit4]    > Throwable #1: org.junit.ComparisonFailure: expected:<[News]> but was:<[]>
    [junit4]    >        at __randomizedtesting.SeedInfo.seed([404BA07016F1FB57:3E1A6EE30E469911]:0)
    
    I have the feeling I have seen this before. Weren't you running the extraction tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Hi I have applied some other fixes and will push soon. Currently ASF have some problems with pushing:
    
    git.exe push --progress "origin" master:master
    
    Counting objects: 121, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (66/66), done.
    Writing objects: 100% (121/121), 8.90 KiB | 0 bytes/s, done.
    Total 121 (delta 55), reused 17 (delta 2)
    remote: You are not authorized to edit this repository.
    remote:
    To https://git-wip-us.apache.org/repos/asf/lucene-solr.git
    ! [remote rejected] master -> master (pre-receive hook declined)
    error: failed to push some refs to 'https://git-wip-us.apache.org/repos/asf/lucene-solr.git'



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Our bug introduced in TIKA-995.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    @uschindler yep we've seen this before. I have no idea what is going on here. I'll look in to it again today. Can someone point out the exact code which does the XPath magic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    OK, I will merge again later. So I will revert my checkout once you have fixed that. Otherwise all looks fine.
    
    BTW: Can you remove the assumeFalse on Java 9, because PDFBox is fixed? This was because on Java 9 PDFBOX failed in clinit (version number parsing failure).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Were you able to fix the test or should I look into it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Y, I did run the extraction tests.  That was the error we were getting initially, but which (without explanation) disappeared on my most recent integration attempt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Error parsing javascript with selenium (solr 6.0.0 & nutch 1.11 & firefox 47.0)

Posted by Erick Erickson <er...@gmail.com>.
This isn't much help, but I'd advise asking on the Nutch user's list as
this appears to be a Nutch issue, not a Solr one.

Best,
Erick

On Mon, Jun 20, 2016 at 1:41 AM, <li...@yahoo.com.invalid> wrote:

>
> ------------------------------
> * From: * liviuchristian@yahoo.com.INVALID
> <li...@yahoo.com.INVALID>;
> * To: * dev@lucene.apache.org <de...@lucene.apache.org>;
> dev@lucene.apache.org <de...@lucene.apache.org>; git@git.apache.org <
> git@git.apache.org>;
> * Subject: * Error parsing javascript with selenium (solr 6.0.0 & nutch
> 1.11 & firefox 47.0)
> * Sent: * Fri, Jun 17, 2016 9:38:53 PM
>
> Hi,
> I'm trying to use selenium (*solr 6.0.0 &* *nutch 1.11 & firefox 47.0*)
> to parse javascript pages
> *I'm using this configuration for nutch-site:*
> <property>
>   <name>plugin.includes</name>
>
> <value>protocol-(httpclient|interactiveselenium|selenium)|urlfilter-(automaton|regex)|parse-(metatags|ext|html|js|swf|tika|zip)|index-(metadata|basic|anchor|geoip|dummy|links|more|replace|static)|scoring-opic|indexer-solr|urlnormalizer-(pass|regex|basic|ajax)|creativecommons|feed|headings|language-identifier|lib-nekohtml|lib-xml|microformats-reltag|mimetype-filter|nutch-extensionpoints|lib-selenium|subcollection|tld|parserfilter-naivebayes</value>
>   <description>...</description>
> </property>
> *and this configuration for parse-plugins.xml*
> <parse-plugins>
>
>   <!--  by default if the mimeType is set to *, or
>         if it can't be determined, use parse-tika -->
>     <mimeType name="*">
>         <plugin id="parse-metatags"/>
>         <plugin id="protocol-interactiveselenium"/>
>         <plugin id="protocol-selenium"/>
>         <plugin id="lib-selenium"/>
>         <plugin id="nutch-extensionpoints"/>
>         <plugin id="parse-js"/>
>         <plugin id="parse-tika" />
>               <plugin id="feed"/>
>         <plugin id="parse-html"/>
>         <plugin id="parse-js"/>
>         <plugin id="parse-html" />
>     </mimeType>
>
>     <mimeType name="application/rss+xml">
>         <plugin id="parse-tika" />
>         <plugin id="feed" />
>     </mimeType>
>
>     <mimeType name="application/x-bzip2">
>         <!--  try and parse it with the zip parser -->
>         <plugin id="parse-zip" />
>     </mimeType>
>
>     <mimeType name="application/x-gzip">
>         <!--  try and parse it with the zip parser -->
>         <plugin id="parse-zip" />
>     </mimeType>
>
>     <mimeType name="application/x-javascript">
>         <plugin id="parse-js" />
>         <plugin id="protocol-interactiveselenium"/>
>         <plugin id="protocol-selenium"/>
>         <plugin id="lib-selenium"/>
>         <plugin id="nutch-extensionpoints"/>
>         <plugin id="parse-metatags"/>
>         <!--<plugin id="parse-ext"/>-->
>         <plugin id="parse-tika" />
>     </mimeType>
>
>     <mimeType name="application/x-shockwave-flash">
>         <plugin id="parse-swf" />
>     </mimeType>
>
>     <mimeType name="application/zip">
>         <plugin id="parse-zip" />
>     </mimeType>
>
>     <!--<mimeType name="text/html">
>         <plugin id="parse-html" />
>     </mimeType>-->
>
>     <mimeType name="text/html">
>         <plugin id="parse-metatags"/>
>         <plugin id="protocol-interactiveselenium"/>
>         <plugin id="protocol-selenium"/>
>         <plugin id="lib-selenium"/>
>         <plugin id="nutch-extensionpoints"/>
>         <!--<plugin id="parse-ext"/>-->
>         <!--<plugin id="parse-js"/>-->
>         <plugin id="parse-html" />
>         <plugin id="parse-tika" />
>     </mimeType>
>
>         <mimeType name="application/xhtml+xml">
>         <plugin id="parse-metatags"/>
>         <plugin id="protocol-interactiveselenium"/>
>         <plugin id="protocol-selenium"/>
>         <plugin id="lib-selenium"/>
>         <plugin id="nutch-extensionpoints"/>
>         <plugin id="parse-tika" />
>               <plugin id="feed" />
>         <plugin id="parse-html" />
>     </mimeType>
>
>     <mimeType name="text/xml">
>         <plugin id="parse-metatags"/>
>         <plugin id="protocol-interactiveselenium"/>
>         <plugin id="protocol-selenium"/>
>         <plugin id="lib-selenium"/>
>         <plugin id="parse-tika" />
>         <plugin id="feed" />
>     </mimeType>
>
>
>
> *The firefox window popup with a message about private browsing on it. *
> *However, I get the error below and the job crushes into flames:*
>
> 17 18:44:13,029 INFO  api.HttpRobotRulesParser - Couldn't get robots.txt
> for http://findjobs.mashable.com/: java.lang.RuntimeException:
> org.openqa.selenium.WebDriverException: Unable to bind to locking port 7054
> within 45000 ms
> Build info: version: '2.48.2', revision:
> '41bccdd10cf2c0560f637404c2d96164b67d9d67', time: '2015-10-09 13:08:06'
> System info: host: 'solr', ip: '127.0.1.1', os.name: 'Linux', os.arch:
> 'amd64', os.version: '3.19.0-39-generic', java.version: '1.8.0_91'
> Driver info: driver.version: FirefoxDriver
> 2016-06-17 18:44:13,129 ERROR selenium.Http - Failed to get protocol output
> *java.lang.RuntimeException: org.openqa.selenium.WebDriverException:
> Failed to connect to binary FirefoxBinary(/usr/bin/firefox) on port 7055;
> process output follows: *
> ения Firefox для Ubuntu","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["sl"],"name":"Ubuntu
> Modifications","description":"Ubuntu razširitve za
> Firefox.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["sv-SE"],"name":"Ubuntu
> Modifications","description":"Ubuntu-paket för
> Firefox.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["uk"],"name":"Ubuntu
> Modifications","description":"Убунтівські доповнення до
> Firefox.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["zh-CN"],"name":"Ubuntu
> Modifications","description":"Ubuntu 火狐扩展包.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["zh-TW"],"name":"Ubuntu
> Modifications","description":"Ubuntu Firefox 擴充包。","creator":"Canonical
> Ltd.","homepageURL":null}],"targetApplications":[{"id":"{ec8030f7-c20a-464f-9b0e-13a3a9e97384}","minVersion":"9.0","maxVersion":"37.0a1"}],"targetPlatforms":[],"multiprocessCompatible":false,"signedState":2,"seen":true}
> 1466178208570    DeferredSave.extensions.json    DEBUG    Save changes
> 1466178208570    addons.xpi    DEBUG    Updating database with changes to
> installed add-ons
> 1466178208570    addons.xpi-utils    DEBUG    Updating add-on states
> 1466178208571    addons.xpi-utils    DEBUG    Writing add-ons list
> 1466178208575    addons.xpi    DEBUG    Registering manifest for
> /usr/lib/firefox/browser/features/firefox@getpocket.com.xpi
> 1466178208576    addons.xpi    DEBUG    Calling bootstrap method startup
> on firefox@getpocket.com version 1.0.2
> 1466178208578    addons.xpi    DEBUG    Registering manifest for
> /usr/lib/firefox/browser/features/e10srollout@mozilla.org.xpi
> 1466178208578    addons.xpi    DEBUG    Calling bootstrap method startup
> on e10srollout@mozilla.org version 1.0
> 1466178208578    addons.xpi    DEBUG    Registering manifest for
> /usr/lib/firefox/browser/features/loop@mozilla.org.xpi
> 1466178208579    addons.xpi    DEBUG    Calling bootstrap method startup
> on loop@mozilla.org version 1.3.2
> 1466178208610    addons.manager    DEBUG    Registering shutdown blocker
> for XPIProvider
> 1466178208610    addons.manager    DEBUG    Provider finished startup:
> XPIProvider
> 1466178208610    addons.manager    DEBUG    Starting provider:
> LightweightThemeManager
> 1466178208611    addons.manager    DEBUG    Registering shutdown blocker
> for LightweightThemeManager
> 1466178208612    addons.manager    DEBUG    Provider finished startup:
> LightweightThemeManager
> 1466178208613    addons.manager    DEBUG    Starting provider: GMPProvider
> 1466178208621    addons.manager    DEBUG    Registering shutdown blocker
> for GMPProvider
> 1466178208622    addons.manager    DEBUG    Provider finished startup:
> GMPProvider
> 1466178208622    addons.manager    DEBUG    Starting provider:
> PluginProvider
> 1466178208622    addons.manager    DEBUG    Registering shutdown blocker
> for PluginProvider
> 1466178208622    addons.manager    DEBUG    Provider finished startup:
> PluginProvider
> 1466178208623    addons.manager    DEBUG    Completed startup sequence
> 1466178209011    addons.manager    DEBUG    Starting provider:
> <unnamed-provider>
> 1466178209011    addons.manager    DEBUG    Registering shutdown blocker
> for <unnamed-provider>
> 1466178209012    addons.manager    DEBUG    Provider finished startup:
> <unnamed-provider>
> 1466178209202    DeferredSave.extensions.json    DEBUG    Write succeeded
> 1466178209202    addons.xpi-utils    DEBUG    XPI Database saved, setting
> schema version preference to 17
> 1466178209202    DeferredSave.extensions.json    DEBUG    Starting timer
> 1466178209229    DeferredSave.extensions.json    DEBUG    Starting write
> 1466178209237    addons.repository    DEBUG    No addons.json found.
> 1466178209238    DeferredSave.addons.json    DEBUG    Save changes
> 1466178209242    DeferredSave.addons.json    DEBUG    Starting timer
> 1466178209309    addons.manager    DEBUG    Starting provider:
> PreviousExperimentProvider
> 1466178209310    addons.manager    DEBUG    Registering shutdown blocker
> for PreviousExperimentProvider
> 1466178209310    addons.manager    DEBUG    Provider finished startup:
> PreviousExperimentProvider
> 1466178209317    DeferredSave.addons.json    DEBUG    Starting write
> 1466178209329    DeferredSave.extensions.json    DEBUG    Write succeeded
> 1466178209357    DeferredSave.addons.json    DEBUG    Write succeeded
>
> (firefox:3352): Gtk-CRITICAL **: gtk_clipboard_set_with_data: assertion
> 'targets != NULL' failed
>
> Build info: version: '2.48.2', revision:
> '41bccdd10cf2c0560f637404c2d96164b67d9d67', time: '2015-10-09 13:08:06'
> System info: host: 'solr', ip: '127.0.1.1', os.name: 'Linux', os.arch:
> 'amd64', os.version: '3.19.0-39-generic', java.version: '1.8.0_91'
> Driver info: driver.version: FirefoxDriver
>     at
> org.apache.nutch.protocol.selenium.HttpWebClient.getDriverForPage(HttpWebClient.java:118)
>     at
> org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:155)
>     at
> org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:244)
>     at
> org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:168)
>     at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
>     at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:261)
>     at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:290)
> *Caused by: org.openqa.selenium.WebDriverException: Failed to connect to
> binary FirefoxBinary(/usr/bin/firefox) on port 7055; process output
> follows: *
> ения Firefox для Ubuntu","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["sl"],"name":"Ubuntu
> Modifications","description":"Ubuntu razširitve za
> Firefox.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["sv-SE"],"name":"Ubuntu
> Modifications","description":"Ubuntu-paket för
> Firefox.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["uk"],"name":"Ubuntu
> Modifications","description":"Убунтівські доповнення до
> Firefox.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["zh-CN"],"name":"Ubuntu
> Modifications","description":"Ubuntu 火狐扩展包.","creator":"Canonical
> Ltd.","homepageURL":null},{"locales":["zh-TW"],"name":"Ubuntu
> Modifications","description":"Ubuntu Firefox 擴充包。","creator":"Canonical
> Ltd.","homepageURL":null}],"targetApplications":[{"id":"{ec8030f7-c20a-464f-9b0e-13a3a9e97384}","minVersion":"9.0","maxVersion":"37.0a1"}],"targetPlatforms":[],"multiprocessCompatible":false,"signedState":2,"seen":true}
> 1466178208570    DeferredSave.extensions.json    DEBUG    Save changes
> 1466178208570    addons.xpi    DEBUG    Updating database with changes to
> installed add-ons
> 1466178208570    addons.xpi-utils    DEBUG    Updating add-on states
> 1466178208571    addons.xpi-utils    DEBUG    Writing add-ons list
>
>
> *I have found some comments on this issue but nothing helpful:*
> Remote driver & Firefox: Unable to bind to locking port 7054 within 45000
> ms · Issue #7272 · SeleniumHQ/selenium-google-code-issue-archive
> <https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/7272>
>
> Remote driver & Firefox: Unable to bind to locking port 7054 within 45...
> Originally reported on Google Code with ID 7272 Hi All, I'm experiencing
> some sporadic issues with Remote ...
>
> <https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/7272>
> In Firefox Browser:Unable to bind to locking port 7054 within 45000ms ·
> Issue #6760 · SeleniumHQ/selenium-google-code-issue-archive
> <https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/6760>
>
> In Firefox Browser:Unable to bind to locking port 7054 within 45000ms ·
> Iss...
> Originally reported on Google Code with ID 6760 selenium: 2.32.0,
> OS:Windows XP firefox version: 26.0. steps:...
>
> <https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/6760>
>
> Unable to bind to locking port 7054 within 45000 ms : webdriver firefox
> <http://stackoverflow.com/questions/13992986/unable-to-bind-to-locking-port-7054-within-45000-ms-webdriver-firefox>
>
> Unable to bind to locking port 7054 within 45000 ms : webdriver firefox
> i'm new to selenium webdriver i'm trying to run a simple test : i'm using
> firefox 17.0.1 and seleni...
>
> <http://stackoverflow.com/questions/13992986/unable-to-bind-to-locking-port-7054-within-45000-ms-webdriver-firefox>
>
>
>
> *Please advice,*
>
>
> *Much obliged,*
>
> *Christian Fotache*
> Tel: 0728.297.207
>
>
>
>

Error parsing javascript with selenium (solr 6.0.0 & nutch 1.11 & firefox 47.0)

Posted by li...@yahoo.com.INVALID.
Hi, 
I'm trying to use selenium (solr 6.0.0 & nutch 1.11 & firefox 47.0) to parse javascript pagesI'm using this configuration for nutch-site:<property>
  <name>plugin.includes</name>
  <value>protocol-(httpclient|interactiveselenium|selenium)|urlfilter-(automaton|regex)|parse-(metatags|ext|html|js|swf|tika|zip)|index-(metadata|basic|anchor|geoip|dummy|links|more|replace|static)|scoring-opic|indexer-solr|urlnormalizer-(pass|regex|basic|ajax)|creativecommons|feed|headings|language-identifier|lib-nekohtml|lib-xml|microformats-reltag|mimetype-filter|nutch-extensionpoints|lib-selenium|subcollection|tld|parserfilter-naivebayes</value>
  <description>...</description>
</property>
and this configuration for parse-plugins.xml
<parse-plugins>

  <!--  by default if the mimeType is set to *, or 
        if it can't be determined, use parse-tika -->
    <mimeType name="*">
        <plugin id="parse-metatags"/>
        <plugin id="protocol-interactiveselenium"/>
        <plugin id="protocol-selenium"/>
        <plugin id="lib-selenium"/>
        <plugin id="nutch-extensionpoints"/>
        <plugin id="parse-js"/>
        <plugin id="parse-tika" />
              <plugin id="feed"/>
        <plugin id="parse-html"/>
        <plugin id="parse-js"/>
        <plugin id="parse-html" />
    </mimeType>
 
    <mimeType name="application/rss+xml">
        <plugin id="parse-tika" />
        <plugin id="feed" />
    </mimeType>

    <mimeType name="application/x-bzip2">
        <!--  try and parse it with the zip parser -->
        <plugin id="parse-zip" />
    </mimeType>

    <mimeType name="application/x-gzip">
        <!--  try and parse it with the zip parser -->
        <plugin id="parse-zip" />
    </mimeType>

    <mimeType name="application/x-javascript">
        <plugin id="parse-js" />
        <plugin id="protocol-interactiveselenium"/>
        <plugin id="protocol-selenium"/>
        <plugin id="lib-selenium"/>
        <plugin id="nutch-extensionpoints"/>
        <plugin id="parse-metatags"/>
        <!--<plugin id="parse-ext"/>-->
        <plugin id="parse-tika" />
    </mimeType>

    <mimeType name="application/x-shockwave-flash">
        <plugin id="parse-swf" />
    </mimeType>

    <mimeType name="application/zip">
        <plugin id="parse-zip" />
    </mimeType>

    <!--<mimeType name="text/html">
        <plugin id="parse-html" />
    </mimeType>-->

    <mimeType name="text/html">
        <plugin id="parse-metatags"/>
        <plugin id="protocol-interactiveselenium"/>
        <plugin id="protocol-selenium"/>
        <plugin id="lib-selenium"/>
        <plugin id="nutch-extensionpoints"/>
        <!--<plugin id="parse-ext"/>-->
        <!--<plugin id="parse-js"/>-->
        <plugin id="parse-html" />
        <plugin id="parse-tika" />
    </mimeType>

        <mimeType name="application/xhtml+xml">
        <plugin id="parse-metatags"/>
        <plugin id="protocol-interactiveselenium"/>
        <plugin id="protocol-selenium"/>
        <plugin id="lib-selenium"/>        <plugin id="nutch-extensionpoints"/>
        <plugin id="parse-tika" />
              <plugin id="feed" />
        <plugin id="parse-html" />
    </mimeType>

    <mimeType name="text/xml">
        <plugin id="parse-metatags"/>
        <plugin id="protocol-interactiveselenium"/>
        <plugin id="protocol-selenium"/>
        <plugin id="lib-selenium"/>
        <plugin id="parse-tika" />
        <plugin id="feed" />
    </mimeType> 
The firefox window popup with a message about private browsing on it. 
However, I get the error below and the job crushes into flames:
17 18:44:13,029 INFO  api.HttpRobotRulesParser - Couldn't get robots.txt for http://findjobs.mashable.com/: java.lang.RuntimeException: org.openqa.selenium.WebDriverException: Unable to bind to locking port 7054 within 45000 ms
Build info: version: '2.48.2', revision: '41bccdd10cf2c0560f637404c2d96164b67d9d67', time: '2015-10-09 13:08:06'
System info: host: 'solr', ip: '127.0.1.1', os.name: 'Linux', os.arch: 'amd64', os.version: '3.19.0-39-generic', java.version: '1.8.0_91'
Driver info: driver.version: FirefoxDriver
2016-06-17 18:44:13,129 ERROR selenium.Http - Failed to get protocol output
java.lang.RuntimeException: org.openqa.selenium.WebDriverException: Failed to connect to binary FirefoxBinary(/usr/bin/firefox) on port 7055; process output follows: 
ения Firefox для Ubuntu","creator":"Canonical Ltd.","homepageURL":null},{"locales":["sl"],"name":"Ubuntu Modifications","description":"Ubuntu razširitve za Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["sv-SE"],"name":"Ubuntu Modifications","description":"Ubuntu-paket för Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["uk"],"name":"Ubuntu Modifications","description":"Убунтівські доповнення до Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["zh-CN"],"name":"Ubuntu Modifications","description":"Ubuntu 火狐扩展包.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["zh-TW"],"name":"Ubuntu Modifications","description":"Ubuntu Firefox 擴充包。","creator":"Canonical Ltd.","homepageURL":null}],"targetApplications":[{"id":"{ec8030f7-c20a-464f-9b0e-13a3a9e97384}","minVersion":"9.0","maxVersion":"37.0a1"}],"targetPlatforms":[],"multiprocessCompatible":false,"signedState":2,"seen":true}
1466178208570    DeferredSave.extensions.json    DEBUG    Save changes
1466178208570    addons.xpi    DEBUG    Updating database with changes to installed add-ons
1466178208570    addons.xpi-utils    DEBUG    Updating add-on states
1466178208571    addons.xpi-utils    DEBUG    Writing add-ons list
1466178208575    addons.xpi    DEBUG    Registering manifest for /usr/lib/firefox/browser/features/firefox@getpocket.com.xpi
1466178208576    addons.xpi    DEBUG    Calling bootstrap method startup on firefox@getpocket.com version 1.0.2
1466178208578    addons.xpi    DEBUG    Registering manifest for /usr/lib/firefox/browser/features/e10srollout@mozilla.org.xpi
1466178208578    addons.xpi    DEBUG    Calling bootstrap method startup on e10srollout@mozilla.org version 1.0
1466178208578    addons.xpi    DEBUG    Registering manifest for /usr/lib/firefox/browser/features/loop@mozilla.org.xpi
1466178208579    addons.xpi    DEBUG    Calling bootstrap method startup on loop@mozilla.org version 1.3.2
1466178208610    addons.manager    DEBUG    Registering shutdown blocker for XPIProvider
1466178208610    addons.manager    DEBUG    Provider finished startup: XPIProvider
1466178208610    addons.manager    DEBUG    Starting provider: LightweightThemeManager
1466178208611    addons.manager    DEBUG    Registering shutdown blocker for LightweightThemeManager
1466178208612    addons.manager    DEBUG    Provider finished startup: LightweightThemeManager
1466178208613    addons.manager    DEBUG    Starting provider: GMPProvider
1466178208621    addons.manager    DEBUG    Registering shutdown blocker for GMPProvider
1466178208622    addons.manager    DEBUG    Provider finished startup: GMPProvider
1466178208622    addons.manager    DEBUG    Starting provider: PluginProvider
1466178208622    addons.manager    DEBUG    Registering shutdown blocker for PluginProvider
1466178208622    addons.manager    DEBUG    Provider finished startup: PluginProvider
1466178208623    addons.manager    DEBUG    Completed startup sequence
1466178209011    addons.manager    DEBUG    Starting provider: <unnamed-provider>
1466178209011    addons.manager    DEBUG    Registering shutdown blocker for <unnamed-provider>
1466178209012    addons.manager    DEBUG    Provider finished startup: <unnamed-provider>
1466178209202    DeferredSave.extensions.json    DEBUG    Write succeeded
1466178209202    addons.xpi-utils    DEBUG    XPI Database saved, setting schema version preference to 17
1466178209202    DeferredSave.extensions.json    DEBUG    Starting timer
1466178209229    DeferredSave.extensions.json    DEBUG    Starting write
1466178209237    addons.repository    DEBUG    No addons.json found.
1466178209238    DeferredSave.addons.json    DEBUG    Save changes
1466178209242    DeferredSave.addons.json    DEBUG    Starting timer
1466178209309    addons.manager    DEBUG    Starting provider: PreviousExperimentProvider
1466178209310    addons.manager    DEBUG    Registering shutdown blocker for PreviousExperimentProvider
1466178209310    addons.manager    DEBUG    Provider finished startup: PreviousExperimentProvider
1466178209317    DeferredSave.addons.json    DEBUG    Starting write
1466178209329    DeferredSave.extensions.json    DEBUG    Write succeeded
1466178209357    DeferredSave.addons.json    DEBUG    Write succeeded

(firefox:3352): Gtk-CRITICAL **: gtk_clipboard_set_with_data: assertion 'targets != NULL' failed

Build info: version: '2.48.2', revision: '41bccdd10cf2c0560f637404c2d96164b67d9d67', time: '2015-10-09 13:08:06'
System info: host: 'solr', ip: '127.0.1.1', os.name: 'Linux', os.arch: 'amd64', os.version: '3.19.0-39-generic', java.version: '1.8.0_91'
Driver info: driver.version: FirefoxDriver
    at org.apache.nutch.protocol.selenium.HttpWebClient.getDriverForPage(HttpWebClient.java:118)
    at org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:155)
    at org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:244)
    at org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:168)
    at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
    at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:261)
    at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:290)
Caused by: org.openqa.selenium.WebDriverException: Failed to connect to binary FirefoxBinary(/usr/bin/firefox) on port 7055; process output follows: 
ения Firefox для Ubuntu","creator":"Canonical Ltd.","homepageURL":null},{"locales":["sl"],"name":"Ubuntu Modifications","description":"Ubuntu razširitve za Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["sv-SE"],"name":"Ubuntu Modifications","description":"Ubuntu-paket för Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["uk"],"name":"Ubuntu Modifications","description":"Убунтівські доповнення до Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["zh-CN"],"name":"Ubuntu Modifications","description":"Ubuntu 火狐扩展包.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["zh-TW"],"name":"Ubuntu Modifications","description":"Ubuntu Firefox 擴充包。","creator":"Canonical Ltd.","homepageURL":null}],"targetApplications":[{"id":"{ec8030f7-c20a-464f-9b0e-13a3a9e97384}","minVersion":"9.0","maxVersion":"37.0a1"}],"targetPlatforms":[],"multiprocessCompatible":false,"signedState":2,"seen":true}
1466178208570    DeferredSave.extensions.json    DEBUG    Save changes
1466178208570    addons.xpi    DEBUG    Updating database with changes to installed add-ons
1466178208570    addons.xpi-utils    DEBUG    Updating add-on states
1466178208571    addons.xpi-utils    DEBUG    Writing add-ons list


I have found some comments on this issue but nothing helpful:
Remote driver & Firefox: Unable to bind to locking port 7054 within 45000 ms · Issue #7272 · SeleniumHQ/selenium-google-code-issue-archive

  
|  
|   
|   
|   |    |

   |

  |
|  
|    |  
Remote driver & Firefox: Unable to bind to locking port 7054 within 45...
 Originally reported on Google Code with ID 7272 Hi All, I'm experiencing some sporadic issues with Remote ...  |   |

  |

  |

 In Firefox Browser:Unable to bind to locking port 7054 within 45000ms · Issue #6760 · SeleniumHQ/selenium-google-code-issue-archive
  
|  
|   
|   
|   |    |

   |

  |
|  
|    |  
In Firefox Browser:Unable to bind to locking port 7054 within 45000ms · Iss...
 Originally reported on Google Code with ID 6760 selenium: 2.32.0, OS:Windows XP firefox version: 26.0. steps:...  |   |

  |

  |

 
Unable to bind to locking port 7054 within 45000 ms : webdriver firefox

  
|  
|   
|   
|   |    |

   |

  |
|  
|   |  
Unable to bind to locking port 7054 within 45000 ms : webdriver firefox
 i'm new to selenium webdriver i'm trying to run a simple test : i'm using firefox 17.0.1 and seleni...  |   |

  |

  |

 


Please advice,
Much obliged,

Christian Fotache 
Tel: 0728.297.207


  

[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    There will likely be some conflicts with bouncy castle.  
    
    Tika 1.13:
    bcmail-jdk15on	1.54	
    bcprov-jdk15on	1.54
    
    vs. Solr:
    org.bouncycastle.version = 1.45
    /org.bouncycastle/bcmail-jdk15 = ${org.bouncycastle.version}
    /org.bouncycastle/bcprov-jdk15 = ${org.bouncycastle.version}



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    OK, the tests pass for me successfully. Should I remove the jackcess-encrypt package from your PR after merging (you said you will be away this weekend)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    > will take a look. The test passed if you assumed that the html had two bodies, but that's crazy...
    
    I hope this test does not download the internet? It should all run local! I have not looked into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    The XHTMLContentHandler adds <body> and </body>.  In out-of-the-box Tika with the DefaultHtmlMapper, "body" tags are not in the list of "SAFE_ELEMENTS", which means that the html's "body" tag is never passed through...so we don't see the doubling in Tika.
    
    The solution is to suppress the body tag in Solr's MostlyPassthroughHtmlMapper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    I also only have Windows :)
    
    I would leave out image format, but MS Access looks fine. Could we leave out updating bouncycastl then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    > I also only have Windows :)
    
    How can you live with the failed builds?!?  I wanted to help with [morphlines](https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201606.mbox/%3CCY1PR09MB1115F9A08E97879D959D3CDCC7570%40CY1PR09MB1115.namprd09.prod.outlook.com%3E), but I can't easily do much...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Let's pick option 2 for now. Maybe update the rest of Solr after some review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Ah OK, so no problem on my side. I'll wait a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    I think I got it...  ant precommit worked in Linux with these modifications.  I kept getting hangs with ant jar-checksums in Windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Did you check with Java 9 or should I do it? I am not sure about the last assume removed, because there is another SOLR issue in the assume message' not just the PDFBOX one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    for me it still happens. I just merged the PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
GitHub user tballison reopened a pull request:

    https://github.com/apache/lucene-solr/pull/44

    SOLR-8981

    SOLR-8981 upgrade to Tika 1.13

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tballison/lucene-solr SOLR-8981

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/44.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #44
    
----
commit ba0e71703464849198b384aa6e92962db8a04b51
Author: tballison <ta...@mitre.org>
Date:   2016-06-16T16:56:45Z

    SOLR-8981 upgrade to Tika 1.13

commit 1706b92790011f3ec5a85915adad3834e87d8970
Author: tballison <ta...@mitre.org>
Date:   2016-06-16T19:36:52Z

    SOLR-8981 clean up license and sha1 info

commit 31c091b4856081f2d1b302499a436e5953779e5e
Author: tballison <ta...@mitre.org>
Date:   2016-06-17T13:47:53Z

    SOLR-8981 clean up new lines, upgrade isoparser, add notice in CHANGES.txt

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #44: SOLR-8981

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/lucene-solr/pull/44


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison closed the pull request at:

    https://github.com/apache/lucene-solr/pull/44


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Grep for that one and remove them. Tests should pass then with latest Java 9:
    `assumeFalse("This test fails with Java 9 (https://issues.apache.org/jira/browse/PDFBOX-3155)", Constants.JRE_IS_MINIMUM_JAVA9);`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/44#discussion_r67575579
  
    --- Diff: solr/contrib/morphlines-cell/src/test/org/apache/solr/morphlines/cell/SolrCellMorphlineTest.java ---
    @@ -42,8 +42,6 @@
       @BeforeClass
       public static void beforeClass2() {
         assumeFalse("FIXME: Morphlines currently has issues with Windows paths", Constants.WINDOWS);
    -    assumeFalse("This test fails with Java 9 (https://issues.apache.org/jira/browse/PDFBOX-3155, https://issues.apache.org/jira/browse/SOLR-8876)",
    --- End diff --
    
    This should stay, because Hadoop related stuff also fails with Java 9. Maybe only remove the PDFBOX issue number.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    What file formats are this? Documents? Otherwise please leave them out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    > I think this should work... ant precommit worked in Linux with these modifications. I kept getting hangs with ant jar-checksums in Windows.
    
    If you checkout with git on windows using auto-eol it fails. The reason is git that threats sha1 files as text and converts their line endings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Just found it.  Confirming that fix doesn't break anything else.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Yes the server is buggered. Good work folks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Git (well, it was my fault, don't get me wrong) added the \r\n somehow.  I had turned off autocrlf earlier.
    
    > C:\...>git config --get core.autocrlf
    input
    
    I realized I forgot to update the isoparser, and I cleaned up the Jackcess notice.
    
    Let me know how this looks now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    LOL. So is this a bug in Solr or in TIKA? Because it did not happen previously.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    No, it is a self-contained test with a test file. +1 on local and _only_ local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    argh...
    
    will take a look.  The test passed if you assumed that the html had two bodies, but that's crazy...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    WebP is an image format.
    Jackcess encrypt is the library that allows users to decrypt MSAccess files.
    
    Please give it a go with Java 9.  I can't easily test the morphlines stuff on my main dev box (Windows ... :( ).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    If we leave out updating bouncycastle, I'm fairly confident that users will run problems at run time if they try to decrypt MSAccess and probably PDF and doc.
    
    We had a binary incompatibility between 1.52 and 1.54 with Jackcess: https://sourceforge.net/p/jackcessencrypt/feature-requests/2/
    
    IIRC, the exception was thrown on any encrypted MSAccess file, not just those for which the user had a password.
    
    I see two options: 
    
    1) upgrade bouncycastle and hope we don't break other parts of Solr
    2) announce decryption of Jackcess/POI/PDFBox as unsupported
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #44: SOLR-8981

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/44
  
    Not willing to point fingers... :)
    
    I'd like to track down the change in our history between 1.7 and 1.13 so that I actually understand what happened


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org