You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (Commented) (JIRA)" <ji...@apache.org> on 2011/11/30 03:05:42 UTC
[jira] [Commented] (CONNECTORS-298) Web connector: SSL does not use
custom SSL socket factory in all cases
[ https://issues.apache.org/jira/browse/CONNECTORS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159743#comment-13159743 ]
Karl Wright commented on CONNECTORS-298:
----------------------------------------
When I attempt to crawl the above site, I get the following history:
{code}
11-29-2011 20:57:30.199 job end 1322618211892(test)
0 1
11-29-2011 20:57:27.566 fetch https://learningresources.nga.gov/
-11 0 1 robots.txt says so
11-29-2011 20:57:27.551 robots parse https:learningresources.nga.gov:443
SUCCESS 0 1
11-29-2011 20:57:27.147 fetch https://learningresources.nga.gov/robots.txt
200 82 391
11-29-2011 20:57:22.147 fetch http://learningresources.nga.gov
301 242 137
11-29-2011 20:57:17.177 fetch http://learningresources.nga.gov/robots.txt
-103 0 729 java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
11-29-2011 20:57:10.150 job start 1322618211892(test)
0 1
{code}
Note that the initial fetch of robots.txt via http was unsuccessful and threw the exception, but the subsequent fetch of robots.txt with protocol specified as https worked fine with no errors. It appears to me that the problem may be that the site itself is using SSL even when the incoming request is not https.
Note also that the crawl stopped because robots prohibited it.
> Web connector: SSL does not use custom SSL socket factory in all cases
> ----------------------------------------------------------------------
>
> Key: CONNECTORS-298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-298
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Affects Versions: ManifoldCF 0.3
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 0.4
>
>
> When crawling learningresources.nga.gov, the web connector gets a strange exception from certificate verification logic in Sun's SSL implementation. The stack trace indicates that the ManifoldCF secure socket factory may not have been used to set up the stream either. Here's the trace:
> INFO 2011-11-29 20:13:33,397 (Thread-535) - I/O exception (javax.net.ssl.SSLException) caught when processing request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> DEBUG 2011-11-29 20:13:33,397 (Thread-535) - java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1612)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1595)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1521)
> at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:64)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(Unknown Source)
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.flushRequestOutputStream(Unknown Source)
> at org.apache.commons.httpclient.HttpMethodBase.writeRequest(Unknown Source)
> at org.apache.commons.httpclient.HttpMethodBase.execute(Unknown Source)
> at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Unknown Source)
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(Unknown Source)
> at org.apache.commons.httpclient.HttpClient.executeMethod(Unknown Source)
> at org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledConnection$ExecuteMethodThread.run(ThrottledFetcher.java:1244)
> Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> at sun.security.validator.PKIXValidator.<init>(PKIXValidator.java:57)
> at sun.security.validator.Validator.getInstance(Validator.java:161)
> at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.getValidator(X509TrustManagerImpl.java:108)
> at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:204)
> at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:249)
> at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1185)
> at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
> at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
> at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:893)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:632)
> at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> ... 10 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira