You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (Commented) (JIRA)" <ji...@apache.org> on 2011/11/30 03:05:42 UTC

[jira] [Commented] (CONNECTORS-298) Web connector: SSL does not use custom SSL socket factory in all cases

    [ https://issues.apache.org/jira/browse/CONNECTORS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159743#comment-13159743 ] 

Karl Wright commented on CONNECTORS-298:
----------------------------------------

When I attempt to crawl the above site, I get the following history:

{code}
11-29-2011 20:57:30.199 	job end 	1322618211892(test)
		0 	1 	
11-29-2011 20:57:27.566 	fetch 	https://learningresources.nga.gov/
	-11 	0 	1 	robots.txt says so
11-29-2011 20:57:27.551 	robots parse 	https:learningresources.nga.gov:443
	SUCCESS 	0 	1 	
11-29-2011 20:57:27.147 	fetch 	https://learningresources.nga.gov/robots.txt
	200 	82 	391 	
11-29-2011 20:57:22.147 	fetch 	http://learningresources.nga.gov
	301 	242 	137 	
11-29-2011 20:57:17.177 	fetch 	http://learningresources.nga.gov/robots.txt
	-103 	0 	729 	java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
11-29-2011 20:57:10.150 	job start 	1322618211892(test)
		0 	1 	
{code}

Note that the initial fetch of robots.txt via http was unsuccessful and threw the exception, but the subsequent fetch of robots.txt with protocol specified as https worked fine with no errors.  It appears to me that the problem may be that the site itself is using SSL even when the incoming request is not https.

Note also that the crawl stopped because robots prohibited it.

                
> Web connector: SSL does not use custom SSL socket factory in all cases
> ----------------------------------------------------------------------
>
>                 Key: CONNECTORS-298
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-298
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 0.3
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> When crawling learningresources.nga.gov, the web connector gets a strange exception from certificate verification logic in Sun's SSL implementation.  The stack trace indicates that the ManifoldCF secure socket factory may not have been used to set up the stream either.  Here's the trace:
>  INFO 2011-11-29 20:13:33,397 (Thread-535) - I/O exception (javax.net.ssl.SSLException) caught when processing request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> DEBUG 2011-11-29 20:13:33,397 (Thread-535) - java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> 	at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
> 	at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
> 	at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1612)
> 	at com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1595)
> 	at com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1521)
> 	at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:64)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> 	at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(Unknown Source)
> 	at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.flushRequestOutputStream(Unknown Source)
> 	at org.apache.commons.httpclient.HttpMethodBase.writeRequest(Unknown Source)
> 	at org.apache.commons.httpclient.HttpMethodBase.execute(Unknown Source)
> 	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Unknown Source)
> 	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(Unknown Source)
> 	at org.apache.commons.httpclient.HttpClient.executeMethod(Unknown Source)
> 	at org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledConnection$ExecuteMethodThread.run(ThrottledFetcher.java:1244)
> Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
> 	at sun.security.validator.PKIXValidator.<init>(PKIXValidator.java:57)
> 	at sun.security.validator.Validator.getInstance(Validator.java:161)
> 	at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.getValidator(X509TrustManagerImpl.java:108)
> 	at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:204)
> 	at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:249)
> 	at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1185)
> 	at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
> 	at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
> 	at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
> 	at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:893)
> 	at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138)
> 	at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:632)
> 	at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> 	... 10 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira