You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Yoniel Jorge Thomas Sosa <yj...@uci.cu> on 2015/01/19 17:01:54 UTC

Problems with web sites using HTTPS in Nutch 1.9

Hi, I am using Nutch 1.9 version but I have a problem with the certificates of the sites in HTTPS. I have activated the protocol-httpclient plugin but I can't fix this problem yet. Below is shown the output 

Injector: starting at 2015-01-19 10:13:05 
Injector: crawlDb: crawl/crawldb 
Injector: urlDir: urls 
Injector: Converting injected urls to crawl db entries. 
Injector: overwrite: false 
Injector: update: false 
Injector: Total number of urls rejected by filters: 0 
Injector: Total number of urls after normalization: 9 
Injector: Total new urls injected: 9 
Injector: finished at 2015-01-19 10:13:18, elapsed: 00:00:13 
lun ene 19 10:13:18 CST 2015 : Iteration 1 of 1 
Generating a new segment 
Generator: starting at 2015-01-19 10:13:32 
Generator: Selecting best-scoring urls due for fetch. 
Generator: filtering: false 
Generator: normalizing: true 
Generator: topN: 100 
Generator: Partitioning selected urls for politeness. 
Generator: segment: crawl/segments/20150119101334 
Generator: finished at 2015-01-19 10:13:35, elapsed: 00:00:03 
Operating on segment : 20150119101334 
Fetching : 20150119101334 
Fetcher: starting at 2015-01-19 10:13:35 
Fetcher: segment: crawl/segments/20150119101334 
Fetcher Timelimit set for : 1421691215995 
Using queue mode : byHost 
Fetcher: threads: 50 
Fetcher: time-out divisor: 2 
QueueFeeder finished: total 9 records + hit by time limit :0 
fetch of https://facultad6.uci.cu/ failed with: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 
fetch of https://dragones.uci.cu/ failed with: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 
fetch of https://php.uci.cu/news.php failed with: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 





---------------------------------------------------
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.

Re: Problems with web sites using HTTPS in Nutch 1.9

Posted by karamveer <ka...@classicinformatics.com>.
Hi, 

We're using Nutch 2.3 version with MongoDB database, its working fine and
fetching the records from 3rd party domains. 

But facing exceptions/errors if I use https (SSL) based domain, can anyone
instruct me a solution on this error -> 


fetch of https://xxxx.com/s/topiccatalog failed with:
javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target 
10/10 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 2 URLs
in 1 queues 
* queue: https://xxxxx.com
  maxThreads    = 1 
  inProgress    = 0 


Thanks, 
Karamveer Singh



--
Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html