You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Yoniel Jorge Thomas Sosa <yj...@uci.cu> on 2015/01/19 17:01:54 UTC
Problems with web sites using HTTPS in Nutch 1.9
Hi, I am using Nutch 1.9 version but I have a problem with the certificates of the sites in HTTPS. I have activated the protocol-httpclient plugin but I can't fix this problem yet. Below is shown the output
Injector: starting at 2015-01-19 10:13:05
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: overwrite: false
Injector: update: false
Injector: Total number of urls rejected by filters: 0
Injector: Total number of urls after normalization: 9
Injector: Total new urls injected: 9
Injector: finished at 2015-01-19 10:13:18, elapsed: 00:00:13
lun ene 19 10:13:18 CST 2015 : Iteration 1 of 1
Generating a new segment
Generator: starting at 2015-01-19 10:13:32
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: true
Generator: topN: 100
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl/segments/20150119101334
Generator: finished at 2015-01-19 10:13:35, elapsed: 00:00:03
Operating on segment : 20150119101334
Fetching : 20150119101334
Fetcher: starting at 2015-01-19 10:13:35
Fetcher: segment: crawl/segments/20150119101334
Fetcher Timelimit set for : 1421691215995
Using queue mode : byHost
Fetcher: threads: 50
Fetcher: time-out divisor: 2
QueueFeeder finished: total 9 records + hit by time limit :0
fetch of https://facultad6.uci.cu/ failed with: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
fetch of https://dragones.uci.cu/ failed with: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
fetch of https://php.uci.cu/news.php failed with: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
---------------------------------------------------
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
Re: Problems with web sites using HTTPS in Nutch 1.9
Posted by karamveer <ka...@classicinformatics.com>.
Hi,
We're using Nutch 2.3 version with MongoDB database, its working fine and
fetching the records from 3rd party domains.
But facing exceptions/errors if I use https (SSL) based domain, can anyone
instruct me a solution on this error ->
fetch of https://xxxx.com/s/topiccatalog failed with:
javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target
10/10 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 2 URLs
in 1 queues
* queue: https://xxxxx.com
maxThreads = 1
inProgress = 0
Thanks,
Karamveer Singh
--
Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html