You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by KRIS MUSSHORN <mu...@comcast.net> on 2016/09/06 17:29:17 UTC

indexing metatags with Nutch 1.12

https://wiki.apache.org/nutch/IndexMetatags 

Soon as i switch to nutch-site_v2 nutch throws protocol missing errors during crawl. 

2016-09-06 12:23:53,102 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1 
2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetching https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf (queue crawl delay=500ms) 
2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetch of https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=https 
at org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84) 
at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) 

how can i fix this? 

Kris