You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Musshorn, Kris T CTR USARMY RDECOM ARL (US)" <kr...@mail.mil> on 2016/07/27 13:52:20 UTC
progress (UNCLASSIFIED)
CLASSIFICATION: UNCLASSIFIED
I have made it past the stupid crap I was doing that was causing errors and gotten to the point in the tutorial where I am trying to index the resources into solr...
Not completely sure the integration is perfect but tried
bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20160727090259/ -filter - normalize
Segment dir is complete: crawl/segments/20160727090259.
The input path at - is not a segment... skipping
The input path at normalize is not a segment... skipping
Indexer: starting at 2016-07-27 09:50:58
Indexer: deleting gone documents: false
Indexer: URL filtering: true
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance
solr.zookeeper.hosts : URL of the Zookeeper quorum
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : username for authentication
solr.auth.password : password for authentication
Indexing 1/1 documents
Deleting 0 documents
Indexing 1/1 documents
Deleting 0 documents
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
What do I need to do to get this to run?
Thanks,
Kris
~~~~~~~~~~~~~~~~~~~~~~~~~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.
US Army Research Lab
Aberdeen Proving Ground
Application Management & Development Branch
410-278-7251
kris.t.musshorn.ctr@mail.mil
~~~~~~~~~~~~~~~~~~~~~~~~~~
CLASSIFICATION: UNCLASSIFIED
RE: progress (UNCLASSIFIED)
Posted by Markus Jelsma <ma...@openindex.io>.
Hello, can you check the logs? There may be a problem with some libraries as someone recently noticed as well.
Markus
-----Original message-----
> From:Musshorn, Kris T CTR USARMY RDECOM ARL (US) <kr...@mail.mil>
> Sent: Wednesday 27th July 2016 15:52
> To: user@nutch.apache.org
> Subject: progress (UNCLASSIFIED)
>
> CLASSIFICATION: UNCLASSIFIED
>
> I have made it past the stupid crap I was doing that was causing errors and gotten to the point in the tutorial where I am trying to index the resources into solr...
> Not completely sure the integration is perfect but tried
>
> bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20160727090259/ -filter - normalize
> Segment dir is complete: crawl/segments/20160727090259.
> The input path at - is not a segment... skipping
> The input path at normalize is not a segment... skipping
> Indexer: starting at 2016-07-27 09:50:58
> Indexer: deleting gone documents: false
> Indexer: URL filtering: true
> Indexer: URL normalizing: false
> Active IndexWriters :
> SOLRIndexWriter
> solr.server.url : URL of the SOLR instance
> solr.zookeeper.hosts : URL of the Zookeeper quorum
> solr.commit.size : buffer size when sending to SOLR (default 1000)
> solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
> solr.auth : use authentication (default false)
> solr.auth.username : username for authentication
> solr.auth.password : password for authentication
>
>
> Indexing 1/1 documents
> Deleting 0 documents
> Indexing 1/1 documents
> Deleting 0 documents
> Indexer: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
>
>
> What do I need to do to get this to run?
>
>
> Thanks,
> Kris
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> Kris T. Musshorn
> FileMaker Developer - Contractor - Catapult Technology Inc.
> US Army Research Lab
> Aberdeen Proving Ground
> Application Management & Development Branch
> 410-278-7251
> kris.t.musshorn.ctr@mail.mil
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> CLASSIFICATION: UNCLASSIFIED