You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "Musshorn, Kris T CTR USARMY RDECOM ARL (US)" <kr...@mail.mil> on 2016/07/27 13:52:20 UTC

progress (UNCLASSIFIED)

CLASSIFICATION: UNCLASSIFIED

I have made it past the stupid crap I was doing that was causing errors and gotten to the point in the tutorial where I am trying to index the resources into solr...
Not completely sure the integration is perfect but tried 

bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20160727090259/ -filter - normalize
Segment dir is complete: crawl/segments/20160727090259.
The input path at - is not a segment... skipping
The input path at normalize is not a segment... skipping
Indexer: starting at 2016-07-27 09:50:58
Indexer: deleting gone documents: false
Indexer: URL filtering: true
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
        solr.server.url : URL of the SOLR instance
        solr.zookeeper.hosts : URL of the Zookeeper quorum
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : username for authentication
        solr.auth.password : password for authentication


Indexing 1/1 documents
Deleting 0 documents
Indexing 1/1 documents
Deleting 0 documents
Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)


What do I need to do to get this to run?


Thanks,
Kris

~~~~~~~~~~~~~~~~~~~~~~~~~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.      
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
kris.t.musshorn.ctr@mail.mil
~~~~~~~~~~~~~~~~~~~~~~~~~~


CLASSIFICATION: UNCLASSIFIED

RE: progress (UNCLASSIFIED)

Posted by Markus Jelsma <ma...@openindex.io>.

Hello, can you check the logs? There may be a problem with some libraries as someone recently noticed as well.
Markus

 
 
-----Original message-----
> From:Musshorn, Kris T CTR USARMY RDECOM ARL (US) <kr...@mail.mil>
> Sent: Wednesday 27th July 2016 15:52
> To: user@nutch.apache.org
> Subject: progress (UNCLASSIFIED)
> 
> CLASSIFICATION: UNCLASSIFIED
> 
> I have made it past the stupid crap I was doing that was causing errors and gotten to the point in the tutorial where I am trying to index the resources into solr...
> Not completely sure the integration is perfect but tried 
> 
> bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20160727090259/ -filter - normalize
> Segment dir is complete: crawl/segments/20160727090259.
> The input path at - is not a segment... skipping
> The input path at normalize is not a segment... skipping
> Indexer: starting at 2016-07-27 09:50:58
> Indexer: deleting gone documents: false
> Indexer: URL filtering: true
> Indexer: URL normalizing: false
> Active IndexWriters :
> SOLRIndexWriter
>         solr.server.url : URL of the SOLR instance
>         solr.zookeeper.hosts : URL of the Zookeeper quorum
>         solr.commit.size : buffer size when sending to SOLR (default 1000)
>         solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
>         solr.auth : use authentication (default false)
>         solr.auth.username : username for authentication
>         solr.auth.password : password for authentication
> 
> 
> Indexing 1/1 documents
> Deleting 0 documents
> Indexing 1/1 documents
> Deleting 0 documents
> Indexer: java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
> 
> 
> What do I need to do to get this to run?
> 
> 
> Thanks,
> Kris
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> Kris T. Musshorn
> FileMaker Developer - Contractor - Catapult Technology Inc.      
> US Army Research Lab 
> Aberdeen Proving Ground 
> Application Management & Development Branch 
> 410-278-7251
> kris.t.musshorn.ctr@mail.mil
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> CLASSIFICATION: UNCLASSIFIED