You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Richardson, Jacquelyn F." <fl...@ornl.gov> on 2015/01/05 14:56:03 UTC

RE: Nutch 1.9 error

Hi Markus,

Thanks for the reply.  I following the link which led me to this link http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html and from here another link http://sourceforge.net/projects/sitemap-parser/

The last link is a page that allows you to download SitemapParser0.9.jar file.  There is no information on the page that tells you where to put the file and how to tell nutch to use it.  

Have you used this utility or do you know the answers to any of the questions above?

Jackie

-----Original Message-----
From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
Sent: Friday, December 19, 2014 6:20 AM
To: user@nutch.apache.org
Subject: RE: Nutch 1.9 error

No, i am wrong. Nutch 1.x has a patch for sitemap processing, please see:
https://issues.apache.org/jira/browse/NUTCH-1465

 
 
-----Original message-----
> From:Markus Jelsma <ma...@openindex.io>
> Sent: Friday 19th December 2014 12:17
> To: user@nutch.apache.org
> Subject: RE: Nutch 1.9 error
> 
> No, unfortunately not. 
>  
>  
> -----Original message-----
> > From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> > Sent: Friday 19th December 2014 5:16
> > To: user@nutch.apache.org
> > Subject: RE: Nutch 1.9 error
> > 
> > Is it possible to crawl sitemap.xml file with Nutch 1.x?
> > 
> > -----Original Message-----
> > From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> > Sent: Thursday, December 18, 2014 3:09 PM
> > To: user@nutch.apache.org
> > Subject: RE: Nutch 1.9 error
> > 
> > Hi - the sitemap command is not part of Nutch 1.x, nor does it have a HostDB. I suspect you are using Nutch 2.x commands. 
> >  
> > -----Original message-----
> > > From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> > > Sent: Thursday 18th December 2014 20:30
> > > To: user@nutch.apache.org
> > > Subject: Nutch 1.9 error
> > > 
> > > I am using Nutch 1.9.  I am trying to crawl our sitemap.xml file.
> > > 
> > > When I submit the following command:
> > > bin/nutch sitemap crawl -hostdb hostdb -threads 2 to nutch I 
> > > receive the following error:
> > > Error: Could not find or load main class sitemap
> > > 
> > > Any help you can give will be greatly appreciated.
> > > 
> > > Jackie Richardson
> > > 
> > > 
> > > 
> > 
> 

RE: Nutch 1.9 error

Posted by Markus Jelsma <ma...@openindex.io>.
Hi - you should try out the patches on that Jira page.
Markus
 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> Sent: Monday 5th January 2015 15:01
> To: user@nutch.apache.org
> Subject: RE: Nutch 1.9 error
> 
> Hi Markus,
> 
> Thanks for the reply.  I following the link which led me to this link http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html and from here another link http://sourceforge.net/projects/sitemap-parser/
> 
> The last link is a page that allows you to download SitemapParser0.9.jar file.  There is no information on the page that tells you where to put the file and how to tell nutch to use it.  
> 
> Have you used this utility or do you know the answers to any of the questions above?
> 
> Jackie
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
> Sent: Friday, December 19, 2014 6:20 AM
> To: user@nutch.apache.org
> Subject: RE: Nutch 1.9 error
> 
> No, i am wrong. Nutch 1.x has a patch for sitemap processing, please see:
> https://issues.apache.org/jira/browse/NUTCH-1465
> 
>  
>  
> -----Original message-----
> > From:Markus Jelsma <ma...@openindex.io>
> > Sent: Friday 19th December 2014 12:17
> > To: user@nutch.apache.org
> > Subject: RE: Nutch 1.9 error
> > 
> > No, unfortunately not. 
> >  
> >  
> > -----Original message-----
> > > From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> > > Sent: Friday 19th December 2014 5:16
> > > To: user@nutch.apache.org
> > > Subject: RE: Nutch 1.9 error
> > > 
> > > Is it possible to crawl sitemap.xml file with Nutch 1.x?
> > > 
> > > -----Original Message-----
> > > From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> > > Sent: Thursday, December 18, 2014 3:09 PM
> > > To: user@nutch.apache.org
> > > Subject: RE: Nutch 1.9 error
> > > 
> > > Hi - the sitemap command is not part of Nutch 1.x, nor does it have a HostDB. I suspect you are using Nutch 2.x commands. 
> > >  
> > > -----Original message-----
> > > > From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> > > > Sent: Thursday 18th December 2014 20:30
> > > > To: user@nutch.apache.org
> > > > Subject: Nutch 1.9 error
> > > > 
> > > > I am using Nutch 1.9.  I am trying to crawl our sitemap.xml file.
> > > > 
> > > > When I submit the following command:
> > > > bin/nutch sitemap crawl -hostdb hostdb -threads 2 to nutch I 
> > > > receive the following error:
> > > > Error: Could not find or load main class sitemap
> > > > 
> > > > Any help you can give will be greatly appreciated.
> > > > 
> > > > Jackie Richardson
> > > > 
> > > > 
> > > > 
> > > 
> > 
>