You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Fred Zimmerman <wf...@nimblebooks.com> on 2011/10/09 02:22:24 UTC

solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc.

Hi -- I am having trouble with the solrindexer parameters -- I see that
Lewis had similar problems a few months ago. Any idea what I am doing wrong?

bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch
> solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/ crawl/crawldb
> crawl/linkdb crawl/segments/*
> SolrIndexer: starting at 2011-10-09 00:13:24
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text



-----------------------------------------------------
Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
monthly updates



On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi guys,
>
> I have been watching this thread intently and I am very happy to see that
> there is some progress :0)
>
> Radim,
>
> Can I ask that you open a JIRA issue and submit a patch, this way we can
> not
> only track it, but it will also give the community a chance to test and
> validate the patch prior to integration into the source.
>
> Thanks
>
> Lewis
>
> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh <
> Rajesh.Ramanathapuram@turner.com> wrote:
>
> > Hi Radim,
> >
> >  Thank you so much for this. I am not familiar with commit process to the
> > core.
> >  Is there someone who can help us get this committed and help resolve
> this
> > issue?
> >
> > Thanks for all your help.
> >
> > Rajesh Ramana
> >
> > -----Original Message-----
> > From: Radim Kolar [mailto:hsn@sendmail.cz]
> > Sent: Thursday, October 06, 2011 2:18 PM
> > To: user@nutch.apache.org
> > Subject: Re: Nutch not crawling URLs with spanish accented characters (
> ñ)
> >
> > - The REGEX normalizer transforms the special characters, but fails to
> > substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’
> >  - The fetcher is having trouble interpreting the links with special
> > character ‘ñ’.
> >
> > i can add this transformation to basic-url normalizer if somebody is
> > willing to commit it.
> >
>
>
>
> --
> *Lewis*
>