You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by zo tiger <zo...@hotmail.com> on 2009/09/02 12:36:15 UTC
Help me, No urls to fetch.
Hi,
I have installed nutch1.0 and hadoop1.9 successfully.
There is no error.
I followed a tutorial http://wiki.apache.org/nutch/NutchHadoopTutorial
http://wiki.apache.org/nutch/NutchHadoopTutorial
I used three multinode, one is master node and other two are slave nodes.
But I just do crawled , no data showed.
$ bin/nutch crawl urls -dir crawled -depth 3
crawl started in: crawled
rootUrlDir = urls
threads = 10
depth = 3
Injector: starting
Injector: crawlDb: crawled/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawled/segments/20090902102133
Generator: filtering: true
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawled
At last i ran bin/nutch crawl command but it gives
No urls to fetch check your filter and seed list error
I am sure there is no problem in crawl-url filter and other configuration
xml files
İs anyone know any possible problem????
help me...
--
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255142.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Help me, No urls to fetch.
Posted by zo tiger <zo...@hotmail.com>.
Hi , 皮皮
how to check that the time clocks of namenode and datanode is synchronized.
I checked all timezone of my all nodes. They were all the same.
please , help me 皮皮.
皮皮 wrote:
>
> check the time clocks of namenode and datanode is synchronized.
>
> 2009/9/3 MilleBii <mi...@gmail.com>
>
>> Is there more information in logs/hadoop file ?
>>
>> What is your plug-in list ?
>>
>> 2009/9/2 zo tiger <zo...@hotmail.com>
>>
>> >
>> > Thank you for your reply.
>> >
>> > In urls directory(exactly /nutch/search/urls) , there is a file
>> > urllist.txt.
>> >
>> > content is as following.
>> >
>> > http://lucene.apache.org
>> >
>> > I don't understand why nutch can not fetch any url.
>> >
>> >
>> > Paul Tomblin wrote:
>> > >
>> > > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>> > >>
>> > >
>> > >> At last i ran bin/nutch crawl command but it gives
>> > >>
>> > >> No urls to fetch check your filter and seed list error
>> > >>
>> > >> I am sure there is no problem in crawl-url filter and other
>> > configuration
>> > >> xml files
>> > >>
>> > >> İs anyone know any possible problem????
>> > >>
>> > >
>> > > What's in your url directory?
>> > >
>> > >
>> > > --
>> > > http://www.linkedin.com/in/paultomblin
>> > >
>> > >
>> >
>> > --
>> > View this message in context:
>> >
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> > Sent from the Nutch - User mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>> --
>> -MilleBii-
>>
>
>
--
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324761.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Help me, No urls to fetch.
Posted by 皮皮 <pi...@gmail.com>.
check the time clocks of namenode and datanode is synchronized.
2009/9/3 MilleBii <mi...@gmail.com>
> Is there more information in logs/hadoop file ?
>
> What is your plug-in list ?
>
> 2009/9/2 zo tiger <zo...@hotmail.com>
>
> >
> > Thank you for your reply.
> >
> > In urls directory(exactly /nutch/search/urls) , there is a file
> > urllist.txt.
> >
> > content is as following.
> >
> > http://lucene.apache.org
> >
> > I don't understand why nutch can not fetch any url.
> >
> >
> > Paul Tomblin wrote:
> > >
> > > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
> > >>
> > >
> > >> At last i ran bin/nutch crawl command but it gives
> > >>
> > >> No urls to fetch check your filter and seed list error
> > >>
> > >> I am sure there is no problem in crawl-url filter and other
> > configuration
> > >> xml files
> > >>
> > >> İs anyone know any possible problem????
> > >>
> > >
> > > What's in your url directory?
> > >
> > >
> > > --
> > > http://www.linkedin.com/in/paultomblin
> > >
> > >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
> >
>
>
> --
> -MilleBii-
>
Re: Help me, No urls to fetch.
Posted by Futebol DotInfo <fu...@yahoo.com>.
unsubscribe
--- On Mon, 9/7/09, zo tiger <zo...@hotmail.com> wrote:
From: zo tiger <zo...@hotmail.com>
Subject: Re: Help me, No urls to fetch.
To: nutch-user@lucene.apache.org
Date: Monday, September 7, 2009, 3:31 AM
Oh, i resolved it. Nutch is runned. Great.
I forgot copy all conf file to other slave nodes.
I only setted config files on the master node but not all slave nodes.
thanks for help of Paul Tomblin , MilleBii and 皮皮.
Very thank you.
MilleBii wrote:
>
> Obviously you've checked crawl-filter.txt rules.
> Beware there is a nasty thing that can happen : make sure there is a
> direct
> CR/LF at the end of the rules, I had recently a problem because some
> "invisible" spaces where following one rule and therefore this rule was
> never matching... took me a while to figure out.
>
>
> 2009/9/7 zo tiger <zo...@hotmail.com>
>
>>
>> This is my hadoop.log file's contents
>>
>>
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - HTTP
>> Framework (lib-http)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Text
>> Parse
>> Plug-in (parse-text)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository -
>> Pass-through
>> URL Normalizer (urlnormalizer-pass)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
>> Filter (urlfilter-regex)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Http
>> Protocol Plug-in (protocol-http)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - XML
>> Response
>> Writer Plug-in (response-xml)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
>> Normalizer (urlnormalizer-regex)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - OPIC
>> Scoring
>> Plug-in (scoring-opic)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - CyberNeko
>> HTML Parser (lib-nekohtml)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Anchor
>> Indexing Filter (index-anchor)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository -
>> JavaScript
>> Parser (parse-js)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - URL Query
>> Filter (query-url)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
>> Filter Framework (lib-regex-filter)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JSON
>> Response Writer Plug-in (response-json)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Registered
>> Extension-Points:
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Summarizer (org.apache.nutch.searcher.Summarizer)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Protocol (org.apache.nutch.protocol.Protocol)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Field
>> Filter (org.apache.nutch.indexer.field.FieldFilter)
>> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - HTML
>> Parse
>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch
>> Query
>> Filter (org.apache.nutch.searcher.QueryFilter)
>> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch
>> Search
>> Results Response Writer
>> (org.apache.nutch.searcher.response.ResponseWriter)
>>
>>
>> MilleBii wrote:
>> >
>> > Is there more information in logs/hadoop file ?
>> >
>> > What is your plug-in list ?
>> >
>> > 2009/9/2 zo tiger <zo...@hotmail.com>
>> >
>> >>
>> >> Thank you for your reply.
>> >>
>> >> In urls directory(exactly /nutch/search/urls) , there is a file
>> >> urllist.txt.
>> >>
>> >> content is as following.
>> >>
>> >> http://lucene.apache.org
>> >>
>> >> I don't understand why nutch can not fetch any url.
>> >>
>> >>
>> >> Paul Tomblin wrote:
>> >> >
>> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com>
>> wrote:
>> >> >>
>> >> >
>> >> >> At last i ran bin/nutch crawl command but it gives
>> >> >>
>> >> >> No urls to fetch check your filter and seed list error
>> >> >>
>> >> >> I am sure there is no problem in crawl-url filter and other
>> >> configuration
>> >> >> xml files
>> >> >>
>> >> >> İs anyone know any possible problem????
>> >> >>
>> >> >
>> >> > What's in your url directory?
>> >> >
>> >> >
>> >> > --
>> >> > http://www.linkedin.com/in/paultomblin
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> >> Sent from the Nutch - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > -MilleBii-
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>
>
> --
> -MilleBii-
>
>
--
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25328368.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Help me, No urls to fetch.
Posted by zo tiger <zo...@hotmail.com>.
Oh, i resolved it. Nutch is runned. Great.
I forgot copy all conf file to other slave nodes.
I only setted config files on the master node but not all slave nodes.
thanks for help of Paul Tomblin , MilleBii and 皮皮.
Very thank you.
MilleBii wrote:
>
> Obviously you've checked crawl-filter.txt rules.
> Beware there is a nasty thing that can happen : make sure there is a
> direct
> CR/LF at the end of the rules, I had recently a problem because some
> "invisible" spaces where following one rule and therefore this rule was
> never matching... took me a while to figure out.
>
>
> 2009/9/7 zo tiger <zo...@hotmail.com>
>
>>
>> This is my hadoop.log file's contents
>>
>>
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - HTTP
>> Framework (lib-http)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Text
>> Parse
>> Plug-in (parse-text)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository -
>> Pass-through
>> URL Normalizer (urlnormalizer-pass)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
>> Filter (urlfilter-regex)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Http
>> Protocol Plug-in (protocol-http)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - XML
>> Response
>> Writer Plug-in (response-xml)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
>> Normalizer (urlnormalizer-regex)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - OPIC
>> Scoring
>> Plug-in (scoring-opic)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - CyberNeko
>> HTML Parser (lib-nekohtml)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Anchor
>> Indexing Filter (index-anchor)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository -
>> JavaScript
>> Parser (parse-js)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - URL Query
>> Filter (query-url)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
>> Filter Framework (lib-regex-filter)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JSON
>> Response Writer Plug-in (response-json)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Registered
>> Extension-Points:
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Summarizer (org.apache.nutch.searcher.Summarizer)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Protocol (org.apache.nutch.protocol.Protocol)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
>> Field
>> Filter (org.apache.nutch.indexer.field.FieldFilter)
>> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - HTML
>> Parse
>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch
>> Query
>> Filter (org.apache.nutch.searcher.QueryFilter)
>> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch
>> Search
>> Results Response Writer
>> (org.apache.nutch.searcher.response.ResponseWriter)
>>
>>
>> MilleBii wrote:
>> >
>> > Is there more information in logs/hadoop file ?
>> >
>> > What is your plug-in list ?
>> >
>> > 2009/9/2 zo tiger <zo...@hotmail.com>
>> >
>> >>
>> >> Thank you for your reply.
>> >>
>> >> In urls directory(exactly /nutch/search/urls) , there is a file
>> >> urllist.txt.
>> >>
>> >> content is as following.
>> >>
>> >> http://lucene.apache.org
>> >>
>> >> I don't understand why nutch can not fetch any url.
>> >>
>> >>
>> >> Paul Tomblin wrote:
>> >> >
>> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com>
>> wrote:
>> >> >>
>> >> >
>> >> >> At last i ran bin/nutch crawl command but it gives
>> >> >>
>> >> >> No urls to fetch check your filter and seed list error
>> >> >>
>> >> >> I am sure there is no problem in crawl-url filter and other
>> >> configuration
>> >> >> xml files
>> >> >>
>> >> >> İs anyone know any possible problem????
>> >> >>
>> >> >
>> >> > What's in your url directory?
>> >> >
>> >> >
>> >> > --
>> >> > http://www.linkedin.com/in/paultomblin
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> >> Sent from the Nutch - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > -MilleBii-
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>
>
> --
> -MilleBii-
>
>
--
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25328368.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Help me, No urls to fetch.
Posted by MilleBii <mi...@gmail.com>.
Obviously you've checked crawl-filter.txt rules.
Beware there is a nasty thing that can happen : make sure there is a direct
CR/LF at the end of the rules, I had recently a problem because some
"invisible" spaces where following one rule and therefore this rule was
never matching... took me a while to figure out.
2009/9/7 zo tiger <zo...@hotmail.com>
>
> This is my hadoop.log file's contents
>
>
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - HTTP
> Framework (lib-http)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Text Parse
> Plug-in (parse-text)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository -
> Pass-through
> URL Normalizer (urlnormalizer-pass)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
> Filter (urlfilter-regex)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Http
> Protocol Plug-in (protocol-http)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - XML
> Response
> Writer Plug-in (response-xml)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
> Normalizer (urlnormalizer-regex)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - OPIC
> Scoring
> Plug-in (scoring-opic)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - CyberNeko
> HTML Parser (lib-nekohtml)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Anchor
> Indexing Filter (index-anchor)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JavaScript
> Parser (parse-js)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - URL Query
> Filter (query-url)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
> Filter Framework (lib-regex-filter)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JSON
> Response Writer Plug-in (response-json)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Registered
> Extension-Points:
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
> Protocol (org.apache.nutch.protocol.Protocol)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
> 2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch Field
> Filter (org.apache.nutch.indexer.field.FieldFilter)
> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - HTML Parse
> Filter (org.apache.nutch.parse.HtmlParseFilter)
> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch Query
> Filter (org.apache.nutch.searcher.QueryFilter)
> 2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch
> Search
> Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter)
>
>
> MilleBii wrote:
> >
> > Is there more information in logs/hadoop file ?
> >
> > What is your plug-in list ?
> >
> > 2009/9/2 zo tiger <zo...@hotmail.com>
> >
> >>
> >> Thank you for your reply.
> >>
> >> In urls directory(exactly /nutch/search/urls) , there is a file
> >> urllist.txt.
> >>
> >> content is as following.
> >>
> >> http://lucene.apache.org
> >>
> >> I don't understand why nutch can not fetch any url.
> >>
> >>
> >> Paul Tomblin wrote:
> >> >
> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
> >> >>
> >> >
> >> >> At last i ran bin/nutch crawl command but it gives
> >> >>
> >> >> No urls to fetch check your filter and seed list error
> >> >>
> >> >> I am sure there is no problem in crawl-url filter and other
> >> configuration
> >> >> xml files
> >> >>
> >> >> İs anyone know any possible problem????
> >> >>
> >> >
> >> > What's in your url directory?
> >> >
> >> >
> >> > --
> >> > http://www.linkedin.com/in/paultomblin
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > -MilleBii-
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
--
-MilleBii-
Re: Help me, No urls to fetch.
Posted by zo tiger <zo...@hotmail.com>.
This is my hadoop.log file's contents
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - HTTP
Framework (lib-http)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Text Parse
Plug-in (parse-text)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Pass-through
URL Normalizer (urlnormalizer-pass)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
Filter (urlfilter-regex)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Http
Protocol Plug-in (protocol-http)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - XML Response
Writer Plug-in (response-xml)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
Normalizer (urlnormalizer-regex)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - OPIC Scoring
Plug-in (scoring-opic)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - CyberNeko
HTML Parser (lib-nekohtml)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Anchor
Indexing Filter (index-anchor)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JavaScript
Parser (parse-js)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - URL Query
Filter (query-url)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Regex URL
Filter Framework (lib-regex-filter)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - JSON
Response Writer Plug-in (response-json)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Registered
Extension-Points:
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
Summarizer (org.apache.nutch.searcher.Summarizer)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
Protocol (org.apache.nutch.protocol.Protocol)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch
Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2009-09-07 03:32:58,137 INFO plugin.PluginRepository - Nutch Field
Filter (org.apache.nutch.indexer.field.FieldFilter)
2009-09-07 03:32:58,138 INFO plugin.PluginRepository - HTML Parse
Filter (org.apache.nutch.parse.HtmlParseFilter)
2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch Query
Filter (org.apache.nutch.searcher.QueryFilter)
2009-09-07 03:32:58,138 INFO plugin.PluginRepository - Nutch Search
Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter)
MilleBii wrote:
>
> Is there more information in logs/hadoop file ?
>
> What is your plug-in list ?
>
> 2009/9/2 zo tiger <zo...@hotmail.com>
>
>>
>> Thank you for your reply.
>>
>> In urls directory(exactly /nutch/search/urls) , there is a file
>> urllist.txt.
>>
>> content is as following.
>>
>> http://lucene.apache.org
>>
>> I don't understand why nutch can not fetch any url.
>>
>>
>> Paul Tomblin wrote:
>> >
>> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>> >>
>> >
>> >> At last i ran bin/nutch crawl command but it gives
>> >>
>> >> No urls to fetch check your filter and seed list error
>> >>
>> >> I am sure there is no problem in crawl-url filter and other
>> configuration
>> >> xml files
>> >>
>> >> İs anyone know any possible problem????
>> >>
>> >
>> > What's in your url directory?
>> >
>> >
>> > --
>> > http://www.linkedin.com/in/paultomblin
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>
>
> --
> -MilleBii-
>
>
--
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Help me, No urls to fetch.
Posted by MilleBii <mi...@gmail.com>.
Is there more information in logs/hadoop file ?
What is your plug-in list ?
2009/9/2 zo tiger <zo...@hotmail.com>
>
> Thank you for your reply.
>
> In urls directory(exactly /nutch/search/urls) , there is a file
> urllist.txt.
>
> content is as following.
>
> http://lucene.apache.org
>
> I don't understand why nutch can not fetch any url.
>
>
> Paul Tomblin wrote:
> >
> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
> >>
> >
> >> At last i ran bin/nutch crawl command but it gives
> >>
> >> No urls to fetch check your filter and seed list error
> >>
> >> I am sure there is no problem in crawl-url filter and other
> configuration
> >> xml files
> >>
> >> İs anyone know any possible problem????
> >>
> >
> > What's in your url directory?
> >
> >
> > --
> > http://www.linkedin.com/in/paultomblin
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
--
-MilleBii-
Re: Help me, No urls to fetch.
Posted by zo tiger <zo...@hotmail.com>.
Thank you for your reply.
In urls directory(exactly /nutch/search/urls) , there is a file urllist.txt.
content is as following.
http://lucene.apache.org
I don't understand why nutch can not fetch any url.
Paul Tomblin wrote:
>
> On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>>
>
>> At last i ran bin/nutch crawl command but it gives
>>
>> No urls to fetch check your filter and seed list error
>>
>> I am sure there is no problem in crawl-url filter and other configuration
>> xml files
>>
>> İs anyone know any possible problem????
>>
>
> What's in your url directory?
>
>
> --
> http://www.linkedin.com/in/paultomblin
>
>
--
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Help me, No urls to fetch.
Posted by Paul Tomblin <pt...@xcski.com>.
On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>
> At last i ran bin/nutch crawl command but it gives
>
> No urls to fetch check your filter and seed list error
>
> I am sure there is no problem in crawl-url filter and other configuration
> xml files
>
> İs anyone know any possible problem????
>
What's in your url directory?
--
http://www.linkedin.com/in/paultomblin