You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by zo tiger <zo...@hotmail.com> on 2009/09/02 12:36:15 UTC

Help me, No urls to fetch.

Hi,

I have installed nutch1.0 and hadoop1.9 successfully.

There is no error.

I followed a tutorial  http://wiki.apache.org/nutch/NutchHadoopTutorial
http://wiki.apache.org/nutch/NutchHadoopTutorial 

I used three multinode, one is master node and other two are slave nodes.

But I just do crawled , no data showed.

$ bin/nutch crawl urls -dir crawled -depth 3
crawl started in: crawled
rootUrlDir = urls
threads = 10
depth = 3
Injector: starting
Injector: crawlDb: crawled/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawled/segments/20090902102133
Generator: filtering: true
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawled

At last i ran bin/nutch crawl command but it gives

No urls to fetch check your filter and seed list error 

I am sure there is no problem in crawl-url filter and other configuration
xml files

İs anyone know any possible problem???? 

help me...
-- 
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255142.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Help me, No urls to fetch.

Posted by zo tiger <zo...@hotmail.com>.
Hi , 皮皮

how to check that the time clocks of namenode and datanode is synchronized.

I checked all timezone of my all nodes. They were all the same.

please , help me 皮皮.


皮皮 wrote:
> 
> check the time clocks of namenode and datanode  is synchronized.
> 
> 2009/9/3 MilleBii <mi...@gmail.com>
> 
>> Is there more information in logs/hadoop file ?
>>
>> What is your plug-in list ?
>>
>> 2009/9/2 zo tiger <zo...@hotmail.com>
>>
>> >
>> > Thank you for your reply.
>> >
>> > In urls directory(exactly /nutch/search/urls) , there is a file
>> > urllist.txt.
>> >
>> > content is as following.
>> >
>> >      http://lucene.apache.org
>> >
>> > I don't understand why nutch can not fetch any url.
>> >
>> >
>> > Paul Tomblin wrote:
>> > >
>> > > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>> > >>
>> > >
>> > >> At last i ran bin/nutch crawl command but it gives
>> > >>
>> > >> No urls to fetch check your filter and seed list error
>> > >>
>> > >> I am sure there is no problem in crawl-url filter and other
>> > configuration
>> > >> xml files
>> > >>
>> > >> İs anyone know any possible problem????
>> > >>
>> > >
>> > > What's in your url directory?
>> > >
>> > >
>> > > --
>> > > http://www.linkedin.com/in/paultomblin
>> > >
>> > >
>> >
>> > --
>> > View this message in context:
>> >
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> > Sent from the Nutch - User mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>> --
>> -MilleBii-
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324761.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Help me, No urls to fetch.

Posted by 皮皮 <pi...@gmail.com>.
check the time clocks of namenode and datanode  is synchronized.

2009/9/3 MilleBii <mi...@gmail.com>

> Is there more information in logs/hadoop file ?
>
> What is your plug-in list ?
>
> 2009/9/2 zo tiger <zo...@hotmail.com>
>
> >
> > Thank you for your reply.
> >
> > In urls directory(exactly /nutch/search/urls) , there is a file
> > urllist.txt.
> >
> > content is as following.
> >
> >      http://lucene.apache.org
> >
> > I don't understand why nutch can not fetch any url.
> >
> >
> > Paul Tomblin wrote:
> > >
> > > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
> > >>
> > >
> > >> At last i ran bin/nutch crawl command but it gives
> > >>
> > >> No urls to fetch check your filter and seed list error
> > >>
> > >> I am sure there is no problem in crawl-url filter and other
> > configuration
> > >> xml files
> > >>
> > >> İs anyone know any possible problem????
> > >>
> > >
> > > What's in your url directory?
> > >
> > >
> > > --
> > > http://www.linkedin.com/in/paultomblin
> > >
> > >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
> >
>
>
> --
> -MilleBii-
>

Re: Help me, No urls to fetch.

Posted by Futebol DotInfo <fu...@yahoo.com>.
unsubscribe

--- On Mon, 9/7/09, zo tiger <zo...@hotmail.com> wrote:


From: zo tiger <zo...@hotmail.com>
Subject: Re: Help me, No urls to fetch.
To: nutch-user@lucene.apache.org
Date: Monday, September 7, 2009, 3:31 AM



Oh, i resolved it. Nutch is runned. Great.

I forgot copy all conf file to other slave nodes.

I only setted config files on the master node but not all slave nodes.

thanks for help of Paul Tomblin , MilleBii and 皮皮.

Very thank you.


MilleBii wrote:
> 
> Obviously you've checked crawl-filter.txt rules.
> Beware there is a nasty thing that can happen : make sure there is a
> direct
> CR/LF at the end of the rules, I had recently a problem because some
> "invisible" spaces where following one rule and therefore this rule was
> never matching... took me a while to figure out.
> 
> 
> 2009/9/7 zo tiger <zo...@hotmail.com>
> 
>>
>> This is my hadoop.log file's contents
>>
>>
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         HTTP
>> Framework (lib-http)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Text
>> Parse
>> Plug-in (parse-text)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -
>> Pass-through
>> URL Normalizer (urlnormalizer-pass)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Filter (urlfilter-regex)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Http
>> Protocol Plug-in (protocol-http)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         XML
>> Response
>> Writer Plug-in (response-xml)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Normalizer (urlnormalizer-regex)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         OPIC
>> Scoring
>> Plug-in (scoring-opic)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         CyberNeko
>> HTML Parser (lib-nekohtml)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Anchor
>> Indexing Filter (index-anchor)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -        
>> JavaScript
>> Parser (parse-js)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         URL Query
>> Filter (query-url)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Filter Framework (lib-regex-filter)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         JSON
>> Response Writer Plug-in (response-json)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository - Registered
>> Extension-Points:
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Summarizer (org.apache.nutch.searcher.Summarizer)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Protocol (org.apache.nutch.protocol.Protocol)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Field
>> Filter (org.apache.nutch.indexer.field.FieldFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         HTML
>> Parse
>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch
>> Query
>> Filter (org.apache.nutch.searcher.QueryFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch
>> Search
>> Results Response Writer
>> (org.apache.nutch.searcher.response.ResponseWriter)
>>
>>
>> MilleBii wrote:
>> >
>> > Is there more information in logs/hadoop file ?
>> >
>> > What is your plug-in list ?
>> >
>> > 2009/9/2 zo tiger <zo...@hotmail.com>
>> >
>> >>
>> >> Thank you for your reply.
>> >>
>> >> In urls directory(exactly /nutch/search/urls) , there is a file
>> >> urllist.txt.
>> >>
>> >> content is as following.
>> >>
>> >>      http://lucene.apache.org
>> >>
>> >> I don't understand why nutch can not fetch any url.
>> >>
>> >>
>> >> Paul Tomblin wrote:
>> >> >
>> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com>
>> wrote:
>> >> >>
>> >> >
>> >> >> At last i ran bin/nutch crawl command but it gives
>> >> >>
>> >> >> No urls to fetch check your filter and seed list error
>> >> >>
>> >> >> I am sure there is no problem in crawl-url filter and other
>> >> configuration
>> >> >> xml files
>> >> >>
>> >> >> İs anyone know any possible problem????
>> >> >>
>> >> >
>> >> > What's in your url directory?
>> >> >
>> >> >
>> >> > --
>> >> > http://www.linkedin.com/in/paultomblin
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> >> Sent from the Nutch - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > -MilleBii-
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> -MilleBii-
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25328368.html
Sent from the Nutch - User mailing list archive at Nabble.com.




      

Re: Help me, No urls to fetch.

Posted by zo tiger <zo...@hotmail.com>.
Oh, i resolved it. Nutch is runned. Great.

I forgot copy all conf file to other slave nodes.

I only setted config files on the master node but not all slave nodes.

thanks for help of Paul Tomblin , MilleBii and 皮皮.

Very thank you.


MilleBii wrote:
> 
> Obviously you've checked crawl-filter.txt rules.
> Beware there is a nasty thing that can happen : make sure there is a
> direct
> CR/LF at the end of the rules, I had recently a problem because some
> "invisible" spaces where following one rule and therefore this rule was
> never matching... took me a while to figure out.
> 
> 
> 2009/9/7 zo tiger <zo...@hotmail.com>
> 
>>
>> This is my hadoop.log file's contents
>>
>>
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         HTTP
>> Framework (lib-http)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Text
>> Parse
>> Plug-in (parse-text)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -
>> Pass-through
>> URL Normalizer (urlnormalizer-pass)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Filter (urlfilter-regex)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Http
>> Protocol Plug-in (protocol-http)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         XML
>> Response
>> Writer Plug-in (response-xml)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Normalizer (urlnormalizer-regex)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         OPIC
>> Scoring
>> Plug-in (scoring-opic)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         CyberNeko
>> HTML Parser (lib-nekohtml)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Anchor
>> Indexing Filter (index-anchor)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -        
>> JavaScript
>> Parser (parse-js)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         URL Query
>> Filter (query-url)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Filter Framework (lib-regex-filter)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         JSON
>> Response Writer Plug-in (response-json)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository - Registered
>> Extension-Points:
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Summarizer (org.apache.nutch.searcher.Summarizer)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Protocol (org.apache.nutch.protocol.Protocol)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Field
>> Filter (org.apache.nutch.indexer.field.FieldFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         HTML
>> Parse
>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch
>> Query
>> Filter (org.apache.nutch.searcher.QueryFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch
>> Search
>> Results Response Writer
>> (org.apache.nutch.searcher.response.ResponseWriter)
>>
>>
>> MilleBii wrote:
>> >
>> > Is there more information in logs/hadoop file ?
>> >
>> > What is your plug-in list ?
>> >
>> > 2009/9/2 zo tiger <zo...@hotmail.com>
>> >
>> >>
>> >> Thank you for your reply.
>> >>
>> >> In urls directory(exactly /nutch/search/urls) , there is a file
>> >> urllist.txt.
>> >>
>> >> content is as following.
>> >>
>> >>      http://lucene.apache.org
>> >>
>> >> I don't understand why nutch can not fetch any url.
>> >>
>> >>
>> >> Paul Tomblin wrote:
>> >> >
>> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com>
>> wrote:
>> >> >>
>> >> >
>> >> >> At last i ran bin/nutch crawl command but it gives
>> >> >>
>> >> >> No urls to fetch check your filter and seed list error
>> >> >>
>> >> >> I am sure there is no problem in crawl-url filter and other
>> >> configuration
>> >> >> xml files
>> >> >>
>> >> >> İs anyone know any possible problem????
>> >> >>
>> >> >
>> >> > What's in your url directory?
>> >> >
>> >> >
>> >> > --
>> >> > http://www.linkedin.com/in/paultomblin
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> >> Sent from the Nutch - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > -MilleBii-
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> -MilleBii-
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25328368.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Help me, No urls to fetch.

Posted by MilleBii <mi...@gmail.com>.
Obviously you've checked crawl-filter.txt rules.
Beware there is a nasty thing that can happen : make sure there is a direct
CR/LF at the end of the rules, I had recently a problem because some
"invisible" spaces where following one rule and therefore this rule was
never matching... took me a while to figure out.


2009/9/7 zo tiger <zo...@hotmail.com>

>
> This is my hadoop.log file's contents
>
>
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         HTTP
> Framework (lib-http)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Text Parse
> Plug-in (parse-text)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -
> Pass-through
> URL Normalizer (urlnormalizer-pass)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
> Filter (urlfilter-regex)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Http
> Protocol Plug-in (protocol-http)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         XML
> Response
> Writer Plug-in (response-xml)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
> Normalizer (urlnormalizer-regex)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         OPIC
> Scoring
> Plug-in (scoring-opic)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         CyberNeko
> HTML Parser (lib-nekohtml)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Anchor
> Indexing Filter (index-anchor)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         JavaScript
> Parser (parse-js)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         URL Query
> Filter (query-url)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
> Filter Framework (lib-regex-filter)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         JSON
> Response Writer Plug-in (response-json)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
> Protocol (org.apache.nutch.protocol.Protocol)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch Field
> Filter (org.apache.nutch.indexer.field.FieldFilter)
> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         HTML Parse
> Filter (org.apache.nutch.parse.HtmlParseFilter)
> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch Query
> Filter (org.apache.nutch.searcher.QueryFilter)
> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch
> Search
> Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter)
>
>
> MilleBii wrote:
> >
> > Is there more information in logs/hadoop file ?
> >
> > What is your plug-in list ?
> >
> > 2009/9/2 zo tiger <zo...@hotmail.com>
> >
> >>
> >> Thank you for your reply.
> >>
> >> In urls directory(exactly /nutch/search/urls) , there is a file
> >> urllist.txt.
> >>
> >> content is as following.
> >>
> >>      http://lucene.apache.org
> >>
> >> I don't understand why nutch can not fetch any url.
> >>
> >>
> >> Paul Tomblin wrote:
> >> >
> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
> >> >>
> >> >
> >> >> At last i ran bin/nutch crawl command but it gives
> >> >>
> >> >> No urls to fetch check your filter and seed list error
> >> >>
> >> >> I am sure there is no problem in crawl-url filter and other
> >> configuration
> >> >> xml files
> >> >>
> >> >> İs anyone know any possible problem????
> >> >>
> >> >
> >> > What's in your url directory?
> >> >
> >> >
> >> > --
> >> > http://www.linkedin.com/in/paultomblin
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > -MilleBii-
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


-- 
-MilleBii-

Re: Help me, No urls to fetch.

Posted by zo tiger <zo...@hotmail.com>.
This is my hadoop.log file's contents


2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         HTTP
Framework (lib-http)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Text Parse
Plug-in (parse-text)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Pass-through
URL Normalizer (urlnormalizer-pass)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
Filter (urlfilter-regex)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Http
Protocol Plug-in (protocol-http)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         XML Response
Writer Plug-in (response-xml)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
Normalizer (urlnormalizer-regex)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         OPIC Scoring
Plug-in (scoring-opic)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         CyberNeko
HTML Parser (lib-nekohtml)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Anchor
Indexing Filter (index-anchor)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         JavaScript
Parser (parse-js)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         URL Query
Filter (query-url)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
Filter Framework (lib-regex-filter)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         JSON
Response Writer Plug-in (response-json)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository - Registered
Extension-Points:
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
Summarizer (org.apache.nutch.searcher.Summarizer)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
Protocol (org.apache.nutch.protocol.Protocol)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch Field
Filter (org.apache.nutch.indexer.field.FieldFilter)
2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         HTML Parse
Filter (org.apache.nutch.parse.HtmlParseFilter)
2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch Query
Filter (org.apache.nutch.searcher.QueryFilter)
2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch Search
Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter)


MilleBii wrote:
> 
> Is there more information in logs/hadoop file ?
> 
> What is your plug-in list ?
> 
> 2009/9/2 zo tiger <zo...@hotmail.com>
> 
>>
>> Thank you for your reply.
>>
>> In urls directory(exactly /nutch/search/urls) , there is a file
>> urllist.txt.
>>
>> content is as following.
>>
>>      http://lucene.apache.org
>>
>> I don't understand why nutch can not fetch any url.
>>
>>
>> Paul Tomblin wrote:
>> >
>> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>> >>
>> >
>> >> At last i ran bin/nutch crawl command but it gives
>> >>
>> >> No urls to fetch check your filter and seed list error
>> >>
>> >> I am sure there is no problem in crawl-url filter and other
>> configuration
>> >> xml files
>> >>
>> >> İs anyone know any possible problem????
>> >>
>> >
>> > What's in your url directory?
>> >
>> >
>> > --
>> > http://www.linkedin.com/in/paultomblin
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> -MilleBii-
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Help me, No urls to fetch.

Posted by MilleBii <mi...@gmail.com>.
Is there more information in logs/hadoop file ?

What is your plug-in list ?

2009/9/2 zo tiger <zo...@hotmail.com>

>
> Thank you for your reply.
>
> In urls directory(exactly /nutch/search/urls) , there is a file
> urllist.txt.
>
> content is as following.
>
>      http://lucene.apache.org
>
> I don't understand why nutch can not fetch any url.
>
>
> Paul Tomblin wrote:
> >
> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
> >>
> >
> >> At last i ran bin/nutch crawl command but it gives
> >>
> >> No urls to fetch check your filter and seed list error
> >>
> >> I am sure there is no problem in crawl-url filter and other
> configuration
> >> xml files
> >>
> >> İs anyone know any possible problem????
> >>
> >
> > What's in your url directory?
> >
> >
> > --
> > http://www.linkedin.com/in/paultomblin
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


-- 
-MilleBii-

Re: Help me, No urls to fetch.

Posted by zo tiger <zo...@hotmail.com>.
Thank you for your reply.

In urls directory(exactly /nutch/search/urls) , there is a file urllist.txt.

content is as following. 
    
      http://lucene.apache.org

I don't understand why nutch can not fetch any url.


Paul Tomblin wrote:
> 
> On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>>
> 
>> At last i ran bin/nutch crawl command but it gives
>>
>> No urls to fetch check your filter and seed list error
>>
>> I am sure there is no problem in crawl-url filter and other configuration
>> xml files
>>
>> İs anyone know any possible problem????
>>
> 
> What's in your url directory?
> 
> 
> -- 
> http://www.linkedin.com/in/paultomblin
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Help me, No urls to fetch.

Posted by Paul Tomblin <pt...@xcski.com>.
On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo...@hotmail.com> wrote:
>

> At last i ran bin/nutch crawl command but it gives
>
> No urls to fetch check your filter and seed list error
>
> I am sure there is no problem in crawl-url filter and other configuration
> xml files
>
> İs anyone know any possible problem????
>

What's in your url directory?


-- 
http://www.linkedin.com/in/paultomblin