You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Sethi, Parampreet" <pa...@teamaol.com> on 2011/07/11 23:50:54 UTC
Nutch Novice help
Hi All,
Sorry for such a naïve question, I downloaded nutch 1.3 binary today and trying to set it up as mentioned in Tutorial at http://wiki.apache.org/nutch/NutchTutorial
How ever I am not able to find crawl-urlfilter.txt inside conf directory. Is there any other place where I should look for this file?
Thanks
Param
Re: Nutch Novice help
Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Please see this tutorial [1] for up to date 1.3 tutorial on wiki.
Please try it out and take on Markus' points regarding Nutch trunk as the
problems you are experiencing are usual with Trunk as it stands.
[1] http://wiki.apache.org/nutch/RunningNutchAndSolr
On Mon, Jul 11, 2011 at 10:50 PM, Sethi, Parampreet <
parampreet.sethi@teamaol.com> wrote:
> Hi All,
>
> Sorry for such a naïve question, I downloaded nutch 1.3 binary today and
> trying to set it up as mentioned in Tutorial at
> http://wiki.apache.org/nutch/NutchTutorial
>
> How ever I am not able to find crawl-urlfilter.txt inside conf directory.
> Is there any other place where I should look for this file?
>
> Thanks
> Param
>
--
*Lewis*
Re: Nutch Novice help
Posted by lewis john mcgibbney <le...@gmail.com>.
Have a good look under your hadoop.log which should be created when you
initiate a crawl with Nutch, this will be extremely valuable. In addition
there are various properties in nutch-site.xml which can be set to make
logging more verbose at various levels e.g. fetching
In order to root out various errors you will need to get used to looking
through yours logs. It is also advised to try and include as much log data
as possible when posting queries on the user list. You can find more
information about this here as it will greatly help you get accurate and
detailed help from the list in the future. Please have a look here [1].
I would advise you to delete all crawled data and begin a fresh crawl, this
way you can try the above, looking at your logs, before we try to root out
where exactly the errors are stemming from.
HTH
[1]
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer#Becoming_a_Nutch_Developer
On Tue, Jul 12, 2011 at 7:31 PM, Sethi, Parampreet <
parampreet.sethi@teamaol.com> wrote:
> Hey Lewis, Thanks for the quick reply. Looks like I am tangled now =)
>
> I tried the tutorial mentioned at
> http://wiki.apache.org/nutch/RunningNutchAndSolr
>
> For me step 3 is not working. Two of the directories are not created (which
> should be there after step 3 is complete.)
>
> crawl/crawldb - Created
> crawl/linkdb - not created
> crawl/segments - not created
>
> Also, I changed the url to http://nutch.apache.org, but still same log
> message "Generator: 0 records selected for fetching, exiting ..."
>
> Looks like I am missing some key step =(.
>
> -param
>
> On 7/12/11 1:37 PM, "lewis john mcgibbney" <le...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I think you are maybe getting tangled here. Please see the following
> > tutorial for Nutch 1.3 [1]
> >
> > Please also note that the URL you provided is the old Nutch site and now
> > redirects to http://nutch.apache.org
> >
> > [1] http://wiki.apache.org/nutch/RunningNutchAndSolr
> >
> > On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet <
> > parampreet.sethi@teamaol.com> wrote:
> >
> >> Thanks for updating the tutorial. I tried my setup, the crawl command is
> >> running. But none of the pages are being crawled.
> >> I created urls directory inside local folder and added new file nutch
> with
> >> url in the same as mentioned in tutorial.
> >>
> >> (I also tried file named urls inside nutch/runtime/local diretcory. The
> >> contents of urls file is http://lucene.apache.org/nutch/ )
> >>
> >> Here's the log:
> >>
> >> us137390:local parampreetsethi$ bin/nutch crawl urls -dir crawl -depth
> 3
> >> -topN 50
> >> solrUrl is not set, indexing will be skipped...
> >> crawl started in: crawl
> >> rootUrlDir = urls
> >> threads = 10
> >> depth = 3
> >> solrUrl=null
> >> topN = 50
> >> Injector: starting at 2011-07-12 12:22:12
> >> Injector: crawlDb: crawl/crawldb
> >> Injector: urlDir: urls
> >> Injector: Converting injected urls to crawl db entries.
> >> Injector: Merging injected urls into crawl db.
> >> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> >> Generator: starting at 2011-07-12 12:22:15
> >> Generator: Selecting best-scoring urls due for fetch.
> >> Generator: filtering: true
> >> Generator: normalizing: true
> >> Generator: topN: 50
> >> Generator: jobtracker is 'local', generating exactly one partition.
> >> Generator: 0 records selected for fetching, exiting ...
> >> Stopping at depth=0 - no more URLs to fetch.
> >> No URLs to fetch - check your seed list and URL filters.
> >> crawl finished: crawl
> >>
> >>
> >> Please help.
> >>
> >> Thanks
> >> Param
> >>
> >> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com>
> wrote:
> >>
> >>> On 12 July 2011 10:30, Julien Nioche <li...@gmail.com>
> >> wrote:
> >>>
> >>>>
> >>>>
> >>>>>>> There seems to be no crawl-urlfilter file indeed. Don't know why
> it's
> >>>>>>> gone since
> >>>>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>>>> release:
> >> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>>>>
> >>>>>> Crawl-urlfilter has been removed purposefully as it did not add
> >>>>> anything
> >>>>>> to the other url filters (automaton | regex) in terms of
> >> functionality.
> >>>>> By
> >>>>>> default the urlfilters contain (+.) which IIRC was what the
> >>>>>> Crawl-urlfilter used to do.
> >>>>>>
> >>>>>
> >>>>> That's reasonable. But now news users are unaware and don't know what
> >> to
> >>>>> do
> >>>>> with this error message.
> >>>>>
> >>>>
> >>>> Yep, the tutorial needs updating indeed
> >>>>
> >>>
> >>> done
> >>>
> >>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>>>> Thanks for a quick reply.
> >>>>>>>>
> >>>>>>>> I searched in the nutch directory but still do not see that file
> :(.
> >>>>>>>
> >>>>>>> Here's
> >>>>>>>
> >>>>>>>> complete file list inside runtime/local/conf directory.
> >>>>>>>>
> >>>>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>>>>
> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>>>> automaton-urlfilter.txt domain-urlfilter.txt
> nutch-default.xml
> >>>>>>>> prefix-urlfilter.txt solrindex-mapping.xml
> >>>>>>>> configuration.xsl httpclient-auth.xml nutch-site.xml
> >>>>>>>> regex-normalize.xml subcollections.xml
> >>>>>>>> domain-suffixes.xml log4j.properties parse-plugins.dtd
> >>>>>>>> regex-urlfilter.txt suffix-urlfilter.txt
> >>>>>>>> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
> >>>>>>>> schema.xml tika-mimetypes.xml
> >>>>>>>>
> >>>>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>>>>
> >>>>>>> repository,
> >>>>>>>
> >>>>>>>> but could not build it. I was getting following error:
> >>>>>>>>
> >>>>>>>> resolve-default:
> >>>>>>>
> >>>>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>>>> http://ant.apache.org/ivy/
> >>>>>>>
> >>>>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>>>>
> >>>>>>>>
> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>>>> [ivy:resolve]
> >>>>>>>> [ivy:resolve] :: problems summary ::
> >>>>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>>>> [ivy:resolve] module not found:
> >>>>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>>>> [ivy:resolve] ==== local: tried
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>>>> ng
> >>>>>>>
> >>>>>>>> / ivys/ivy.xml
> >>>>>>>> [ivy:resolve] -- artifact
> >>>>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>>>> ng
> >>>>>>>
> >>>>>>>> / jars/gora-core.jar
> >>>>>>>> [ivy:resolve] module not found:
> >>>>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>>>> [ivy:resolve] ==== local: tried
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>>>> g/
> >>>>>>>
> >>>>>>>> i vys/ivy.xml
> >>>>>>>> [ivy:resolve] -- artifact
> >>>>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>>>> g/
> >>>>>>>
> >>>>>>>> j ars/gora-sql.jar
> >>>>>>>> [ivy:resolve]
> ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> [ivy:resolve] :: UNRESOLVED DEPENDENCIES
> ::
> >>>>>>>> [ivy:resolve]
> ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating:
> >>>>> not
> >>>>>>>> found [ivy:resolve] ::
> >>>>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>>>> not found [ivy:resolve]
> >>>>>>>>
> >>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>>>>
> >>>>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE
> DETAILS
> >>>>>>>>
> >>>>>>>> BUILD FAILED
> >>>>>>>
> >>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>>>> impossible
> >>>>>>>
> >>>>>>>> to resolve dependencies:
> >>>>>>>> resolve failed - see output for details
> >>>>>>>>
> >>>>>>>> -param
> >>>>>>>>
> >>>>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jcraig@inforeverse.com
> >
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>>> Look down a little further for the
> >>>>>>>>>
> >>>>>>>>> or
> >>>>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>>>>
> >>>>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>>>>
> >>>>>>>>> Jerry E. Craig, Jr.
> >>>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> >>>>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>>>> To: user@nutch.apache.org
> >>>>>>>>> Subject: Nutch Novice help
> >>>>>>>>>
> >>>>>>>>> Hi All,
> >>>>>>>>>
> >>>>>>>>> Sorry for such a naïve question, I downloaded nutch 1.3 binary
> >>>>> today
> >>>>>>>
> >>>>>>> and
> >>>>>>>
> >>>>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>>>>
> >>>>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>>>>
> >>>>>>> directory.
> >>>>>>>
> >>>>>>>>> Is there any other place where I should look for this file?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Param
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> *
> >>>> *Open Source Solutions for Text Engineering
> >>>>
> >>>> http://digitalpebble.blogspot.com/
> >>>> http://www.digitalpebble.com
> >>>>
> >>>
> >>>
> >>
> >>
> >
>
>
--
*Lewis*
Re: Nutch Novice help
Posted by "Sethi, Parampreet" <pa...@teamaol.com>.
Hey Lewis, Thanks for the quick reply. Looks like I am tangled now =)
I tried the tutorial mentioned at
http://wiki.apache.org/nutch/RunningNutchAndSolr
For me step 3 is not working. Two of the directories are not created (which
should be there after step 3 is complete.)
crawl/crawldb - Created
crawl/linkdb - not created
crawl/segments - not created
Also, I changed the url to http://nutch.apache.org, but still same log
message "Generator: 0 records selected for fetching, exiting ..."
Looks like I am missing some key step =(.
-param
On 7/12/11 1:37 PM, "lewis john mcgibbney" <le...@gmail.com>
wrote:
> Hi,
>
> I think you are maybe getting tangled here. Please see the following
> tutorial for Nutch 1.3 [1]
>
> Please also note that the URL you provided is the old Nutch site and now
> redirects to http://nutch.apache.org
>
> [1] http://wiki.apache.org/nutch/RunningNutchAndSolr
>
> On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet <
> parampreet.sethi@teamaol.com> wrote:
>
>> Thanks for updating the tutorial. I tried my setup, the crawl command is
>> running. But none of the pages are being crawled.
>> I created urls directory inside local folder and added new file nutch with
>> url in the same as mentioned in tutorial.
>>
>> (I also tried file named urls inside nutch/runtime/local diretcory. The
>> contents of urls file is http://lucene.apache.org/nutch/ )
>>
>> Here's the log:
>>
>> us137390:local parampreetsethi$ bin/nutch crawl urls -dir crawl -depth 3
>> -topN 50
>> solrUrl is not set, indexing will be skipped...
>> crawl started in: crawl
>> rootUrlDir = urls
>> threads = 10
>> depth = 3
>> solrUrl=null
>> topN = 50
>> Injector: starting at 2011-07-12 12:22:12
>> Injector: crawlDb: crawl/crawldb
>> Injector: urlDir: urls
>> Injector: Converting injected urls to crawl db entries.
>> Injector: Merging injected urls into crawl db.
>> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
>> Generator: starting at 2011-07-12 12:22:15
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: true
>> Generator: normalizing: true
>> Generator: topN: 50
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: 0 records selected for fetching, exiting ...
>> Stopping at depth=0 - no more URLs to fetch.
>> No URLs to fetch - check your seed list and URL filters.
>> crawl finished: crawl
>>
>>
>> Please help.
>>
>> Thanks
>> Param
>>
>> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:
>>
>>> On 12 July 2011 10:30, Julien Nioche <li...@gmail.com>
>> wrote:
>>>
>>>>
>>>>
>>>>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
>>>>>>> gone since
>>>>>>> the crawl command is still there. You can find the file in the 1.2
>>>>>>> release:
>> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>>>>>>
>>>>>> Crawl-urlfilter has been removed purposefully as it did not add
>>>>> anything
>>>>>> to the other url filters (automaton | regex) in terms of
>> functionality.
>>>>> By
>>>>>> default the urlfilters contain (+.) which IIRC was what the
>>>>>> Crawl-urlfilter used to do.
>>>>>>
>>>>>
>>>>> That's reasonable. But now news users are unaware and don't know what
>> to
>>>>> do
>>>>> with this error message.
>>>>>
>>>>
>>>> Yep, the tutorial needs updating indeed
>>>>
>>>
>>> done
>>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>>>> Thanks for a quick reply.
>>>>>>>>
>>>>>>>> I searched in the nutch directory but still do not see that file :(.
>>>>>>>
>>>>>>> Here's
>>>>>>>
>>>>>>>> complete file list inside runtime/local/conf directory.
>>>>>>>>
>>>>>>>> us137390:conf parampreetsethi$ pwd
>>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
>>>>>>>> us137390:conf parampreetsethi$ ls -t
>>>>>>>> automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
>>>>>>>> prefix-urlfilter.txt solrindex-mapping.xml
>>>>>>>> configuration.xsl httpclient-auth.xml nutch-site.xml
>>>>>>>> regex-normalize.xml subcollections.xml
>>>>>>>> domain-suffixes.xml log4j.properties parse-plugins.dtd
>>>>>>>> regex-urlfilter.txt suffix-urlfilter.txt
>>>>>>>> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
>>>>>>>> schema.xml tika-mimetypes.xml
>>>>>>>>
>>>>>>>> By the way, I tried deploying the code by checking out from svn
>>>>>>>
>>>>>>> repository,
>>>>>>>
>>>>>>>> but could not build it. I was getting following error:
>>>>>>>>
>>>>>>>> resolve-default:
>>>>>>>
>>>>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
>>>>>>> http://ant.apache.org/ivy/
>>>>>>>
>>>>>>>> :: [ivy:resolve] :: loading settings :: file =
>>>>>>>>
>>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
>>>>>>>> [ivy:resolve]
>>>>>>>> [ivy:resolve] :: problems summary ::
>>>>>>>> [ivy:resolve] :::: WARNINGS
>>>>>>>> [ivy:resolve] module not found:
>>>>>>>> org.apache.gora#gora-core;0.2-incubating
>>>>>>>> [ivy:resolve] ==== local: tried
>>>>>>>> [ivy:resolve]
>>>>>>>
>>>>>>>
>>>>>
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>>>> ng
>>>>>>>
>>>>>>>> / ivys/ivy.xml
>>>>>>>> [ivy:resolve] -- artifact
>>>>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
>>>>>>>> [ivy:resolve]
>>>>>>>
>>>>>>>
>>>>>
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>>>> ng
>>>>>>>
>>>>>>>> / jars/gora-core.jar
>>>>>>>> [ivy:resolve] module not found:
>>>>>>>> org.apache.gora#gora-sql;0.2-incubating
>>>>>>>> [ivy:resolve] ==== local: tried
>>>>>>>> [ivy:resolve]
>>>>>>>
>>>>>>>
>>>>>
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>>>> g/
>>>>>>>
>>>>>>>> i vys/ivy.xml
>>>>>>>> [ivy:resolve] -- artifact
>>>>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
>>>>>>>> [ivy:resolve]
>>>>>>>
>>>>>>>
>>>>>
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>>>> g/
>>>>>>>
>>>>>>>> j ars/gora-sql.jar
>>>>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
>>>>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating:
>>>>> not
>>>>>>>> found [ivy:resolve] ::
>>>>> org.apache.gora#gora-sql;0.2-incubating:
>>>>>>>> not found [ivy:resolve]
>>>>>>>>
>>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
>>>>>>>>
>>>>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>>>>>>>
>>>>>>>> BUILD FAILED
>>>>>>>
>>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
>>>>>>> impossible
>>>>>>>
>>>>>>>> to resolve dependencies:
>>>>>>>> resolve failed - see output for details
>>>>>>>>
>>>>>>>> -param
>>>>>>>>
>>>>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>> Look down a little further for the
>>>>>>>>>
>>>>>>>>> or
>>>>>>>>> runtime/local/bin/nutch (version >= 1.3)
>>>>>>>>>
>>>>>>>>> If you download the bin then it's in the runtime directory.
>>>>>>>>>
>>>>>>>>> Jerry E. Craig, Jr.
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
>>>>>>>>> Sent: Monday, July 11, 2011 2:51 PM
>>>>>>>>> To: user@nutch.apache.org
>>>>>>>>> Subject: Nutch Novice help
>>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Sorry for such a naïve question, I downloaded nutch 1.3 binary
>>>>> today
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>> trying to set it up as mentioned in Tutorial at
>>>>>>>>> http://wiki.apache.org/nutch/NutchTutorial
>>>>>>>>>
>>>>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
>>>>>>>
>>>>>>> directory.
>>>>>>>
>>>>>>>>> Is there any other place where I should look for this file?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Param
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *
>>>> *Open Source Solutions for Text Engineering
>>>>
>>>> http://digitalpebble.blogspot.com/
>>>> http://www.digitalpebble.com
>>>>
>>>
>>>
>>
>>
>
Re: Nutch Novice help
Posted by lewis john mcgibbney <le...@gmail.com>.
Hi,
I think you are maybe getting tangled here. Please see the following
tutorial for Nutch 1.3 [1]
Please also note that the URL you provided is the old Nutch site and now
redirects to http://nutch.apache.org
[1] http://wiki.apache.org/nutch/RunningNutchAndSolr
On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet <
parampreet.sethi@teamaol.com> wrote:
> Thanks for updating the tutorial. I tried my setup, the crawl command is
> running. But none of the pages are being crawled.
> I created urls directory inside local folder and added new file nutch with
> url in the same as mentioned in tutorial.
>
> (I also tried file named urls inside nutch/runtime/local diretcory. The
> contents of urls file is http://lucene.apache.org/nutch/ )
>
> Here's the log:
>
> us137390:local parampreetsethi$ bin/nutch crawl urls -dir crawl -depth 3
> -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-12 12:22:12
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> Generator: starting at 2011-07-12 12:22:15
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
>
> Please help.
>
> Thanks
> Param
>
> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:
>
> > On 12 July 2011 10:30, Julien Nioche <li...@gmail.com>
> wrote:
> >
> >>
> >>
> >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
> >>>>> gone since
> >>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>> release:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>>
> >>>> Crawl-urlfilter has been removed purposefully as it did not add
> >>> anything
> >>>> to the other url filters (automaton | regex) in terms of
> functionality.
> >>> By
> >>>> default the urlfilters contain (+.) which IIRC was what the
> >>>> Crawl-urlfilter used to do.
> >>>>
> >>>
> >>> That's reasonable. But now news users are unaware and don't know what
> to
> >>> do
> >>> with this error message.
> >>>
> >>
> >> Yep, the tutorial needs updating indeed
> >>
> >
> > done
> >
> >
> >>
> >>
> >>
> >>>
> >>>>>> Thanks for a quick reply.
> >>>>>>
> >>>>>> I searched in the nutch directory but still do not see that file :(.
> >>>>>
> >>>>> Here's
> >>>>>
> >>>>>> complete file list inside runtime/local/conf directory.
> >>>>>>
> >>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>> automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
> >>>>>> prefix-urlfilter.txt solrindex-mapping.xml
> >>>>>> configuration.xsl httpclient-auth.xml nutch-site.xml
> >>>>>> regex-normalize.xml subcollections.xml
> >>>>>> domain-suffixes.xml log4j.properties parse-plugins.dtd
> >>>>>> regex-urlfilter.txt suffix-urlfilter.txt
> >>>>>> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
> >>>>>> schema.xml tika-mimetypes.xml
> >>>>>>
> >>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>>
> >>>>> repository,
> >>>>>
> >>>>>> but could not build it. I was getting following error:
> >>>>>>
> >>>>>> resolve-default:
> >>>>>
> >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>> http://ant.apache.org/ivy/
> >>>>>
> >>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>> [ivy:resolve]
> >>>>>> [ivy:resolve] :: problems summary ::
> >>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>> [ivy:resolve] module not found:
> >>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>> [ivy:resolve] ==== local: tried
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>> ng
> >>>>>
> >>>>>> / ivys/ivy.xml
> >>>>>> [ivy:resolve] -- artifact
> >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>> ng
> >>>>>
> >>>>>> / jars/gora-core.jar
> >>>>>> [ivy:resolve] module not found:
> >>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>> [ivy:resolve] ==== local: tried
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>> g/
> >>>>>
> >>>>>> i vys/ivy.xml
> >>>>>> [ivy:resolve] -- artifact
> >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>> g/
> >>>>>
> >>>>>> j ars/gora-sql.jar
> >>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
> >>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating:
> >>> not
> >>>>>> found [ivy:resolve] ::
> >>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>> not found [ivy:resolve]
> >>>>>>
> >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>>
> >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >>>>>>
> >>>>>> BUILD FAILED
> >>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>> impossible
> >>>>>
> >>>>>> to resolve dependencies:
> >>>>>> resolve failed - see output for details
> >>>>>>
> >>>>>> -param
> >>>>>>
> >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> >>>>>
> >>>>> wrote:
> >>>>>>> Look down a little further for the
> >>>>>>>
> >>>>>>> or
> >>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>>
> >>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>>
> >>>>>>> Jerry E. Craig, Jr.
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> >>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>> To: user@nutch.apache.org
> >>>>>>> Subject: Nutch Novice help
> >>>>>>>
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> Sorry for such a naïve question, I downloaded nutch 1.3 binary
> >>> today
> >>>>>
> >>>>> and
> >>>>>
> >>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>>
> >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>>
> >>>>> directory.
> >>>>>
> >>>>>>> Is there any other place where I should look for this file?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Param
> >>>
> >>
> >>
> >>
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> >>
> >
> >
>
>
--
*Lewis*
Re: Nutch Novice help
Posted by Markus Jelsma <ma...@openindex.io>.
No URLs to fetch - check your seed list and URL filters
The error is quite clear. You injected URL's that did not pass your url
filters. Check your url filters, likely crawl-urlfilter since you seem to use the
crawl command.
> Thanks for updating the tutorial. I tried my setup, the crawl command is
> running. But none of the pages are being crawled.
> I created urls directory inside local folder and added new file nutch with
> url in the same as mentioned in tutorial.
>
> (I also tried file named urls inside nutch/runtime/local diretcory. The
> contents of urls file is http://lucene.apache.org/nutch/ )
>
> Here's the log:
>
> us137390:local parampreetsethi$ bin/nutch crawl urls -dir crawl -depth 3
> -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-12 12:22:12
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> Generator: starting at 2011-07-12 12:22:15
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
>
> Please help.
>
> Thanks
> Param
>
> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:
> > On 12 July 2011 10:30, Julien Nioche <li...@gmail.com>
wrote:
> >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
> >>>>> gone since
> >>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>> release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>>
> >>>> Crawl-urlfilter has been removed purposefully as it did not add
> >>>
> >>> anything
> >>>
> >>>> to the other url filters (automaton | regex) in terms of
> >>>> functionality.
> >>>
> >>> By
> >>>
> >>>> default the urlfilters contain (+.) which IIRC was what the
> >>>> Crawl-urlfilter used to do.
> >>>
> >>> That's reasonable. But now news users are unaware and don't know what
> >>> to do
> >>> with this error message.
> >>
> >> Yep, the tutorial needs updating indeed
> >
> > done
> >
> >>>>>> Thanks for a quick reply.
> >>>>>>
> >>>>>> I searched in the nutch directory but still do not see that file :(.
> >>>>>
> >>>>> Here's
> >>>>>
> >>>>>> complete file list inside runtime/local/conf directory.
> >>>>>>
> >>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>> automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
> >>>>>> prefix-urlfilter.txt solrindex-mapping.xml
> >>>>>> configuration.xsl httpclient-auth.xml nutch-site.xml
> >>>>>> regex-normalize.xml subcollections.xml
> >>>>>> domain-suffixes.xml log4j.properties parse-plugins.dtd
> >>>>>> regex-urlfilter.txt suffix-urlfilter.txt
> >>>>>> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
> >>>>>> schema.xml tika-mimetypes.xml
> >>>>>>
> >>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>>
> >>>>> repository,
> >>>>>
> >>>>>> but could not build it. I was getting following error:
> >>>>>>
> >>>>>> resolve-default:
> >>>>>
> >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>> http://ant.apache.org/ivy/
> >>>>>
> >>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>> [ivy:resolve]
> >>>>>> [ivy:resolve] :: problems summary ::
> >>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>> [ivy:resolve] module not found:
> >>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>> [ivy:resolve] ==== local: tried
> >>>>>> [ivy:resolve]
> >>>
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incuba
> >>> ti
> >>>
> >>>>> ng
> >>>>>
> >>>>>> / ivys/ivy.xml
> >>>>>> [ivy:resolve] -- artifact
> >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>> [ivy:resolve]
> >>>
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incuba
> >>> ti
> >>>
> >>>>> ng
> >>>>>
> >>>>>> / jars/gora-core.jar
> >>>>>> [ivy:resolve] module not found:
> >>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>> [ivy:resolve] ==== local: tried
> >>>>>> [ivy:resolve]
> >>>
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubat
> >>> in
> >>>
> >>>>> g/
> >>>>>
> >>>>>> i vys/ivy.xml
> >>>>>> [ivy:resolve] -- artifact
> >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>> [ivy:resolve]
> >>>
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubat
> >>> in
> >>>
> >>>>> g/
> >>>>>
> >>>>>> j ars/gora-sql.jar
> >>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
> >>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> >>>
> >>>>>> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating:
> >>> not
> >>>
> >>>>>> found [ivy:resolve] ::
> >>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>> not found [ivy:resolve]
> >>>>>>
> >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>>
> >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >>>>>>
> >>>>>> BUILD FAILED
> >>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>> impossible
> >>>>>
> >>>>>> to resolve dependencies:
> >>>>>> resolve failed - see output for details
> >>>>>>
> >>>>>> -param
> >>>>>>
> >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> >>>>>
> >>>>> wrote:
> >>>>>>> Look down a little further for the
> >>>>>>>
> >>>>>>> or
> >>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>>
> >>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>>
> >>>>>>> Jerry E. Craig, Jr.
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> >>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>> To: user@nutch.apache.org
> >>>>>>> Subject: Nutch Novice help
> >>>>>>>
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> Sorry for such a naïve question, I downloaded nutch 1.3 binary
> >>>
> >>> today
> >>>
> >>>>> and
> >>>>>
> >>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>>
> >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>>
> >>>>> directory.
> >>>>>
> >>>>>>> Is there any other place where I should look for this file?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Param
> >>
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
Re: Nutch Novice help
Posted by "Sethi, Parampreet" <pa...@teamaol.com>.
Thanks for updating the tutorial. I tried my setup, the crawl command is
running. But none of the pages are being crawled.
I created urls directory inside local folder and added new file nutch with
url in the same as mentioned in tutorial.
(I also tried file named urls inside nutch/runtime/local diretcory. The
contents of urls file is http://lucene.apache.org/nutch/ )
Here's the log:
us137390:local parampreetsethi$ bin/nutch crawl urls -dir crawl -depth 3
-topN 50
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=null
topN = 50
Injector: starting at 2011-07-12 12:22:12
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
Generator: starting at 2011-07-12 12:22:15
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl
Please help.
Thanks
Param
On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:
> On 12 July 2011 10:30, Julien Nioche <li...@gmail.com> wrote:
>
>>
>>
>>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
>>>>> gone since
>>>>> the crawl command is still there. You can find the file in the 1.2
>>>>> release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>>>>
>>>> Crawl-urlfilter has been removed purposefully as it did not add
>>> anything
>>>> to the other url filters (automaton | regex) in terms of functionality.
>>> By
>>>> default the urlfilters contain (+.) which IIRC was what the
>>>> Crawl-urlfilter used to do.
>>>>
>>>
>>> That's reasonable. But now news users are unaware and don't know what to
>>> do
>>> with this error message.
>>>
>>
>> Yep, the tutorial needs updating indeed
>>
>
> done
>
>
>>
>>
>>
>>>
>>>>>> Thanks for a quick reply.
>>>>>>
>>>>>> I searched in the nutch directory but still do not see that file :(.
>>>>>
>>>>> Here's
>>>>>
>>>>>> complete file list inside runtime/local/conf directory.
>>>>>>
>>>>>> us137390:conf parampreetsethi$ pwd
>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
>>>>>> us137390:conf parampreetsethi$ ls -t
>>>>>> automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
>>>>>> prefix-urlfilter.txt solrindex-mapping.xml
>>>>>> configuration.xsl httpclient-auth.xml nutch-site.xml
>>>>>> regex-normalize.xml subcollections.xml
>>>>>> domain-suffixes.xml log4j.properties parse-plugins.dtd
>>>>>> regex-urlfilter.txt suffix-urlfilter.txt
>>>>>> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
>>>>>> schema.xml tika-mimetypes.xml
>>>>>>
>>>>>> By the way, I tried deploying the code by checking out from svn
>>>>>
>>>>> repository,
>>>>>
>>>>>> but could not build it. I was getting following error:
>>>>>>
>>>>>> resolve-default:
>>>>>
>>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
>>>>> http://ant.apache.org/ivy/
>>>>>
>>>>>> :: [ivy:resolve] :: loading settings :: file =
>>>>>>
>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
>>>>>> [ivy:resolve]
>>>>>> [ivy:resolve] :: problems summary ::
>>>>>> [ivy:resolve] :::: WARNINGS
>>>>>> [ivy:resolve] module not found:
>>>>>> org.apache.gora#gora-core;0.2-incubating
>>>>>> [ivy:resolve] ==== local: tried
>>>>>> [ivy:resolve]
>>>>>
>>>>>
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>> ng
>>>>>
>>>>>> / ivys/ivy.xml
>>>>>> [ivy:resolve] -- artifact
>>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
>>>>>> [ivy:resolve]
>>>>>
>>>>>
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>> ng
>>>>>
>>>>>> / jars/gora-core.jar
>>>>>> [ivy:resolve] module not found:
>>>>>> org.apache.gora#gora-sql;0.2-incubating
>>>>>> [ivy:resolve] ==== local: tried
>>>>>> [ivy:resolve]
>>>>>
>>>>>
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>> g/
>>>>>
>>>>>> i vys/ivy.xml
>>>>>> [ivy:resolve] -- artifact
>>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
>>>>>> [ivy:resolve]
>>>>>
>>>>>
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>> g/
>>>>>
>>>>>> j ars/gora-sql.jar
>>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
>>>>>> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating:
>>> not
>>>>>> found [ivy:resolve] ::
>>> org.apache.gora#gora-sql;0.2-incubating:
>>>>>> not found [ivy:resolve]
>>>>>>
>>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
>>>>>>
>>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>>>>>
>>>>>> BUILD FAILED
>>>>>
>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
>>>>> impossible
>>>>>
>>>>>> to resolve dependencies:
>>>>>> resolve failed - see output for details
>>>>>>
>>>>>> -param
>>>>>>
>>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
>>>>>
>>>>> wrote:
>>>>>>> Look down a little further for the
>>>>>>>
>>>>>>> or
>>>>>>> runtime/local/bin/nutch (version >= 1.3)
>>>>>>>
>>>>>>> If you download the bin then it's in the runtime directory.
>>>>>>>
>>>>>>> Jerry E. Craig, Jr.
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
>>>>>>> Sent: Monday, July 11, 2011 2:51 PM
>>>>>>> To: user@nutch.apache.org
>>>>>>> Subject: Nutch Novice help
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Sorry for such a naïve question, I downloaded nutch 1.3 binary
>>> today
>>>>>
>>>>> and
>>>>>
>>>>>>> trying to set it up as mentioned in Tutorial at
>>>>>>> http://wiki.apache.org/nutch/NutchTutorial
>>>>>>>
>>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
>>>>>
>>>>> directory.
>>>>>
>>>>>>> Is there any other place where I should look for this file?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Param
>>>
>>
>>
>>
>> --
>> *
>> *Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>>
>
>
Re: Nutch Novice help
Posted by Julien Nioche <li...@gmail.com>.
On 12 July 2011 10:30, Julien Nioche <li...@gmail.com> wrote:
>
>
>> > > There seems to be no crawl-urlfilter file indeed. Don't know why it's
>> > > gone since
>> > > the crawl command is still there. You can find the file in the 1.2
>> > > release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>> >
>> > Crawl-urlfilter has been removed purposefully as it did not add
>> anything
>> > to the other url filters (automaton | regex) in terms of functionality.
>> By
>> > default the urlfilters contain (+.) which IIRC was what the
>> > Crawl-urlfilter used to do.
>> >
>>
>> That's reasonable. But now news users are unaware and don't know what to
>> do
>> with this error message.
>>
>
> Yep, the tutorial needs updating indeed
>
done
>
>
>
>>
>> > > > Thanks for a quick reply.
>> > > >
>> > > > I searched in the nutch directory but still do not see that file :(.
>> > >
>> > > Here's
>> > >
>> > > > complete file list inside runtime/local/conf directory.
>> > > >
>> > > > us137390:conf parampreetsethi$ pwd
>> > > > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
>> > > > us137390:conf parampreetsethi$ ls -t
>> > > > automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
>> > > > prefix-urlfilter.txt solrindex-mapping.xml
>> > > > configuration.xsl httpclient-auth.xml nutch-site.xml
>> > > > regex-normalize.xml subcollections.xml
>> > > > domain-suffixes.xml log4j.properties parse-plugins.dtd
>> > > > regex-urlfilter.txt suffix-urlfilter.txt
>> > > > domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
>> > > > schema.xml tika-mimetypes.xml
>> > > >
>> > > > By the way, I tried deploying the code by checking out from svn
>> > >
>> > > repository,
>> > >
>> > > > but could not build it. I was getting following error:
>> > > >
>> > > > resolve-default:
>> > >
>> > > > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
>> > > http://ant.apache.org/ivy/
>> > >
>> > > > :: [ivy:resolve] :: loading settings :: file =
>> > > >
>> > > > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
>> > > > [ivy:resolve]
>> > > > [ivy:resolve] :: problems summary ::
>> > > > [ivy:resolve] :::: WARNINGS
>> > > > [ivy:resolve] module not found:
>> > > > org.apache.gora#gora-core;0.2-incubating
>> > > > [ivy:resolve] ==== local: tried
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>> > > ng
>> > >
>> > > > / ivys/ivy.xml
>> > > > [ivy:resolve] -- artifact
>> > > > org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>> > > ng
>> > >
>> > > > / jars/gora-core.jar
>> > > > [ivy:resolve] module not found:
>> > > > org.apache.gora#gora-sql;0.2-incubating
>> > > > [ivy:resolve] ==== local: tried
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>> > > g/
>> > >
>> > > > i vys/ivy.xml
>> > > > [ivy:resolve] -- artifact
>> > > > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>> > > g/
>> > >
>> > > > j ars/gora-sql.jar
>> > > > [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
>> > > > [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
>> > > > [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
>> > > > [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating:
>> not
>> > > > found [ivy:resolve] ::
>> org.apache.gora#gora-sql;0.2-incubating:
>> > > > not found [ivy:resolve]
>> > > >
>> > > > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
>> > > >
>> > > > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>> > > >
>> > > > BUILD FAILED
>> > >
>> > > > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
>> > > impossible
>> > >
>> > > > to resolve dependencies:
>> > > > resolve failed - see output for details
>> > > >
>> > > > -param
>> > > >
>> > > > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
>> > >
>> > > wrote:
>> > > > > Look down a little further for the
>> > > > >
>> > > > > or
>> > > > > runtime/local/bin/nutch (version >= 1.3)
>> > > > >
>> > > > > If you download the bin then it's in the runtime directory.
>> > > > >
>> > > > > Jerry E. Craig, Jr.
>> > > > >
>> > > > > -----Original Message-----
>> > > > > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
>> > > > > Sent: Monday, July 11, 2011 2:51 PM
>> > > > > To: user@nutch.apache.org
>> > > > > Subject: Nutch Novice help
>> > > > >
>> > > > > Hi All,
>> > > > >
>> > > > > Sorry for such a naïve question, I downloaded nutch 1.3 binary
>> today
>> > >
>> > > and
>> > >
>> > > > > trying to set it up as mentioned in Tutorial at
>> > > > > http://wiki.apache.org/nutch/NutchTutorial
>> > > > >
>> > > > > How ever I am not able to find crawl-urlfilter.txt inside conf
>> > >
>> > > directory.
>> > >
>> > > > > Is there any other place where I should look for this file?
>> > > > >
>> > > > > Thanks
>> > > > > Param
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
Re: Nutch Novice help
Posted by Markus Jelsma <ma...@openindex.io>.
> > There seems to be no crawl-urlfilter file indeed. Don't know why it's
> > gone since
> > the crawl command is still there. You can find the file in the 1.2
> > release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>
> Crawl-urlfilter has been removed purposefully as it did not add anything
> to the other url filters (automaton | regex) in terms of functionality. By
> default the urlfilters contain (+.) which IIRC was what the
> Crawl-urlfilter used to do.
>
That's reasonable. But now news users are unaware and don't know what to do
with this error message.
> > > Thanks for a quick reply.
> > >
> > > I searched in the nutch directory but still do not see that file :(.
> >
> > Here's
> >
> > > complete file list inside runtime/local/conf directory.
> > >
> > > us137390:conf parampreetsethi$ pwd
> > > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> > > us137390:conf parampreetsethi$ ls -t
> > > automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
> > > prefix-urlfilter.txt solrindex-mapping.xml
> > > configuration.xsl httpclient-auth.xml nutch-site.xml
> > > regex-normalize.xml subcollections.xml
> > > domain-suffixes.xml log4j.properties parse-plugins.dtd
> > > regex-urlfilter.txt suffix-urlfilter.txt
> > > domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
> > > schema.xml tika-mimetypes.xml
> > >
> > > By the way, I tried deploying the code by checking out from svn
> >
> > repository,
> >
> > > but could not build it. I was getting following error:
> > >
> > > resolve-default:
> >
> > > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> > http://ant.apache.org/ivy/
> >
> > > :: [ivy:resolve] :: loading settings :: file =
> > >
> > > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> > > [ivy:resolve]
> > > [ivy:resolve] :: problems summary ::
> > > [ivy:resolve] :::: WARNINGS
> > > [ivy:resolve] module not found:
> > > org.apache.gora#gora-core;0.2-incubating
> > > [ivy:resolve] ==== local: tried
> > > [ivy:resolve]
> >
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> > ng
> >
> > > / ivys/ivy.xml
> > > [ivy:resolve] -- artifact
> > > org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> > > [ivy:resolve]
> >
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> > ng
> >
> > > / jars/gora-core.jar
> > > [ivy:resolve] module not found:
> > > org.apache.gora#gora-sql;0.2-incubating
> > > [ivy:resolve] ==== local: tried
> > > [ivy:resolve]
> >
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> > g/
> >
> > > i vys/ivy.xml
> > > [ivy:resolve] -- artifact
> > > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> > > [ivy:resolve]
> >
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> > g/
> >
> > > j ars/gora-sql.jar
> > > [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> > > [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
> > > [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> > > [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating: not
> > > found [ivy:resolve] :: org.apache.gora#gora-sql;0.2-incubating:
> > > not found [ivy:resolve]
> > >
> > > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> > >
> > > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> > >
> > > BUILD FAILED
> >
> > > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> > impossible
> >
> > > to resolve dependencies:
> > > resolve failed - see output for details
> > >
> > > -param
> > >
> > > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> >
> > wrote:
> > > > Look down a little further for the
> > > >
> > > > or
> > > > runtime/local/bin/nutch (version >= 1.3)
> > > >
> > > > If you download the bin then it's in the runtime directory.
> > > >
> > > > Jerry E. Craig, Jr.
> > > >
> > > > -----Original Message-----
> > > > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> > > > Sent: Monday, July 11, 2011 2:51 PM
> > > > To: user@nutch.apache.org
> > > > Subject: Nutch Novice help
> > > >
> > > > Hi All,
> > > >
> > > > Sorry for such a naïve question, I downloaded nutch 1.3 binary today
> >
> > and
> >
> > > > trying to set it up as mentioned in Tutorial at
> > > > http://wiki.apache.org/nutch/NutchTutorial
> > > >
> > > > How ever I am not able to find crawl-urlfilter.txt inside conf
> >
> > directory.
> >
> > > > Is there any other place where I should look for this file?
> > > >
> > > > Thanks
> > > > Param
Re: Nutch Novice help
Posted by Julien Nioche <li...@gmail.com>.
>
> There seems to be no crawl-urlfilter file indeed. Don't know why it's gone
> since
> the crawl command is still there. You can find the file in the 1.2 release:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>
Crawl-urlfilter has been removed purposefully as it did not add anything to
the other url filters (automaton | regex) in terms of functionality. By
default the urlfilters contain (+.) which IIRC was what the Crawl-urlfilter
used to do.
>
> > Thanks for a quick reply.
> >
> > I searched in the nutch directory but still do not see that file :(.
> Here's
> > complete file list inside runtime/local/conf directory.
> >
> > us137390:conf parampreetsethi$ pwd
> > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> > us137390:conf parampreetsethi$ ls -t
> > automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
> > prefix-urlfilter.txt solrindex-mapping.xml
> > configuration.xsl httpclient-auth.xml nutch-site.xml
> > regex-normalize.xml subcollections.xml
> > domain-suffixes.xml log4j.properties parse-plugins.dtd
> > regex-urlfilter.txt suffix-urlfilter.txt
> > domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
> > schema.xml tika-mimetypes.xml
> >
> > By the way, I tried deploying the code by checking out from svn
> repository,
> > but could not build it. I was getting following error:
> >
> > resolve-default:
> > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> http://ant.apache.org/ivy/
> > :: [ivy:resolve] :: loading settings :: file =
> > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> > [ivy:resolve]
> > [ivy:resolve] :: problems summary ::
> > [ivy:resolve] :::: WARNINGS
> > [ivy:resolve] module not found:
> > org.apache.gora#gora-core;0.2-incubating
> > [ivy:resolve] ==== local: tried
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> > / ivys/ivy.xml
> > [ivy:resolve] -- artifact
> > org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> > / jars/gora-core.jar
> > [ivy:resolve] module not found:
> > org.apache.gora#gora-sql;0.2-incubating
> > [ivy:resolve] ==== local: tried
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> > i vys/ivy.xml
> > [ivy:resolve] -- artifact
> > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> > j ars/gora-sql.jar
> > [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> > [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
> > [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> > [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating: not
> > found [ivy:resolve] :: org.apache.gora#gora-sql;0.2-incubating:
> > not found [ivy:resolve]
> > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >
> > BUILD FAILED
> > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> impossible
> > to resolve dependencies:
> > resolve failed - see output for details
> >
> >
> > -param
> >
> > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> wrote:
> > > Look down a little further for the
> > >
> > > or
> > > runtime/local/bin/nutch (version >= 1.3)
> > >
> > > If you download the bin then it's in the runtime directory.
> > >
> > > Jerry E. Craig, Jr.
> > >
> > > -----Original Message-----
> > > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> > > Sent: Monday, July 11, 2011 2:51 PM
> > > To: user@nutch.apache.org
> > > Subject: Nutch Novice help
> > >
> > > Hi All,
> > >
> > > Sorry for such a naïve question, I downloaded nutch 1.3 binary today
> and
> > > trying to set it up as mentioned in Tutorial at
> > > http://wiki.apache.org/nutch/NutchTutorial
> > >
> > > How ever I am not able to find crawl-urlfilter.txt inside conf
> directory.
> > > Is there any other place where I should look for this file?
> > >
> > > Thanks
> > > Param
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
Re: Nutch Novice help
Posted by Markus Jelsma <ma...@openindex.io>.
Building trunk is tricky and runs with issues. Don't use it in production
unless you know what you're doing. It's safer to checkout 1.3 stable although
1.4-dev runs fine as well and has some fixes for 1.3 that users mentioned on the
list.
There seems to be no crawl-urlfilter file indeed. Don't know why it's gone since
the crawl command is still there. You can find the file in the 1.2 release:
http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> Thanks for a quick reply.
>
> I searched in the nutch directory but still do not see that file :(. Here's
> complete file list inside runtime/local/conf directory.
>
> us137390:conf parampreetsethi$ pwd
> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> us137390:conf parampreetsethi$ ls -t
> automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
> prefix-urlfilter.txt solrindex-mapping.xml
> configuration.xsl httpclient-auth.xml nutch-site.xml
> regex-normalize.xml subcollections.xml
> domain-suffixes.xml log4j.properties parse-plugins.dtd
> regex-urlfilter.txt suffix-urlfilter.txt
> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml
> schema.xml tika-mimetypes.xml
>
> By the way, I tried deploying the code by checking out from svn repository,
> but could not build it. I was getting following error:
>
> resolve-default:
> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/
> :: [ivy:resolve] :: loading settings :: file =
> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> [ivy:resolve]
> [ivy:resolve] :: problems summary ::
> [ivy:resolve] :::: WARNINGS
> [ivy:resolve] module not found:
> org.apache.gora#gora-core;0.2-incubating
> [ivy:resolve] ==== local: tried
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> / ivys/ivy.xml
> [ivy:resolve] -- artifact
> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> / jars/gora-core.jar
> [ivy:resolve] module not found:
> org.apache.gora#gora-sql;0.2-incubating
> [ivy:resolve] ==== local: tried
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> i vys/ivy.xml
> [ivy:resolve] -- artifact
> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> j ars/gora-sql.jar
> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
> [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating: not
> found [ivy:resolve] :: org.apache.gora#gora-sql;0.2-incubating:
> not found [ivy:resolve]
> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>
> BUILD FAILED
> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458: impossible
> to resolve dependencies:
> resolve failed - see output for details
>
>
> -param
>
> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com> wrote:
> > Look down a little further for the
> >
> > or
> > runtime/local/bin/nutch (version >= 1.3)
> >
> > If you download the bin then it's in the runtime directory.
> >
> > Jerry E. Craig, Jr.
> >
> > -----Original Message-----
> > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> > Sent: Monday, July 11, 2011 2:51 PM
> > To: user@nutch.apache.org
> > Subject: Nutch Novice help
> >
> > Hi All,
> >
> > Sorry for such a naïve question, I downloaded nutch 1.3 binary today and
> > trying to set it up as mentioned in Tutorial at
> > http://wiki.apache.org/nutch/NutchTutorial
> >
> > How ever I am not able to find crawl-urlfilter.txt inside conf directory.
> > Is there any other place where I should look for this file?
> >
> > Thanks
> > Param
Re: Nutch Novice help
Posted by "Sethi, Parampreet" <pa...@teamaol.com>.
Thanks for a quick reply.
I searched in the nutch directory but still do not see that file :(. Here's
complete file list inside runtime/local/conf directory.
us137390:conf parampreetsethi$ pwd
/Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
us137390:conf parampreetsethi$ ls -t
automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml
prefix-urlfilter.txt solrindex-mapping.xml
configuration.xsl httpclient-auth.xml nutch-site.xml
regex-normalize.xml subcollections.xml
domain-suffixes.xml log4j.properties parse-plugins.dtd
regex-urlfilter.txt suffix-urlfilter.txt
domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml schema.xml
tika-mimetypes.xml
By the way, I tried deploying the code by checking out from svn repository,
but could not build it. I was getting following error:
resolve-default:
[ivy:resolve] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:resolve] :: loading settings :: file =
/Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
[ivy:resolve]
[ivy:resolve] :: problems summary ::
[ivy:resolve] :::: WARNINGS
[ivy:resolve] module not found:
org.apache.gora#gora-core;0.2-incubating
[ivy:resolve] ==== local: tried
[ivy:resolve]
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating/
ivys/ivy.xml
[ivy:resolve] -- artifact
org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
[ivy:resolve]
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating/
jars/gora-core.jar
[ivy:resolve] module not found:
org.apache.gora#gora-sql;0.2-incubating
[ivy:resolve] ==== local: tried
[ivy:resolve]
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/i
vys/ivy.xml
[ivy:resolve] -- artifact
org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
[ivy:resolve]
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/j
ars/gora-sql.jar
[ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
[ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating: not found
[ivy:resolve] :: org.apache.gora#gora-sql;0.2-incubating: not found
[ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
BUILD FAILED
/Users/parampreetsethi/Documents/workspace/nutch/build.xml:458: impossible
to resolve dependencies:
resolve failed - see output for details
-param
On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com> wrote:
> Look down a little further for the
>
> or
> runtime/local/bin/nutch (version >= 1.3)
>
> If you download the bin then it's in the runtime directory.
>
> Jerry E. Craig, Jr.
>
> -----Original Message-----
> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> Sent: Monday, July 11, 2011 2:51 PM
> To: user@nutch.apache.org
> Subject: Nutch Novice help
>
> Hi All,
>
> Sorry for such a naïve question, I downloaded nutch 1.3 binary today and
> trying to set it up as mentioned in Tutorial at
> http://wiki.apache.org/nutch/NutchTutorial
>
> How ever I am not able to find crawl-urlfilter.txt inside conf directory. Is
> there any other place where I should look for this file?
>
> Thanks
> Param
RE: Nutch Novice help
Posted by "Jerry E. Craig, Jr." <jc...@inforeverse.com>.
Look down a little further for the
or
runtime/local/bin/nutch (version >= 1.3)
If you download the bin then it's in the runtime directory.
Jerry E. Craig, Jr.
-----Original Message-----
From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
Sent: Monday, July 11, 2011 2:51 PM
To: user@nutch.apache.org
Subject: Nutch Novice help
Hi All,
Sorry for such a naïve question, I downloaded nutch 1.3 binary today and trying to set it up as mentioned in Tutorial at http://wiki.apache.org/nutch/NutchTutorial
How ever I am not able to find crawl-urlfilter.txt inside conf directory. Is there any other place where I should look for this file?
Thanks
Param