You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Sethi, Parampreet" <pa...@teamaol.com> on 2011/07/11 23:50:54 UTC

Nutch Novice help

Hi All,

Sorry for such a naïve question,  I downloaded nutch 1.3 binary today and trying to set it up as mentioned in Tutorial at http://wiki.apache.org/nutch/NutchTutorial

How ever I am not able to find crawl-urlfilter.txt inside conf directory. Is there any other place where I should look for this file?

Thanks
Param

Re: Nutch Novice help

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Please see this tutorial [1] for up to date 1.3 tutorial on wiki.

Please try it out and take on Markus' points regarding Nutch trunk as the
problems you are experiencing are usual with Trunk as it stands.

[1] http://wiki.apache.org/nutch/RunningNutchAndSolr

On Mon, Jul 11, 2011 at 10:50 PM, Sethi, Parampreet <
parampreet.sethi@teamaol.com> wrote:

> Hi All,
>
> Sorry for such a naïve question,  I downloaded nutch 1.3 binary today and
> trying to set it up as mentioned in Tutorial at
> http://wiki.apache.org/nutch/NutchTutorial
>
> How ever I am not able to find crawl-urlfilter.txt inside conf directory.
> Is there any other place where I should look for this file?
>
> Thanks
> Param
>



-- 
*Lewis*

Re: Nutch Novice help

Posted by lewis john mcgibbney <le...@gmail.com>.
Have a good look under your hadoop.log which should be created when you
initiate a crawl with Nutch, this will be extremely valuable. In addition
there are various properties in nutch-site.xml which can be set to make
logging more verbose at various levels e.g. fetching

In order to root out various errors you will need to get used to looking
through yours logs. It is also advised to try and include as much log data
as possible when posting queries on the user list. You can find more
information about this here as it will greatly help you get accurate and
detailed help from the list in the future. Please have a look here [1].

I would advise you to delete all crawled data and begin a fresh crawl, this
way you can try the above, looking at your logs, before we try to root out
where exactly the errors are stemming from.

HTH

[1]
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer#Becoming_a_Nutch_Developer



On Tue, Jul 12, 2011 at 7:31 PM, Sethi, Parampreet <
parampreet.sethi@teamaol.com> wrote:

> Hey Lewis, Thanks for the quick reply. Looks like I am tangled now =)
>
> I tried the tutorial mentioned at
> http://wiki.apache.org/nutch/RunningNutchAndSolr
>
> For me step 3 is not working. Two of the directories are not created (which
> should be there after step 3 is complete.)
>
> crawl/crawldb - Created
> crawl/linkdb - not created
> crawl/segments - not created
>
> Also, I changed the url to http://nutch.apache.org, but still same log
> message "Generator: 0 records selected for fetching, exiting ..."
>
> Looks like I am missing some key step =(.
>
> -param
>
> On 7/12/11 1:37 PM, "lewis john mcgibbney" <le...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I think you are maybe getting tangled here. Please see the following
> > tutorial for Nutch 1.3 [1]
> >
> > Please also note that the URL you provided is the old Nutch site and now
> > redirects to http://nutch.apache.org
> >
> > [1] http://wiki.apache.org/nutch/RunningNutchAndSolr
> >
> > On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet <
> > parampreet.sethi@teamaol.com> wrote:
> >
> >> Thanks for updating the tutorial. I tried my setup, the crawl command is
> >> running. But none of the pages are being crawled.
> >> I created urls directory inside local folder and added new file nutch
> with
> >> url in the same as mentioned in tutorial.
> >>
> >> (I also tried file named urls inside nutch/runtime/local diretcory. The
> >> contents of urls file is http://lucene.apache.org/nutch/ )
> >>
> >> Here's the log:
> >>
> >> us137390:local parampreetsethi$  bin/nutch crawl urls -dir crawl -depth
> 3
> >> -topN 50
> >> solrUrl is not set, indexing will be skipped...
> >> crawl started in: crawl
> >> rootUrlDir = urls
> >> threads = 10
> >> depth = 3
> >> solrUrl=null
> >> topN = 50
> >> Injector: starting at 2011-07-12 12:22:12
> >> Injector: crawlDb: crawl/crawldb
> >> Injector: urlDir: urls
> >> Injector: Converting injected urls to crawl db entries.
> >> Injector: Merging injected urls into crawl db.
> >> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> >> Generator: starting at 2011-07-12 12:22:15
> >> Generator: Selecting best-scoring urls due for fetch.
> >> Generator: filtering: true
> >> Generator: normalizing: true
> >> Generator: topN: 50
> >> Generator: jobtracker is 'local', generating exactly one partition.
> >> Generator: 0 records selected for fetching, exiting ...
> >> Stopping at depth=0 - no more URLs to fetch.
> >> No URLs to fetch - check your seed list and URL filters.
> >> crawl finished: crawl
> >>
> >>
> >> Please help.
> >>
> >> Thanks
> >> Param
> >>
> >> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com>
> wrote:
> >>
> >>> On 12 July 2011 10:30, Julien Nioche <li...@gmail.com>
> >> wrote:
> >>>
> >>>>
> >>>>
> >>>>>>> There seems to be no crawl-urlfilter file indeed. Don't know why
> it's
> >>>>>>> gone since
> >>>>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>>>> release:
> >> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>>>>
> >>>>>> Crawl-urlfilter has been removed  purposefully as it did not add
> >>>>> anything
> >>>>>> to the other url filters (automaton | regex) in terms of
> >> functionality.
> >>>>> By
> >>>>>> default the urlfilters contain (+.) which IIRC was what the
> >>>>>> Crawl-urlfilter used to do.
> >>>>>>
> >>>>>
> >>>>> That's reasonable. But now news users are unaware and don't know what
> >> to
> >>>>> do
> >>>>> with this error message.
> >>>>>
> >>>>
> >>>> Yep, the tutorial needs updating indeed
> >>>>
> >>>
> >>> done
> >>>
> >>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>>>> Thanks for a quick reply.
> >>>>>>>>
> >>>>>>>> I searched in the nutch directory but still do not see that file
> :(.
> >>>>>>>
> >>>>>>> Here's
> >>>>>>>
> >>>>>>>> complete file list inside runtime/local/conf directory.
> >>>>>>>>
> >>>>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>>>>
> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>>>> automaton-urlfilter.txt    domain-urlfilter.txt
>  nutch-default.xml
> >>>>>>>> prefix-urlfilter.txt    solrindex-mapping.xml
> >>>>>>>> configuration.xsl    httpclient-auth.xml    nutch-site.xml
> >>>>>>>> regex-normalize.xml    subcollections.xml
> >>>>>>>> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> >>>>>>>> regex-urlfilter.txt    suffix-urlfilter.txt
> >>>>>>>> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> >>>>>>>> schema.xml tika-mimetypes.xml
> >>>>>>>>
> >>>>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>>>>
> >>>>>>> repository,
> >>>>>>>
> >>>>>>>> but could not build it. I was getting following error:
> >>>>>>>>
> >>>>>>>> resolve-default:
> >>>>>>>
> >>>>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>>>> http://ant.apache.org/ivy/
> >>>>>>>
> >>>>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>>>>
> >>>>>>>>
> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>>>> [ivy:resolve]
> >>>>>>>> [ivy:resolve] :: problems summary ::
> >>>>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>>>> [ivy:resolve]         module not found:
> >>>>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>>>> [ivy:resolve]     ==== local: tried
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>>>> ng
> >>>>>>>
> >>>>>>>> / ivys/ivy.xml
> >>>>>>>> [ivy:resolve]       -- artifact
> >>>>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>>>> ng
> >>>>>>>
> >>>>>>>> / jars/gora-core.jar
> >>>>>>>> [ivy:resolve]         module not found:
> >>>>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>>>> [ivy:resolve]     ==== local: tried
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>>>> g/
> >>>>>>>
> >>>>>>>> i vys/ivy.xml
> >>>>>>>> [ivy:resolve]       -- artifact
> >>>>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>>>> [ivy:resolve]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>>>> g/
> >>>>>>>
> >>>>>>>> j ars/gora-sql.jar
> >>>>>>>> [ivy:resolve]
> ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES
> ::
> >>>>>>>> [ivy:resolve]
> ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
> >>>>> not
> >>>>>>>> found [ivy:resolve]         ::
> >>>>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>>>> not found [ivy:resolve]
> >>>>>>>>
> >>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>>>>
> >>>>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE
> DETAILS
> >>>>>>>>
> >>>>>>>> BUILD FAILED
> >>>>>>>
> >>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>>>> impossible
> >>>>>>>
> >>>>>>>> to resolve dependencies:
> >>>>>>>>     resolve failed - see output for details
> >>>>>>>>
> >>>>>>>> -param
> >>>>>>>>
> >>>>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jcraig@inforeverse.com
> >
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>>> Look down a little further for the
> >>>>>>>>>
> >>>>>>>>> or
> >>>>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>>>>
> >>>>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>>>>
> >>>>>>>>> Jerry E. Craig, Jr.
> >>>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> >>>>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>>>> To: user@nutch.apache.org
> >>>>>>>>> Subject: Nutch Novice help
> >>>>>>>>>
> >>>>>>>>> Hi All,
> >>>>>>>>>
> >>>>>>>>> Sorry for such a naïve question,  I downloaded nutch 1.3 binary
> >>>>> today
> >>>>>>>
> >>>>>>> and
> >>>>>>>
> >>>>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>>>>
> >>>>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>>>>
> >>>>>>> directory.
> >>>>>>>
> >>>>>>>>> Is there any other place where I should look for this file?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Param
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> *
> >>>> *Open Source Solutions for Text Engineering
> >>>>
> >>>> http://digitalpebble.blogspot.com/
> >>>> http://www.digitalpebble.com
> >>>>
> >>>
> >>>
> >>
> >>
> >
>
>


-- 
*Lewis*

Re: Nutch Novice help

Posted by "Sethi, Parampreet" <pa...@teamaol.com>.
Hey Lewis, Thanks for the quick reply. Looks like I am tangled now =)

I tried the tutorial mentioned at
http://wiki.apache.org/nutch/RunningNutchAndSolr

For me step 3 is not working. Two of the directories are not created (which
should be there after step 3 is complete.)

crawl/crawldb - Created
crawl/linkdb - not created
crawl/segments - not created

Also, I changed the url to http://nutch.apache.org, but still same log
message "Generator: 0 records selected for fetching, exiting ..."

Looks like I am missing some key step =(.

-param

On 7/12/11 1:37 PM, "lewis john mcgibbney" <le...@gmail.com>
wrote:

> Hi,
> 
> I think you are maybe getting tangled here. Please see the following
> tutorial for Nutch 1.3 [1]
> 
> Please also note that the URL you provided is the old Nutch site and now
> redirects to http://nutch.apache.org
> 
> [1] http://wiki.apache.org/nutch/RunningNutchAndSolr
> 
> On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet <
> parampreet.sethi@teamaol.com> wrote:
> 
>> Thanks for updating the tutorial. I tried my setup, the crawl command is
>> running. But none of the pages are being crawled.
>> I created urls directory inside local folder and added new file nutch with
>> url in the same as mentioned in tutorial.
>> 
>> (I also tried file named urls inside nutch/runtime/local diretcory. The
>> contents of urls file is http://lucene.apache.org/nutch/ )
>> 
>> Here's the log:
>> 
>> us137390:local parampreetsethi$  bin/nutch crawl urls -dir crawl -depth 3
>> -topN 50
>> solrUrl is not set, indexing will be skipped...
>> crawl started in: crawl
>> rootUrlDir = urls
>> threads = 10
>> depth = 3
>> solrUrl=null
>> topN = 50
>> Injector: starting at 2011-07-12 12:22:12
>> Injector: crawlDb: crawl/crawldb
>> Injector: urlDir: urls
>> Injector: Converting injected urls to crawl db entries.
>> Injector: Merging injected urls into crawl db.
>> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
>> Generator: starting at 2011-07-12 12:22:15
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: true
>> Generator: normalizing: true
>> Generator: topN: 50
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: 0 records selected for fetching, exiting ...
>> Stopping at depth=0 - no more URLs to fetch.
>> No URLs to fetch - check your seed list and URL filters.
>> crawl finished: crawl
>> 
>> 
>> Please help.
>> 
>> Thanks
>> Param
>> 
>> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:
>> 
>>> On 12 July 2011 10:30, Julien Nioche <li...@gmail.com>
>> wrote:
>>> 
>>>> 
>>>> 
>>>>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
>>>>>>> gone since
>>>>>>> the crawl command is still there. You can find the file in the 1.2
>>>>>>> release:
>> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>>>>>> 
>>>>>> Crawl-urlfilter has been removed  purposefully as it did not add
>>>>> anything
>>>>>> to the other url filters (automaton | regex) in terms of
>> functionality.
>>>>> By
>>>>>> default the urlfilters contain (+.) which IIRC was what the
>>>>>> Crawl-urlfilter used to do.
>>>>>> 
>>>>> 
>>>>> That's reasonable. But now news users are unaware and don't know what
>> to
>>>>> do
>>>>> with this error message.
>>>>> 
>>>> 
>>>> Yep, the tutorial needs updating indeed
>>>> 
>>> 
>>> done
>>> 
>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>>>>> Thanks for a quick reply.
>>>>>>>> 
>>>>>>>> I searched in the nutch directory but still do not see that file :(.
>>>>>>> 
>>>>>>> Here's
>>>>>>> 
>>>>>>>> complete file list inside runtime/local/conf directory.
>>>>>>>> 
>>>>>>>> us137390:conf parampreetsethi$ pwd
>>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
>>>>>>>> us137390:conf parampreetsethi$ ls -t
>>>>>>>> automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
>>>>>>>> prefix-urlfilter.txt    solrindex-mapping.xml
>>>>>>>> configuration.xsl    httpclient-auth.xml    nutch-site.xml
>>>>>>>> regex-normalize.xml    subcollections.xml
>>>>>>>> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
>>>>>>>> regex-urlfilter.txt    suffix-urlfilter.txt
>>>>>>>> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
>>>>>>>> schema.xml tika-mimetypes.xml
>>>>>>>> 
>>>>>>>> By the way, I tried deploying the code by checking out from svn
>>>>>>> 
>>>>>>> repository,
>>>>>>> 
>>>>>>>> but could not build it. I was getting following error:
>>>>>>>> 
>>>>>>>> resolve-default:
>>>>>>> 
>>>>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
>>>>>>> http://ant.apache.org/ivy/
>>>>>>> 
>>>>>>>> :: [ivy:resolve] :: loading settings :: file =
>>>>>>>> 
>>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
>>>>>>>> [ivy:resolve]
>>>>>>>> [ivy:resolve] :: problems summary ::
>>>>>>>> [ivy:resolve] :::: WARNINGS
>>>>>>>> [ivy:resolve]         module not found:
>>>>>>>> org.apache.gora#gora-core;0.2-incubating
>>>>>>>> [ivy:resolve]     ==== local: tried
>>>>>>>> [ivy:resolve]
>>>>>>> 
>>>>>>> 
>>>>> 
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>>>> ng
>>>>>>> 
>>>>>>>> / ivys/ivy.xml
>>>>>>>> [ivy:resolve]       -- artifact
>>>>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
>>>>>>>> [ivy:resolve]
>>>>>>> 
>>>>>>> 
>>>>> 
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>>>> ng
>>>>>>> 
>>>>>>>> / jars/gora-core.jar
>>>>>>>> [ivy:resolve]         module not found:
>>>>>>>> org.apache.gora#gora-sql;0.2-incubating
>>>>>>>> [ivy:resolve]     ==== local: tried
>>>>>>>> [ivy:resolve]
>>>>>>> 
>>>>>>> 
>>>>> 
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>>>> g/
>>>>>>> 
>>>>>>>> i vys/ivy.xml
>>>>>>>> [ivy:resolve]       -- artifact
>>>>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
>>>>>>>> [ivy:resolve]
>>>>>>> 
>>>>>>> 
>>>>> 
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>>>> g/
>>>>>>> 
>>>>>>>> j ars/gora-sql.jar
>>>>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
>>>>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
>>>>> not
>>>>>>>> found [ivy:resolve]         ::
>>>>> org.apache.gora#gora-sql;0.2-incubating:
>>>>>>>> not found [ivy:resolve]
>>>>>>>> 
>>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
>>>>>>>> 
>>>>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>>>>>>> 
>>>>>>>> BUILD FAILED
>>>>>>> 
>>>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
>>>>>>> impossible
>>>>>>> 
>>>>>>>> to resolve dependencies:
>>>>>>>>     resolve failed - see output for details
>>>>>>>> 
>>>>>>>> -param
>>>>>>>> 
>>>>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
>>>>>>> 
>>>>>>> wrote:
>>>>>>>>> Look down a little further for the
>>>>>>>>> 
>>>>>>>>> or
>>>>>>>>> runtime/local/bin/nutch (version >= 1.3)
>>>>>>>>> 
>>>>>>>>> If you download the bin then it's in the runtime directory.
>>>>>>>>> 
>>>>>>>>> Jerry E. Craig, Jr.
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
>>>>>>>>> Sent: Monday, July 11, 2011 2:51 PM
>>>>>>>>> To: user@nutch.apache.org
>>>>>>>>> Subject: Nutch Novice help
>>>>>>>>> 
>>>>>>>>> Hi All,
>>>>>>>>> 
>>>>>>>>> Sorry for such a naïve question,  I downloaded nutch 1.3 binary
>>>>> today
>>>>>>> 
>>>>>>> and
>>>>>>> 
>>>>>>>>> trying to set it up as mentioned in Tutorial at
>>>>>>>>> http://wiki.apache.org/nutch/NutchTutorial
>>>>>>>>> 
>>>>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
>>>>>>> 
>>>>>>> directory.
>>>>>>> 
>>>>>>>>> Is there any other place where I should look for this file?
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Param
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *
>>>> *Open Source Solutions for Text Engineering
>>>> 
>>>> http://digitalpebble.blogspot.com/
>>>> http://www.digitalpebble.com
>>>> 
>>> 
>>> 
>> 
>> 
> 


Re: Nutch Novice help

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi,

I think you are maybe getting tangled here. Please see the following
tutorial for Nutch 1.3 [1]

Please also note that the URL you provided is the old Nutch site and now
redirects to http://nutch.apache.org

[1] http://wiki.apache.org/nutch/RunningNutchAndSolr

On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet <
parampreet.sethi@teamaol.com> wrote:

> Thanks for updating the tutorial. I tried my setup, the crawl command is
> running. But none of the pages are being crawled.
> I created urls directory inside local folder and added new file nutch with
> url in the same as mentioned in tutorial.
>
> (I also tried file named urls inside nutch/runtime/local diretcory. The
> contents of urls file is http://lucene.apache.org/nutch/ )
>
> Here's the log:
>
> us137390:local parampreetsethi$  bin/nutch crawl urls -dir crawl -depth 3
> -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-12 12:22:12
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> Generator: starting at 2011-07-12 12:22:15
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
>
> Please help.
>
> Thanks
> Param
>
> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:
>
> > On 12 July 2011 10:30, Julien Nioche <li...@gmail.com>
> wrote:
> >
> >>
> >>
> >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
> >>>>> gone since
> >>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>> release:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>>
> >>>> Crawl-urlfilter has been removed  purposefully as it did not add
> >>> anything
> >>>> to the other url filters (automaton | regex) in terms of
> functionality.
> >>> By
> >>>> default the urlfilters contain (+.) which IIRC was what the
> >>>> Crawl-urlfilter used to do.
> >>>>
> >>>
> >>> That's reasonable. But now news users are unaware and don't know what
> to
> >>> do
> >>> with this error message.
> >>>
> >>
> >> Yep, the tutorial needs updating indeed
> >>
> >
> > done
> >
> >
> >>
> >>
> >>
> >>>
> >>>>>> Thanks for a quick reply.
> >>>>>>
> >>>>>> I searched in the nutch directory but still do not see that file :(.
> >>>>>
> >>>>> Here's
> >>>>>
> >>>>>> complete file list inside runtime/local/conf directory.
> >>>>>>
> >>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>> automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> >>>>>> prefix-urlfilter.txt    solrindex-mapping.xml
> >>>>>> configuration.xsl    httpclient-auth.xml    nutch-site.xml
> >>>>>> regex-normalize.xml    subcollections.xml
> >>>>>> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> >>>>>> regex-urlfilter.txt    suffix-urlfilter.txt
> >>>>>> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> >>>>>> schema.xml tika-mimetypes.xml
> >>>>>>
> >>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>>
> >>>>> repository,
> >>>>>
> >>>>>> but could not build it. I was getting following error:
> >>>>>>
> >>>>>> resolve-default:
> >>>>>
> >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>> http://ant.apache.org/ivy/
> >>>>>
> >>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>> [ivy:resolve]
> >>>>>> [ivy:resolve] :: problems summary ::
> >>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>> ng
> >>>>>
> >>>>>> / ivys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>> ng
> >>>>>
> >>>>>> / jars/gora-core.jar
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>> g/
> >>>>>
> >>>>>> i vys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>> g/
> >>>>>
> >>>>>> j ars/gora-sql.jar
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
> >>> not
> >>>>>> found [ivy:resolve]         ::
> >>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>> not found [ivy:resolve]
> >>>>>>
> >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>>
> >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >>>>>>
> >>>>>> BUILD FAILED
> >>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>> impossible
> >>>>>
> >>>>>> to resolve dependencies:
> >>>>>>     resolve failed - see output for details
> >>>>>>
> >>>>>> -param
> >>>>>>
> >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> >>>>>
> >>>>> wrote:
> >>>>>>> Look down a little further for the
> >>>>>>>
> >>>>>>> or
> >>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>>
> >>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>>
> >>>>>>> Jerry E. Craig, Jr.
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> >>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>> To: user@nutch.apache.org
> >>>>>>> Subject: Nutch Novice help
> >>>>>>>
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> Sorry for such a naïve question,  I downloaded nutch 1.3 binary
> >>> today
> >>>>>
> >>>>> and
> >>>>>
> >>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>>
> >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>>
> >>>>> directory.
> >>>>>
> >>>>>>> Is there any other place where I should look for this file?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Param
> >>>
> >>
> >>
> >>
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> >>
> >
> >
>
>


-- 
*Lewis*

Re: Nutch Novice help

Posted by Markus Jelsma <ma...@openindex.io>.
No URLs to fetch - check your seed list and URL filters

The error is quite clear. You injected URL's that did not pass your url 
filters. Check your url filters, likely crawl-urlfilter since you seem to use the 
crawl command.


> Thanks for updating the tutorial. I tried my setup, the crawl command is
> running. But none of the pages are being crawled.
> I created urls directory inside local folder and added new file nutch with
> url in the same as mentioned in tutorial.
> 
> (I also tried file named urls inside nutch/runtime/local diretcory. The
> contents of urls file is http://lucene.apache.org/nutch/ )
> 
> Here's the log:
> 
> us137390:local parampreetsethi$  bin/nutch crawl urls -dir crawl -depth 3
> -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-12 12:22:12
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> Generator: starting at 2011-07-12 12:22:15
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
> 
> 
> Please help.
> 
> Thanks
> Param
> 
> On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:
> > On 12 July 2011 10:30, Julien Nioche <li...@gmail.com> 
wrote:
> >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
> >>>>> gone since
> >>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>> release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>> 
> >>>> Crawl-urlfilter has been removed  purposefully as it did not add
> >>> 
> >>> anything
> >>> 
> >>>> to the other url filters (automaton | regex) in terms of
> >>>> functionality.
> >>> 
> >>> By
> >>> 
> >>>> default the urlfilters contain (+.) which IIRC was what the
> >>>> Crawl-urlfilter used to do.
> >>> 
> >>> That's reasonable. But now news users are unaware and don't know what
> >>> to do
> >>> with this error message.
> >> 
> >> Yep, the tutorial needs updating indeed
> > 
> > done
> > 
> >>>>>> Thanks for a quick reply.
> >>>>>> 
> >>>>>> I searched in the nutch directory but still do not see that file :(.
> >>>>> 
> >>>>> Here's
> >>>>> 
> >>>>>> complete file list inside runtime/local/conf directory.
> >>>>>> 
> >>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>> automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> >>>>>> prefix-urlfilter.txt    solrindex-mapping.xml
> >>>>>> configuration.xsl    httpclient-auth.xml    nutch-site.xml
> >>>>>> regex-normalize.xml    subcollections.xml
> >>>>>> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> >>>>>> regex-urlfilter.txt    suffix-urlfilter.txt
> >>>>>> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> >>>>>> schema.xml tika-mimetypes.xml
> >>>>>> 
> >>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>> 
> >>>>> repository,
> >>>>> 
> >>>>>> but could not build it. I was getting following error:
> >>>>>> 
> >>>>>> resolve-default:
> >>>>> 
> >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>> http://ant.apache.org/ivy/
> >>>>> 
> >>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>> 
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>> [ivy:resolve]
> >>>>>> [ivy:resolve] :: problems summary ::
> >>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incuba
> >>> ti
> >>> 
> >>>>> ng
> >>>>> 
> >>>>>> / ivys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incuba
> >>> ti
> >>> 
> >>>>> ng
> >>>>> 
> >>>>>> / jars/gora-core.jar
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubat
> >>> in
> >>> 
> >>>>> g/
> >>>>> 
> >>>>>> i vys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubat
> >>> in
> >>> 
> >>>>> g/
> >>>>> 
> >>>>>> j ars/gora-sql.jar
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>> 
> >>>>>> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
> >>> not
> >>> 
> >>>>>> found [ivy:resolve]         ::
> >>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>> not found [ivy:resolve]
> >>>>>> 
> >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>> 
> >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >>>>>> 
> >>>>>> BUILD FAILED
> >>>>> 
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>> impossible
> >>>>> 
> >>>>>> to resolve dependencies:
> >>>>>>     resolve failed - see output for details
> >>>>>> 
> >>>>>> -param
> >>>>>> 
> >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> >>>>> 
> >>>>> wrote:
> >>>>>>> Look down a little further for the
> >>>>>>> 
> >>>>>>> or
> >>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>> 
> >>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>> 
> >>>>>>> Jerry E. Craig, Jr.
> >>>>>>> 
> >>>>>>> -----Original Message-----
> >>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> >>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>> To: user@nutch.apache.org
> >>>>>>> Subject: Nutch Novice help
> >>>>>>> 
> >>>>>>> Hi All,
> >>>>>>> 
> >>>>>>> Sorry for such a naïve question,  I downloaded nutch 1.3 binary
> >>> 
> >>> today
> >>> 
> >>>>> and
> >>>>> 
> >>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>> 
> >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>> 
> >>>>> directory.
> >>>>> 
> >>>>>>> Is there any other place where I should look for this file?
> >>>>>>> 
> >>>>>>> Thanks
> >>>>>>> Param
> >> 
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >> 
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com

Re: Nutch Novice help

Posted by "Sethi, Parampreet" <pa...@teamaol.com>.
Thanks for updating the tutorial. I tried my setup, the crawl command is
running. But none of the pages are being crawled.
I created urls directory inside local folder and added new file nutch with
url in the same as mentioned in tutorial.

(I also tried file named urls inside nutch/runtime/local diretcory. The
contents of urls file is http://lucene.apache.org/nutch/ )

Here's the log:

us137390:local parampreetsethi$  bin/nutch crawl urls -dir crawl -depth 3
-topN 50 
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=null
topN = 50
Injector: starting at 2011-07-12 12:22:12
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
Generator: starting at 2011-07-12 12:22:15
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl


Please help.

Thanks
Param

On 7/12/11 5:52 AM, "Julien Nioche" <li...@gmail.com> wrote:

> On 12 July 2011 10:30, Julien Nioche <li...@gmail.com> wrote:
> 
>> 
>> 
>>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
>>>>> gone since
>>>>> the crawl command is still there. You can find the file in the 1.2
>>>>> release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>>>> 
>>>> Crawl-urlfilter has been removed  purposefully as it did not add
>>> anything
>>>> to the other url filters (automaton | regex) in terms of functionality.
>>> By
>>>> default the urlfilters contain (+.) which IIRC was what the
>>>> Crawl-urlfilter used to do.
>>>> 
>>> 
>>> That's reasonable. But now news users are unaware and don't know what to
>>> do
>>> with this error message.
>>> 
>> 
>> Yep, the tutorial needs updating indeed
>> 
> 
> done
> 
> 
>> 
>> 
>> 
>>> 
>>>>>> Thanks for a quick reply.
>>>>>> 
>>>>>> I searched in the nutch directory but still do not see that file :(.
>>>>> 
>>>>> Here's
>>>>> 
>>>>>> complete file list inside runtime/local/conf directory.
>>>>>> 
>>>>>> us137390:conf parampreetsethi$ pwd
>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
>>>>>> us137390:conf parampreetsethi$ ls -t
>>>>>> automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
>>>>>> prefix-urlfilter.txt    solrindex-mapping.xml
>>>>>> configuration.xsl    httpclient-auth.xml    nutch-site.xml
>>>>>> regex-normalize.xml    subcollections.xml
>>>>>> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
>>>>>> regex-urlfilter.txt    suffix-urlfilter.txt
>>>>>> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
>>>>>> schema.xml tika-mimetypes.xml
>>>>>> 
>>>>>> By the way, I tried deploying the code by checking out from svn
>>>>> 
>>>>> repository,
>>>>> 
>>>>>> but could not build it. I was getting following error:
>>>>>> 
>>>>>> resolve-default:
>>>>> 
>>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
>>>>> http://ant.apache.org/ivy/
>>>>> 
>>>>>> :: [ivy:resolve] :: loading settings :: file =
>>>>>> 
>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
>>>>>> [ivy:resolve]
>>>>>> [ivy:resolve] :: problems summary ::
>>>>>> [ivy:resolve] :::: WARNINGS
>>>>>> [ivy:resolve]         module not found:
>>>>>> org.apache.gora#gora-core;0.2-incubating
>>>>>> [ivy:resolve]     ==== local: tried
>>>>>> [ivy:resolve]
>>>>> 
>>>>> 
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>> ng
>>>>> 
>>>>>> / ivys/ivy.xml
>>>>>> [ivy:resolve]       -- artifact
>>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
>>>>>> [ivy:resolve]
>>>>> 
>>>>> 
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>>>>> ng
>>>>> 
>>>>>> / jars/gora-core.jar
>>>>>> [ivy:resolve]         module not found:
>>>>>> org.apache.gora#gora-sql;0.2-incubating
>>>>>> [ivy:resolve]     ==== local: tried
>>>>>> [ivy:resolve]
>>>>> 
>>>>> 
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>> g/
>>>>> 
>>>>>> i vys/ivy.xml
>>>>>> [ivy:resolve]       -- artifact
>>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
>>>>>> [ivy:resolve]
>>>>> 
>>>>> 
>>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>>>>> g/
>>>>> 
>>>>>> j ars/gora-sql.jar
>>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
>>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
>>> not
>>>>>> found [ivy:resolve]         ::
>>> org.apache.gora#gora-sql;0.2-incubating:
>>>>>> not found [ivy:resolve]
>>>>>> 
>>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
>>>>>> 
>>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>>>>> 
>>>>>> BUILD FAILED
>>>>> 
>>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
>>>>> impossible
>>>>> 
>>>>>> to resolve dependencies:
>>>>>>     resolve failed - see output for details
>>>>>> 
>>>>>> -param
>>>>>> 
>>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
>>>>> 
>>>>> wrote:
>>>>>>> Look down a little further for the
>>>>>>> 
>>>>>>> or
>>>>>>> runtime/local/bin/nutch (version >= 1.3)
>>>>>>> 
>>>>>>> If you download the bin then it's in the runtime directory.
>>>>>>> 
>>>>>>> Jerry E. Craig, Jr.
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
>>>>>>> Sent: Monday, July 11, 2011 2:51 PM
>>>>>>> To: user@nutch.apache.org
>>>>>>> Subject: Nutch Novice help
>>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> Sorry for such a naïve question,  I downloaded nutch 1.3 binary
>>> today
>>>>> 
>>>>> and
>>>>> 
>>>>>>> trying to set it up as mentioned in Tutorial at
>>>>>>> http://wiki.apache.org/nutch/NutchTutorial
>>>>>>> 
>>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
>>>>> 
>>>>> directory.
>>>>> 
>>>>>>> Is there any other place where I should look for this file?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Param
>>> 
>> 
>> 
>> 
>> --
>> *
>> *Open Source Solutions for Text Engineering
>> 
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> 
> 
> 


Re: Nutch Novice help

Posted by Julien Nioche <li...@gmail.com>.
On 12 July 2011 10:30, Julien Nioche <li...@gmail.com> wrote:

>
>
>> > > There seems to be no crawl-urlfilter file indeed. Don't know why it's
>> > > gone since
>> > > the crawl command is still there. You can find the file in the 1.2
>> > > release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>> >
>> > Crawl-urlfilter has been removed  purposefully as it did not add
>> anything
>> > to the other url filters (automaton | regex) in terms of functionality.
>> By
>> > default the urlfilters contain (+.) which IIRC was what the
>> > Crawl-urlfilter used to do.
>> >
>>
>> That's reasonable. But now news users are unaware and don't know what to
>> do
>> with this error message.
>>
>
> Yep, the tutorial needs updating indeed
>

done


>
>
>
>>
>> > > > Thanks for a quick reply.
>> > > >
>> > > > I searched in the nutch directory but still do not see that file :(.
>> > >
>> > > Here's
>> > >
>> > > > complete file list inside runtime/local/conf directory.
>> > > >
>> > > > us137390:conf parampreetsethi$ pwd
>> > > > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
>> > > > us137390:conf parampreetsethi$ ls -t
>> > > > automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
>> > > > prefix-urlfilter.txt    solrindex-mapping.xml
>> > > > configuration.xsl    httpclient-auth.xml    nutch-site.xml
>> > > > regex-normalize.xml    subcollections.xml
>> > > > domain-suffixes.xml    log4j.properties    parse-plugins.dtd
>> > > > regex-urlfilter.txt    suffix-urlfilter.txt
>> > > > domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
>> > > > schema.xml tika-mimetypes.xml
>> > > >
>> > > > By the way, I tried deploying the code by checking out from svn
>> > >
>> > > repository,
>> > >
>> > > > but could not build it. I was getting following error:
>> > > >
>> > > > resolve-default:
>> > >
>> > > > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
>> > > http://ant.apache.org/ivy/
>> > >
>> > > > :: [ivy:resolve] :: loading settings :: file =
>> > > >
>> > > > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
>> > > > [ivy:resolve]
>> > > > [ivy:resolve] :: problems summary ::
>> > > > [ivy:resolve] :::: WARNINGS
>> > > > [ivy:resolve]         module not found:
>> > > > org.apache.gora#gora-core;0.2-incubating
>> > > > [ivy:resolve]     ==== local: tried
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>> > > ng
>> > >
>> > > > / ivys/ivy.xml
>> > > > [ivy:resolve]       -- artifact
>> > > > org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
>> > > ng
>> > >
>> > > > / jars/gora-core.jar
>> > > > [ivy:resolve]         module not found:
>> > > > org.apache.gora#gora-sql;0.2-incubating
>> > > > [ivy:resolve]     ==== local: tried
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>> > > g/
>> > >
>> > > > i vys/ivy.xml
>> > > > [ivy:resolve]       -- artifact
>> > > > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
>> > > > [ivy:resolve]
>> > >
>> > >
>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
>> > > g/
>> > >
>> > > > j ars/gora-sql.jar
>> > > > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
>> > > > [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
>> > > > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
>> > > > [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
>> not
>> > > > found [ivy:resolve]         ::
>> org.apache.gora#gora-sql;0.2-incubating:
>> > > > not found [ivy:resolve]
>> > > >
>> > > > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
>> > > >
>> > > > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>> > > >
>> > > > BUILD FAILED
>> > >
>> > > > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
>> > > impossible
>> > >
>> > > > to resolve dependencies:
>> > > >     resolve failed - see output for details
>> > > >
>> > > > -param
>> > > >
>> > > > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
>> > >
>> > > wrote:
>> > > > > Look down a little further for the
>> > > > >
>> > > > > or
>> > > > > runtime/local/bin/nutch (version >= 1.3)
>> > > > >
>> > > > > If you download the bin then it's in the runtime directory.
>> > > > >
>> > > > > Jerry E. Craig, Jr.
>> > > > >
>> > > > > -----Original Message-----
>> > > > > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
>> > > > > Sent: Monday, July 11, 2011 2:51 PM
>> > > > > To: user@nutch.apache.org
>> > > > > Subject: Nutch Novice help
>> > > > >
>> > > > > Hi All,
>> > > > >
>> > > > > Sorry for such a naïve question,  I downloaded nutch 1.3 binary
>> today
>> > >
>> > > and
>> > >
>> > > > > trying to set it up as mentioned in Tutorial at
>> > > > > http://wiki.apache.org/nutch/NutchTutorial
>> > > > >
>> > > > > How ever I am not able to find crawl-urlfilter.txt inside conf
>> > >
>> > > directory.
>> > >
>> > > > > Is there any other place where I should look for this file?
>> > > > >
>> > > > > Thanks
>> > > > > Param
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Nutch Novice help

Posted by Markus Jelsma <ma...@openindex.io>.
> > There seems to be no crawl-urlfilter file indeed. Don't know why it's
> > gone since
> > the crawl command is still there. You can find the file in the 1.2
> > release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> 
> Crawl-urlfilter has been removed  purposefully as it did not add anything
> to the other url filters (automaton | regex) in terms of functionality. By
> default the urlfilters contain (+.) which IIRC was what the
> Crawl-urlfilter used to do.
> 

That's reasonable. But now news users are unaware and don't know what to do 
with this error message.

> > > Thanks for a quick reply.
> > > 
> > > I searched in the nutch directory but still do not see that file :(.
> > 
> > Here's
> > 
> > > complete file list inside runtime/local/conf directory.
> > > 
> > > us137390:conf parampreetsethi$ pwd
> > > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> > > us137390:conf parampreetsethi$ ls -t
> > > automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> > > prefix-urlfilter.txt    solrindex-mapping.xml
> > > configuration.xsl    httpclient-auth.xml    nutch-site.xml
> > > regex-normalize.xml    subcollections.xml
> > > domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> > > regex-urlfilter.txt    suffix-urlfilter.txt
> > > domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> > > schema.xml tika-mimetypes.xml
> > > 
> > > By the way, I tried deploying the code by checking out from svn
> > 
> > repository,
> > 
> > > but could not build it. I was getting following error:
> > > 
> > > resolve-default:
> > 
> > > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> > http://ant.apache.org/ivy/
> > 
> > > :: [ivy:resolve] :: loading settings :: file =
> > > 
> > > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> > > [ivy:resolve]
> > > [ivy:resolve] :: problems summary ::
> > > [ivy:resolve] :::: WARNINGS
> > > [ivy:resolve]         module not found:
> > > org.apache.gora#gora-core;0.2-incubating
> > > [ivy:resolve]     ==== local: tried
> > > [ivy:resolve]
> > 
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> > ng
> > 
> > > / ivys/ivy.xml
> > > [ivy:resolve]       -- artifact
> > > org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> > > [ivy:resolve]
> > 
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> > ng
> > 
> > > / jars/gora-core.jar
> > > [ivy:resolve]         module not found:
> > > org.apache.gora#gora-sql;0.2-incubating
> > > [ivy:resolve]     ==== local: tried
> > > [ivy:resolve]
> > 
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> > g/
> > 
> > > i vys/ivy.xml
> > > [ivy:resolve]       -- artifact
> > > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> > > [ivy:resolve]
> > 
> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> > g/
> > 
> > > j ars/gora-sql.jar
> > > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> > > [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> > > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> > > [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating: not
> > > found [ivy:resolve]         :: org.apache.gora#gora-sql;0.2-incubating:
> > > not found [ivy:resolve]
> > > 
> > > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> > > 
> > > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> > > 
> > > BUILD FAILED
> > 
> > > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> > impossible
> > 
> > > to resolve dependencies:
> > >     resolve failed - see output for details
> > > 
> > > -param
> > > 
> > > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> > 
> > wrote:
> > > > Look down a little further for the
> > > > 
> > > > or
> > > > runtime/local/bin/nutch (version >= 1.3)
> > > > 
> > > > If you download the bin then it's in the runtime directory.
> > > > 
> > > > Jerry E. Craig, Jr.
> > > > 
> > > > -----Original Message-----
> > > > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> > > > Sent: Monday, July 11, 2011 2:51 PM
> > > > To: user@nutch.apache.org
> > > > Subject: Nutch Novice help
> > > > 
> > > > Hi All,
> > > > 
> > > > Sorry for such a naïve question,  I downloaded nutch 1.3 binary today
> > 
> > and
> > 
> > > > trying to set it up as mentioned in Tutorial at
> > > > http://wiki.apache.org/nutch/NutchTutorial
> > > > 
> > > > How ever I am not able to find crawl-urlfilter.txt inside conf
> > 
> > directory.
> > 
> > > > Is there any other place where I should look for this file?
> > > > 
> > > > Thanks
> > > > Param

Re: Nutch Novice help

Posted by Julien Nioche <li...@gmail.com>.
>
> There seems to be no crawl-urlfilter file indeed. Don't know why it's gone
> since
> the crawl command is still there. You can find the file in the 1.2 release:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>

Crawl-urlfilter has been removed  purposefully as it did not add anything to
the other url filters (automaton | regex) in terms of functionality. By
default the urlfilters contain (+.) which IIRC was what the Crawl-urlfilter
used to do.



>
> > Thanks for a quick reply.
> >
> > I searched in the nutch directory but still do not see that file :(.
> Here's
> > complete file list inside runtime/local/conf directory.
> >
> > us137390:conf parampreetsethi$ pwd
> > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> > us137390:conf parampreetsethi$ ls -t
> > automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> > prefix-urlfilter.txt    solrindex-mapping.xml
> > configuration.xsl    httpclient-auth.xml    nutch-site.xml
> > regex-normalize.xml    subcollections.xml
> > domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> > regex-urlfilter.txt    suffix-urlfilter.txt
> > domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> > schema.xml tika-mimetypes.xml
> >
> > By the way, I tried deploying the code by checking out from svn
> repository,
> > but could not build it. I was getting following error:
> >
> > resolve-default:
> > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> http://ant.apache.org/ivy/
> > :: [ivy:resolve] :: loading settings :: file =
> > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> > [ivy:resolve]
> > [ivy:resolve] :: problems summary ::
> > [ivy:resolve] :::: WARNINGS
> > [ivy:resolve]         module not found:
> > org.apache.gora#gora-core;0.2-incubating
> > [ivy:resolve]     ==== local: tried
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> > / ivys/ivy.xml
> > [ivy:resolve]       -- artifact
> > org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> > / jars/gora-core.jar
> > [ivy:resolve]         module not found:
> > org.apache.gora#gora-sql;0.2-incubating
> > [ivy:resolve]     ==== local: tried
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> > i vys/ivy.xml
> > [ivy:resolve]       -- artifact
> > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> > j ars/gora-sql.jar
> > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> > [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> > [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating: not
> > found [ivy:resolve]         :: org.apache.gora#gora-sql;0.2-incubating:
> > not found [ivy:resolve]
> > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >
> > BUILD FAILED
> > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> impossible
> > to resolve dependencies:
> >     resolve failed - see output for details
> >
> >
> > -param
> >
> > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com>
> wrote:
> > > Look down a little further for the
> > >
> > > or
> > > runtime/local/bin/nutch (version >= 1.3)
> > >
> > > If you download the bin then it's in the runtime directory.
> > >
> > > Jerry E. Craig, Jr.
> > >
> > > -----Original Message-----
> > > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> > > Sent: Monday, July 11, 2011 2:51 PM
> > > To: user@nutch.apache.org
> > > Subject: Nutch Novice help
> > >
> > > Hi All,
> > >
> > > Sorry for such a naïve question,  I downloaded nutch 1.3 binary today
> and
> > > trying to set it up as mentioned in Tutorial at
> > > http://wiki.apache.org/nutch/NutchTutorial
> > >
> > > How ever I am not able to find crawl-urlfilter.txt inside conf
> directory.
> > > Is there any other place where I should look for this file?
> > >
> > > Thanks
> > > Param
>




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Nutch Novice help

Posted by Markus Jelsma <ma...@openindex.io>.
Building trunk is tricky and runs with issues. Don't use it in production 
unless you know what you're doing. It's safer to checkout 1.3 stable although 
1.4-dev runs fine as well and has some fixes for 1.3 that users mentioned on the 
list.

There seems to be no crawl-urlfilter file indeed. Don't know why it's gone since 
the crawl command is still there. You can find the file in the 1.2 release:
http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/

> Thanks for a quick reply.
> 
> I searched in the nutch directory but still do not see that file :(. Here's
> complete file list inside runtime/local/conf directory.
> 
> us137390:conf parampreetsethi$ pwd
> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> us137390:conf parampreetsethi$ ls -t
> automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> prefix-urlfilter.txt    solrindex-mapping.xml
> configuration.xsl    httpclient-auth.xml    nutch-site.xml
> regex-normalize.xml    subcollections.xml
> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> regex-urlfilter.txt    suffix-urlfilter.txt
> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml   
> schema.xml tika-mimetypes.xml
> 
> By the way, I tried deploying the code by checking out from svn repository,
> but could not build it. I was getting following error:
> 
> resolve-default:
> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/
> :: [ivy:resolve] :: loading settings :: file =
> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> [ivy:resolve]
> [ivy:resolve] :: problems summary ::
> [ivy:resolve] :::: WARNINGS
> [ivy:resolve]         module not found:
> org.apache.gora#gora-core;0.2-incubating
> [ivy:resolve]     ==== local: tried
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> / ivys/ivy.xml
> [ivy:resolve]       -- artifact
> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> / jars/gora-core.jar
> [ivy:resolve]         module not found:
> org.apache.gora#gora-sql;0.2-incubating
> [ivy:resolve]     ==== local: tried
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> i vys/ivy.xml
> [ivy:resolve]       -- artifact
> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> [ivy:resolve]
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> j ars/gora-sql.jar
> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating: not
> found [ivy:resolve]         :: org.apache.gora#gora-sql;0.2-incubating:
> not found [ivy:resolve]        
> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> 
> BUILD FAILED
> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458: impossible
> to resolve dependencies:
>     resolve failed - see output for details
> 
> 
> -param
> 
> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com> wrote:
> > Look down a little further for the
> > 
> > or
> > runtime/local/bin/nutch (version >= 1.3)
> > 
> > If you download the bin then it's in the runtime directory.
> > 
> > Jerry E. Craig, Jr.
> > 
> > -----Original Message-----
> > From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> > Sent: Monday, July 11, 2011 2:51 PM
> > To: user@nutch.apache.org
> > Subject: Nutch Novice help
> > 
> > Hi All,
> > 
> > Sorry for such a naïve question,  I downloaded nutch 1.3 binary today and
> > trying to set it up as mentioned in Tutorial at
> > http://wiki.apache.org/nutch/NutchTutorial
> > 
> > How ever I am not able to find crawl-urlfilter.txt inside conf directory.
> > Is there any other place where I should look for this file?
> > 
> > Thanks
> > Param

Re: Nutch Novice help

Posted by "Sethi, Parampreet" <pa...@teamaol.com>.
Thanks for a quick reply.

I searched in the nutch directory but still do not see that file :(. Here's
complete file list inside runtime/local/conf directory.

us137390:conf parampreetsethi$ pwd
/Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
us137390:conf parampreetsethi$ ls -t
automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
prefix-urlfilter.txt    solrindex-mapping.xml
configuration.xsl    httpclient-auth.xml    nutch-site.xml
regex-normalize.xml    subcollections.xml
domain-suffixes.xml    log4j.properties    parse-plugins.dtd
regex-urlfilter.txt    suffix-urlfilter.txt
domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml    schema.xml
tika-mimetypes.xml

By the way, I tried deploying the code by checking out from svn repository,
but could not build it. I was getting following error:

resolve-default:
[ivy:resolve] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:resolve] :: loading settings :: file =
/Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve] :::: WARNINGS
[ivy:resolve]         module not found:
org.apache.gora#gora-core;0.2-incubating
[ivy:resolve]     ==== local: tried
[ivy:resolve]      
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating/
ivys/ivy.xml
[ivy:resolve]       -- artifact
org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
[ivy:resolve]      
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating/
jars/gora-core.jar
[ivy:resolve]         module not found:
org.apache.gora#gora-sql;0.2-incubating
[ivy:resolve]     ==== local: tried
[ivy:resolve]      
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/i
vys/ivy.xml
[ivy:resolve]       -- artifact
org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
[ivy:resolve]      
/Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/j
ars/gora-sql.jar
[ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
[ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating: not found
[ivy:resolve]         :: org.apache.gora#gora-sql;0.2-incubating: not found
[ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

BUILD FAILED
/Users/parampreetsethi/Documents/workspace/nutch/build.xml:458: impossible
to resolve dependencies:
    resolve failed - see output for details


-param

On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <jc...@inforeverse.com> wrote:

> Look down a little further for the
> 
> or
> runtime/local/bin/nutch (version >= 1.3)
> 
> If you download the bin then it's in the runtime directory.
> 
> Jerry E. Craig, Jr.
> 
> -----Original Message-----
> From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com]
> Sent: Monday, July 11, 2011 2:51 PM
> To: user@nutch.apache.org
> Subject: Nutch Novice help
> 
> Hi All,
> 
> Sorry for such a naïve question,  I downloaded nutch 1.3 binary today and
> trying to set it up as mentioned in Tutorial at
> http://wiki.apache.org/nutch/NutchTutorial
> 
> How ever I am not able to find crawl-urlfilter.txt inside conf directory. Is
> there any other place where I should look for this file?
> 
> Thanks
> Param


RE: Nutch Novice help

Posted by "Jerry E. Craig, Jr." <jc...@inforeverse.com>.
Look down a little further for the 

or
runtime/local/bin/nutch (version >= 1.3)

If you download the bin then it's in the runtime directory.

Jerry E. Craig, Jr.

-----Original Message-----
From: Sethi, Parampreet [mailto:parampreet.sethi@teamaol.com] 
Sent: Monday, July 11, 2011 2:51 PM
To: user@nutch.apache.org
Subject: Nutch Novice help

Hi All,

Sorry for such a naïve question,  I downloaded nutch 1.3 binary today and trying to set it up as mentioned in Tutorial at http://wiki.apache.org/nutch/NutchTutorial

How ever I am not able to find crawl-urlfilter.txt inside conf directory. Is there any other place where I should look for this file?

Thanks
Param