You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Bin Wang <bi...@gmail.com> on 2013/12/23 05:05:05 UTC

Step Through Nutch 1.7 Inside Eclipse Missing Argument

Hi there,

I was following the RunNutchInEclipse tutorial (1.7 Nutch / trunk example).

After I configured the java run configurations as the tutorial showed.. and
clicked run.  It did not show the injector process as shown in the
tutorial, and instead, it showed error:

    Usage: Injector <crawldb> <url_dir>

I took a look at the source code of the injector class and obviously, it is
somehow expecting two arguments.

    public void inject(Path crawlDb, Path urlDir) throws IOException..

I don't know how everyone got through this step only passing the URL path
to eclipse.

Show I add something more in the run configuration to also pass the crawldb
path as another argument? If so, where should the crawldb suppose to
locate?

(I am new to this mail list and assuming, question at the source code level
might be a better fit for the developer one. So let me know if I am asking
the wrong question in the wrong place..)

/usr/bin

--- here is some basically information of the platform that I am working on
----
Mac OS X 10.8.4
Apache Ant(TM) version 1.8.2 compiled on June 16 2012
svn, version 1.6.18 (r1303927) compiled Feb  6 2013, 14:18:52
Juno Eclipse SDK Version: 4.2.2 Build id: M20130204-1200

System JAVA:
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

Eclipse JAVA:
JVM 1.6.0.jdk

Re: Step Through Nutch 1.7 Inside Eclipse Missing Argument

Posted by Tejas Patil <te...@gmail.com>.
Hi Bin Wang,
You are welcome to edit the wiki and add your observations to it. Thanks
for your contribution.

~tejas


On Mon, Dec 23, 2013 at 8:19 AM, Bin Wang <bi...@gmail.com> wrote:

> Hi Tejas,
>
> Thanks a lot for your confirmation! And it is working for me now!
>
> I will take you as the tutorial author Tejas and correct me I was wrong.
> The tutorial you have written is very helpful, most of your tutorial have
> the mentioned how to work with Nutch 1.7 (trunk) even it is targeted at
> 2.X.
> I am wondering should I(/can I )go to the Wiki and add this solution to
> the Wiki so your tutorial is both consistent for Nutch 2.X and 1.X..
>
> (thought I should contribute back when I got the help from the community.)
>
> Thanks,
> /usr/bin
>
>
>
> On Sun, Dec 22, 2013 at 10:44 PM, Tejas Patil <te...@gmail.com>wrote:
>
>> You are asking the right question at the right place.
>> The example shown in the tutorial was for Nutch 2.X series. The 1.X
>> Injector needs an extra param as input which is the location of the crawldb
>> to inject the urls into. (For first time, it would create a new one on the
>> location in the command).
>>
>> Thanks,
>> Tejas
>>
>>
>> On Sun, Dec 22, 2013 at 8:05 PM, Bin Wang <bi...@gmail.com> wrote:
>>
>>> Hi there,
>>>
>>> I was following the RunNutchInEclipse tutorial (1.7 Nutch / trunk
>>> example).
>>>
>>> After I configured the java run configurations as the tutorial showed..
>>> and clicked run.  It did not show the injector process as shown in the
>>> tutorial, and instead, it showed error:
>>>
>>>     Usage: Injector <crawldb> <url_dir>
>>>
>>> I took a look at the source code of the injector class and obviously, it
>>> is somehow expecting two arguments.
>>>
>>>     public void inject(Path crawlDb, Path urlDir) throws IOException..
>>>
>>> I don't know how everyone got through this step only passing the URL
>>> path to eclipse.
>>>
>>> Show I add something more in the run configuration to also pass the
>>> crawldb path as another argument? If so, where should the crawldb suppose
>>> to locate?
>>>
>>> (I am new to this mail list and assuming, question at the source code
>>> level might be a better fit for the developer one. So let me know if I am
>>> asking the wrong question in the wrong place..)
>>>
>>> /usr/bin
>>>
>>> --- here is some basically information of the platform that I am working
>>> on ----
>>> Mac OS X 10.8.4
>>> Apache Ant(TM) version 1.8.2 compiled on June 16 2012
>>>  svn, version 1.6.18 (r1303927) compiled Feb  6 2013, 14:18:52
>>> Juno Eclipse SDK Version: 4.2.2 Build id: M20130204-1200
>>>
>>> System JAVA:
>>> java version "1.7.0_25"
>>> Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
>>> Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
>>>
>>> Eclipse JAVA:
>>> JVM 1.6.0.jdk
>>>
>>>
>>>
>>
>

Re: Step Through Nutch 1.7 Inside Eclipse Missing Argument

Posted by Bin Wang <bi...@gmail.com>.
Hi Tejas,

Thanks a lot for your confirmation! And it is working for me now!

I will take you as the tutorial author Tejas and correct me I was wrong.
The tutorial you have written is very helpful, most of your tutorial have
the mentioned how to work with Nutch 1.7 (trunk) even it is targeted at
2.X.
I am wondering should I(/can I )go to the Wiki and add this solution to the
Wiki so your tutorial is both consistent for Nutch 2.X and 1.X..

(thought I should contribute back when I got the help from the community.)

Thanks,
/usr/bin



On Sun, Dec 22, 2013 at 10:44 PM, Tejas Patil <te...@gmail.com>wrote:

> You are asking the right question at the right place.
> The example shown in the tutorial was for Nutch 2.X series. The 1.X
> Injector needs an extra param as input which is the location of the crawldb
> to inject the urls into. (For first time, it would create a new one on the
> location in the command).
>
> Thanks,
> Tejas
>
>
> On Sun, Dec 22, 2013 at 8:05 PM, Bin Wang <bi...@gmail.com> wrote:
>
>> Hi there,
>>
>> I was following the RunNutchInEclipse tutorial (1.7 Nutch / trunk
>> example).
>>
>> After I configured the java run configurations as the tutorial showed..
>> and clicked run.  It did not show the injector process as shown in the
>> tutorial, and instead, it showed error:
>>
>>     Usage: Injector <crawldb> <url_dir>
>>
>> I took a look at the source code of the injector class and obviously, it
>> is somehow expecting two arguments.
>>
>>     public void inject(Path crawlDb, Path urlDir) throws IOException..
>>
>> I don't know how everyone got through this step only passing the URL path
>> to eclipse.
>>
>> Show I add something more in the run configuration to also pass the
>> crawldb path as another argument? If so, where should the crawldb suppose
>> to locate?
>>
>> (I am new to this mail list and assuming, question at the source code
>> level might be a better fit for the developer one. So let me know if I am
>> asking the wrong question in the wrong place..)
>>
>> /usr/bin
>>
>> --- here is some basically information of the platform that I am working
>> on ----
>> Mac OS X 10.8.4
>> Apache Ant(TM) version 1.8.2 compiled on June 16 2012
>>  svn, version 1.6.18 (r1303927) compiled Feb  6 2013, 14:18:52
>> Juno Eclipse SDK Version: 4.2.2 Build id: M20130204-1200
>>
>> System JAVA:
>> java version "1.7.0_25"
>> Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
>>
>> Eclipse JAVA:
>> JVM 1.6.0.jdk
>>
>>
>>
>

Re: Step Through Nutch 1.7 Inside Eclipse Missing Argument

Posted by Tejas Patil <te...@gmail.com>.
You are asking the right question at the right place.
The example shown in the tutorial was for Nutch 2.X series. The 1.X
Injector needs an extra param as input which is the location of the crawldb
to inject the urls into. (For first time, it would create a new one on the
location in the command).

Thanks,
Tejas


On Sun, Dec 22, 2013 at 8:05 PM, Bin Wang <bi...@gmail.com> wrote:

> Hi there,
>
> I was following the RunNutchInEclipse tutorial (1.7 Nutch / trunk
> example).
>
> After I configured the java run configurations as the tutorial showed..
> and clicked run.  It did not show the injector process as shown in the
> tutorial, and instead, it showed error:
>
>     Usage: Injector <crawldb> <url_dir>
>
> I took a look at the source code of the injector class and obviously, it
> is somehow expecting two arguments.
>
>     public void inject(Path crawlDb, Path urlDir) throws IOException..
>
> I don't know how everyone got through this step only passing the URL path
> to eclipse.
>
> Show I add something more in the run configuration to also pass the
> crawldb path as another argument? If so, where should the crawldb suppose
> to locate?
>
> (I am new to this mail list and assuming, question at the source code
> level might be a better fit for the developer one. So let me know if I am
> asking the wrong question in the wrong place..)
>
> /usr/bin
>
> --- here is some basically information of the platform that I am working
> on ----
> Mac OS X 10.8.4
> Apache Ant(TM) version 1.8.2 compiled on June 16 2012
> svn, version 1.6.18 (r1303927) compiled Feb  6 2013, 14:18:52
> Juno Eclipse SDK Version: 4.2.2 Build id: M20130204-1200
>
> System JAVA:
> java version "1.7.0_25"
> Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
>
> Eclipse JAVA:
> JVM 1.6.0.jdk
>
>
>