You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Brian Cuttler <br...@wadsworth.org> on 2005/03/10 17:54:15 UTC

New nutch user, setup problems

Hello,

I'm running on a Solaris 9 system with Apache/Tomcat (SUN specific
release).

Previously we used Lucene example programs but build an internal
site index, we are now in the process of replacing our extranet
webserver and where looking for something beyond the Lucene examples
and received (from Chris Hostetter, hossman-lucene@focit.org) the
recommendation to install/use Nutch.

I've downloaded, unzipped and untar'd the kit but have hit a couple
of possibly related snags. The version of nutch is the current release
0.6 (though I was sure I saw 0.7 imbedded somewhere in it).

First thing was a problem with the LongLink file, it was converted
to a regular file, I don't know what information was lossed (unpacked
using gunzip and Solaris tar utilities).

When I try to run nutch I see the following 

> bin/nutch crawl urls -dir /tmp/nutch -depth 3
bin/nutch: IFS: cannot unset

We are interested (at this point) in indexing and being able to search
our external site. Server is currently internal, we will replace existing
server only when services on the new platform are complete. The site
is relatively small, 1.5 gig, only a few thousand documents.

Thanks in advance for your help,

						Brian

---
   Brian R Cuttler                 brian.cuttler@wadsworth.org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773


Re: New nutch user, setup problems

Posted by sub paul <su...@gmail.com>.
Just an addition to this..

I could not successfully unzip this file in windows. I generally use
winrar and it takes care of untarring and unzipping without an issue.

However.. tar -xzf  under linux works great for me.

Paul



On Thu, 10 Mar 2005 13:17:57 -0500, Brian Cuttler <br...@wadsworth.org> wrote:
> 
> Follow up to my own post.
> =========================
> 
> While not completely satisfied I knew I'd eventually google my
> way to a workaround, I don't know if its a good idea or not but
> it boils down to "comment out the IFS lines in the nutch script".
> 
> The author of the article was on Solaris 8 with Nutch 0.5 but the
> workaround seems valid. Workaround posted by Tito Sierra, North
> Carolina State University.
> 
> I haven't (yet) seen anything the the tar -xf error concerning
> the "LongLink" file issue.
> 
>                                                 thank you,
> 
>                                                 Brian
> ---
>    Brian R Cuttler                 brian.cuttler@wadsworth.org
>    Computer Systems Support        (v) 518 486-1697
>    Wadsworth Center                (f) 518 473-6384
>    NYS Department of Health        Help Desk 518 473-0773
> 
> On Thu, Mar 10, 2005 at 11:54:15AM -0500, Brian Cuttler wrote:
> > Hello,
> >
> > I'm running on a Solaris 9 system with Apache/Tomcat (SUN specific
> > release).
> >
> > Previously we used Lucene example programs but build an internal
> > site index, we are now in the process of replacing our extranet
> > webserver and where looking for something beyond the Lucene examples
> > and received (from Chris Hostetter, hossman-lucene@focit.org) the
> > recommendation to install/use Nutch.
> >
> > I've downloaded, unzipped and untar'd the kit but have hit a couple
> > of possibly related snags. The version of nutch is the current release
> > 0.6 (though I was sure I saw 0.7 imbedded somewhere in it).
> >
> > First thing was a problem with the LongLink file, it was converted
> > to a regular file, I don't know what information was lossed (unpacked
> > using gunzip and Solaris tar utilities).
> >
> > When I try to run nutch I see the following
> >
> > > bin/nutch crawl urls -dir /tmp/nutch -depth 3
> > bin/nutch: IFS: cannot unset
> >
> > We are interested (at this point) in indexing and being able to search
> > our external site. Server is currently internal, we will replace existing
> > server only when services on the new platform are complete. The site
> > is relatively small, 1.5 gig, only a few thousand documents.
> >
> > Thanks in advance for your help,
> >
> >                                               Brian
> >
> > ---
> >    Brian R Cuttler                 brian.cuttler@wadsworth.org
> >    Computer Systems Support        (v) 518 486-1697
> >    Wadsworth Center                (f) 518 473-6384
> >    NYS Department of Health        Help Desk 518 473-0773
> >
> ---
>    Brian R Cuttler                 brian.cuttler@wadsworth.org
>    Computer Systems Support        (v) 518 486-1697
>    Wadsworth Center                (f) 518 473-6384
>    NYS Department of Health        Help Desk 518 473-0773
> 
>

Re: New nutch user, setup problems

Posted by Brian Cuttler <br...@wadsworth.org>.
Follow up to my own post.
=========================

While not completely satisfied I knew I'd eventually google my
way to a workaround, I don't know if its a good idea or not but
it boils down to "comment out the IFS lines in the nutch script".

The author of the article was on Solaris 8 with Nutch 0.5 but the
workaround seems valid. Workaround posted by Tito Sierra, North
Carolina State University.

I haven't (yet) seen anything the the tar -xf error concerning
the "LongLink" file issue.

						thank you,

						Brian
---
   Brian R Cuttler                 brian.cuttler@wadsworth.org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773



On Thu, Mar 10, 2005 at 11:54:15AM -0500, Brian Cuttler wrote:
> Hello,
> 
> I'm running on a Solaris 9 system with Apache/Tomcat (SUN specific
> release).
> 
> Previously we used Lucene example programs but build an internal
> site index, we are now in the process of replacing our extranet
> webserver and where looking for something beyond the Lucene examples
> and received (from Chris Hostetter, hossman-lucene@focit.org) the
> recommendation to install/use Nutch.
> 
> I've downloaded, unzipped and untar'd the kit but have hit a couple
> of possibly related snags. The version of nutch is the current release
> 0.6 (though I was sure I saw 0.7 imbedded somewhere in it).
> 
> First thing was a problem with the LongLink file, it was converted
> to a regular file, I don't know what information was lossed (unpacked
> using gunzip and Solaris tar utilities).
> 
> When I try to run nutch I see the following 
> 
> > bin/nutch crawl urls -dir /tmp/nutch -depth 3
> bin/nutch: IFS: cannot unset
> 
> We are interested (at this point) in indexing and being able to search
> our external site. Server is currently internal, we will replace existing
> server only when services on the new platform are complete. The site
> is relatively small, 1.5 gig, only a few thousand documents.
> 
> Thanks in advance for your help,
> 
> 						Brian
> 
> ---
>    Brian R Cuttler                 brian.cuttler@wadsworth.org
>    Computer Systems Support        (v) 518 486-1697
>    Wadsworth Center                (f) 518 473-6384
>    NYS Department of Health        Help Desk 518 473-0773
> 
---
   Brian R Cuttler                 brian.cuttler@wadsworth.org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773