You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by A Laxmi <a....@gmail.com> on 2014/02/07 23:18:00 UTC

Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Hi,

I am having issues building Nutch 2.2.1 behind my company firewall. My
build gets stuck here:

[ivy:resolve] :: loading settings :: file =
~/nutchtest/nutch/ivy/ivysettings.xml

When I contacted the hosting admin, they said - "Ant is trying to download
files from internet and it will have problems with our firewalls. You will
either have to download the files yourself and then scp/sftp them to the
machine. Unfortunately we don't have an http proxy."


>From further digging, I could see Ant is trying to access this link
http://ant.apache.org/ivy/. Could anyone please advise what I should do to
make Ant compile Nutch without accessing the internet? I can download
required files from http://ant.apache.org/ivy/ and scp/sftp to the server
but I am not sure what files to download and where to put them?

Thanks for your help!!

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by A Laxmi <a....@gmail.com>.

Thanks Tejas and dk! I will give it a try and will let you guys know.


On Saturday, February 8, 2014, d_k <ma...@gmail.com> wrote:

> Tejas Patil is right, you should copy over the .ivy2 folder and it will
> work.
>
> You can extract it to some other location and run ant with the parameter
> "-D
> ivy.cache.dir=/path/to/extraced/cache".
>
> In order to use the eclipse project behind a firewall you can either run
> 'ant eclipse' and copy over the .project and .classpath files or download
> the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and then
> either edit the ant-eclipse-download target in build.xml to a web server
> serving the copied tar over http or change the build.xml
> ant-eclipse-download target from a get task to something along the lines
> of:
>
> <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> todir="${build.dir}" />
>
> [0]
>
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
>
>
> On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <tejas.patil.cs@gmail.com<javascript:;>
> >wrote:
>
> > This has to do more with ant and nothing about nutch. Here is a wild
> idea:
> >
> > Grab a linux box without any internet restrictions, download nutch over
> it
> > and build it. In the user home, there would a hidden directory ".ivy2"
> > which is a local ivy cache. Create a tarball of the same and scp it over
> > your work machine, extract it in home directory and then run nutch build.
> >
> > PS: I have never done this for ivy but for maven and it had worked.
> >
> > ~tejas
> >
> >
> > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a.lakshmi458@gmail.com<javascript:;>>
> wrote:
> >
> > > Hi,
> > >
> > > I am having issues building Nutch 2.2.1 behind my company firewall. My
> > > build gets stuck here:
> > >
> > > [ivy:resolve] :: loading settings :: file =
> > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > >
> > > When I contacted the hosting admin, they said - "Ant is trying to
> > download
> > > files from internet and it will have problems with our firewalls. You
> > will
> > > either have to download the files yourself and then scp/sftp them to
> the
> > > machine. Unfortunately we don't have an http proxy."
> > >
> > >
> > > From further digging, I could see Ant is trying to access this link
> > > http://ant.apache.org/ivy/. Could anyone please advise what I should
> do
> > to
> > > make Ant compile Nutch without accessing the internet? I can download
> > > required files from http://ant.apache.org/ivy/ and scp/sftp to the
> > server
> > > but I am not sure what files to download and where to put them?
> > >
> > > Thanks for your help!!
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by A Laxmi <a....@gmail.com>.

sure d_k. I will post it as a new question.



On Tue, Feb 11, 2014 at 5:17 PM, d_k <ma...@gmail.com> wrote:

> I think it will be better if you'll resend the last question to the list
> under a new subject so the thread will revolve around a single topic. :-)
> When threads span more than one topic its harder to find answers when some
> one will be searching for them in the future.
>
>
> On Tue, Feb 11, 2014 at 11:40 PM, A Laxmi <a....@gmail.com> wrote:
>
> > d_k:
> >
> > I think I had space when I tried and that could be the reason why it
> didn't
> > work! Nutch 2.2.1 is built successfully now. Thank you!!
> >
> > Now, I got into a new issue  - I tried to run my first crawl on the
> target
> > server wherever I had the firewall limitation preventing access to
> > internet. I started getting timedout errors. Looks like firewall is
> > blocking my nutch crawler to crawl any site. Please suggest what can be
> > done? For ant to compile, I could scp .ivy2 folder to the tarrget server
> > and compiled it. I am totally not sure how I go about getting nutch crawl
> > website in such a firewall restricted environment?
> >
> > Thanks!
> >
> >
> > On Tue, Feb 11, 2014 at 2:30 PM, d_k <ma...@gmail.com> wrote:
> >
> > > There shouldn't be a space between -D and ivy.cache.dir, that is
> > > "-Divy.cache.dir=..." and not "-D ivy.cache.dir=..." and I trust you
> > > changed "/path/to/extraced/cache" to the correct path?
> > >
> > >
> > > On Tue, Feb 11, 2014 at 7:54 PM, A Laxmi <a....@gmail.com>
> wrote:
> > >
> > > > d_k & Tejas:
> > > >
> > > > Yay!! It worked!! I had to extract the uploaded ivy2 folder to
> > > /root/.ivy2
> > > > and had to use "ant runtime". It ran very well and BUILD was
> > Successful!
> > > >
> > > > On a side note, I initially tried to put ivy2 folder in a different
> > path
> > > > and used this parameter "-D ivy.cache.dir=/path/to/extraced/cache".
> > which
> > > > didn't work, not sure why.
> > > >
> > > > Thanks so much!!!
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Feb 11, 2014 at 11:52 AM, d_k <ma...@gmail.com> wrote:
> > > >
> > > > > This should work as is. Copy them to the target server and try to
> > > > compile.
> > > > >
> > > > >
> > > > > On Tue, Feb 11, 2014 at 6:18 PM, A Laxmi <a....@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I compiled nutch in a linux server connected to internet like
> Tejas
> > > > > > suggested and found the .iv2 folder. However, there are some
> files
> > in
> > > > > that
> > > > > > folder with filenames that has its own hostname as part of the
> > > > filename.
> > > > > I
> > > > > > am wondering how I can scp this .iv2 folder to other server which
> > > has a
> > > > > > different hostname? Can I just manually edit those filenames to
> > match
> > > > > other
> > > > > > server hostname? Please advise
> > > > > >
> > > > > >
> > > > > > On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:
> > > > > >
> > > > > > > Tejas Patil is right, you should copy over the .ivy2 folder and
> > it
> > > > will
> > > > > > > work.
> > > > > > >
> > > > > > > You can extract it to some other location and run ant with the
> > > > > parameter
> > > > > > > "-D
> > > > > > > ivy.cache.dir=/path/to/extraced/cache".
> > > > > > >
> > > > > > > In order to use the eclipse project behind a firewall you can
> > > either
> > > > > run
> > > > > > > 'ant eclipse' and copy over the .project and .classpath files
> or
> > > > > download
> > > > > > > the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0]
> and
> > > then
> > > > > > > either edit the ant-eclipse-download target in build.xml to a
> web
> > > > > server
> > > > > > > serving the copied tar over http or change the build.xml
> > > > > > > ant-eclipse-download target from a get task to something along
> > the
> > > > > lines
> > > > > > > of:
> > > > > > >
> > > > > > > <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> > > > > > > todir="${build.dir}" />
> > > > > > >
> > > > > > > [0]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <
> > > > tejas.patil.cs@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > This has to do more with ant and nothing about nutch. Here
> is a
> > > > wild
> > > > > > > idea:
> > > > > > > >
> > > > > > > > Grab a linux box without any internet restrictions, download
> > > nutch
> > > > > over
> > > > > > > it
> > > > > > > > and build it. In the user home, there would a hidden
> directory
> > > > > ".ivy2"
> > > > > > > > which is a local ivy cache. Create a tarball of the same and
> > scp
> > > it
> > > > > > over
> > > > > > > > your work machine, extract it in home directory and then run
> > > nutch
> > > > > > build.
> > > > > > > >
> > > > > > > > PS: I have never done this for ivy but for maven and it had
> > > worked.
> > > > > > > >
> > > > > > > > ~tejas
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <
> > a.lakshmi458@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I am having issues building Nutch 2.2.1 behind my company
> > > > firewall.
> > > > > > My
> > > > > > > > > build gets stuck here:
> > > > > > > > >
> > > > > > > > > [ivy:resolve] :: loading settings :: file =
> > > > > > > > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > > > > > > > >
> > > > > > > > > When I contacted the hosting admin, they said - "Ant is
> > trying
> > > to
> > > > > > > > download
> > > > > > > > > files from internet and it will have problems with our
> > > firewalls.
> > > > > You
> > > > > > > > will
> > > > > > > > > either have to download the files yourself and then
> scp/sftp
> > > them
> > > > > to
> > > > > > > the
> > > > > > > > > machine. Unfortunately we don't have an http proxy."
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > From further digging, I could see Ant is trying to access
> > this
> > > > link
> > > > > > > > > http://ant.apache.org/ivy/. Could anyone please advise
> what
> > I
> > > > > should
> > > > > > > do
> > > > > > > > to
> > > > > > > > > make Ant compile Nutch without accessing the internet? I
> can
> > > > > download
> > > > > > > > > required files from http://ant.apache.org/ivy/ and
> scp/sftp
> > to
> > > > the
> > > > > > > > server
> > > > > > > > > but I am not sure what files to download and where to put
> > them?
> > > > > > > > >
> > > > > > > > > Thanks for your help!!
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by d_k <ma...@gmail.com>.

I think it will be better if you'll resend the last question to the list
under a new subject so the thread will revolve around a single topic. :-)
When threads span more than one topic its harder to find answers when some
one will be searching for them in the future.


On Tue, Feb 11, 2014 at 11:40 PM, A Laxmi <a....@gmail.com> wrote:

> d_k:
>
> I think I had space when I tried and that could be the reason why it didn't
> work! Nutch 2.2.1 is built successfully now. Thank you!!
>
> Now, I got into a new issue  - I tried to run my first crawl on the target
> server wherever I had the firewall limitation preventing access to
> internet. I started getting timedout errors. Looks like firewall is
> blocking my nutch crawler to crawl any site. Please suggest what can be
> done? For ant to compile, I could scp .ivy2 folder to the tarrget server
> and compiled it. I am totally not sure how I go about getting nutch crawl
> website in such a firewall restricted environment?
>
> Thanks!
>
>
> On Tue, Feb 11, 2014 at 2:30 PM, d_k <ma...@gmail.com> wrote:
>
> > There shouldn't be a space between -D and ivy.cache.dir, that is
> > "-Divy.cache.dir=..." and not "-D ivy.cache.dir=..." and I trust you
> > changed "/path/to/extraced/cache" to the correct path?
> >
> >
> > On Tue, Feb 11, 2014 at 7:54 PM, A Laxmi <a....@gmail.com> wrote:
> >
> > > d_k & Tejas:
> > >
> > > Yay!! It worked!! I had to extract the uploaded ivy2 folder to
> > /root/.ivy2
> > > and had to use "ant runtime". It ran very well and BUILD was
> Successful!
> > >
> > > On a side note, I initially tried to put ivy2 folder in a different
> path
> > > and used this parameter "-D ivy.cache.dir=/path/to/extraced/cache".
> which
> > > didn't work, not sure why.
> > >
> > > Thanks so much!!!
> > >
> > >
> > >
> > >
> > > On Tue, Feb 11, 2014 at 11:52 AM, d_k <ma...@gmail.com> wrote:
> > >
> > > > This should work as is. Copy them to the target server and try to
> > > compile.
> > > >
> > > >
> > > > On Tue, Feb 11, 2014 at 6:18 PM, A Laxmi <a....@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I compiled nutch in a linux server connected to internet like Tejas
> > > > > suggested and found the .iv2 folder. However, there are some files
> in
> > > > that
> > > > > folder with filenames that has its own hostname as part of the
> > > filename.
> > > > I
> > > > > am wondering how I can scp this .iv2 folder to other server which
> > has a
> > > > > different hostname? Can I just manually edit those filenames to
> match
> > > > other
> > > > > server hostname? Please advise
> > > > >
> > > > >
> > > > > On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:
> > > > >
> > > > > > Tejas Patil is right, you should copy over the .ivy2 folder and
> it
> > > will
> > > > > > work.
> > > > > >
> > > > > > You can extract it to some other location and run ant with the
> > > > parameter
> > > > > > "-D
> > > > > > ivy.cache.dir=/path/to/extraced/cache".
> > > > > >
> > > > > > In order to use the eclipse project behind a firewall you can
> > either
> > > > run
> > > > > > 'ant eclipse' and copy over the .project and .classpath files or
> > > > download
> > > > > > the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and
> > then
> > > > > > either edit the ant-eclipse-download target in build.xml to a web
> > > > server
> > > > > > serving the copied tar over http or change the build.xml
> > > > > > ant-eclipse-download target from a get task to something along
> the
> > > > lines
> > > > > > of:
> > > > > >
> > > > > > <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> > > > > > todir="${build.dir}" />
> > > > > >
> > > > > > [0]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
> > > > > >
> > > > > >
> > > > > > On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <
> > > tejas.patil.cs@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > This has to do more with ant and nothing about nutch. Here is a
> > > wild
> > > > > > idea:
> > > > > > >
> > > > > > > Grab a linux box without any internet restrictions, download
> > nutch
> > > > over
> > > > > > it
> > > > > > > and build it. In the user home, there would a hidden directory
> > > > ".ivy2"
> > > > > > > which is a local ivy cache. Create a tarball of the same and
> scp
> > it
> > > > > over
> > > > > > > your work machine, extract it in home directory and then run
> > nutch
> > > > > build.
> > > > > > >
> > > > > > > PS: I have never done this for ivy but for maven and it had
> > worked.
> > > > > > >
> > > > > > > ~tejas
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <
> a.lakshmi458@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am having issues building Nutch 2.2.1 behind my company
> > > firewall.
> > > > > My
> > > > > > > > build gets stuck here:
> > > > > > > >
> > > > > > > > [ivy:resolve] :: loading settings :: file =
> > > > > > > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > > > > > > >
> > > > > > > > When I contacted the hosting admin, they said - "Ant is
> trying
> > to
> > > > > > > download
> > > > > > > > files from internet and it will have problems with our
> > firewalls.
> > > > You
> > > > > > > will
> > > > > > > > either have to download the files yourself and then scp/sftp
> > them
> > > > to
> > > > > > the
> > > > > > > > machine. Unfortunately we don't have an http proxy."
> > > > > > > >
> > > > > > > >
> > > > > > > > From further digging, I could see Ant is trying to access
> this
> > > link
> > > > > > > > http://ant.apache.org/ivy/. Could anyone please advise what
> I
> > > > should
> > > > > > do
> > > > > > > to
> > > > > > > > make Ant compile Nutch without accessing the internet? I can
> > > > download
> > > > > > > > required files from http://ant.apache.org/ivy/ and scp/sftp
> to
> > > the
> > > > > > > server
> > > > > > > > but I am not sure what files to download and where to put
> them?
> > > > > > > >
> > > > > > > > Thanks for your help!!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by A Laxmi <a....@gmail.com>.

d_k:

I think I had space when I tried and that could be the reason why it didn't
work! Nutch 2.2.1 is built successfully now. Thank you!!

Now, I got into a new issue  - I tried to run my first crawl on the target
server wherever I had the firewall limitation preventing access to
internet. I started getting timedout errors. Looks like firewall is
blocking my nutch crawler to crawl any site. Please suggest what can be
done? For ant to compile, I could scp .ivy2 folder to the tarrget server
and compiled it. I am totally not sure how I go about getting nutch crawl
website in such a firewall restricted environment?

Thanks!


On Tue, Feb 11, 2014 at 2:30 PM, d_k <ma...@gmail.com> wrote:

> There shouldn't be a space between -D and ivy.cache.dir, that is
> "-Divy.cache.dir=..." and not "-D ivy.cache.dir=..." and I trust you
> changed "/path/to/extraced/cache" to the correct path?
>
>
> On Tue, Feb 11, 2014 at 7:54 PM, A Laxmi <a....@gmail.com> wrote:
>
> > d_k & Tejas:
> >
> > Yay!! It worked!! I had to extract the uploaded ivy2 folder to
> /root/.ivy2
> > and had to use "ant runtime". It ran very well and BUILD was Successful!
> >
> > On a side note, I initially tried to put ivy2 folder in a different path
> > and used this parameter "-D ivy.cache.dir=/path/to/extraced/cache". which
> > didn't work, not sure why.
> >
> > Thanks so much!!!
> >
> >
> >
> >
> > On Tue, Feb 11, 2014 at 11:52 AM, d_k <ma...@gmail.com> wrote:
> >
> > > This should work as is. Copy them to the target server and try to
> > compile.
> > >
> > >
> > > On Tue, Feb 11, 2014 at 6:18 PM, A Laxmi <a....@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I compiled nutch in a linux server connected to internet like Tejas
> > > > suggested and found the .iv2 folder. However, there are some files in
> > > that
> > > > folder with filenames that has its own hostname as part of the
> > filename.
> > > I
> > > > am wondering how I can scp this .iv2 folder to other server which
> has a
> > > > different hostname? Can I just manually edit those filenames to match
> > > other
> > > > server hostname? Please advise
> > > >
> > > >
> > > > On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:
> > > >
> > > > > Tejas Patil is right, you should copy over the .ivy2 folder and it
> > will
> > > > > work.
> > > > >
> > > > > You can extract it to some other location and run ant with the
> > > parameter
> > > > > "-D
> > > > > ivy.cache.dir=/path/to/extraced/cache".
> > > > >
> > > > > In order to use the eclipse project behind a firewall you can
> either
> > > run
> > > > > 'ant eclipse' and copy over the .project and .classpath files or
> > > download
> > > > > the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and
> then
> > > > > either edit the ant-eclipse-download target in build.xml to a web
> > > server
> > > > > serving the copied tar over http or change the build.xml
> > > > > ant-eclipse-download target from a get task to something along the
> > > lines
> > > > > of:
> > > > >
> > > > > <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> > > > > todir="${build.dir}" />
> > > > >
> > > > > [0]
> > > > >
> > > > >
> > > >
> > >
> >
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
> > > > >
> > > > >
> > > > > On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <
> > tejas.patil.cs@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > This has to do more with ant and nothing about nutch. Here is a
> > wild
> > > > > idea:
> > > > > >
> > > > > > Grab a linux box without any internet restrictions, download
> nutch
> > > over
> > > > > it
> > > > > > and build it. In the user home, there would a hidden directory
> > > ".ivy2"
> > > > > > which is a local ivy cache. Create a tarball of the same and scp
> it
> > > > over
> > > > > > your work machine, extract it in home directory and then run
> nutch
> > > > build.
> > > > > >
> > > > > > PS: I have never done this for ivy but for maven and it had
> worked.
> > > > > >
> > > > > > ~tejas
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am having issues building Nutch 2.2.1 behind my company
> > firewall.
> > > > My
> > > > > > > build gets stuck here:
> > > > > > >
> > > > > > > [ivy:resolve] :: loading settings :: file =
> > > > > > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > > > > > >
> > > > > > > When I contacted the hosting admin, they said - "Ant is trying
> to
> > > > > > download
> > > > > > > files from internet and it will have problems with our
> firewalls.
> > > You
> > > > > > will
> > > > > > > either have to download the files yourself and then scp/sftp
> them
> > > to
> > > > > the
> > > > > > > machine. Unfortunately we don't have an http proxy."
> > > > > > >
> > > > > > >
> > > > > > > From further digging, I could see Ant is trying to access this
> > link
> > > > > > > http://ant.apache.org/ivy/. Could anyone please advise what I
> > > should
> > > > > do
> > > > > > to
> > > > > > > make Ant compile Nutch without accessing the internet? I can
> > > download
> > > > > > > required files from http://ant.apache.org/ivy/ and scp/sftp to
> > the
> > > > > > server
> > > > > > > but I am not sure what files to download and where to put them?
> > > > > > >
> > > > > > > Thanks for your help!!
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by d_k <ma...@gmail.com>.

There shouldn't be a space between -D and ivy.cache.dir, that is
"-Divy.cache.dir=..." and not "-D ivy.cache.dir=..." and I trust you
changed "/path/to/extraced/cache" to the correct path?


On Tue, Feb 11, 2014 at 7:54 PM, A Laxmi <a....@gmail.com> wrote:

> d_k & Tejas:
>
> Yay!! It worked!! I had to extract the uploaded ivy2 folder to /root/.ivy2
> and had to use "ant runtime". It ran very well and BUILD was Successful!
>
> On a side note, I initially tried to put ivy2 folder in a different path
> and used this parameter "-D ivy.cache.dir=/path/to/extraced/cache". which
> didn't work, not sure why.
>
> Thanks so much!!!
>
>
>
>
> On Tue, Feb 11, 2014 at 11:52 AM, d_k <ma...@gmail.com> wrote:
>
> > This should work as is. Copy them to the target server and try to
> compile.
> >
> >
> > On Tue, Feb 11, 2014 at 6:18 PM, A Laxmi <a....@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I compiled nutch in a linux server connected to internet like Tejas
> > > suggested and found the .iv2 folder. However, there are some files in
> > that
> > > folder with filenames that has its own hostname as part of the
> filename.
> > I
> > > am wondering how I can scp this .iv2 folder to other server which has a
> > > different hostname? Can I just manually edit those filenames to match
> > other
> > > server hostname? Please advise
> > >
> > >
> > > On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:
> > >
> > > > Tejas Patil is right, you should copy over the .ivy2 folder and it
> will
> > > > work.
> > > >
> > > > You can extract it to some other location and run ant with the
> > parameter
> > > > "-D
> > > > ivy.cache.dir=/path/to/extraced/cache".
> > > >
> > > > In order to use the eclipse project behind a firewall you can either
> > run
> > > > 'ant eclipse' and copy over the .project and .classpath files or
> > download
> > > > the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and then
> > > > either edit the ant-eclipse-download target in build.xml to a web
> > server
> > > > serving the copied tar over http or change the build.xml
> > > > ant-eclipse-download target from a get task to something along the
> > lines
> > > > of:
> > > >
> > > > <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> > > > todir="${build.dir}" />
> > > >
> > > > [0]
> > > >
> > > >
> > >
> >
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
> > > >
> > > >
> > > > On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <
> tejas.patil.cs@gmail.com
> > > > >wrote:
> > > >
> > > > > This has to do more with ant and nothing about nutch. Here is a
> wild
> > > > idea:
> > > > >
> > > > > Grab a linux box without any internet restrictions, download nutch
> > over
> > > > it
> > > > > and build it. In the user home, there would a hidden directory
> > ".ivy2"
> > > > > which is a local ivy cache. Create a tarball of the same and scp it
> > > over
> > > > > your work machine, extract it in home directory and then run nutch
> > > build.
> > > > >
> > > > > PS: I have never done this for ivy but for maven and it had worked.
> > > > >
> > > > > ~tejas
> > > > >
> > > > >
> > > > > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am having issues building Nutch 2.2.1 behind my company
> firewall.
> > > My
> > > > > > build gets stuck here:
> > > > > >
> > > > > > [ivy:resolve] :: loading settings :: file =
> > > > > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > > > > >
> > > > > > When I contacted the hosting admin, they said - "Ant is trying to
> > > > > download
> > > > > > files from internet and it will have problems with our firewalls.
> > You
> > > > > will
> > > > > > either have to download the files yourself and then scp/sftp them
> > to
> > > > the
> > > > > > machine. Unfortunately we don't have an http proxy."
> > > > > >
> > > > > >
> > > > > > From further digging, I could see Ant is trying to access this
> link
> > > > > > http://ant.apache.org/ivy/. Could anyone please advise what I
> > should
> > > > do
> > > > > to
> > > > > > make Ant compile Nutch without accessing the internet? I can
> > download
> > > > > > required files from http://ant.apache.org/ivy/ and scp/sftp to
> the
> > > > > server
> > > > > > but I am not sure what files to download and where to put them?
> > > > > >
> > > > > > Thanks for your help!!
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by A Laxmi <a....@gmail.com>.

d_k & Tejas:

Yay!! It worked!! I had to extract the uploaded ivy2 folder to /root/.ivy2
and had to use "ant runtime". It ran very well and BUILD was Successful!

On a side note, I initially tried to put ivy2 folder in a different path
and used this parameter "-D ivy.cache.dir=/path/to/extraced/cache". which
didn't work, not sure why.

Thanks so much!!!




On Tue, Feb 11, 2014 at 11:52 AM, d_k <ma...@gmail.com> wrote:

> This should work as is. Copy them to the target server and try to compile.
>
>
> On Tue, Feb 11, 2014 at 6:18 PM, A Laxmi <a....@gmail.com> wrote:
>
> > Hi,
> >
> > I compiled nutch in a linux server connected to internet like Tejas
> > suggested and found the .iv2 folder. However, there are some files in
> that
> > folder with filenames that has its own hostname as part of the filename.
> I
> > am wondering how I can scp this .iv2 folder to other server which has a
> > different hostname? Can I just manually edit those filenames to match
> other
> > server hostname? Please advise
> >
> >
> > On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:
> >
> > > Tejas Patil is right, you should copy over the .ivy2 folder and it will
> > > work.
> > >
> > > You can extract it to some other location and run ant with the
> parameter
> > > "-D
> > > ivy.cache.dir=/path/to/extraced/cache".
> > >
> > > In order to use the eclipse project behind a firewall you can either
> run
> > > 'ant eclipse' and copy over the .project and .classpath files or
> download
> > > the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and then
> > > either edit the ant-eclipse-download target in build.xml to a web
> server
> > > serving the copied tar over http or change the build.xml
> > > ant-eclipse-download target from a get task to something along the
> lines
> > > of:
> > >
> > > <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> > > todir="${build.dir}" />
> > >
> > > [0]
> > >
> > >
> >
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
> > >
> > >
> > > On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <tejas.patil.cs@gmail.com
> > > >wrote:
> > >
> > > > This has to do more with ant and nothing about nutch. Here is a wild
> > > idea:
> > > >
> > > > Grab a linux box without any internet restrictions, download nutch
> over
> > > it
> > > > and build it. In the user home, there would a hidden directory
> ".ivy2"
> > > > which is a local ivy cache. Create a tarball of the same and scp it
> > over
> > > > your work machine, extract it in home directory and then run nutch
> > build.
> > > >
> > > > PS: I have never done this for ivy but for maven and it had worked.
> > > >
> > > > ~tejas
> > > >
> > > >
> > > > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am having issues building Nutch 2.2.1 behind my company firewall.
> > My
> > > > > build gets stuck here:
> > > > >
> > > > > [ivy:resolve] :: loading settings :: file =
> > > > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > > > >
> > > > > When I contacted the hosting admin, they said - "Ant is trying to
> > > > download
> > > > > files from internet and it will have problems with our firewalls.
> You
> > > > will
> > > > > either have to download the files yourself and then scp/sftp them
> to
> > > the
> > > > > machine. Unfortunately we don't have an http proxy."
> > > > >
> > > > >
> > > > > From further digging, I could see Ant is trying to access this link
> > > > > http://ant.apache.org/ivy/. Could anyone please advise what I
> should
> > > do
> > > > to
> > > > > make Ant compile Nutch without accessing the internet? I can
> download
> > > > > required files from http://ant.apache.org/ivy/ and scp/sftp to the
> > > > server
> > > > > but I am not sure what files to download and where to put them?
> > > > >
> > > > > Thanks for your help!!
> > > > >
> > > >
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by A Laxmi <a....@gmail.com>.

Oh okay. Let me try that! Thanks d_k! I will get back this once I try that.


On Tue, Feb 11, 2014 at 11:52 AM, d_k <ma...@gmail.com> wrote:

> This should work as is. Copy them to the target server and try to compile.
>
>
> On Tue, Feb 11, 2014 at 6:18 PM, A Laxmi <a....@gmail.com> wrote:
>
> > Hi,
> >
> > I compiled nutch in a linux server connected to internet like Tejas
> > suggested and found the .iv2 folder. However, there are some files in
> that
> > folder with filenames that has its own hostname as part of the filename.
> I
> > am wondering how I can scp this .iv2 folder to other server which has a
> > different hostname? Can I just manually edit those filenames to match
> other
> > server hostname? Please advise
> >
> >
> > On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:
> >
> > > Tejas Patil is right, you should copy over the .ivy2 folder and it will
> > > work.
> > >
> > > You can extract it to some other location and run ant with the
> parameter
> > > "-D
> > > ivy.cache.dir=/path/to/extraced/cache".
> > >
> > > In order to use the eclipse project behind a firewall you can either
> run
> > > 'ant eclipse' and copy over the .project and .classpath files or
> download
> > > the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and then
> > > either edit the ant-eclipse-download target in build.xml to a web
> server
> > > serving the copied tar over http or change the build.xml
> > > ant-eclipse-download target from a get task to something along the
> lines
> > > of:
> > >
> > > <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> > > todir="${build.dir}" />
> > >
> > > [0]
> > >
> > >
> >
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
> > >
> > >
> > > On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <tejas.patil.cs@gmail.com
> > > >wrote:
> > >
> > > > This has to do more with ant and nothing about nutch. Here is a wild
> > > idea:
> > > >
> > > > Grab a linux box without any internet restrictions, download nutch
> over
> > > it
> > > > and build it. In the user home, there would a hidden directory
> ".ivy2"
> > > > which is a local ivy cache. Create a tarball of the same and scp it
> > over
> > > > your work machine, extract it in home directory and then run nutch
> > build.
> > > >
> > > > PS: I have never done this for ivy but for maven and it had worked.
> > > >
> > > > ~tejas
> > > >
> > > >
> > > > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am having issues building Nutch 2.2.1 behind my company firewall.
> > My
> > > > > build gets stuck here:
> > > > >
> > > > > [ivy:resolve] :: loading settings :: file =
> > > > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > > > >
> > > > > When I contacted the hosting admin, they said - "Ant is trying to
> > > > download
> > > > > files from internet and it will have problems with our firewalls.
> You
> > > > will
> > > > > either have to download the files yourself and then scp/sftp them
> to
> > > the
> > > > > machine. Unfortunately we don't have an http proxy."
> > > > >
> > > > >
> > > > > From further digging, I could see Ant is trying to access this link
> > > > > http://ant.apache.org/ivy/. Could anyone please advise what I
> should
> > > do
> > > > to
> > > > > make Ant compile Nutch without accessing the internet? I can
> download
> > > > > required files from http://ant.apache.org/ivy/ and scp/sftp to the
> > > > server
> > > > > but I am not sure what files to download and where to put them?
> > > > >
> > > > > Thanks for your help!!
> > > > >
> > > >
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by d_k <ma...@gmail.com>.

This should work as is. Copy them to the target server and try to compile.


On Tue, Feb 11, 2014 at 6:18 PM, A Laxmi <a....@gmail.com> wrote:

> Hi,
>
> I compiled nutch in a linux server connected to internet like Tejas
> suggested and found the .iv2 folder. However, there are some files in that
> folder with filenames that has its own hostname as part of the filename. I
> am wondering how I can scp this .iv2 folder to other server which has a
> different hostname? Can I just manually edit those filenames to match other
> server hostname? Please advise
>
>
> On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:
>
> > Tejas Patil is right, you should copy over the .ivy2 folder and it will
> > work.
> >
> > You can extract it to some other location and run ant with the parameter
> > "-D
> > ivy.cache.dir=/path/to/extraced/cache".
> >
> > In order to use the eclipse project behind a firewall you can either run
> > 'ant eclipse' and copy over the .project and .classpath files or download
> > the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and then
> > either edit the ant-eclipse-download target in build.xml to a web server
> > serving the copied tar over http or change the build.xml
> > ant-eclipse-download target from a get task to something along the lines
> > of:
> >
> > <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> > todir="${build.dir}" />
> >
> > [0]
> >
> >
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
> >
> >
> > On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <tejas.patil.cs@gmail.com
> > >wrote:
> >
> > > This has to do more with ant and nothing about nutch. Here is a wild
> > idea:
> > >
> > > Grab a linux box without any internet restrictions, download nutch over
> > it
> > > and build it. In the user home, there would a hidden directory ".ivy2"
> > > which is a local ivy cache. Create a tarball of the same and scp it
> over
> > > your work machine, extract it in home directory and then run nutch
> build.
> > >
> > > PS: I have never done this for ivy but for maven and it had worked.
> > >
> > > ~tejas
> > >
> > >
> > > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am having issues building Nutch 2.2.1 behind my company firewall.
> My
> > > > build gets stuck here:
> > > >
> > > > [ivy:resolve] :: loading settings :: file =
> > > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > > >
> > > > When I contacted the hosting admin, they said - "Ant is trying to
> > > download
> > > > files from internet and it will have problems with our firewalls. You
> > > will
> > > > either have to download the files yourself and then scp/sftp them to
> > the
> > > > machine. Unfortunately we don't have an http proxy."
> > > >
> > > >
> > > > From further digging, I could see Ant is trying to access this link
> > > > http://ant.apache.org/ivy/. Could anyone please advise what I should
> > do
> > > to
> > > > make Ant compile Nutch without accessing the internet? I can download
> > > > required files from http://ant.apache.org/ivy/ and scp/sftp to the
> > > server
> > > > but I am not sure what files to download and where to put them?
> > > >
> > > > Thanks for your help!!
> > > >
> > >
> >
>

Re: Nutch 2.2.1 can not index to solr

Posted by Gavin <27...@qq.com>.

my solr is 4.6.1. 
I followed the steps in https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup and https://wiki.apache.org/nutch/RunNutchInEclipse.
Mybe there is something wrong in the schema.xml of solr.
please give me a working schema.xml!   thanks!
And where is the log file for the solr? I cant find any exception from the console of solr.

thanks a lot!



------------------ Original ------------------
From:  "d_k";<ma...@gmail.com>;
Date:  Wed, Feb 12, 2014 07:15 PM
To:  "user"<us...@nutch.apache.org>; 

Subject:  Re: Nutch 2.2.1 can not index to solr



Are you sure solr is not throwing any errors?
Did you make any changes to the schema? What schema does Solr use? What
version of Solr are you using?
You can turn on the debug logs by changing the logging level to DEBUG in
the log4j.properties properties file inside the conf dir in the
runtime/local dir. (I assume this is your setup, let me know if its not).
You can also try to debug nutch in eclipse as described here:
https://wiki.apache.org/nutch/RunNutchInEclipse


On Wed, Feb 12, 2014 at 11:31 AM, Gavin <27...@qq.com> wrote:

> andm my solr:
>
>
> Statistics
>
>                                  Last Modified:
> Num Docs:0Max Doc:0Heap Memory Usage:0Deleted Docs:0Version:1Segment
> Count:0Optimized:
> Current:
>
>
>
> what is wrong?
>
> Thanks for your help!!!
>
>
>
>
>
> ------------------ Original ------------------
> From:  "274614348";<27...@qq.com>;
> Date:  Wed, Feb 12, 2014 05:24 PM
> To:  "user"<us...@nutch.apache.org>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> Here is my output:
>
>
> [Gavin@Gavin local]$ bin/nutch  inject urls
> InjectorJob: starting at 2014-02-12 17:16:20
> InjectorJob: Injecting urlDir: urls
> InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the
> Gora storage class.
> InjectorJob: total number of urls rejected by filters: 0
> InjectorJob: total number of urls injected after normalization and
> filtering: 1
> Injector: finished at 2014-02-12 17:16:25, elapsed: 00:00:04
> [Gavin@Gavin local]$ bin/nutch generate -topN 5
> GeneratorJob: starting at 2014-02-12 17:16:46
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: normalizing: true
> GeneratorJob: topN: 5
> GeneratorJob: finished at 2014-02-12 17:16:51, time elapsed: 00:00:05
> GeneratorJob: generated batch id: 1392196606-229189632
> [Gavin@Gavin local]$ bin/nutch fetch -all
> FetcherJob: starting
> FetcherJob: fetching all
> FetcherJob: threads: 10
> FetcherJob: parsing: false
> FetcherJob: resuming: false
> FetcherJob : timelimit set for : -1
> Using queue mode : byHost
> Fetcher: threads: 10
> QueueFeeder finished: total 5 records. Hit by time limit :0
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold sequence: 5
> fetching http://www.163.com/ (queue crawl delay=5000ms)
> fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
> fetching http://www.tianya.cn/ (queue crawl delay=5000ms)
> fetching http://www.taobao.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread5, activeThreads=8
> -finishing thread FetcherThread6, activeThreads=8
> -finishing thread FetcherThread4, activeThreads=7
> -finishing thread FetcherThread3, activeThreads=6
> -finishing thread FetcherThread2, activeThreads=5
> fetching http://www.hao123.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread0, activeThreads=4
> -finishing thread FetcherThread7, activeThreads=3
> -finishing thread FetcherThread1, activeThreads=2
> -finishing thread FetcherThread8, activeThreads=1
> -finishing thread FetcherThread9, activeThreads=0
> 0/0 spinwaiting/active, 4 pages, 0 errors, 0.8 1 pages/s, 242 242 kb/s, 0
> URLs in 0 queues
> -activeThreads=0
> FetcherJob: done
> [Gavin@Gavin local]$ bin/nutch parse -all
> ParserJob: starting
> ParserJob: resuming:    false
> ParserJob: forced reparse:    false
> ParserJob: parsing all
> Parsing http://www.tianya.cn/
> Parsing http://www.163.com/
> Parsing http://www.hao123.com/
> Parsing http://www.taobao.com/
> Parsing http://nutch.apache.org/
> ParserJob: success
> [Gavin@Gavin local]$ bin/nutch solrindex http://127.0.0.1:8983/solr -all
> SolrIndexerJob: starting
> SolrIndexerJob: done.
>
>
> Thank you!
>
>
> ------------------ Original ------------------
> From:  "d_k";<ma...@gmail.com>;
> Date:  Wed, Feb 12, 2014 04:58 PM
> To:  "user"<us...@nutch.apache.org>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> What is the output of each of the steps when you execute them separately?
> Did you edit regex-urlfilter.txt accordingly?
>
> $ bin/nutch inject urls
> $ bin/nutch generate -topN 5
> $ bin/nutch fetch -all
> $ bin/nutch parse -all
>
> Taken from here:
> https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup
>
>
>
>
> On Wed, Feb 12, 2014 at 10:33 AM, Gavin <27...@qq.com> wrote:
>
> > I compiled  nutch in eclipse. My storage is hbase.
> > After I run the bin/crawl , there are to tables in hbase :"webpage" and
> > "%crawl_ID%webpage"
> > but there is no data in solr and no exception.
> > why?
> >
> > (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> > server is ok)
>

Re: Nutch 2.2.1 can not index to solr

Posted by Gavin <27...@qq.com>.

I turn the debug logs and the output is :

[root@Gavin local]#  bin/nutch solrindex http://127.0.0.1:8983/solr -all
SolrIndexerJob: starting
Skipping http://www.tianya.cn/; different batch id (null)
Skipping http://www.163.com/; different batch id (null)
Skipping http://www.taobao.com/; different batch id (null)
Skipping http://nutch.apache.org/; different batch id (null)
SolrIndexerJob: done.

The indexjob was  skiped!

What should I  do to make it work?

Thank you!




------------------ Original ------------------
From:  "d_k";<ma...@gmail.com>;
Date:  Wed, Feb 12, 2014 07:15 PM
To:  "user"<us...@nutch.apache.org>; 

Subject:  Re: Nutch 2.2.1 can not index to solr



Are you sure solr is not throwing any errors?
Did you make any changes to the schema? What schema does Solr use? What
version of Solr are you using?
You can turn on the debug logs by changing the logging level to DEBUG in
the log4j.properties properties file inside the conf dir in the
runtime/local dir. (I assume this is your setup, let me know if its not).
You can also try to debug nutch in eclipse as described here:
https://wiki.apache.org/nutch/RunNutchInEclipse


On Wed, Feb 12, 2014 at 11:31 AM, Gavin <27...@qq.com> wrote:

> andm my solr:
>
>
> Statistics
>
>                                  Last Modified:
> Num Docs:0Max Doc:0Heap Memory Usage:0Deleted Docs:0Version:1Segment
> Count:0Optimized:
> Current:
>
>
>
> what is wrong?
>
> Thanks for your help!!!
>
>
>
>
>
> ------------------ Original ------------------
> From:  "274614348";<27...@qq.com>;
> Date:  Wed, Feb 12, 2014 05:24 PM
> To:  "user"<us...@nutch.apache.org>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> Here is my output:
>
>
> [Gavin@Gavin local]$ bin/nutch  inject urls
> InjectorJob: starting at 2014-02-12 17:16:20
> InjectorJob: Injecting urlDir: urls
> InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the
> Gora storage class.
> InjectorJob: total number of urls rejected by filters: 0
> InjectorJob: total number of urls injected after normalization and
> filtering: 1
> Injector: finished at 2014-02-12 17:16:25, elapsed: 00:00:04
> [Gavin@Gavin local]$ bin/nutch generate -topN 5
> GeneratorJob: starting at 2014-02-12 17:16:46
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: normalizing: true
> GeneratorJob: topN: 5
> GeneratorJob: finished at 2014-02-12 17:16:51, time elapsed: 00:00:05
> GeneratorJob: generated batch id: 1392196606-229189632
> [Gavin@Gavin local]$ bin/nutch fetch -all
> FetcherJob: starting
> FetcherJob: fetching all
> FetcherJob: threads: 10
> FetcherJob: parsing: false
> FetcherJob: resuming: false
> FetcherJob : timelimit set for : -1
> Using queue mode : byHost
> Fetcher: threads: 10
> QueueFeeder finished: total 5 records. Hit by time limit :0
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold sequence: 5
> fetching http://www.163.com/ (queue crawl delay=5000ms)
> fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
> fetching http://www.tianya.cn/ (queue crawl delay=5000ms)
> fetching http://www.taobao.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread5, activeThreads=8
> -finishing thread FetcherThread6, activeThreads=8
> -finishing thread FetcherThread4, activeThreads=7
> -finishing thread FetcherThread3, activeThreads=6
> -finishing thread FetcherThread2, activeThreads=5
> fetching http://www.hao123.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread0, activeThreads=4
> -finishing thread FetcherThread7, activeThreads=3
> -finishing thread FetcherThread1, activeThreads=2
> -finishing thread FetcherThread8, activeThreads=1
> -finishing thread FetcherThread9, activeThreads=0
> 0/0 spinwaiting/active, 4 pages, 0 errors, 0.8 1 pages/s, 242 242 kb/s, 0
> URLs in 0 queues
> -activeThreads=0
> FetcherJob: done
> [Gavin@Gavin local]$ bin/nutch parse -all
> ParserJob: starting
> ParserJob: resuming:    false
> ParserJob: forced reparse:    false
> ParserJob: parsing all
> Parsing http://www.tianya.cn/
> Parsing http://www.163.com/
> Parsing http://www.hao123.com/
> Parsing http://www.taobao.com/
> Parsing http://nutch.apache.org/
> ParserJob: success
> [Gavin@Gavin local]$ bin/nutch solrindex http://127.0.0.1:8983/solr -all
> SolrIndexerJob: starting
> SolrIndexerJob: done.
>
>
> Thank you!
>
>
> ------------------ Original ------------------
> From:  "d_k";<ma...@gmail.com>;
> Date:  Wed, Feb 12, 2014 04:58 PM
> To:  "user"<us...@nutch.apache.org>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> What is the output of each of the steps when you execute them separately?
> Did you edit regex-urlfilter.txt accordingly?
>
> $ bin/nutch inject urls
> $ bin/nutch generate -topN 5
> $ bin/nutch fetch -all
> $ bin/nutch parse -all
>
> Taken from here:
> https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup
>
>
>
>
> On Wed, Feb 12, 2014 at 10:33 AM, Gavin <27...@qq.com> wrote:
>
> > I compiled  nutch in eclipse. My storage is hbase.
> > After I run the bin/crawl , there are to tables in hbase :"webpage" and
> > "%crawl_ID%webpage"
> > but there is no data in solr and no exception.
> > why?
> >
> > (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> > server is ok)
>

Re: Nutch 2.2.1 can not index to solr

Posted by d_k <ma...@gmail.com>.

Are you sure solr is not throwing any errors?
Did you make any changes to the schema? What schema does Solr use? What
version of Solr are you using?
You can turn on the debug logs by changing the logging level to DEBUG in
the log4j.properties properties file inside the conf dir in the
runtime/local dir. (I assume this is your setup, let me know if its not).
You can also try to debug nutch in eclipse as described here:
https://wiki.apache.org/nutch/RunNutchInEclipse


On Wed, Feb 12, 2014 at 11:31 AM, Gavin <27...@qq.com> wrote:

> andm my solr:
>
>
> Statistics
>
>                                  Last Modified:
> Num Docs:0Max Doc:0Heap Memory Usage:0Deleted Docs:0Version:1Segment
> Count:0Optimized:
> Current:
>
>
>
> what is wrong?
>
> Thanks for your help!!!
>
>
>
>
>
> ------------------ Original ------------------
> From:  "274614348";<27...@qq.com>;
> Date:  Wed, Feb 12, 2014 05:24 PM
> To:  "user"<us...@nutch.apache.org>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> Here is my output:
>
>
> [Gavin@Gavin local]$ bin/nutch  inject urls
> InjectorJob: starting at 2014-02-12 17:16:20
> InjectorJob: Injecting urlDir: urls
> InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the
> Gora storage class.
> InjectorJob: total number of urls rejected by filters: 0
> InjectorJob: total number of urls injected after normalization and
> filtering: 1
> Injector: finished at 2014-02-12 17:16:25, elapsed: 00:00:04
> [Gavin@Gavin local]$ bin/nutch generate -topN 5
> GeneratorJob: starting at 2014-02-12 17:16:46
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: normalizing: true
> GeneratorJob: topN: 5
> GeneratorJob: finished at 2014-02-12 17:16:51, time elapsed: 00:00:05
> GeneratorJob: generated batch id: 1392196606-229189632
> [Gavin@Gavin local]$ bin/nutch fetch -all
> FetcherJob: starting
> FetcherJob: fetching all
> FetcherJob: threads: 10
> FetcherJob: parsing: false
> FetcherJob: resuming: false
> FetcherJob : timelimit set for : -1
> Using queue mode : byHost
> Fetcher: threads: 10
> QueueFeeder finished: total 5 records. Hit by time limit :0
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold sequence: 5
> fetching http://www.163.com/ (queue crawl delay=5000ms)
> fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
> fetching http://www.tianya.cn/ (queue crawl delay=5000ms)
> fetching http://www.taobao.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread5, activeThreads=8
> -finishing thread FetcherThread6, activeThreads=8
> -finishing thread FetcherThread4, activeThreads=7
> -finishing thread FetcherThread3, activeThreads=6
> -finishing thread FetcherThread2, activeThreads=5
> fetching http://www.hao123.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread0, activeThreads=4
> -finishing thread FetcherThread7, activeThreads=3
> -finishing thread FetcherThread1, activeThreads=2
> -finishing thread FetcherThread8, activeThreads=1
> -finishing thread FetcherThread9, activeThreads=0
> 0/0 spinwaiting/active, 4 pages, 0 errors, 0.8 1 pages/s, 242 242 kb/s, 0
> URLs in 0 queues
> -activeThreads=0
> FetcherJob: done
> [Gavin@Gavin local]$ bin/nutch parse -all
> ParserJob: starting
> ParserJob: resuming:    false
> ParserJob: forced reparse:    false
> ParserJob: parsing all
> Parsing http://www.tianya.cn/
> Parsing http://www.163.com/
> Parsing http://www.hao123.com/
> Parsing http://www.taobao.com/
> Parsing http://nutch.apache.org/
> ParserJob: success
> [Gavin@Gavin local]$ bin/nutch solrindex http://127.0.0.1:8983/solr -all
> SolrIndexerJob: starting
> SolrIndexerJob: done.
>
>
> Thank you!
>
>
> ------------------ Original ------------------
> From:  "d_k";<ma...@gmail.com>;
> Date:  Wed, Feb 12, 2014 04:58 PM
> To:  "user"<us...@nutch.apache.org>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> What is the output of each of the steps when you execute them separately?
> Did you edit regex-urlfilter.txt accordingly?
>
> $ bin/nutch inject urls
> $ bin/nutch generate -topN 5
> $ bin/nutch fetch -all
> $ bin/nutch parse -all
>
> Taken from here:
> https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup
>
>
>
>
> On Wed, Feb 12, 2014 at 10:33 AM, Gavin <27...@qq.com> wrote:
>
> > I compiled  nutch in eclipse. My storage is hbase.
> > After I run the bin/crawl , there are to tables in hbase :"webpage" and
> > "%crawl_ID%webpage"
> > but there is no data in solr and no exception.
> > why?
> >
> > (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> > server is ok)
>

Re: Nutch 2.2.1 can not index to solr

Posted by Gavin <27...@qq.com>.

andm my solr:

       
Statistics
                                    
                                 Last Modified:
Num Docs:0Max Doc:0Heap Memory Usage:0Deleted Docs:0Version:1Segment Count:0Optimized:             
Current:



what is wrong?

Thanks for your help!!!





------------------ Original ------------------
From:  "274614348";<27...@qq.com>;
Date:  Wed, Feb 12, 2014 05:24 PM
To:  "user"<us...@nutch.apache.org>; 

Subject:  Re: Nutch 2.2.1 can not index to solr



Here is my output:


[Gavin@Gavin local]$ bin/nutch  inject urls
InjectorJob: starting at 2014-02-12 17:16:20
InjectorJob: Injecting urlDir: urls
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2014-02-12 17:16:25, elapsed: 00:00:04
[Gavin@Gavin local]$ bin/nutch generate -topN 5
GeneratorJob: starting at 2014-02-12 17:16:46
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 5
GeneratorJob: finished at 2014-02-12 17:16:51, time elapsed: 00:00:05
GeneratorJob: generated batch id: 1392196606-229189632
[Gavin@Gavin local]$ bin/nutch fetch -all
FetcherJob: starting
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 5 records. Hit by time limit :0
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching http://www.163.com/ (queue crawl delay=5000ms)
fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
fetching http://www.tianya.cn/ (queue crawl delay=5000ms)
fetching http://www.taobao.com/ (queue crawl delay=5000ms)
-finishing thread FetcherThread5, activeThreads=8
-finishing thread FetcherThread6, activeThreads=8
-finishing thread FetcherThread4, activeThreads=7
-finishing thread FetcherThread3, activeThreads=6
-finishing thread FetcherThread2, activeThreads=5
fetching http://www.hao123.com/ (queue crawl delay=5000ms)
-finishing thread FetcherThread0, activeThreads=4
-finishing thread FetcherThread7, activeThreads=3
-finishing thread FetcherThread1, activeThreads=2
-finishing thread FetcherThread8, activeThreads=1
-finishing thread FetcherThread9, activeThreads=0
0/0 spinwaiting/active, 4 pages, 0 errors, 0.8 1 pages/s, 242 242 kb/s, 0 URLs in 0 queues
-activeThreads=0
FetcherJob: done
[Gavin@Gavin local]$ bin/nutch parse -all
ParserJob: starting
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Parsing http://www.tianya.cn/
Parsing http://www.163.com/
Parsing http://www.hao123.com/
Parsing http://www.taobao.com/
Parsing http://nutch.apache.org/
ParserJob: success
[Gavin@Gavin local]$ bin/nutch solrindex http://127.0.0.1:8983/solr -all
SolrIndexerJob: starting
SolrIndexerJob: done.


Thank you!


------------------ Original ------------------
From:  "d_k";<ma...@gmail.com>;
Date:  Wed, Feb 12, 2014 04:58 PM
To:  "user"<us...@nutch.apache.org>; 

Subject:  Re: Nutch 2.2.1 can not index to solr



What is the output of each of the steps when you execute them separately?
Did you edit regex-urlfilter.txt accordingly?

$ bin/nutch inject urls
$ bin/nutch generate -topN 5
$ bin/nutch fetch -all
$ bin/nutch parse -all

Taken from here:
https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup




On Wed, Feb 12, 2014 at 10:33 AM, Gavin <27...@qq.com> wrote:

> I compiled  nutch in eclipse. My storage is hbase.
> After I run the bin/crawl , there are to tables in hbase :"webpage" and
> "%crawl_ID%webpage"
> but there is no data in solr and no exception.
> why?
>
> (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> server is ok)

Re: Nutch 2.2.1 can not index to solr

Posted by Gavin <27...@qq.com>.

Here is my output:


[Gavin@Gavin local]$ bin/nutch  inject urls
InjectorJob: starting at 2014-02-12 17:16:20
InjectorJob: Injecting urlDir: urls
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2014-02-12 17:16:25, elapsed: 00:00:04
[Gavin@Gavin local]$ bin/nutch generate -topN 5
GeneratorJob: starting at 2014-02-12 17:16:46
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 5
GeneratorJob: finished at 2014-02-12 17:16:51, time elapsed: 00:00:05
GeneratorJob: generated batch id: 1392196606-229189632
[Gavin@Gavin local]$ bin/nutch fetch -all
FetcherJob: starting
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 5 records. Hit by time limit :0
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching http://www.163.com/ (queue crawl delay=5000ms)
fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
fetching http://www.tianya.cn/ (queue crawl delay=5000ms)
fetching http://www.taobao.com/ (queue crawl delay=5000ms)
-finishing thread FetcherThread5, activeThreads=8
-finishing thread FetcherThread6, activeThreads=8
-finishing thread FetcherThread4, activeThreads=7
-finishing thread FetcherThread3, activeThreads=6
-finishing thread FetcherThread2, activeThreads=5
fetching http://www.hao123.com/ (queue crawl delay=5000ms)
-finishing thread FetcherThread0, activeThreads=4
-finishing thread FetcherThread7, activeThreads=3
-finishing thread FetcherThread1, activeThreads=2
-finishing thread FetcherThread8, activeThreads=1
-finishing thread FetcherThread9, activeThreads=0
0/0 spinwaiting/active, 4 pages, 0 errors, 0.8 1 pages/s, 242 242 kb/s, 0 URLs in 0 queues
-activeThreads=0
FetcherJob: done
[Gavin@Gavin local]$ bin/nutch parse -all
ParserJob: starting
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Parsing http://www.tianya.cn/
Parsing http://www.163.com/
Parsing http://www.hao123.com/
Parsing http://www.taobao.com/
Parsing http://nutch.apache.org/
ParserJob: success
[Gavin@Gavin local]$ bin/nutch solrindex http://127.0.0.1:8983/solr -all
SolrIndexerJob: starting
SolrIndexerJob: done.


Thank you!


------------------ Original ------------------
From:  "d_k";<ma...@gmail.com>;
Date:  Wed, Feb 12, 2014 04:58 PM
To:  "user"<us...@nutch.apache.org>; 

Subject:  Re: Nutch 2.2.1 can not index to solr



What is the output of each of the steps when you execute them separately?
Did you edit regex-urlfilter.txt accordingly?

$ bin/nutch inject urls
$ bin/nutch generate -topN 5
$ bin/nutch fetch -all
$ bin/nutch parse -all

Taken from here:
https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup




On Wed, Feb 12, 2014 at 10:33 AM, Gavin <27...@qq.com> wrote:

> I compiled  nutch in eclipse. My storage is hbase.
> After I run the bin/crawl , there are to tables in hbase :"webpage" and
> "%crawl_ID%webpage"
> but there is no data in solr and no exception.
> why?
>
> (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> server is ok)

Re: Nutch 2.2.1 can not index to solr

Posted by d_k <ma...@gmail.com>.

What is the output of each of the steps when you execute them separately?
Did you edit regex-urlfilter.txt accordingly?

$ bin/nutch inject urls
$ bin/nutch generate -topN 5
$ bin/nutch fetch -all
$ bin/nutch parse -all

Taken from here:
https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup




On Wed, Feb 12, 2014 at 10:33 AM, Gavin <27...@qq.com> wrote:

> I compiled  nutch in eclipse. My storage is hbase.
> After I run the bin/crawl , there are to tables in hbase :"webpage" and
> "%crawl_ID%webpage"
> but there is no data in solr and no exception.
> why?
>
> (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> server is ok)

Nutch 2.2.1 can not index to solr

Posted by Gavin <27...@qq.com>.

I compiled  nutch in eclipse. My storage is hbase. 
After I run the bin/crawl , there are to tables in hbase :"webpage" and "%crawl_ID%webpage"
but there is no data in solr and no exception.
why?

(I can crawl and index to solr server use nutch1.7.bin,so I think my solr server is ok)

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by A Laxmi <a....@gmail.com>.

Hi,

I compiled nutch in a linux server connected to internet like Tejas
suggested and found the .iv2 folder. However, there are some files in that
folder with filenames that has its own hostname as part of the filename. I
am wondering how I can scp this .iv2 folder to other server which has a
different hostname? Can I just manually edit those filenames to match other
server hostname? Please advise


On Sat, Feb 8, 2014 at 9:32 AM, d_k <ma...@gmail.com> wrote:

> Tejas Patil is right, you should copy over the .ivy2 folder and it will
> work.
>
> You can extract it to some other location and run ant with the parameter
> "-D
> ivy.cache.dir=/path/to/extraced/cache".
>
> In order to use the eclipse project behind a firewall you can either run
> 'ant eclipse' and copy over the .project and .classpath files or download
> the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and then
> either edit the ant-eclipse-download target in build.xml to a web server
> serving the copied tar over http or change the build.xml
> ant-eclipse-download target from a get task to something along the lines
> of:
>
> <copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
> todir="${build.dir}" />
>
> [0]
>
> http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
>
>
> On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <tejas.patil.cs@gmail.com
> >wrote:
>
> > This has to do more with ant and nothing about nutch. Here is a wild
> idea:
> >
> > Grab a linux box without any internet restrictions, download nutch over
> it
> > and build it. In the user home, there would a hidden directory ".ivy2"
> > which is a local ivy cache. Create a tarball of the same and scp it over
> > your work machine, extract it in home directory and then run nutch build.
> >
> > PS: I have never done this for ivy but for maven and it had worked.
> >
> > ~tejas
> >
> >
> > On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am having issues building Nutch 2.2.1 behind my company firewall. My
> > > build gets stuck here:
> > >
> > > [ivy:resolve] :: loading settings :: file =
> > > ~/nutchtest/nutch/ivy/ivysettings.xml
> > >
> > > When I contacted the hosting admin, they said - "Ant is trying to
> > download
> > > files from internet and it will have problems with our firewalls. You
> > will
> > > either have to download the files yourself and then scp/sftp them to
> the
> > > machine. Unfortunately we don't have an http proxy."
> > >
> > >
> > > From further digging, I could see Ant is trying to access this link
> > > http://ant.apache.org/ivy/. Could anyone please advise what I should
> do
> > to
> > > make Ant compile Nutch without accessing the internet? I can download
> > > required files from http://ant.apache.org/ivy/ and scp/sftp to the
> > server
> > > but I am not sure what files to download and where to put them?
> > >
> > > Thanks for your help!!
> > >
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by d_k <ma...@gmail.com>.

Tejas Patil is right, you should copy over the .ivy2 folder and it will
work.

You can extract it to some other location and run ant with the parameter "-D
ivy.cache.dir=/path/to/extraced/cache".

In order to use the eclipse project behind a firewall you can either run
'ant eclipse' and copy over the .project and .classpath files or download
the ant-eclipse-1.0.bin.tar.bz2 file, the default url is [0] and then
either edit the ant-eclipse-download target in build.xml to a web server
serving the copied tar over http or change the build.xml
ant-eclipse-download target from a get task to something along the lines of:

<copy file="/path/to/local/ant-eclipse-1.0.bin.tar.bz2"
todir="${build.dir}" />

[0]
http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2


On Sat, Feb 8, 2014 at 11:28 AM, Tejas Patil <te...@gmail.com>wrote:

> This has to do more with ant and nothing about nutch. Here is a wild idea:
>
> Grab a linux box without any internet restrictions, download nutch over it
> and build it. In the user home, there would a hidden directory ".ivy2"
> which is a local ivy cache. Create a tarball of the same and scp it over
> your work machine, extract it in home directory and then run nutch build.
>
> PS: I have never done this for ivy but for maven and it had worked.
>
> ~tejas
>
>
> On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com> wrote:
>
> > Hi,
> >
> > I am having issues building Nutch 2.2.1 behind my company firewall. My
> > build gets stuck here:
> >
> > [ivy:resolve] :: loading settings :: file =
> > ~/nutchtest/nutch/ivy/ivysettings.xml
> >
> > When I contacted the hosting admin, they said - "Ant is trying to
> download
> > files from internet and it will have problems with our firewalls. You
> will
> > either have to download the files yourself and then scp/sftp them to the
> > machine. Unfortunately we don't have an http proxy."
> >
> >
> > From further digging, I could see Ant is trying to access this link
> > http://ant.apache.org/ivy/. Could anyone please advise what I should do
> to
> > make Ant compile Nutch without accessing the internet? I can download
> > required files from http://ant.apache.org/ivy/ and scp/sftp to the
> server
> > but I am not sure what files to download and where to put them?
> >
> > Thanks for your help!!
> >
>

Re: Nutch 2.2.1 Build stuck while trying to access http://ant.apache.org/ivy/

Posted by Tejas Patil <te...@gmail.com>.

This has to do more with ant and nothing about nutch. Here is a wild idea:

Grab a linux box without any internet restrictions, download nutch over it
and build it. In the user home, there would a hidden directory ".ivy2"
which is a local ivy cache. Create a tarball of the same and scp it over
your work machine, extract it in home directory and then run nutch build.

PS: I have never done this for ivy but for maven and it had worked.

~tejas

On Fri, Feb 7, 2014 at 2:18 PM, A Laxmi <a....@gmail.com> wrote:

> Hi,
>
> I am having issues building Nutch 2.2.1 behind my company firewall. My
> build gets stuck here:
>
> [ivy:resolve] :: loading settings :: file =
> ~/nutchtest/nutch/ivy/ivysettings.xml
>
> When I contacted the hosting admin, they said - "Ant is trying to download
> files from internet and it will have problems with our firewalls. You will
> either have to download the files yourself and then scp/sftp them to the
> machine. Unfortunately we don't have an http proxy."
>
>
> From further digging, I could see Ant is trying to access this link
> http://ant.apache.org/ivy/. Could anyone please advise what I should do to
> make Ant compile Nutch without accessing the internet? I can download
> required files from http://ant.apache.org/ivy/ and scp/sftp to the server
> but I am not sure what files to download and where to put them?
>
> Thanks for your help!!
>