You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Amit Sela <am...@infolinks.com> on 2013/02/18 14:07:22 UTC

Nutch stable version

Hi all,

I installed Nutch 2.1 with Gora and MySQL and I tried running the inject
job i got the following exception:

 org.apache.gora.util.GoraException: java.io.IOException:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length
too big for column 'text' (max = 16383); use BLOB or TEXT instead

Then I found out it's a known BUG
NUTCH-970<https://issues.apache.org/jira/browse/NUTCH-970>

So what version should I use for a stable crawler to parse about 12MM urls ?
I want to try it first on my laptop (with much less urls to parse...) and
then deploy on an existing Hadoop cluster.

Any suggestions ?

Thanks,

Amit.

Re: Nutch stable version

Posted by Amit Sela <am...@infolinks.com>.
You are correct, my bad, I forgot to mention that my cluster runs with
HBase 0.94.2 so that makes it incompatible...
And there is that bug I mentioned with MySQL...
So should I go for 1.6 ?

On Mon, Feb 18, 2013 at 3:31 PM, kiran chitturi
<ch...@gmail.com>wrote:

> Hi Amit,
>
> Nutch 2.1 with Hbase is stable than using MySQL as backend. Please check
> the link here [0] on how to use Hbase as backend.
>
> [0] - http://wiki.apache.org/nutch/Nutch2Tutorial
>
>
> On Mon, Feb 18, 2013 at 8:07 AM, Amit Sela <am...@infolinks.com> wrote:
>
> > Hi all,
> >
> > I installed Nutch 2.1 with Gora and MySQL and I tried running the inject
> > job i got the following exception:
> >
> >  org.apache.gora.util.GoraException: java.io.IOException:
> > com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length
> > too big for column 'text' (max = 16383); use BLOB or TEXT instead
> >
> > Then I found out it's a known BUG
> > NUTCH-970<https://issues.apache.org/jira/browse/NUTCH-970>
> >
> > So what version should I use for a stable crawler to parse about 12MM
> urls
> > ?
> > I want to try it first on my laptop (with much less urls to parse...) and
> > then deploy on an existing Hadoop cluster.
> >
> > Any suggestions ?
> >
> > Thanks,
> >
> > Amit.
> >
>
>
>
> --
> Kiran Chitturi
>

Re: Nutch stable version

Posted by kiran chitturi <ch...@gmail.com>.
Hi Amit,

Nutch 2.1 with Hbase is stable than using MySQL as backend. Please check
the link here [0] on how to use Hbase as backend.

[0] - http://wiki.apache.org/nutch/Nutch2Tutorial


On Mon, Feb 18, 2013 at 8:07 AM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
>
> I installed Nutch 2.1 with Gora and MySQL and I tried running the inject
> job i got the following exception:
>
>  org.apache.gora.util.GoraException: java.io.IOException:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length
> too big for column 'text' (max = 16383); use BLOB or TEXT instead
>
> Then I found out it's a known BUG
> NUTCH-970<https://issues.apache.org/jira/browse/NUTCH-970>
>
> So what version should I use for a stable crawler to parse about 12MM urls
> ?
> I want to try it first on my laptop (with much less urls to parse...) and
> then deploy on an existing Hadoop cluster.
>
> Any suggestions ?
>
> Thanks,
>
> Amit.
>



-- 
Kiran Chitturi