You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by AC Nutch <ac...@gmail.com> on 2013/05/01 20:41:39 UTC

HBase 0.94.6 and Nutch 2.1

Hello All,

Has anyone gotten the latest version of HBase 0.94.6 to work with Nutch 2.1
on Ubuntu with Hadoop >= 1.0.X. I keep getting the error:

Exception in thread "main" org.apache.gora.util.GoraException:
java.lang.IllegalArgumentException: Not a host:port pair:

Googling around I saw the suggestion to replace the hbase-0.90.4 jar with
the hbase-0.94.6.jar from my hbase distro (btw I understand I'm trying to
do something that is unsupported by using the latest hbase version). The
suggestion didn't appear to work - I get the same error. Has anyone gotten
the latest HBase to work with Nutch 2.1 and if so, how did you get around
this error?

As a little bit of background, the overall problem I'm trying to solve is
that I really want to use Nutch 2.1 as opposed to the 1.6 branch for what
will become a production application. However, I have the requirement of
using at least Hadoop 1.0.X which, as I understand it, is not supported by
HBase 0.90.x. On the other hand, Nutch 2.1 (or rather GORA) doesn't support
later HBase versions, which leaves me in quite the pickle - it seems that
either I use an older Hadoop (which I can't do) or I use Nutch 1.6 (which I
don't want to do). Any suggestions?

Re: HBase 0.94.6 and Nutch 2.1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
In short, Gora needs to upgrade the use of HBase API to more recent version.
If you are able and willing to do so, we would be very very happy to have
you contribute to Gora.
https://issues.apache.org/jira/browse/GORA-201


On Wed, May 1, 2013 at 11:41 AM, AC Nutch <ac...@gmail.com> wrote:

> Hello All,
>
> Has anyone gotten the latest version of HBase 0.94.6 to work with Nutch 2.1
> on Ubuntu with Hadoop >= 1.0.X. I keep getting the error:
>
> Exception in thread "main" org.apache.gora.util.GoraException:
> java.lang.IllegalArgumentException: Not a host:port pair:
>
> Googling around I saw the suggestion to replace the hbase-0.90.4 jar with
> the hbase-0.94.6.jar from my hbase distro (btw I understand I'm trying to
> do something that is unsupported by using the latest hbase version). The
> suggestion didn't appear to work - I get the same error. Has anyone gotten
> the latest HBase to work with Nutch 2.1 and if so, how did you get around
> this error?
>
> As a little bit of background, the overall problem I'm trying to solve is
> that I really want to use Nutch 2.1 as opposed to the 1.6 branch for what
> will become a production application. However, I have the requirement of
> using at least Hadoop 1.0.X which, as I understand it, is not supported by
> HBase 0.90.x. On the other hand, Nutch 2.1 (or rather GORA) doesn't support
> later HBase versions, which leaves me in quite the pickle - it seems that
> either I use an older Hadoop (which I can't do) or I use Nutch 1.6 (which I
> don't want to do). Any suggestions?
>



-- 
*Lewis*

Re: HBase 0.94.6 and Nutch 2.1

Posted by AC Nutch <ac...@gmail.com>.
Excellent, I'll take a look and see what we can do. Thanks!

Alex


On Thu, May 2, 2013 at 3:57 AM, Julien Nioche <lists.digitalpebble@gmail.com
> wrote:

> See https://issues.apache.org/jira/browse/NUTCH-1047 which is in trunk for
> writing indexing plugins. You will have the same issues with the versions
> of HBase if you use GORA within your plugin so in your case a  more direct
> approach might be more appropriate. It would be good to help the GORA
> people with upgrading their version of HBase then using GORA within your
> custom indexing plugin, which would make it more generic and would indeed
> be a nice contribution.
>
> J.
>
>
> On 2 May 2013 04:55, AC Nutch <ac...@gmail.com> wrote:
>
> > Thanks a lot for the suggestion Julien, I suspected that might be the
> case
> > and I really appreciate the recommendation of using 1.x for robustness.
> >
> > Also that sounds like a wonderful idea regarding extending the indexer. I
> > think that's exactly what we'll do! Is this something you all would be
> > interested in having as part of the 1.x code base? We would be glad to
> > contribute it back to you all once we have done this.
> >
> > Alex
> >
> >
> > On Wed, May 1, 2013 at 4:25 PM, Julien Nioche <
> > lists.digitalpebble@gmail.com
> > > wrote:
> >
> > > Nutch 1.x is definitely more tested and robust than 2.x. Loads of work
> is
> > > done for the latter but the former is probably a safer option in
> > > production. You could use the pluggable indexer and send the documents
> to
> > > HBase (ideally via GORA)? This would be an elegant way of migrating
> from
> > > 1.x to 2.x BTW.
> > >
> > >
> > > On 1 May 2013 19:41, AC Nutch <ac...@gmail.com> wrote:
> > >
> > > > Hello All,
> > > >
> > > > Has anyone gotten the latest version of HBase 0.94.6 to work with
> Nutch
> > > 2.1
> > > > on Ubuntu with Hadoop >= 1.0.X. I keep getting the error:
> > > >
> > > > Exception in thread "main" org.apache.gora.util.GoraException:
> > > > java.lang.IllegalArgumentException: Not a host:port pair:
> > > >
> > > > Googling around I saw the suggestion to replace the hbase-0.90.4 jar
> > with
> > > > the hbase-0.94.6.jar from my hbase distro (btw I understand I'm
> trying
> > to
> > > > do something that is unsupported by using the latest hbase version).
> > The
> > > > suggestion didn't appear to work - I get the same error. Has anyone
> > > gotten
> > > > the latest HBase to work with Nutch 2.1 and if so, how did you get
> > around
> > > > this error?
> > > >
> > > > As a little bit of background, the overall problem I'm trying to
> solve
> > is
> > > > that I really want to use Nutch 2.1 as opposed to the 1.6 branch for
> > what
> > > > will become a production application. However, I have the requirement
> > of
> > > > using at least Hadoop 1.0.X which, as I understand it, is not
> supported
> > > by
> > > > HBase 0.90.x. On the other hand, Nutch 2.1 (or rather GORA) doesn't
> > > support
> > > > later HBase versions, which leaves me in quite the pickle - it seems
> > that
> > > > either I use an older Hadoop (which I can't do) or I use Nutch 1.6
> > > (which I
> > > > don't want to do). Any suggestions?
> > > >
> > >
> > >
> > >
> > > --
> > > *
> > > *Open Source Solutions for Text Engineering
> > >
> > > http://digitalpebble.blogspot.com/
> > > http://www.digitalpebble.com
> > > http://twitter.com/digitalpebble
> > >
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Re: HBase 0.94.6 and Nutch 2.1

Posted by Julien Nioche <li...@gmail.com>.
See https://issues.apache.org/jira/browse/NUTCH-1047 which is in trunk for
writing indexing plugins. You will have the same issues with the versions
of HBase if you use GORA within your plugin so in your case a  more direct
approach might be more appropriate. It would be good to help the GORA
people with upgrading their version of HBase then using GORA within your
custom indexing plugin, which would make it more generic and would indeed
be a nice contribution.

J.


On 2 May 2013 04:55, AC Nutch <ac...@gmail.com> wrote:

> Thanks a lot for the suggestion Julien, I suspected that might be the case
> and I really appreciate the recommendation of using 1.x for robustness.
>
> Also that sounds like a wonderful idea regarding extending the indexer. I
> think that's exactly what we'll do! Is this something you all would be
> interested in having as part of the 1.x code base? We would be glad to
> contribute it back to you all once we have done this.
>
> Alex
>
>
> On Wed, May 1, 2013 at 4:25 PM, Julien Nioche <
> lists.digitalpebble@gmail.com
> > wrote:
>
> > Nutch 1.x is definitely more tested and robust than 2.x. Loads of work is
> > done for the latter but the former is probably a safer option in
> > production. You could use the pluggable indexer and send the documents to
> > HBase (ideally via GORA)? This would be an elegant way of migrating from
> > 1.x to 2.x BTW.
> >
> >
> > On 1 May 2013 19:41, AC Nutch <ac...@gmail.com> wrote:
> >
> > > Hello All,
> > >
> > > Has anyone gotten the latest version of HBase 0.94.6 to work with Nutch
> > 2.1
> > > on Ubuntu with Hadoop >= 1.0.X. I keep getting the error:
> > >
> > > Exception in thread "main" org.apache.gora.util.GoraException:
> > > java.lang.IllegalArgumentException: Not a host:port pair:
> > >
> > > Googling around I saw the suggestion to replace the hbase-0.90.4 jar
> with
> > > the hbase-0.94.6.jar from my hbase distro (btw I understand I'm trying
> to
> > > do something that is unsupported by using the latest hbase version).
> The
> > > suggestion didn't appear to work - I get the same error. Has anyone
> > gotten
> > > the latest HBase to work with Nutch 2.1 and if so, how did you get
> around
> > > this error?
> > >
> > > As a little bit of background, the overall problem I'm trying to solve
> is
> > > that I really want to use Nutch 2.1 as opposed to the 1.6 branch for
> what
> > > will become a production application. However, I have the requirement
> of
> > > using at least Hadoop 1.0.X which, as I understand it, is not supported
> > by
> > > HBase 0.90.x. On the other hand, Nutch 2.1 (or rather GORA) doesn't
> > support
> > > later HBase versions, which leaves me in quite the pickle - it seems
> that
> > > either I use an older Hadoop (which I can't do) or I use Nutch 1.6
> > (which I
> > > don't want to do). Any suggestions?
> > >
> >
> >
> >
> > --
> > *
> > *Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: HBase 0.94.6 and Nutch 2.1

Posted by AC Nutch <ac...@gmail.com>.
Thanks a lot for the suggestion Julien, I suspected that might be the case
and I really appreciate the recommendation of using 1.x for robustness.

Also that sounds like a wonderful idea regarding extending the indexer. I
think that's exactly what we'll do! Is this something you all would be
interested in having as part of the 1.x code base? We would be glad to
contribute it back to you all once we have done this.

Alex


On Wed, May 1, 2013 at 4:25 PM, Julien Nioche <lists.digitalpebble@gmail.com
> wrote:

> Nutch 1.x is definitely more tested and robust than 2.x. Loads of work is
> done for the latter but the former is probably a safer option in
> production. You could use the pluggable indexer and send the documents to
> HBase (ideally via GORA)? This would be an elegant way of migrating from
> 1.x to 2.x BTW.
>
>
> On 1 May 2013 19:41, AC Nutch <ac...@gmail.com> wrote:
>
> > Hello All,
> >
> > Has anyone gotten the latest version of HBase 0.94.6 to work with Nutch
> 2.1
> > on Ubuntu with Hadoop >= 1.0.X. I keep getting the error:
> >
> > Exception in thread "main" org.apache.gora.util.GoraException:
> > java.lang.IllegalArgumentException: Not a host:port pair:
> >
> > Googling around I saw the suggestion to replace the hbase-0.90.4 jar with
> > the hbase-0.94.6.jar from my hbase distro (btw I understand I'm trying to
> > do something that is unsupported by using the latest hbase version). The
> > suggestion didn't appear to work - I get the same error. Has anyone
> gotten
> > the latest HBase to work with Nutch 2.1 and if so, how did you get around
> > this error?
> >
> > As a little bit of background, the overall problem I'm trying to solve is
> > that I really want to use Nutch 2.1 as opposed to the 1.6 branch for what
> > will become a production application. However, I have the requirement of
> > using at least Hadoop 1.0.X which, as I understand it, is not supported
> by
> > HBase 0.90.x. On the other hand, Nutch 2.1 (or rather GORA) doesn't
> support
> > later HBase versions, which leaves me in quite the pickle - it seems that
> > either I use an older Hadoop (which I can't do) or I use Nutch 1.6
> (which I
> > don't want to do). Any suggestions?
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Re: HBase 0.94.6 and Nutch 2.1

Posted by Julien Nioche <li...@gmail.com>.
Nutch 1.x is definitely more tested and robust than 2.x. Loads of work is
done for the latter but the former is probably a safer option in
production. You could use the pluggable indexer and send the documents to
HBase (ideally via GORA)? This would be an elegant way of migrating from
1.x to 2.x BTW.


On 1 May 2013 19:41, AC Nutch <ac...@gmail.com> wrote:

> Hello All,
>
> Has anyone gotten the latest version of HBase 0.94.6 to work with Nutch 2.1
> on Ubuntu with Hadoop >= 1.0.X. I keep getting the error:
>
> Exception in thread "main" org.apache.gora.util.GoraException:
> java.lang.IllegalArgumentException: Not a host:port pair:
>
> Googling around I saw the suggestion to replace the hbase-0.90.4 jar with
> the hbase-0.94.6.jar from my hbase distro (btw I understand I'm trying to
> do something that is unsupported by using the latest hbase version). The
> suggestion didn't appear to work - I get the same error. Has anyone gotten
> the latest HBase to work with Nutch 2.1 and if so, how did you get around
> this error?
>
> As a little bit of background, the overall problem I'm trying to solve is
> that I really want to use Nutch 2.1 as opposed to the 1.6 branch for what
> will become a production application. However, I have the requirement of
> using at least Hadoop 1.0.X which, as I understand it, is not supported by
> HBase 0.90.x. On the other hand, Nutch 2.1 (or rather GORA) doesn't support
> later HBase versions, which leaves me in quite the pickle - it seems that
> either I use an older Hadoop (which I can't do) or I use Nutch 1.6 (which I
> don't want to do). Any suggestions?
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble