You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@gora.apache.org by Julien Nioche <li...@gmail.com> on 2012/10/18 11:57:32 UTC

compilation error, links in release page and question about indices

Hi guys,

I've just pulled the latest version of GORA and it fails on gora-cassandra.
I see that the hector dependency is commented out in ivy.xml - is that a
know issue and if so isn't there a workaround?

The links in http://gora.apache.org/releases.html do not have the right
version numbers i.e.
*http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2-src.zip*

and should be

*http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2.1-src.zip*

Finally a very simple question : some backends have indices (SQL of course
but Cassandra and HBase?) that could be used when querying.  Typically in
Nutch we'd want to retrieve all the docs having a specific flag like
fetched in order to parse them. Is this implemented? Am sure the answer is
in the code somewhere but it is good to have a trace on the mailing list
for future reference.

Thanks

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: compilation error, links in release page and question about indices

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Julien,

On Fri, Oct 19, 2012 at 12:53 PM, Julien Nioche
<li...@gmail.com> wrote:

> Thanks. Sorry I had missed that. Are you planning to get rid of the ANT and
> Ivy related stuff? That would make things a bit clearer

No hassle, I would be happy to remove the any + ivy stuff (and also
remove it from the tutorial) as I agree it does confuse things...
however we've just not got around to it. Its outdated, broken and
quite honestly useless in its current state.

>

> yes I think that covers it. Basically what I want to check is whether we
> scan the whole dataset and filter on the fly or use queries on the back end
> side to return only what is needed. I think this would make a substantial
> difference in performance and would be a perfect illustration of what Nutch
> 2.x does that 1.x can't

+1 for this. I have not been using HBase or Accumulo, and currently
these two backends in particular seem to be driving this particular
issue. From what I can see both Ferdy and Keith were working well on
this issue but it seems to have lingered somewhat recently. It would
however be excellent to drive better query performance within
gora-core and subsequently all backend modules.

Lewis

Re: compilation error, links in release page and question about indices

Posted by Julien Nioche <li...@gmail.com>.

Hi Lewis,

We are not using ant and ivy for the build anymore. If you please try
> building with maven commands it will work fine.


Thanks. Sorry I had missed that. Are you planning to get rid of the ANT and
Ivy related stuff? That would make things a bit clearer


> >
> > Finally a very simple question : some backends have indices (SQL of
> course
> > but Cassandra and HBase?) that could be used when querying.  Typically in
> > Nutch we'd want to retrieve all the docs having a specific flag like
> > fetched in order to parse them. Is this implemented? Am sure the answer
> is
> > in the code somewhere but it is good to have a trace on the mailing list
> > for future reference.
> >
>
> Well in gora-cassandra a field such as fetched (fetchedTime?) would be
> defined as a column in the database, therefore it would be possible to
> execute queries normally however I think you are maybe talking about
> some like GORA119? Can you review and confirm?
>
> https://issues.apache.org/jira/browse/GORA-119
>
>
yes I think that covers it. Basically what I want to check is whether we
scan the whole dataset and filter on the fly or use queries on the back end
side to return only what is needed. I think this would make a substantial
difference in performance and would be a perfect illustration of what Nutch
2.x does that 1.x can't

Thanks

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: compilation error, links in release page and question about indices

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Julien,

On Thu, Oct 18, 2012 at 10:57 AM, Julien Nioche
<li...@gmail.com> wrote:

> I've just pulled the latest version of GORA and it fails on gora-cassandra.
> I see that the hector dependency is commented out in ivy.xml - is that a
> know issue and if so isn't there a workaround?

We are not using ant and ivy for the build anymore. If you please try
building with maven commands it will work fine.

>
> The links in http://gora.apache.org/releases.html do not have the right
> version numbers i.e.
> *http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2-src.zip*
>
> and should be
>
> *http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2.1-src.zip*

Thanks for pointing this out Julien. I'll rebuild the site shortly to
reflect this.

>
> Finally a very simple question : some backends have indices (SQL of course
> but Cassandra and HBase?) that could be used when querying.  Typically in
> Nutch we'd want to retrieve all the docs having a specific flag like
> fetched in order to parse them. Is this implemented? Am sure the answer is
> in the code somewhere but it is good to have a trace on the mailing list
> for future reference.
>

Well in gora-cassandra a field such as fetched (fetchedTime?) would be
defined as a column in the database, therefore it would be possible to
execute queries normally however I think you are maybe talking about
some like GORA119? Can you review and confirm?

https://issues.apache.org/jira/browse/GORA-119

Thanks
Lewis