You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2008/09/27 07:17:09 UTC

Ocean and GData

Hi,

Here is one thing that's been confusing me.  http://wiki.apache.org/lucene-java/OceanRealtimeSearch?highlight=(GData) often mentions GData and relates it to real-time search (to Ocean), as if it is GData that provides real-time search functionality.  But isn't GData simply a communication protocol (Atom with some custom additions by Google)?  If so, are statements like "Ocean addresses this by providing the same functionality as GData open sourced for use in any project" really correct?  If GData is just a communication protocol, and Ocean is really primarily the search engine that is capable of real-time search, then is it really correct to compare Ocean with GData?  My feeling is that the thinking is:
"When I access Google's databases using GData I can see my changes to those databases immediately".
But that doesn't make GData this real-time thing, but rather the backend, no?


Please enlighten me if I'm misunderstanding what GData is.  Thanks,

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Ocean and GData

Posted by Jason Rutherglen <ja...@gmail.com>.
This is true, however the realtime search portion does not really have
a name which is why I chose chose pin the tail on GData.  They store
the documents in BigTable, but BigTable does not provide search
capabilities.

On Sun, Sep 28, 2008 at 1:49 AM, J. Delgado <jo...@gmail.com> wrote:

> My understanding is that GBase is based on the infrastructure that Google is
> building for large scale distributed computing (Google File System,
> MapReduce, BigTable, GData, etc.) More specifically, BigTable, the column
> storage "database" which requires extremely high performance and
> reliability, but provides only weak guarantees on data consistency. There is
> plenty of documentation on these technologies.
>
> I agree with Otis that it is clear to mention the characteristics of RDBMS
> that real-time search displays such as atomicity and transactionality.
>
> -- Joaquin
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Ocean and GData

Posted by "J. Delgado" <jo...@gmail.com>.
On Sat, Sep 27, 2008 at 5:03 AM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> Unlike MapReduce, there are no infrastructure whitepapers on
> how GData/Base works so I had to make a broad comparison rather than a
> specific one.

My understanding is that GBase is based on the infrastructure that Google is
building for large scale distributed computing (Google File System,
MapReduce, BigTable, GData, etc.) More specifically, BigTable, the column
storage "database" which requires extremely high performance and
reliability, but provides only weak guarantees on data consistency. There is
plenty of documentation on these technologies.

I agree with Otis that it is clear to mention the characteristics of RDBMS
that real-time search displays such as atomicity and transactionality.

-- Joaquin

Re: Ocean and GData

Posted by Jason Rutherglen <ja...@gmail.com>.
Hello Otis,

GData and GBase to me sounds like they are short for Google Database.
The goal with Ocean is to provide a Lucene based search database that
provides out of the box functionality like the Google Data/Base
offers.  Unlike MapReduce, there are no infrastructure whitepapers on
how GData/Base works so I had to make a broad comparison rather than a
specific one.  Realtime seems like a feature a search database should
have to qualify as such and so GData is mentioned as the only known
realtime solution (other than Twitter's Summize with I found out about
later).  The service Google provides through the GData protocol seems
to also be referred to as GData, but could simply be called the
"infrastructure supporting Google's realtime search web services".

Jason

On Sat, Sep 27, 2008 at 1:17 AM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hi,
>
> Here is one thing that's been confusing me.  http://wiki.apache.org/lucene-java/OceanRealtimeSearch?highlight=(GData) often mentions GData and relates it to real-time search (to Ocean), as if it is GData that provides real-time search functionality.  But isn't GData simply a communication protocol (Atom with some custom additions by Google)?  If so, are statements like "Ocean addresses this by providing the same functionality as GData open sourced for use in any project" really correct?  If GData is just a communication protocol, and Ocean is really primarily the search engine that is capable of real-time search, then is it really correct to compare Ocean with GData?  My feeling is that the thinking is:
> "When I access Google's databases using GData I can see my changes to those databases immediately".
> But that doesn't make GData this real-time thing, but rather the backend, no?
>
>
> Please enlighten me if I'm misunderstanding what GData is.  Thanks,
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org