You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@gora.apache.org by Noora <no...@gmail.com> on 2014/04/30 14:25:25 UTC

How does Gora work?

Hi All,

I want to integrate mysql and hdfs in my hadoop project. I searched a lot
about different ways, there was two approach: real time using "mysql
applier for hadoop" and "apache sqoop" for non real time uses.

Then I found that Gora has this ability too but I could not find any
information about how it works.

Is Gora real time or not? What is the difference between gora and mysql
applier or sqoop? If realtime, is db process blocking or not?
For integration of hadoop and mysql, does it need any nosql db as interface?

thanx

Re: How does Gora work?

Posted by Lewis John Mcgibbney <le...@gmail.com>.

There are previous messages on nutch mail archives regarding hacking
various tools to push data to mysql... please check there for previous
commentary.
Sql module is disabled. We would love for someone who has time to rewrite
it ;)
On May 1, 2014 3:06 AM, "Noora" <no...@gmail.com> wrote:

> Thank you for your answer and welcoming :)
>
> Actually I want to save crawl datum of nutch 1.7 in MySQL. Of course
> solution should be integrated with MR to write in mySQL during the inject
> and update db execution. Writing in hdfs file should be done too like past
> because of high speed of inserting and insertion in mysql db occurs
> parallel. So what is your suggestion?
>
> I have a look at nutch 2.2 but I could not run it with MySQL. Lately I've
> read that SQL madule of Gora is disabled at this moment. is it right?
>
>
> On Wed, Apr 30, 2014 at 12:13 PM, Henry Saputra <he...@gmail.com>wrote:
>
>> Hi Noora, welcome to Apache Gora in particular =)
>>
>> +1 well said about Apache Gora, Tim
>>
>> - Henry
>>
>> On Wed, Apr 30, 2014 at 6:06 AM, Tim Robertson
>> <ti...@gmail.com> wrote:
>> > Hi Noora,
>> >
>> > Welcome to the world of the Hadoop - It is a vast eco system and is
>> quite
>> > daunting at first.
>> >
>> > Perhaps if I summarize a few of the key technologies which build on each
>> > other it might help you navigate things:
>> >
>> > a) Hadoop DFS - the distributed file system
>> > b) Hadoop MapReduce (MR) - a distributed framework for processing where
>> you
>> > right Maps and Reduces.  It is batch oriented, with 30+ sec latency to
>> start
>> > even the smallest jobs, so not ideally suited to interactive operations
>> > c) Sqoop is a library that allows you to run MR jobs that either suck
>> data
>> > from a DB to HDFS or vice versa.  It supports a variety of formats,
>> such as
>> > Avro (a data format where the schema is embedded)
>> > d) You didn't mention it but Hive is a SQL layer, that allows to you to
>> run
>> > SQL as MR jobs.  A common use is MySQL -> Sqoop -> HDFS -> Hive
>> > e) HBase - a "big table" technology that allows you to have a column
>> > oriented data stored, and you can GET or PUT by key, or perform limited
>> > operations.
>> >
>> > So what is Gora?
>> > Gora is a effectively an Object Relational Mapper, that allows you to
>> define
>> > the table definition using Avro format, and provide a mapping of how
>> each
>> > field is stored against the backend system and then Gora takes care of
>> CRUD
>> > operations and mediation with the backend, without the caller actually
>> > knowing how to use the backend API.  Various backends are supported.
>>  Thus I
>> > can do Person p = new Person("Tim") and then "gora save Tim" - Gora will
>> > then take care of saving my object in (e.g.) HBase.  There are
>> connectors
>> > that allow you to run MR jobs over Gora stores as well.  Gora is
>> similar to
>> > the likes of MyBATIS if you are familiar with that, but support "Hadoop
>> > technologies" as backends, and provides MR capability allowing you to MR
>> > across various backends consistently.
>> >
>> > So is gora real time or not - yes it is real time for CRUD, but MR type
>> jobs
>> > are batch operations, with reasonably high latency.
>> > Does gora block? that depends on the backend... With HBase updates for
>> > example, you typically either overwrite, or fail the update on a race
>> > condition, and scans are non blocking.
>> >
>> > Perhaps if you explain what you are trying to do, the list can help
>> advise
>> > you if Gora is a suitable option, or could suggest the appropriate
>> Hadoop
>> > list to ask?
>> >
>> > I hope this helps,
>> > Tim
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Apr 30, 2014 at 2:25 PM, Noora <no...@gmail.com> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> I want to integrate mysql and hdfs in my hadoop project. I searched a
>> lot
>> >> about different ways, there was two approach: real time using "mysql
>> applier
>> >> for hadoop" and "apache sqoop" for non real time uses.
>> >>
>> >> Then I found that Gora has this ability too but I could not find any
>> >> information about how it works.
>> >>
>> >> Is Gora real time or not? What is the difference between gora and mysql
>> >> applier or sqoop? If realtime, is db process blocking or not?
>> >> For integration of hadoop and mysql, does it need any nosql db as
>> >> interface?
>> >>
>> >> thanx
>> >
>> >
>>
>
>

Re: How does Gora work?

Posted by Noora <no...@gmail.com>.

Thank you for your answer and welcoming :)

Actually I want to save crawl datum of nutch 1.7 in MySQL. Of course
solution should be integrated with MR to write in mySQL during the inject
and update db execution. Writing in hdfs file should be done too like past
because of high speed of inserting and insertion in mysql db occurs
parallel. So what is your suggestion?

I have a look at nutch 2.2 but I could not run it with MySQL. Lately I've
read that SQL madule of Gora is disabled at this moment. is it right?


On Wed, Apr 30, 2014 at 12:13 PM, Henry Saputra <he...@gmail.com>wrote:

> Hi Noora, welcome to Apache Gora in particular =)
>
> +1 well said about Apache Gora, Tim
>
> - Henry
>
> On Wed, Apr 30, 2014 at 6:06 AM, Tim Robertson
> <ti...@gmail.com> wrote:
> > Hi Noora,
> >
> > Welcome to the world of the Hadoop - It is a vast eco system and is quite
> > daunting at first.
> >
> > Perhaps if I summarize a few of the key technologies which build on each
> > other it might help you navigate things:
> >
> > a) Hadoop DFS - the distributed file system
> > b) Hadoop MapReduce (MR) - a distributed framework for processing where
> you
> > right Maps and Reduces.  It is batch oriented, with 30+ sec latency to
> start
> > even the smallest jobs, so not ideally suited to interactive operations
> > c) Sqoop is a library that allows you to run MR jobs that either suck
> data
> > from a DB to HDFS or vice versa.  It supports a variety of formats, such
> as
> > Avro (a data format where the schema is embedded)
> > d) You didn't mention it but Hive is a SQL layer, that allows to you to
> run
> > SQL as MR jobs.  A common use is MySQL -> Sqoop -> HDFS -> Hive
> > e) HBase - a "big table" technology that allows you to have a column
> > oriented data stored, and you can GET or PUT by key, or perform limited
> > operations.
> >
> > So what is Gora?
> > Gora is a effectively an Object Relational Mapper, that allows you to
> define
> > the table definition using Avro format, and provide a mapping of how each
> > field is stored against the backend system and then Gora takes care of
> CRUD
> > operations and mediation with the backend, without the caller actually
> > knowing how to use the backend API.  Various backends are supported.
>  Thus I
> > can do Person p = new Person("Tim") and then "gora save Tim" - Gora will
> > then take care of saving my object in (e.g.) HBase.  There are connectors
> > that allow you to run MR jobs over Gora stores as well.  Gora is similar
> to
> > the likes of MyBATIS if you are familiar with that, but support "Hadoop
> > technologies" as backends, and provides MR capability allowing you to MR
> > across various backends consistently.
> >
> > So is gora real time or not - yes it is real time for CRUD, but MR type
> jobs
> > are batch operations, with reasonably high latency.
> > Does gora block? that depends on the backend... With HBase updates for
> > example, you typically either overwrite, or fail the update on a race
> > condition, and scans are non blocking.
> >
> > Perhaps if you explain what you are trying to do, the list can help
> advise
> > you if Gora is a suitable option, or could suggest the appropriate Hadoop
> > list to ask?
> >
> > I hope this helps,
> > Tim
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 30, 2014 at 2:25 PM, Noora <no...@gmail.com> wrote:
> >>
> >> Hi All,
> >>
> >> I want to integrate mysql and hdfs in my hadoop project. I searched a
> lot
> >> about different ways, there was two approach: real time using "mysql
> applier
> >> for hadoop" and "apache sqoop" for non real time uses.
> >>
> >> Then I found that Gora has this ability too but I could not find any
> >> information about how it works.
> >>
> >> Is Gora real time or not? What is the difference between gora and mysql
> >> applier or sqoop? If realtime, is db process blocking or not?
> >> For integration of hadoop and mysql, does it need any nosql db as
> >> interface?
> >>
> >> thanx
> >
> >
>

Re: How does Gora work?

Posted by Henry Saputra <he...@gmail.com>.

Hi Noora, welcome to Apache Gora in particular =)

+1 well said about Apache Gora, Tim

- Henry

On Wed, Apr 30, 2014 at 6:06 AM, Tim Robertson
<ti...@gmail.com> wrote:
> Hi Noora,
>
> Welcome to the world of the Hadoop - It is a vast eco system and is quite
> daunting at first.
>
> Perhaps if I summarize a few of the key technologies which build on each
> other it might help you navigate things:
>
> a) Hadoop DFS - the distributed file system
> b) Hadoop MapReduce (MR) - a distributed framework for processing where you
> right Maps and Reduces.  It is batch oriented, with 30+ sec latency to start
> even the smallest jobs, so not ideally suited to interactive operations
> c) Sqoop is a library that allows you to run MR jobs that either suck data
> from a DB to HDFS or vice versa.  It supports a variety of formats, such as
> Avro (a data format where the schema is embedded)
> d) You didn't mention it but Hive is a SQL layer, that allows to you to run
> SQL as MR jobs.  A common use is MySQL -> Sqoop -> HDFS -> Hive
> e) HBase - a "big table" technology that allows you to have a column
> oriented data stored, and you can GET or PUT by key, or perform limited
> operations.
>
> So what is Gora?
> Gora is a effectively an Object Relational Mapper, that allows you to define
> the table definition using Avro format, and provide a mapping of how each
> field is stored against the backend system and then Gora takes care of CRUD
> operations and mediation with the backend, without the caller actually
> knowing how to use the backend API.  Various backends are supported.  Thus I
> can do Person p = new Person("Tim") and then "gora save Tim" - Gora will
> then take care of saving my object in (e.g.) HBase.  There are connectors
> that allow you to run MR jobs over Gora stores as well.  Gora is similar to
> the likes of MyBATIS if you are familiar with that, but support "Hadoop
> technologies" as backends, and provides MR capability allowing you to MR
> across various backends consistently.
>
> So is gora real time or not - yes it is real time for CRUD, but MR type jobs
> are batch operations, with reasonably high latency.
> Does gora block? that depends on the backend... With HBase updates for
> example, you typically either overwrite, or fail the update on a race
> condition, and scans are non blocking.
>
> Perhaps if you explain what you are trying to do, the list can help advise
> you if Gora is a suitable option, or could suggest the appropriate Hadoop
> list to ask?
>
> I hope this helps,
> Tim
>
>
>
>
>
>
> On Wed, Apr 30, 2014 at 2:25 PM, Noora <no...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I want to integrate mysql and hdfs in my hadoop project. I searched a lot
>> about different ways, there was two approach: real time using "mysql applier
>> for hadoop" and "apache sqoop" for non real time uses.
>>
>> Then I found that Gora has this ability too but I could not find any
>> information about how it works.
>>
>> Is Gora real time or not? What is the difference between gora and mysql
>> applier or sqoop? If realtime, is db process blocking or not?
>> For integration of hadoop and mysql, does it need any nosql db as
>> interface?
>>
>> thanx
>
>

Re: How does Gora work?

Posted by Tim Robertson <ti...@gmail.com>.

Hi Noora,

Welcome to the world of the Hadoop - It is a vast eco system and is quite
daunting at first.

Perhaps if I summarize a few of the key technologies which build on each
other it might help you navigate things:

a) Hadoop DFS - the distributed file system
b) Hadoop MapReduce (MR) - a distributed framework for processing where you
right Maps and Reduces.  It is batch oriented, with 30+ sec latency to
start even the smallest jobs, so not ideally suited to interactive
operations
c) Sqoop is a library that allows you to run MR jobs that either suck data
from a DB to HDFS or vice versa.  It supports a variety of formats, such as
Avro (a data format where the schema is embedded)
d) You didn't mention it but Hive is a SQL layer, that allows to you to run
SQL as MR jobs.  A common use is MySQL -> Sqoop -> HDFS -> Hive
e) HBase - a "big table" technology that allows you to have a column
oriented data stored, and you can GET or PUT by key, or perform limited
operations.

So what is Gora?
Gora is a effectively an Object Relational Mapper, that allows you to
define the table definition using Avro format, and provide a mapping of how
each field is stored against the backend system and then Gora takes care of
CRUD operations and mediation with the backend, without the caller actually
knowing how to use the backend API.  Various backends are supported.  Thus
I can do Person p = new Person("Tim") and then "gora save Tim" - Gora will
then take care of saving my object in (e.g.) HBase.  There are connectors
that allow you to run MR jobs over Gora stores as well.  Gora is similar to
the likes of MyBATIS if you are familiar with that, but support "Hadoop
technologies" as backends, and provides MR capability allowing you to MR
across various backends consistently.

So is gora real time or not - yes it is real time for CRUD, but MR type
jobs are batch operations, with reasonably high latency.
Does gora block? that depends on the backend... With HBase updates for
example, you typically either overwrite, or fail the update on a race
condition, and scans are non blocking.

Perhaps if you explain what you are trying to do, the list can help advise
you if Gora is a suitable option, or could suggest the appropriate Hadoop
list to ask?

I hope this helps,
Tim

On Wed, Apr 30, 2014 at 2:25 PM, Noora <no...@gmail.com> wrote:

> Hi All,
>
> I want to integrate mysql and hdfs in my hadoop project. I searched a lot
> about different ways, there was two approach: real time using "mysql
> applier for hadoop" and "apache sqoop" for non real time uses.
>
> Then I found that Gora has this ability too but I could not find any
> information about how it works.
>
> Is Gora real time or not? What is the difference between gora and mysql
> applier or sqoop? If realtime, is db process blocking or not?
> For integration of hadoop and mysql, does it need any nosql db as
> interface?
>
> thanx
>