You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Yash Sharma <ya...@gmail.com> on 2016/03/13 03:31:00 UTC

Newbie question

Hi All,
I have been recently reading about Apache Beam and am interested in
exploring how it fits into our stack.

We currently have our hive and spark pipelines. We have the late data
arrival issues and have to reprocess couple of steps to ensure the data is
consumed.

Couple of questions on top of my mind are -

1. Does Beam use the existing cluster or needs its own cluster ?
2. How Beam fits with the existing Hive and Spark jobs ? What changes might
be required in the jobs for starting with Beam ?

Best,
Yash

Re: Newbie question

Posted by Yash Sharma <ya...@gmail.com>.
Thats a great post.  Thanks.

- Thanks, via mobile,  excuse brevity.
On Mar 14, 2016 11:49 AM, "Jean-Baptiste Onofré" <jb...@nanthrax.net> wrote:

> Hi Yash,
>
> you can already take a look on Google Dataflow examples, and blog posts (
> http://blog.nanthrax.net/2016/01/introducing-apache-dataflow/)
>
> Regards
> JB
>
> On 03/13/2016 11:46 PM, Yash Sharma wrote:
>
>> Thanks Jean.
>> I am excited to see some examples of Beam 'getting started' once the
>> bootstrap is complete.
>>
>> Best,
>> yash
>>
>>
>>
>> On Sun, Mar 13, 2016 at 4:22 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>> Hi Yash,
>>>
>>> Beam is a SDK, so it runs on an existing cluster.
>>>
>>> You design jobs as pipeline: it's a "programming model".
>>>
>>> For your late data arrival issues, maybe Falcon can help there.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 03/13/2016 03:31 AM, Yash Sharma wrote:
>>>
>>> Hi All,
>>>> I have been recently reading about Apache Beam and am interested in
>>>> exploring how it fits into our stack.
>>>>
>>>> We currently have our hive and spark pipelines. We have the late data
>>>> arrival issues and have to reprocess couple of steps to ensure the data
>>>> is
>>>> consumed.
>>>>
>>>> Couple of questions on top of my mind are -
>>>>
>>>> 1. Does Beam use the existing cluster or needs its own cluster ?
>>>> 2. How Beam fits with the existing Hive and Spark jobs ? What changes
>>>> might
>>>> be required in the jobs for starting with Beam ?
>>>>
>>>> Best,
>>>> Yash
>>>>
>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Newbie question

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Yash,

you can already take a look on Google Dataflow examples, and blog posts 
(http://blog.nanthrax.net/2016/01/introducing-apache-dataflow/)

Regards
JB

On 03/13/2016 11:46 PM, Yash Sharma wrote:
> Thanks Jean.
> I am excited to see some examples of Beam 'getting started' once the
> bootstrap is complete.
>
> Best,
> yash
>
>
>
> On Sun, Mar 13, 2016 at 4:22 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Hi Yash,
>>
>> Beam is a SDK, so it runs on an existing cluster.
>>
>> You design jobs as pipeline: it's a "programming model".
>>
>> For your late data arrival issues, maybe Falcon can help there.
>>
>> Regards
>> JB
>>
>>
>> On 03/13/2016 03:31 AM, Yash Sharma wrote:
>>
>>> Hi All,
>>> I have been recently reading about Apache Beam and am interested in
>>> exploring how it fits into our stack.
>>>
>>> We currently have our hive and spark pipelines. We have the late data
>>> arrival issues and have to reprocess couple of steps to ensure the data is
>>> consumed.
>>>
>>> Couple of questions on top of my mind are -
>>>
>>> 1. Does Beam use the existing cluster or needs its own cluster ?
>>> 2. How Beam fits with the existing Hive and Spark jobs ? What changes
>>> might
>>> be required in the jobs for starting with Beam ?
>>>
>>> Best,
>>> Yash
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Newbie question

Posted by Yash Sharma <ya...@gmail.com>.
Thanks Jean.
I am excited to see some examples of Beam 'getting started' once the
bootstrap is complete.

Best,
yash



On Sun, Mar 13, 2016 at 4:22 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Yash,
>
> Beam is a SDK, so it runs on an existing cluster.
>
> You design jobs as pipeline: it's a "programming model".
>
> For your late data arrival issues, maybe Falcon can help there.
>
> Regards
> JB
>
>
> On 03/13/2016 03:31 AM, Yash Sharma wrote:
>
>> Hi All,
>> I have been recently reading about Apache Beam and am interested in
>> exploring how it fits into our stack.
>>
>> We currently have our hive and spark pipelines. We have the late data
>> arrival issues and have to reprocess couple of steps to ensure the data is
>> consumed.
>>
>> Couple of questions on top of my mind are -
>>
>> 1. Does Beam use the existing cluster or needs its own cluster ?
>> 2. How Beam fits with the existing Hive and Spark jobs ? What changes
>> might
>> be required in the jobs for starting with Beam ?
>>
>> Best,
>> Yash
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Newbie question

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Yash,

Beam is a SDK, so it runs on an existing cluster.

You design jobs as pipeline: it's a "programming model".

For your late data arrival issues, maybe Falcon can help there.

Regards
JB

On 03/13/2016 03:31 AM, Yash Sharma wrote:
> Hi All,
> I have been recently reading about Apache Beam and am interested in
> exploring how it fits into our stack.
>
> We currently have our hive and spark pipelines. We have the late data
> arrival issues and have to reprocess couple of steps to ensure the data is
> consumed.
>
> Couple of questions on top of my mind are -
>
> 1. Does Beam use the existing cluster or needs its own cluster ?
> 2. How Beam fits with the existing Hive and Spark jobs ? What changes might
> be required in the jobs for starting with Beam ?
>
> Best,
> Yash
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com