You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by 杨洪波 <ho...@163.com> on 2014/10/16 04:27:12 UTC

about hdfs in flink cluster

hey all,
i am new in flink and i have a question:  i use flink cluster to read mysql datasource,my biz don't need hdfs, do i must set HDFS in my flink cluster? if yes,then for what flink use it ?
thanks!   

Re: Re: about hdfs in flink cluster

Posted by Robert Metzger <rm...@apache.org>.
You don't need to build Flink yourself if you want to use it without Hadoop.
The downloadable file contains the Hadoop client libraries for accessing
HDFS, but the system will only use them if you connect it to HDFS.

You are right, the "Cluster Setup"-guide explains how to setup Flink with
HDFS. But if you don't want to use HDFS, you can just skip the entire HDFS
section and only follow the Flink-setup steps for a cluster.
We should probably add a little note to our website that says that the HDFS
setup-step is not required for running Flink.

Note: If you want to use Flink on YARN, you need to have HDFS.

On Thu, Oct 16, 2014 at 3:16 PM, 杨洪波 <ho...@163.com> wrote:

>
>
> thanks.    maybe i need to build flink myself and then setup in cluster
> mode without hdfs .
>
>
> why we need hdfs setup in "cluster setup ?   (   it said "This involves
> two steps. First, installing and configuring Flink and second installing
> and configuring the Hadoop Distributed Filesystem (HDFS)." )
>
>
> local setup & cluster setup without hdfs & cluster setup with hdfs & yarn
> setup ?
>
>
>
>
>
>
>
> At 2014-10-16 18:27:08, "Kostas Tzoumas" <kt...@apache.org> wrote:
> >No, you don't need to install HDFS. You can use Flink without HDFS.
> >
> >On Thu, Oct 16, 2014 at 12:17 PM, Robert Metzger <rm...@apache.org>
> >wrote:
> >
> >> Hi,
> >>
> >> good to hear that you are using Flink.
> >>
> >> HDFS is a distributed file system for reliably storing huge amounts of
> >> data. Many users of Flink use HDFS to store all kinds of data on it.
> >> This includes both input data for their jobs and also results.
> >> Often, Flink and HDFS are installed next to each other in a cluster so
> that
> >> the same machines that contain the data also process it.
> >>
> >> An example for you could be that you store the data from MySQL in HDFS.
> Or
> >> you could join data from MySQL with data in HDFS.
> >>
> >>
> >> Let us know if you have more questions. We are happy to help!
> >>
> >>
> >>
> >> On Thu, Oct 16, 2014 at 9:56 AM, Márton Balassi <
> balassi.marton@gmail.com>
> >> wrote:
> >>
> >> > Flink does not require HDFS if you decide not to use it. The version
> you
> >> > are currently using should fit you then. You might change it if you
> >> need/do
> >> > not need YARN support.
> >> >
> >> > On Thu, Oct 16, 2014 at 4:27 AM, 杨洪波 <ho...@163.com> wrote:
> >> >
> >> > > hey all,
> >> > > i am new in flink and i have a question:  i use flink cluster to
> read
> >> > > mysql datasource,my biz don't need hdfs, do i must set HDFS in my
> flink
> >> > > cluster? if yes,then for what flink use it ?
> >> > > thanks!
> >> >
> >>
>

Re: Re: about hdfs in flink cluster

Posted by Fabian Hueske <fh...@apache.org>.
That's a good point. The HDFS setup is definitely not a hard requirement.
However, Flink on a cluster does only work together with a storage system
that can be accessed in parallel. HDFS is likely the most popular choice
for that.

The HDFS setup could be marked as optional in the cluster setup.

2014-10-16 15:16 GMT+02:00 杨洪波 <ho...@163.com>:

>
>
> thanks.    maybe i need to build flink myself and then setup in cluster
> mode without hdfs .
>
>
> why we need hdfs setup in "cluster setup ?   (   it said "This involves
> two steps. First, installing and configuring Flink and second installing
> and configuring the Hadoop Distributed Filesystem (HDFS)." )
>
>
> local setup & cluster setup without hdfs & cluster setup with hdfs & yarn
> setup ?
>
>
>
>
>
>
>
> At 2014-10-16 18:27:08, "Kostas Tzoumas" <kt...@apache.org> wrote:
> >No, you don't need to install HDFS. You can use Flink without HDFS.
> >
> >On Thu, Oct 16, 2014 at 12:17 PM, Robert Metzger <rm...@apache.org>
> >wrote:
> >
> >> Hi,
> >>
> >> good to hear that you are using Flink.
> >>
> >> HDFS is a distributed file system for reliably storing huge amounts of
> >> data. Many users of Flink use HDFS to store all kinds of data on it.
> >> This includes both input data for their jobs and also results.
> >> Often, Flink and HDFS are installed next to each other in a cluster so
> that
> >> the same machines that contain the data also process it.
> >>
> >> An example for you could be that you store the data from MySQL in HDFS.
> Or
> >> you could join data from MySQL with data in HDFS.
> >>
> >>
> >> Let us know if you have more questions. We are happy to help!
> >>
> >>
> >>
> >> On Thu, Oct 16, 2014 at 9:56 AM, Márton Balassi <
> balassi.marton@gmail.com>
> >> wrote:
> >>
> >> > Flink does not require HDFS if you decide not to use it. The version
> you
> >> > are currently using should fit you then. You might change it if you
> >> need/do
> >> > not need YARN support.
> >> >
> >> > On Thu, Oct 16, 2014 at 4:27 AM, 杨洪波 <ho...@163.com> wrote:
> >> >
> >> > > hey all,
> >> > > i am new in flink and i have a question:  i use flink cluster to
> read
> >> > > mysql datasource,my biz don't need hdfs, do i must set HDFS in my
> flink
> >> > > cluster? if yes,then for what flink use it ?
> >> > > thanks!
> >> >
> >>
>

Re:Re: about hdfs in flink cluster

Posted by 杨洪波 <ho...@163.com>.

thanks.    maybe i need to build flink myself and then setup in cluster mode without hdfs .


why we need hdfs setup in "cluster setup ?   (   it said "This involves two steps. First, installing and configuring Flink and second installing and configuring the Hadoop Distributed Filesystem (HDFS)." )


local setup & cluster setup without hdfs & cluster setup with hdfs & yarn setup ?







At 2014-10-16 18:27:08, "Kostas Tzoumas" <kt...@apache.org> wrote:
>No, you don't need to install HDFS. You can use Flink without HDFS.
>
>On Thu, Oct 16, 2014 at 12:17 PM, Robert Metzger <rm...@apache.org>
>wrote:
>
>> Hi,
>>
>> good to hear that you are using Flink.
>>
>> HDFS is a distributed file system for reliably storing huge amounts of
>> data. Many users of Flink use HDFS to store all kinds of data on it.
>> This includes both input data for their jobs and also results.
>> Often, Flink and HDFS are installed next to each other in a cluster so that
>> the same machines that contain the data also process it.
>>
>> An example for you could be that you store the data from MySQL in HDFS. Or
>> you could join data from MySQL with data in HDFS.
>>
>>
>> Let us know if you have more questions. We are happy to help!
>>
>>
>>
>> On Thu, Oct 16, 2014 at 9:56 AM, Márton Balassi <ba...@gmail.com>
>> wrote:
>>
>> > Flink does not require HDFS if you decide not to use it. The version you
>> > are currently using should fit you then. You might change it if you
>> need/do
>> > not need YARN support.
>> >
>> > On Thu, Oct 16, 2014 at 4:27 AM, 杨洪波 <ho...@163.com> wrote:
>> >
>> > > hey all,
>> > > i am new in flink and i have a question:  i use flink cluster to read
>> > > mysql datasource,my biz don't need hdfs, do i must set HDFS in my flink
>> > > cluster? if yes,then for what flink use it ?
>> > > thanks!
>> >
>>

Re: about hdfs in flink cluster

Posted by Kostas Tzoumas <kt...@apache.org>.
No, you don't need to install HDFS. You can use Flink without HDFS.

On Thu, Oct 16, 2014 at 12:17 PM, Robert Metzger <rm...@apache.org>
wrote:

> Hi,
>
> good to hear that you are using Flink.
>
> HDFS is a distributed file system for reliably storing huge amounts of
> data. Many users of Flink use HDFS to store all kinds of data on it.
> This includes both input data for their jobs and also results.
> Often, Flink and HDFS are installed next to each other in a cluster so that
> the same machines that contain the data also process it.
>
> An example for you could be that you store the data from MySQL in HDFS. Or
> you could join data from MySQL with data in HDFS.
>
>
> Let us know if you have more questions. We are happy to help!
>
>
>
> On Thu, Oct 16, 2014 at 9:56 AM, Márton Balassi <ba...@gmail.com>
> wrote:
>
> > Flink does not require HDFS if you decide not to use it. The version you
> > are currently using should fit you then. You might change it if you
> need/do
> > not need YARN support.
> >
> > On Thu, Oct 16, 2014 at 4:27 AM, 杨洪波 <ho...@163.com> wrote:
> >
> > > hey all,
> > > i am new in flink and i have a question:  i use flink cluster to read
> > > mysql datasource,my biz don't need hdfs, do i must set HDFS in my flink
> > > cluster? if yes,then for what flink use it ?
> > > thanks!
> >
>

Re: about hdfs in flink cluster

Posted by Robert Metzger <rm...@apache.org>.
Hi,

good to hear that you are using Flink.

HDFS is a distributed file system for reliably storing huge amounts of
data. Many users of Flink use HDFS to store all kinds of data on it.
This includes both input data for their jobs and also results.
Often, Flink and HDFS are installed next to each other in a cluster so that
the same machines that contain the data also process it.

An example for you could be that you store the data from MySQL in HDFS. Or
you could join data from MySQL with data in HDFS.


Let us know if you have more questions. We are happy to help!



On Thu, Oct 16, 2014 at 9:56 AM, Márton Balassi <ba...@gmail.com>
wrote:

> Flink does not require HDFS if you decide not to use it. The version you
> are currently using should fit you then. You might change it if you need/do
> not need YARN support.
>
> On Thu, Oct 16, 2014 at 4:27 AM, 杨洪波 <ho...@163.com> wrote:
>
> > hey all,
> > i am new in flink and i have a question:  i use flink cluster to read
> > mysql datasource,my biz don't need hdfs, do i must set HDFS in my flink
> > cluster? if yes,then for what flink use it ?
> > thanks!
>

Re: about hdfs in flink cluster

Posted by Márton Balassi <ba...@gmail.com>.
Flink does not require HDFS if you decide not to use it. The version you
are currently using should fit you then. You might change it if you need/do
not need YARN support.

On Thu, Oct 16, 2014 at 4:27 AM, 杨洪波 <ho...@163.com> wrote:

> hey all,
> i am new in flink and i have a question:  i use flink cluster to read
> mysql datasource,my biz don't need hdfs, do i must set HDFS in my flink
> cluster? if yes,then for what flink use it ?
> thanks!

Re: about hdfs in flink cluster

Posted by Stephan Ewen <se...@apache.org>.
Short answer is: No, HDFS is purely optional.

Have a look at the JDBCInputFormat example:
https://github.com/apache/incubator-flink/blob/master/flink-addons/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/example/JDBCExample.java

Greetings,
Stephan



On Thu, Oct 16, 2014 at 4:27 AM, 杨洪波 <ho...@163.com> wrote:

> hey all,
> i am new in flink and i have a question:  i use flink cluster to read
> mysql datasource,my biz don't need hdfs, do i must set HDFS in my flink
> cluster? if yes,then for what flink use it ?
> thanks!