You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Chiwan Park <ch...@apache.org> on 2016/01/04 12:52:03 UTC

Flink on EMR Question

Hi All,

I have some problems using Flink on Amazon EMR cluster.

Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.

I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.

Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.

As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?

Regards,
Chiwan Park

Re: Flink on EMR Question

Posted by Stephan Ewen <se...@apache.org>.

Would it cause problems if I remove it from the "flink-runtime" pom?

Seems strange to have a dependency there that we do not even use...

On Wed, Jan 6, 2016 at 12:07 PM, Ufuk Celebi <uc...@apache.org> wrote:

> @Stephan: It was added to the dependency management section in order to
> enforce a higher version for S3 client, because it was causing problems
> earlier.
>
> > On 06 Jan 2016, at 11:14, Chiwan Park <ch...@apache.org> wrote:
> >
> > Great! Thanks for addressing!
> >
> >> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <se...@apache.org> wrote:
> >>
> >> At a first look, I think that "flink-runtime" does not need Apache
> Httpclient at all. I'll try to simply remove that dependency...
> >>
> >> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <ch...@apache.org>
> wrote:
> >> Hi,
> >>
> >> Thanks for answering me!
> >>
> >> It is happy to hear the problem will be addressed. :)
> >>
> >> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3
> file system api implemented by Amazon uses 4.3.x. There are some API
> changes, so NoSuchMethodError exception occurs.
> >>
> >>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <se...@apache.org> wrote:
> >>>
> >>> Hi!
> >>>
> >>> Concerning (1) We have seen that a few times. The JVMs / Threads do
> sometimes not properly exit in a graceful way, and YARN is not always able
> to kill the process (YARN bug). I am currently working on a refactoring of
> the YARN resource manager (to allow to easy addition of other frameworks)
> and have addressed this as part of that. Will be in the master in a bit.
> >>>
> >>> Concerning (2) Do you know which component in Flink uses the HTTP
> client?
> >>>
> >>> Greetings,
> >>> Stephan
> >>>
> >>>
> >>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <
> maximilian.bode@tngtech.com> wrote:
> >>> Hi everyone,
> >>>
> >>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a
> (3-node, non-EMR) YARN cluster. After shutting down the yarn session via
> `stop`, one container seems to linger around. `yarn application -list` is
> empty, whereas `bin/yarn-session.sh -q` lists the left-over container.
> Also, there is still one application shown as ‚running‘ in Ambari’s YARN
> pane under current applications. Then, after some time (order of a few
> minutes) it disappears and the resources are available again.
> >>>
> >>> I have not tested this behavior extensibly so far. Noticeably, I was
> not able to reproduce it by just starting a session and then ending it
> again right away without looking at the JobManager web interface. Maybe
> this produces some kind of lag as far as YARN containers are concerned?
> >>>
> >>> Cheers,
> >>> Max
> >>>
> >>>> Am 04.01.2016 um 12:52 schrieb Chiwan Park <ch...@apache.org>:
> >>>>
> >>>> Hi All,
> >>>>
> >>>> I have some problems using Flink on Amazon EMR cluster.
> >>>>
> >>>> Q1. Sometimes, jobmanager container still exists after destroying
> yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited
> correctly in YARN RM dashboard. But there is a running container in the
> dashboard. From logs of the container, I realize that the container is
> jobmanager.
> >>>>
> >>>> I cannot kill the container because there is no permission to restart
> YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem
> doesn’t appear.
> >>>>
> >>>> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it
> because of version conflict of Apache Httpclient. In default,
> implementation of S3 file system in EMR is
> `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with
> other version of Apache Httpclient.
> >>>>
> >>>> As I wrote above, I cannot restart Hadoop cluster after modifying
> conf-site.xml because of lack of permission. How can I solve this problem?
> >>>>
> >>>> Regards,
> >>>> Chiwan Park
> >>>>
> >>>>
> >>
> >> Regards,
> >> Chiwan Park
> >
> > Regards,
> > Chiwan Park
> >
> >
>
>

Re: Flink on EMR Question

Posted by Ufuk Celebi <uc...@apache.org>.

@Stephan: It was added to the dependency management section in order to enforce a higher version for S3 client, because it was causing problems earlier.

> On 06 Jan 2016, at 11:14, Chiwan Park <ch...@apache.org> wrote:
> 
> Great! Thanks for addressing!
> 
>> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <se...@apache.org> wrote:
>> 
>> At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency...
>> 
>> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <ch...@apache.org> wrote:
>> Hi,
>> 
>> Thanks for answering me!
>> 
>> It is happy to hear the problem will be addressed. :)
>> 
>> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.
>> 
>>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <se...@apache.org> wrote:
>>> 
>>> Hi!
>>> 
>>> Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
>>> 
>>> Concerning (2) Do you know which component in Flink uses the HTTP client?
>>> 
>>> Greetings,
>>> Stephan
>>> 
>>> 
>>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <ma...@tngtech.com> wrote:
>>> Hi everyone,
>>> 
>>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
>>> 
>>> I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
>>> 
>>> Cheers,
>>> Max
>>> 
>>>> Am 04.01.2016 um 12:52 schrieb Chiwan Park <ch...@apache.org>:
>>>> 
>>>> Hi All,
>>>> 
>>>> I have some problems using Flink on Amazon EMR cluster.
>>>> 
>>>> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
>>>> 
>>>> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
>>>> 
>>>> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
>>>> 
>>>> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
>>>> 
>>>> Regards,
>>>> Chiwan Park
>>>> 
>>>> 
>> 
>> Regards,
>> Chiwan Park
> 
> Regards,
> Chiwan Park
> 
>

Re: Flink on EMR Question

Posted by Chiwan Park <ch...@apache.org>.

Great! Thanks for addressing!

> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> At a first look, I think that "flink-runtime" does not need Apache Httpclient at all. I'll try to simply remove that dependency...
> 
> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <ch...@apache.org> wrote:
> Hi,
> 
> Thanks for answering me!
> 
> It is happy to hear the problem will be addressed. :)
> 
> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.
> 
> > On Jan 5, 2016, at 11:59 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > Hi!
> >
> > Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
> >
> > Concerning (2) Do you know which component in Flink uses the HTTP client?
> >
> > Greetings,
> > Stephan
> >
> >
> > On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <ma...@tngtech.com> wrote:
> > Hi everyone,
> >
> > Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
> >
> > I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
> >
> > Cheers,
> > Max
> >
> > > Am 04.01.2016 um 12:52 schrieb Chiwan Park <ch...@apache.org>:
> > >
> > > Hi All,
> > >
> > > I have some problems using Flink on Amazon EMR cluster.
> > >
> > > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
> > >
> > > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
> > >
> > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
> > >
> > > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
> > >
> > > Regards,
> > > Chiwan Park
> > >
> > >
> 
> Regards,
> Chiwan Park

Regards,
Chiwan Park

Re: Flink on EMR Question

Posted by Stephan Ewen <se...@apache.org>.

At a first look, I think that "flink-runtime" does not need Apache
Httpclient at all. I'll try to simply remove that dependency...

On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <ch...@apache.org> wrote:

> Hi,
>
> Thanks for answering me!
>
> It is happy to hear the problem will be addressed. :)
>
> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file
> system api implemented by Amazon uses 4.3.x. There are some API changes, so
> NoSuchMethodError exception occurs.
>
> > On Jan 5, 2016, at 11:59 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > Hi!
> >
> > Concerning (1) We have seen that a few times. The JVMs / Threads do
> sometimes not properly exit in a graceful way, and YARN is not always able
> to kill the process (YARN bug). I am currently working on a refactoring of
> the YARN resource manager (to allow to easy addition of other frameworks)
> and have addressed this as part of that. Will be in the master in a bit.
> >
> > Concerning (2) Do you know which component in Flink uses the HTTP client?
> >
> > Greetings,
> > Stephan
> >
> >
> > On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <
> maximilian.bode@tngtech.com> wrote:
> > Hi everyone,
> >
> > Regarding Q1, I believe I have witnessed a comparable phenomenon in a
> (3-node, non-EMR) YARN cluster. After shutting down the yarn session via
> `stop`, one container seems to linger around. `yarn application -list` is
> empty, whereas `bin/yarn-session.sh -q` lists the left-over container.
> Also, there is still one application shown as ‚running‘ in Ambari’s YARN
> pane under current applications. Then, after some time (order of a few
> minutes) it disappears and the resources are available again.
> >
> > I have not tested this behavior extensibly so far. Noticeably, I was not
> able to reproduce it by just starting a session and then ending it again
> right away without looking at the JobManager web interface. Maybe this
> produces some kind of lag as far as YARN containers are concerned?
> >
> > Cheers,
> > Max
> >
> > > Am 04.01.2016 um 12:52 schrieb Chiwan Park <ch...@apache.org>:
> > >
> > > Hi All,
> > >
> > > I have some problems using Flink on Amazon EMR cluster.
> > >
> > > Q1. Sometimes, jobmanager container still exists after destroying yarn
> session by pressing Ctrl+C. In that case, Flink YARN app seems exited
> correctly in YARN RM dashboard. But there is a running container in the
> dashboard. From logs of the container, I realize that the container is
> jobmanager.
> > >
> > > I cannot kill the container because there is no permission to restart
> YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem
> doesn’t appear.
> > >
> > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it
> because of version conflict of Apache Httpclient. In default,
> implementation of S3 file system in EMR is
> `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with
> other version of Apache Httpclient.
> > >
> > > As I wrote above, I cannot restart Hadoop cluster after modifying
> conf-site.xml because of lack of permission. How can I solve this problem?
> > >
> > > Regards,
> > > Chiwan Park
> > >
> > >
>
> Regards,
> Chiwan Park
>
>
>

Re: Flink on EMR Question

Posted by Chiwan Park <ch...@apache.org>.

Hi,

Thanks for answering me!

It is happy to hear the problem will be addressed. :)

About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file system api implemented by Amazon uses 4.3.x. There are some API changes, so NoSuchMethodError exception occurs.

> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> Hi!
> 
> Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks) and have addressed this as part of that. Will be in the master in a bit.
> 
> Concerning (2) Do you know which component in Flink uses the HTTP client?
> 
> Greetings,
> Stephan
> 
> 
> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <ma...@tngtech.com> wrote:
> Hi everyone,
> 
> Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.
> 
> I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?
> 
> Cheers,
> Max
> 
> > Am 04.01.2016 um 12:52 schrieb Chiwan Park <ch...@apache.org>:
> >
> > Hi All,
> >
> > I have some problems using Flink on Amazon EMR cluster.
> >
> > Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
> >
> > I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
> >
> > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
> >
> > As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
> >
> > Regards,
> > Chiwan Park
> >
> >

Regards,
Chiwan Park

Re: Flink on EMR Question

Posted by Stephan Ewen <se...@apache.org>.

Hi!

Concerning (1) We have seen that a few times. The JVMs / Threads do
sometimes not properly exit in a graceful way, and YARN is not always able
to kill the process (YARN bug). I am currently working on a refactoring of
the YARN resource manager (to allow to easy addition of other frameworks)
and have addressed this as part of that. Will be in the master in a bit.

Concerning (2) Do you know which component in Flink uses the HTTP client?

Greetings,
Stephan


On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <maximilian.bode@tngtech.com
> wrote:

> Hi everyone,
>
> Regarding Q1, I believe I have witnessed a comparable phenomenon in a
> (3-node, non-EMR) YARN cluster. After shutting down the yarn session via
> `stop`, one container seems to linger around. `yarn application -list` is
> empty, whereas `bin/yarn-session.sh -q` lists the left-over container.
> Also, there is still one application shown as ‚running‘ in Ambari’s YARN
> pane under current applications. Then, after some time (order of a few
> minutes) it disappears and the resources are available again.
>
> I have not tested this behavior extensibly so far. Noticeably, I was not
> able to reproduce it by just starting a session and then ending it again
> right away without looking at the JobManager web interface. Maybe this
> produces some kind of lag as far as YARN containers are concerned?
>
> Cheers,
> Max
>
> > Am 04.01.2016 um 12:52 schrieb Chiwan Park <ch...@apache.org>:
> >
> > Hi All,
> >
> > I have some problems using Flink on Amazon EMR cluster.
> >
> > Q1. Sometimes, jobmanager container still exists after destroying yarn
> session by pressing Ctrl+C. In that case, Flink YARN app seems exited
> correctly in YARN RM dashboard. But there is a running container in the
> dashboard. From logs of the container, I realize that the container is
> jobmanager.
> >
> > I cannot kill the container because there is no permission to restart
> YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem
> doesn’t appear.
> >
> > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it
> because of version conflict of Apache Httpclient. In default,
> implementation of S3 file system in EMR is
> `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with
> other version of Apache Httpclient.
> >
> > As I wrote above, I cannot restart Hadoop cluster after modifying
> conf-site.xml because of lack of permission. How can I solve this problem?
> >
> > Regards,
> > Chiwan Park
> >
> >
>
>

Re: Flink on EMR Question

Posted by Maximilian Bode <ma...@tngtech.com>.

Hi everyone,

Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over container. Also, there is still one application shown as ‚running‘ in Ambari’s YARN pane under current applications. Then, after some time (order of a few minutes) it disappears and the resources are available again.

I have not tested this behavior extensibly so far. Noticeably, I was not able to reproduce it by just starting a session and then ending it again right away without looking at the JobManager web interface. Maybe this produces some kind of lag as far as YARN containers are concerned?

Cheers,
Max

> Am 04.01.2016 um 12:52 schrieb Chiwan Park <ch...@apache.org>:
> 
> Hi All,
> 
> I have some problems using Flink on Amazon EMR cluster.
> 
> Q1. Sometimes, jobmanager container still exists after destroying yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited correctly in YARN RM dashboard. But there is a running container in the dashboard. From logs of the container, I realize that the container is jobmanager.
> 
> I cannot kill the container because there is no permission to restart YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem doesn’t appear.
> 
> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it because of version conflict of Apache Httpclient. In default, implementation of S3 file system in EMR is `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with other version of Apache Httpclient.
> 
> As I wrote above, I cannot restart Hadoop cluster after modifying conf-site.xml because of lack of permission. How can I solve this problem?
> 
> Regards,
> Chiwan Park
> 
>