You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Navneeth Krishnan <re...@gmail.com> on 2017/09/26 05:14:03 UTC

Flink on EMR

Hello All,

I'm trying to deploy flink on AWS EMR and I'm very new to EMR. I'm running
into multiple issues and need some help.

*Issue1:*

How did others resolve this multiple bindings issue?

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/11/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/12/location-compute-1.0-SNAPSHOT-all.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

*Issue2:*

Running the below command runs the pipeline but the task manager is
allocated with only 5GB memory instead of 8GB memory. Any reason why?
flink run -m yarn-cluster -yn 4 -yjm 2048 -ytm 8192 ./my-pipeline.jar

*Issue3:*

How to provide the checkpoint directory? By just providing this
"hdfs:///checkpoints/" will it work or should I provide any master
node host name?

*Issue 4:*

How can I get the task manager logs? Should I use log aggregation in
hadoop yarn or send it to cloud watch?

Also if there any best practices to be used while running flink on
yarn, please let me know.

Thanks a lot.

Regards,

Navneeth

Re: Flink on EMR

Posted by Navneeth Krishnan <re...@gmail.com>.

Hi,

I’m using the default flink package that comes with EMR. I’m facing the
issue while running my pipeline. Thanks.

On Mon, Sep 25, 2017 at 11:09 PM Jörn Franke <jo...@gmail.com> wrote:

> Amazon EMR has already a Flink package. You just need to check the
> checkbox. I would not install it on your own.
> I think you can find it in the advanced options.
>
> On 26. Sep 2017, at 07:14, Navneeth Krishnan <re...@gmail.com>
> wrote:
>
> Hello All,
>
> I'm trying to deploy flink on AWS EMR and I'm very new to EMR. I'm running
> into multiple issues and need some help.
>
> *Issue1:*
>
> How did others resolve this multiple bindings issue?
>
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/11/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/12/location-compute-1.0-SNAPSHOT-all.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
>
> *Issue2:*
>
> Running the below command runs the pipeline but the task manager is allocated with only 5GB memory instead of 8GB memory. Any reason why?
> flink run -m yarn-cluster -yn 4 -yjm 2048 -ytm 8192 ./my-pipeline.jar
>
>
> *Issue3:*
>
> How to provide the checkpoint directory? By just providing this "hdfs:///checkpoints/" will it work or should I provide any master node host name?
>
>
> *Issue 4:*
>
> How can I get the task manager logs? Should I use log aggregation in hadoop yarn or send it to cloud watch?
>
>
> Also if there any best practices to be used while running flink on yarn, please let me know.
>
>
> Thanks a lot.
>
>
> Regards,
>
> Navneeth
>
>

Re: Flink on EMR

Posted by Jörn Franke <jo...@gmail.com>.

Amazon EMR has already a Flink package. You just need to check the checkbox. I would not install it on your own. 
I think you can find it in the advanced options.

> On 26. Sep 2017, at 07:14, Navneeth Krishnan <re...@gmail.com> wrote:
> 
> Hello All,
> 
> I'm trying to deploy flink on AWS EMR and I'm very new to EMR. I'm running into multiple issues and need some help.
> 
> Issue1:
> How did others resolve this multiple bindings issue?
> 
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/11/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/12/location-compute-1.0-SNAPSHOT-all.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 
> Issue2:
> Running the below command runs the pipeline but the task manager is allocated with only 5GB memory instead of 8GB memory. Any reason why?
> flink run -m yarn-cluster -yn 4 -yjm 2048 -ytm 8192 ./my-pipeline.jar
> 
> Issue3:
> How to provide the checkpoint directory? By just providing this "hdfs:///checkpoints/" will it work or should I provide any master node host name?
> 
> Issue 4:
> How can I get the task manager logs? Should I use log aggregation in hadoop yarn or send it to cloud watch?
> 
> Also if there any best practices to be used while running flink on yarn, please let me know.
> 
> Thanks a lot.
> 
> Regards,
> Navneeth

Re: Flink on EMR

Posted by Stefan Richter <s....@data-artisans.com>.

Hi,

for issue 1, you could delete the slf4j jar from Flink’s lib folder, but I wonder if this producing any problems even with the warning? 

For issue 2, my question is where you found that only 5GB have been allocated? Did you consider that Flink only allocates a fraction of the memory for heap and another fraction for off-heap memory? This can be influenced with the memory fraction parameter.

About issue 3, I think this should work without providing the host name.

Issue 4 is a matter of taste, if cloudwatch is some log aggregation service, it might be easier for you to use something like that.

Best,
Stefan

> Am 27.09.2017 um 06:57 schrieb Navneeth Krishnan <re...@gmail.com>:
> 
> Hi All,
> 
> Any suggestions?
> 
> Thanks.
> 
> On Mon, Sep 25, 2017 at 10:14 PM, Navneeth Krishnan <reachnavneeth2@gmail.com <ma...@gmail.com>> wrote:
> Hello All,
> 
> I'm trying to deploy flink on AWS EMR and I'm very new to EMR. I'm running into multiple issues and need some help.
> 
> Issue1:
> How did others resolve this multiple bindings issue?
> 
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/11/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/12/location-compute-1.0-SNAPSHOT-all.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 
> Issue2:
> Running the below command runs the pipeline but the task manager is allocated with only 5GB memory instead of 8GB memory. Any reason why?
> flink run -m yarn-cluster -yn 4 -yjm 2048 -ytm 8192 ./my-pipeline.jar
> 
> Issue3:
> How to provide the checkpoint directory? By just providing this "hdfs:///checkpoints/" will it work or should I provide any master node host name?
> 
> Issue 4:
> How can I get the task manager logs? Should I use log aggregation in hadoop yarn or send it to cloud watch?
> 
> Also if there any best practices to be used while running flink on yarn, please let me know.
> 
> Thanks a lot.
> 
> Regards,
> Navneeth
>

Re: Flink on EMR

Posted by Navneeth Krishnan <re...@gmail.com>.

Hi All,

Any suggestions?

Thanks.

On Mon, Sep 25, 2017 at 10:14 PM, Navneeth Krishnan <
reachnavneeth2@gmail.com> wrote:

> Hello All,
>
> I'm trying to deploy flink on AWS EMR and I'm very new to EMR. I'm running
> into multiple issues and need some help.
>
> *Issue1:*
>
> How did others resolve this multiple bindings issue?
>
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/11/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_1505848894978_0007/filecache/12/location-compute-1.0-SNAPSHOT-all.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
>
> *Issue2:*
>
> Running the below command runs the pipeline but the task manager is allocated with only 5GB memory instead of 8GB memory. Any reason why?
> flink run -m yarn-cluster -yn 4 -yjm 2048 -ytm 8192 ./my-pipeline.jar
>
>
> *Issue3:*
>
> How to provide the checkpoint directory? By just providing this "hdfs:///checkpoints/" will it work or should I provide any master node host name?
>
>
> *Issue 4:*
>
> How can I get the task manager logs? Should I use log aggregation in hadoop yarn or send it to cloud watch?
>
>
> Also if there any best practices to be used while running flink on yarn, please let me know.
>
>
> Thanks a lot.
>
>
> Regards,
>
> Navneeth
>
>