You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by aj <aj...@gmail.com> on 2020/10/16 18:15:14 UTC

Re: Flink 1.11 TaskManagerLogFileHandler -Exception

Hi Fanbin,

You do not have to copy log4j.properties to log4j2.properties.

log4j2 has a different format so change ur log4j.properties file so that it
uses a log4j2 formatting style.

Please replace ur log4j.properties with this default setting for reference:

https://github.com/apache/flink/blob/release-1.11/flink-dist/src/main/flink-bin/conf/log4j.properties

On Fri, Oct 16, 2020 at 2:50 AM Fanbin Bu <fa...@coinbase.com> wrote:

> Hi AJ,
>
> How did you solve the issue? I ran into this and tried the following but
> didnt work.
>
> cp conf/log4j.properties conf/log4j2.properties
>
> Thanks,
> Fanbin
>
> On Mon, Sep 7, 2020 at 11:15 AM aj <aj...@gmail.com> wrote:
>
>> Thanks for all the help .I have got the issue and it got resolved now. It
>> is the issue with EMR cluster .
>>
>> 1. lib folder have jar corresponding to log4j 2 but  log4j.properties
>> file is corresponding to log4j 1.
>> 2. They have missing plugins folders.
>>
>> I am not sure how these got changes in EMR cluster flink 1.11 version. I
>> come to know when i downloaded flink 1.11 on my local and compare it.
>> I am not sure what they have done but it really i was not expecting .
>>
>>
>>
>> On Mon, Sep 7, 2020 at 5:27 PM aj <aj...@gmail.com> wrote:
>>
>>> @Yun Tang <my...@live.com>
>>>
>>> i have checked launch_container.sh and it is using flink default
>>> log4j.properties file.
>>>
>>>
>>> *Jobmanager.out only printing this even i enable DEBUG LEVEL: *
>>>
>>> WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
>>> will impact performance.
>>> 11:49:58.216 [flink-akka.actor.default-dispatcher-15] ERROR
>>> org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler
>>> - Failed to transfer file from TaskExecutor
>>> container_1599221902457_0014_01_000002.
>>> java.util.concurrent.CompletionException:
>>> org.apache.flink.util.FlinkException: The file LOG does not exist on the
>>> TaskExecutor.
>>>         at
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$25(TaskExecutor.java:1742)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> ~[?:?]
>>>         at java.lang.Thread.run(Thread.java:834) ~[?:?]
>>> Caused by: org.apache.flink.util.FlinkException: The file LOG does not
>>> exist on the TaskExecutor.
>>>         ... 5 more
>>> 11:49:58.226 [flink-akka.actor.default-dispatcher-15] ERROR
>>> org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler
>>> - Unhandled exception.
>>> org.apache.flink.util.FlinkException: The file LOG does not exist on the
>>> TaskExecutor.
>>>         at
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$25(TaskExecutor.java:1742)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> ~[?:?]
>>>         at java.lang.Thread.run(Thread.java:834) ~[?:?]
>>> 11:50:17.256 [flink-akka.actor.default-dispatcher-16] ERROR
>>> org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler
>>> - Failed to transfer file from TaskExecutor
>>> container_1599221902457_0014_01_000002.
>>> java.util.concurrent.CompletionException:
>>> org.apache.flink.util.FlinkException: The file LOG does not exist on the
>>> TaskExecutor.
>>>         at
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$25(TaskExecutor.java:1742)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> ~[?:?]
>>>         at java.lang.Thread.run(Thread.java:834) ~[?:?]
>>> Caused by: org.apache.flink.util.FlinkException: The file LOG does not
>>> exist on the TaskExecutor.
>>>         ... 5 more
>>> 11:50:17.257 [flink-akka.actor.default-dispatcher-16] ERROR
>>> org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler
>>> - Unhandled exception.
>>> org.apache.flink.util.FlinkException: The file LOG does not exist on the
>>> TaskExecutor.
>>>         at
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$25(TaskExecutor.java:1742)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> ~[?:?]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> ~[?:?]
>>>         at java.lang.Thread.run(Thread.java:834) ~[?:?]
>>>
>>>
>>> *Another thing is my prometheus reporter is also giving error :*
>>>
>>> WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
>>> will impact performance.
>>> 11:46:04.146 [main] ERROR org.apache.flink.runtime.metrics.ReporterSetup
>>> - Could not instantiate metrics reporter prom. Metrics might not be
>>> exposed/reported.
>>> java.lang.RuntimeException: Could not start PrometheusReporter HTTP
>>> server on any configured port. Ports: 9250
>>>         at
>>> org.apache.flink.metrics.prometheus.PrometheusReporter.open(PrometheusReporter.java:74)
>>> ~[flink-metrics-prometheus_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.runtime.metrics.ReporterSetup.createReporterSetup(ReporterSetup.java:130)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.runtime.metrics.ReporterSetup.lambda$setupReporters$1(ReporterSetup.java:239)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at java.util.Optional.ifPresent(Optional.java:183) ~[?:?]
>>>         at
>>> org.apache.flink.runtime.metrics.ReporterSetup.setupReporters(ReporterSetup.java:236)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.runtime.metrics.ReporterSetup.fromConfiguration(ReporterSetup.java:148)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:143)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:305)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.yarn.YarnTaskExecutorRunner.lambda$runTaskManagerSecurely$0(YarnTaskExecutorRunner.java:107)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>> ~[?:?]
>>>         at javax.security.auth.Subject.doAs(Subject.java:423) ~[?:?]
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>>> ~[hadoop-common-3.2.1-amzn-1.jar:?]
>>>         at
>>> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.yarn.YarnTaskExecutorRunner.runTaskManagerSecurely(YarnTaskExecutorRunner.java:106)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>         at
>>> org.apache.flink.yarn.YarnTaskExecutorRunner.main(YarnTaskExecutorRunner.java:77)
>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>
>>> even i can see the node is listening on the port
>>> netstat -l | grep "9250"
>>> tcp6       0      0 [::]:9250               [::]:*
>>>  LISTEN
>>>
>>>
>>> *And also when i have run this,  i am getting output of metrics.  :  *
>>> * curl http://ip-172-20-5-234.ap-south-1.compute.internal:9250/metrics
>>> <http://ip-172-20-5-234.ap-south-1.compute.internal:9250/metrics>*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *flink_jobmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count{host="ip_172_20_5_234_ap_south_1_compute_internal",}
>>> 0.0# HELP flink_jobmanager_Status_JVM_Memory_Mapped_Count Count (scope:
>>> jobmanager_Status_JVM_Memory_Mapped)# TYPE
>>> flink_jobmanager_Status_JVM_Memory_Mapped_Count
>>> gaugeflink_jobmanager_Status_JVM_Memory_Mapped_Count{host="ip_172_20_5_234_ap_south_1_compute_internal",}
>>> 0.0# HELP flink_jobmanager_Status_JVM_Memory_Heap_Committed Committed
>>> (scope: jobmanager_Status_JVM_Memory_Heap)# TYPE
>>> flink_jobmanager_Status_JVM_Memory_Heap_Committed
>>> gaugeflink_jobmanager_Status_JVM_Memory_Heap_Committed{host="ip_172_20_5_234_ap_south_1_compute_internal",}
>>> 4.69762048E8# HELP flink_jobmanager_job_numberOfInProgressCheckpoints
>>> numberOfInProgressCheckpoints (scope: jobmanager_job)# TYPE
>>> flink_jobmanager_job_numberOfInProgressCheckpoints
>>> gaugeflink_jobmanager_job_numberOfInProgressCheckpoints{job_id="f7e1d9f5ee7637142489b89fbd4381ad",host="ip_172_20_5_234_ap_south_1_compute_internal",job_name="test",}
>>> 0.0# HELP flink_jobmanager_Status_JVM_Memory_NonHeap_Max Max (scope:
>>> jobmanager_Status_JVM_Memory_NonHeap)# TYPE
>>> flink_jobmanager_Status_JVM_Memory_NonHeap_Max
>>> gaugeflink_jobmanager_Status_JVM_Memory_NonHeap_Max{host="ip_172_20_5_234_ap_south_1_compute_internal",}
>>> 7.80140544E8# HELP flink_jobmanager_job_restartingTime restartingTime
>>> (scope: jobmanager_job)# TYPE flink_jobmanager_job_restartingTime
>>> gaugeflink_jobmanager_job_restartingTime{job_id="f7e1d9f5ee7637142489b89fbd4381ad",host="ip_172_20_5_234_ap_south_1_compute_internal",job_name="test",}
>>> 0.0*
>>>
>>> So did not understand why this error is coming in task manager logs.
>>>
>>>
>>>
>>> *@Xintong*
>>>
>>> Could you check whether your jar files contain any log4j properties
>>> files? Accidentally including other log4j properties files in the classpath
>>> is one of the common causes of log issues.
>>>
>>> *I do not think this is happening because i am using same jar that was
>>> working on Flink 1.10 *
>>>
>>>
>>> On Mon, Sep 7, 2020 at 12:02 PM Xintong Song <to...@gmail.com>
>>> wrote:
>>>
>>>> Hi Anuj,
>>>>
>>>> Could you check whether your jar files contain any log4j properties
>>>> files? Accidentally including other log4j properties files in the classpath
>>>> is one of the common causes of log issues.
>>>>
>>>> Thank you~
>>>>
>>>> Xintong Song
>>>>
>>>>
>>>>
>>>> On Mon, Sep 7, 2020 at 12:36 PM Yun Tang <my...@live.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> You could check the 'launch_container.sh' to see where the
>>>>> log4j.properties locates to see what inside log4j.properties and you could
>>>>> also check 'jobmanager.out' or 'taskmanager.out' to see why logs are not
>>>>> crated.
>>>>> ------------------------------
>>>>> *From:* aj <aj...@gmail.com>
>>>>> *Sent:* Sunday, September 6, 2020 2:35
>>>>> *To:* Yun Tang <my...@live.com>
>>>>> *Cc:* user <us...@flink.apache.org>
>>>>> *Subject:* Re: Flink 1.11 TaskManagerLogFileHandler -Exception
>>>>>
>>>>> I am not able to understand what is happening. Please help me to
>>>>> resolve this issue so that i can migrate to flink 1.11
>>>>>
>>>>> On Sat, Sep 5, 2020 at 12:41 AM aj <aj...@gmail.com> wrote:
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>> the location is correct but these log files are not getting generated.
>>>>>
>>>>> On Fri, Sep 4, 2020 at 11:00 PM Yun Tang <my...@live.com> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> Could you check the log4j.properties or related conf file used to
>>>>> generate logs to see anything unpected?
>>>>> You could also login the machine and use command 'ps -ef | grep java'
>>>>> to grep the command to run taskmanager, as Flink would print the place
>>>>> where 'taskmanager.log' locates [1] via property {log.file}.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/flink/blob/6b9cdd41743edd24a929074d62a57b84e7b2dd97/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/BootstrapTools.java#L419
>>>>>
>>>>> Best,
>>>>> Yun Tang
>>>>> ------------------------------
>>>>> *From:* aj <aj...@gmail.com>
>>>>> *Sent:* Friday, September 4, 2020 21:20
>>>>> *To:* user <us...@flink.apache.org>
>>>>> *Subject:* Re: Flink 1.11 TaskManagerLogFileHandler -Exception
>>>>>
>>>>> I have also checked for port and all the ports from 0-65535 are open.
>>>>> Even I do not see any taskmanager.log is getting generated under my
>>>>> container logs on the task machine.
>>>>>
>>>>> On Fri, Sep 4, 2020 at 2:58 PM aj <aj...@gmail.com> wrote:
>>>>>
>>>>>
>>>>> I am trying to switch to Flink 1.11 with the new EMR release 6.1. I
>>>>> have created 3 nodes EMR cluster with Flink 1.11. When I am running my job
>>>>> its working fine only issue is I am not able to see any logs in the job
>>>>> manager and task manager. I am seeing below exception in stdout of job
>>>>> manager
>>>>>
>>>>> 09:21:36.761 [flink-akka.actor.default-dispatcher-17] ERROR
>>>>> org.apache.flink.runtime.rest.handler.taskmanager.
>>>>> TaskManagerLogFileHandler - Failed to transfer file from TaskExecutor
>>>>> container_1599202660950_0009_01_000003.
>>>>> java.util.concurrent.CompletionException: org.apache.flink.util.
>>>>> FlinkException: The file LOG does not exist on the TaskExecutor.
>>>>>     at org.apache.flink.runtime.taskexecutor.TaskExecutor
>>>>> .lambda$requestFileUploadByFilePath$25(TaskExecutor.java:1742)
>>>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>>>     at java.util.concurrent.CompletableFuture$AsyncSupply.run(
>>>>> CompletableFuture.java:1604) ~[?:1.8.0_252]
>>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>> ThreadPoolExecutor.java:1149) ~[?:1.8.0_252]
>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>> ThreadPoolExecutor.java:624) ~[?:1.8.0_252]
>>>>>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]
>>>>> Caused by: org.apache.flink.util.FlinkException: The file LOG does not
>>>>> exist on the TaskExecutor.
>>>>>     ... 5 more
>>>>> 09:21:36.773 [flink-akka.actor.default-dispatcher-17] ERROR
>>>>> org.apache.flink.runtime.rest.handler.taskmanager.
>>>>> TaskManagerLogFileHandler - Unhandled exception.
>>>>> org.apache.flink.util.FlinkException: The file LOG does not exist on
>>>>> the TaskExecutor.
>>>>>     at org.apache.flink.runtime.taskexecutor.TaskExecutor
>>>>> .lambda$requestFileUploadByFilePath$25(TaskExecutor.java:1742)
>>>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0]
>>>>>     at java.util.concurrent.CompletableFuture$AsyncSupply.run(
>>>>> CompletableFuture.java:1604) ~[?:1.8.0_252]
>>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>> ThreadPoolExecutor.java:1149) ~[?:1.8.0_252]
>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>> ThreadPoolExecutor.java:624) ~[?:1.8.0_252]
>>>>>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]
>>>>> 09:21:57.399 [flink-akka.actor.default-dispatcher-15] ERROR
>>>>> org.apache.flink.runtime.rest.handler.taskmanager.
>>>>> TaskManagerLogFileHandler - Failed to transfer file from TaskExecutor
>>>>> container_1599202660950_0009_01_000003.
>>>>> TaskManagerLogFileHandler - Unhandled exception.
>>>>> org.apache.flink.util.FlinkException
>>>>>
>>>>> Is Any new setting I have to do in flink 1.11 as the same job is
>>>>> working fine on the previous version of EMR with Flink 1.10
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anuj Jain
>>>>>
>>>>>
>>>>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anuj Jain
>>>>> Mob. : +91- 8588817877
>>>>> Skype : anuj.jain07
>>>>> <http://www.oracle.com/>
>>>>>
>>>>>
>>>>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anuj Jain
>>>>> Mob. : +91- 8588817877
>>>>> Skype : anuj.jain07
>>>>> <http://www.oracle.com/>
>>>>>
>>>>>
>>>>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anuj Jain
>>>>> Mob. : +91- 8588817877
>>>>> Skype : anuj.jain07
>>>>> <http://www.oracle.com/>
>>>>>
>>>>>
>>>>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>>>>
>>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Anuj Jain
>>> Mob. : +91- 8588817877
>>> Skype : anuj.jain07
>>> <http://www.oracle.com/>
>>>
>>>
>>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anuj Jain
>> Mob. : +91- 8588817877
>> Skype : anuj.jain07
>> <http://www.oracle.com/>
>>
>>
>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>
>

-- 
Thanks & Regards,
Anuj Jain
Mob. : +91- 8588817877
Skype : anuj.jain07
<http://www.oracle.com/>


<http://www.cse.iitm.ac.in/%7Eanujjain/>