You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2019/10/01 07:29:29 UTC

thrift.transport.TTransportException

Dear Zeppelin community,

I would like to ask for advice in regards an error I am having with thrift.

I am getting quite a lot of these errors while running my notebooks

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

And this is the Spark driver application logs:
...
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
    SPARK_YARN_STAGING_DIR -> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
    SPARK_USER -> mansop
    PYTHONPATH -> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip

  command:
    LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \
      {{JAVA_HOME}}/bin/java \
      -server \
      -Xmx1024m \
      '-XX:+UseNUMA' \
      -Djava.io.tmpdir={{PWD}}/tmp \
      '-Dspark.history.ui.port=18081' \
      -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
      -XX:OnOutOfMemoryError='kill %p' \
      org.apache.spark.executor.CoarseGrainedExecutorBackend \
      --driver-url \
      spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
      --executor-id \
      <executorId> \
      --hostname \
      <hostname> \
      --cores \
      1 \
      --app-id \
      application_1568954689585_0052 \
      --user-class-path \
      file:$PWD/__app__.jar \
      1><LOG_DIR>/stdout \
      2><LOG_DIR>/stderr

  resources:
    __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar" } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
    __spark_conf__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip" } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
    sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" } size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
    log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties" } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
    pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" } size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
    __spark_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
    py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip" } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
    __hive_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size: 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC

===============================================================================
INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133) - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030
INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) - Registering the ApplicationMaster
INFO [2019-09-30 10:42:37,454] ({main} Configuration.java[getConfResourceAsInputStream]:2756) - found resource resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14} Logging.scala[logInfo]:54) - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) - Submitted 2 unlocalized container requests.
INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000002 on host r640-1-12-mlx.mlx for executor with ID 1
INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000003 on host r640-1-13-mlx.mlx for executor with ID 2
INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) - Received 2 containers from YARN, launching executors on 2 of them.
INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1
INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62} Logging.scala[logInfo]:54) - Registering block manager r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1, r640-1-12-mlx.mlx, 33043, None)
INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2
INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2} Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2} Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11} Logging.scala[logInfo]:54) - Registering block manager r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2, r640-1-13-mlx.mlx, 34105, None)
INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2} SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2} Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at 127.0.0.1:36897
INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2} PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - pythonExec: /home/mansop/anaconda2/bin/python
INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2} PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH: /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0} AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopped Spark web UI at http://r640-1-12-mlx.mlx:42446
INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52} Logging.scala[logInfo]:54) - Driver requested a total number of 0 executor(s).
INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutting down all executors
INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Asking each executor to shut down
INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
               at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
               at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
               at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
               at com.sun.proxy.$Proxy21.allocate(Unknown Source)
               at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
               at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
               at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
               at org.apache.hadoop.ipc.Client.call(Client.java:1443)
               at org.apache.hadoop.ipc.Client.call(Client.java:1353)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
               at com.sun.proxy.$Proxy20.allocate(Unknown Source)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
               ... 13 more
INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) - Final app status: FAILED, exitCode: 12, (reason: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
)
INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54} Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0} Logging.scala[logInfo]:54) - MemoryStore cleared
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManager stopped
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManagerMaster stopped
INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73} Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Successfully stopped SparkContext
INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutdown hook called
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086

How can I continue troubleshooting in order to find out what this error means?

Thank you very much

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: thrift.transport.TTransportException

Posted by James Srinivasan <ja...@gmail.com>.
I'm guessing you might have conflicting versions of libthrift on your
classpath

On Tue, 1 Oct 2019, 08:44 Jeff Zhang, <zj...@gmail.com> wrote:

> It looks like you are using pyspark, could you try just start scala spark
> interpreter via `%spark` ? First let's figure out whether it is related
> with pyspark.
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月1日周二
> 下午3:29写道:
>
>> Dear Zeppelin community,
>>
>>
>>
>> I would like to ask for advice in regards an error I am having with
>> thrift.
>>
>>
>>
>> I am getting quite a lot of these errors while running my notebooks
>>
>>
>>
>> org.apache.thrift.transport.TTransportException at
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at
>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
>> at
>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at
>> org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> And this is the Spark driver application logs:
>>
>> …
>>
>>
>> ===============================================================================
>>
>> YARN executor launch context:
>>
>>   env:
>>
>>     CLASSPATH ->
>> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
>>
>>     SPARK_YARN_STAGING_DIR ->
>> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
>>
>>     SPARK_USER -> mansop
>>
>>     PYTHONPATH ->
>> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip
>>
>>
>>
>>   command:
>>
>>
>> LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH"
>> \
>>
>>       {{JAVA_HOME}}/bin/java \
>>
>>       -server \
>>
>>       -Xmx1024m \
>>
>>       '-XX:+UseNUMA' \
>>
>>       -Djava.io.tmpdir={{PWD}}/tmp \
>>
>>       '-Dspark.history.ui.port=18081' \
>>
>>       -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
>>
>>       -XX:OnOutOfMemoryError='kill %p' \
>>
>>       org.apache.spark.executor.CoarseGrainedExecutorBackend \
>>
>>       --driver-url \
>>
>>       spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
>>
>>       --executor-id \
>>
>>       <executorId> \
>>
>>       --hostname \
>>
>>       <hostname> \
>>
>>       --cores \
>>
>>       1 \
>>
>>       --app-id \
>>
>>       application_1568954689585_0052 \
>>
>>       --user-class-path \
>>
>>       file:$PWD/__app__.jar \
>>
>>       1><LOG_DIR>/stdout \
>>
>>       2><LOG_DIR>/stderr
>>
>>
>>
>>   resources:
>>
>>     __app__.jar -> resource { scheme: "hdfs" host:
>> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
>> "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar"
>> } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
>>
>>     __spark_conf__ -> resource { scheme: "hdfs" host:
>> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
>> "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip"
>> } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
>>
>>     sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
>> port: 8020 file:
>> "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" }
>> size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
>>
>>     log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host:
>> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
>> "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties"
>> } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
>>
>>     pyspark.zip -> resource { scheme: "hdfs" host:
>> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
>> "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" }
>> size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
>>
>>     __spark_libs__ -> resource { scheme: "hdfs" host:
>> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
>> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size:
>> 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
>>
>>     py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host:
>> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
>> "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip"
>> } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
>>
>>     __hive_libs__ -> resource { scheme: "hdfs" host:
>> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
>> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size:
>> 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC
>>
>>
>>
>>
>> ===============================================================================
>>
>> INFO [2019-09-30 10:42:37,303] ({main}
>> RMProxy.java[newProxyInstance]:133) - Connecting to ResourceManager at
>> gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030
>>
>> INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) -
>> Registering the ApplicationMaster
>>
>> INFO [2019-09-30 10:42:37,454] ({main}
>> Configuration.java[getConfResourceAsInputStream]:2756) - found resource
>> resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
>>
>> INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will
>> request 2 executor container(s), each with 1 core(s) and 1408 MB memory
>> (including 384 MB of overhead)
>>
>> INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14}
>> Logging.scala[logInfo]:54) - ApplicationMaster registered as
>> NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
>>
>> INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) -
>> Submitted 2 unlocalized container requests.
>>
>> INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) -
>> Started progress reporter thread with (heartbeat : 3000, initial allocation
>> : 200) intervals
>>
>> INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) -
>> Launching container container_e01_1568954689585_0052_01_000002 on host
>> r640-1-12-mlx.mlx for executor with ID 1
>>
>> INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) -
>> Launching container container_e01_1568954689585_0052_01_000003 on host
>> r640-1-13-mlx.mlx for executor with ID 2
>>
>> INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) -
>> Received 2 containers from YARN, launching executors on 2 of them.
>>
>> INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51}
>> Logging.scala[logInfo]:54) - Registered executor
>> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1
>>
>> INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62}
>> Logging.scala[logInfo]:54) - Registering block manager
>> r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1,
>> r640-1-12-mlx.mlx, 33043, None)
>>
>> INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9}
>> Logging.scala[logInfo]:54) - Registered executor
>> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2
>>
>> INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2}
>> Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling
>> beginning after reached minRegisteredResourcesRatio: 0.8
>>
>> INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2}
>> Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
>>
>> INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11}
>> Logging.scala[logInfo]:54) - Registering block manager
>> r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2,
>> r640-1-13-mlx.mlx, 34105, None)
>>
>> INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2}
>> SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
>>
>> INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2}
>> Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at
>> 127.0.0.1:36897
>>
>> INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2}
>> PySparkInterpreter.java[createGatewayServerAndStartScript]:265) -
>> pythonExec: /home/mansop/anaconda2/bin/python
>>
>> INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2}
>> PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH:
>> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
>>
>> ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler}
>> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>>
>> INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
>>
>> INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0}
>> AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3
>> {HTTP/1.1,[http/1.1]}{0.0.0.0:0}
>>
>> INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Stopped Spark web UI at
>> http://r640-1-12-mlx.mlx:42446
>>
>> INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52}
>> Logging.scala[logInfo]:54) - Driver requested a total number of 0
>> executor(s).
>>
>> INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Shutting down all executors
>>
>> INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51}
>> Logging.scala[logInfo]:54) - Asking each executor to shut down
>>
>> INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
>>
>> (serviceOption=None,
>>
>> services=List(),
>>
>> started=false)
>>
>> ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) -
>> Exception from Reporter thread.
>>
>> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
>> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
>> ApplicationMasterService cache.
>>
>>                at
>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>>
>>                at
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>>
>>                at
>> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>>
>>                at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>>
>>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>>
>>                at
>> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>>
>>                at
>> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>>
>>                at java.security.AccessController.doPrivileged(Native
>> Method)
>>
>>                at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>                at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>>
>>                at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>>
>>
>>
>>                at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>
>>                at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>
>>                at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>
>>                at
>> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>
>>                at
>> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>>
>>                at
>> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>>
>>                at
>> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>>
>>                at
>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>>
>>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>
>>                at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>>                at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>                at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>                at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>>
>>                at
>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>>
>>                at
>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>>
>>                at
>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>>
>>                at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>>
>>                at com.sun.proxy.$Proxy21.allocate(Unknown Source)
>>
>>                at
>> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
>>
>>                at
>> org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
>>
>>                at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
>>
>> Caused by:
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
>> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
>> ApplicationMasterService cache.
>>
>>                at
>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>>
>>                at
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>>
>>                at
>> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>>
>>                at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>>
>>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>>
>>                at
>> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>>
>>                at
>> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>>
>>                at java.security.AccessController.doPrivileged(Native
>> Method)
>>
>>                at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>                at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>>
>>                at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>>
>>
>>
>>                at
>> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>>
>>                at org.apache.hadoop.ipc.Client.call(Client.java:1443)
>>
>>                at org.apache.hadoop.ipc.Client.call(Client.java:1353)
>>
>>                at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>>
>>                at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>>
>>                at com.sun.proxy.$Proxy20.allocate(Unknown Source)
>>
>>                at
>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>>
>>                ... 13 more
>>
>> INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) -
>> Final app status: FAILED, exitCode: 12, (reason: Application attempt
>> appattempt_1568954689585_0052_000001 doesn't exist in
>> ApplicationMasterService cache.
>>
>>                at
>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>>
>>                at
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>>
>>                at
>> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>>
>>                at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>>
>>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>>
>>                at
>> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>>
>>                at
>> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>>
>>                at java.security.AccessController.doPrivileged(Native
>> Method)
>>
>>                at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>                at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>>
>>                at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>>
>> )
>>
>> INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54}
>> Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
>>
>> INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - MemoryStore cleared
>>
>> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - BlockManager stopped
>>
>> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - BlockManagerMaster stopped
>>
>> INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73}
>> Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
>>
>> INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Successfully stopped SparkContext
>>
>> INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Shutdown hook called
>>
>> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Deleting directory
>> /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
>>
>> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Deleting directory
>> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
>>
>> INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0}
>> Logging.scala[logInfo]:54) - Deleting directory
>> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086
>>
>>
>>
>> How can I continue troubleshooting in order to find out what this error
>> means?
>>
>>
>>
>> Thank you very much
>>
>>
>> NOTICE
>> Please consider the environment before printing this email. This message
>> and any attachments are intended for the addressee named and may contain
>> legally privileged/confidential/copyright information. If you are not the
>> intended recipient, you should not read, use, disclose, copy or distribute
>> this communication. If you have received this message in error please
>> notify us at once by return email and then delete both messages. We accept
>> no liability for the distribution of viruses or similar in electronic
>> communications. This notice should not be removed.
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: thrift.transport.TTransportException

Posted by Jeff Zhang <zj...@gmail.com>.
Another thing you can do is looking at the yarn web ui or resource manager
log. It is possible that yarn killed your driver because of your usage of
memory is out of limitation.

The following line of code seems consume large amount of memory.

aList = []

for i in range(1000):

    aList.append(i**i*a)



Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月9日周三 上午11:58写道:

> Got it,
>
>
>
> But I still can’t see why interpreter fails, logs below:
>
>
>
> DEBUG [2019-10-09 14:48:02,193] ({pool-6-thread-2}
> Interpreter.java[getProperty]:222) - key:
> zeppelin.PySparkInterpreter.precode, value: null
>
> DEBUG [2019-10-09 14:48:02,195] ({pool-6-thread-2}
> RemoteInterpreterServer.java[jobRun]:632) - Script after hooks: a =
> "bigword"
>
> aList = []
>
> for i in range(1000):
>
>     aList.append(i**i*a)
>
> #print aList
>
>
>
> for word in aList:
>
>     print word
>
> __zeppelin__._displayhook()
>
> DEBUG [2019-10-09 14:48:02,195] ({pool-6-thread-2}
> RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event:
> RemoteInterpreterEvent(type:META_INFOS, data:{"message":"Spark UI
> enabled","url":"http://r640-1-10-mlx.mlx:36423"})
>
> DEBUG [2019-10-09 14:48:02,195] ({pool-5-thread-2}
> RemoteInterpreterEventClient.java[pollEvent]:366) - Send event META_INFOS
>
> DEBUG [2019-10-09 14:48:04,720] ({Thread-33}
> RemoteInterpreterServer.java[onAppend]:789) - Output Append:
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1570490897819_0016/container_e05_1570490897819_0016_01_000001/tmp/zeppelin_pyspark-1580515882697345087.py:179:
> UserWarning: Unable to load inline matplotlib backend, falling back to Agg
>
>
>
> DEBUG [2019-10-09 14:48:04,722] ({Thread-33}
> RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event:
> RemoteInterpreterEvent(type:OUTPUT_APPEND,
> data:{"data":"/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1570490897819_0016/container_e05_1570490897819_0016_01_000001/tmp/zeppelin_pyspark-1580515882697345087.py:179:
> UserWarning: Unable to load inline matplotlib backend, falling back to
> Agg\n","index":"0","noteId":"2ENM9X82N","paragraphId":"20190926-163159_1153559848"})
>
> DEBUG [2019-10-09 14:48:04,722] ({Thread-33}
> RemoteInterpreterServer.java[onAppend]:789) - Output Append:
> warnings.warn("Unable to load inline matplotlib backend, "
>
>
>
> DEBUG [2019-10-09 14:48:04,722] ({pool-5-thread-2}
> RemoteInterpreterEventClient.java[pollEvent]:366) - Send event OUTPUT_APPEND
>
> DEBUG [2019-10-09 14:48:04,722] ({Thread-33}
> RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event:
> RemoteInterpreterEvent(type:OUTPUT_APPEND, data:{"data":"
> warnings.warn(\"Unable to load inline matplotlib backend,
> \"\n","index":"0","noteId":"2ENM9X82N","paragraphId":"20190926-163159_1153559848"})
>
> DEBUG [2019-10-09 14:48:04,723] ({pool-5-thread-2}
> RemoteInterpreterEventClient.java[pollEvent]:366) - Send event OUTPUT_APPEND
>
> ERROR [2019-10-09 14:48:10,937] ({SIGTERM handler}
> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>
> INFO [2019-10-09 14:48:10,981] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
>
> INFO [2019-10-09 14:48:11,002] ({shutdown-hook-0}
> AbstractConnector.java[doStop]:318) - Stopped Spark@3e8aac20
> {HTTP/1.1,[http/1.1]}{0.0.0.0:0}
>
> INFO [2019-10-09 14:48:11,006] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopped Spark web UI at
> http://r640-1-10-mlx.mlx:36423
>
> INFO [2019-10-09 14:48:11,057] ({dispatcher-event-loop-22}
> Logging.scala[logInfo]:54) - Driver requested a total number of 0
> executor(s).
>
> INFO [2019-10-09 14:48:11,059] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutting down all executors
>
> INFO [2019-10-09 14:48:11,061] ({dispatcher-event-loop-23}
> Logging.scala[logInfo]:54) - Asking each executor to shut down
>
> INFO [2019-10-09 14:48:11,070] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
>
> (serviceOption=None,
>
> services=List(),
>
> started=false)
>
> ERROR [2019-10-09 14:48:11,075] ({Reporter} Logging.scala[logError]:91) -
> Exception from Reporter thread.
>
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1570490897819_0016_000001 doesn't exist in
> ApplicationMasterService cache.
>
>         at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>         at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>         at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>
>         at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>
>         at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>
>         at
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>
>         at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>
>         at com.sun.proxy.$Proxy21.allocate(Unknown Source)
>
>         at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
>
>         at
> org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
>
>         at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
> Application attempt appattempt_1570490897819_0016_000001 doesn't exist in
> ApplicationMasterService cache.
>
>         at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>         at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>         at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1443)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1353)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>
>         at com.sun.proxy.$Proxy20.allocate(Unknown Source)
>
>         at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>
>         ... 13 more
>
> INFO [2019-10-09 14:48:11,084] ({Reporter} Logging.scala[logInfo]:54) -
> Final app status: FAILED, exitCode: 12, (reason: Application attempt
> appattempt_1570490897819_0016_000001 doesn't exist in
> ApplicationMasterService cache.
>
>         at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>         at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>         at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
> )
>
> INFO [2019-10-09 14:48:11,086] ({dispatcher-event-loop-33}
> Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
>
> INFO [2019-10-09 14:48:11,119] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - MemoryStore cleared
>
> INFO [2019-10-09 14:48:11,120] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManager stopped
>
> INFO [2019-10-09 14:48:11,133] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManagerMaster stopped
>
> INFO [2019-10-09 14:48:11,138] ({dispatcher-event-loop-40}
> Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
>
> INFO [2019-10-09 14:48:11,159] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Successfully stopped SparkContext
>
> INFO [2019-10-09 14:48:11,163] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutdown hook called
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Wednesday, October 9, 2019 1:10 PM
> *To:* users
> *Subject:* Re: thrift.transport.TTransportException
>
>
>
> >>> I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the
> ` log4j_yarn_cluster.properties` file but nothing has changed, in fact
> the ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is
> not updated after running my notes
>
>
>
> In yarn cluster mode, you should check yarn app log file instead of the
> local log file.
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月9日周三 上午10:06
> 写道:
>
> Hi Jeff,
>
>
>
> Sorry for the late response.
>
>
>
> I ran yarn-cluster mode with this setup
>
>
>
> %spark2.conf
>
>
>
> master yarn
>
> spark.submit.deployMode cluster
>
> zeppelin.pyspark.python /home/mansop/anaconda2/bin/python
>
> spark.driver.memory 10g
>
>
>
> I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the `
> log4j_yarn_cluster.properties` file but nothing has changed, in fact the
> ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is not
> updated after running my notes
>
>
>
> This code works
>
>
>
> %pyspark
>
>
>
> print("Hello world!")
>
>
>
> However this one does not work:
>
>
>
> %pyspark
>
>
>
> a = "bigword"
>
> aList = []
>
> for i in range(1000):
>
>     aList.append(i**i*a)
>
> #print aList
>
>
>
> for word in aList:
>
>     print word
>
>
>
> which means I am still getting org.apache.thrift.transport.TTransportException
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
>
>
> and spark logs says:
>
> ERROR [2019-10-09 12:15:16,454] ({SIGTERM handler}
> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>
> …
>
> ERROR [2019-10-09 12:15:16,609] ({Reporter} Logging.scala[logError]:91) -
> Exception from Reporter thread.
>
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1570490897819_0013_000001 doesn't exist in
> ApplicationMasterService cache.
>
>
>
> Any idea?
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, October 4, 2019 5:12 PM
> *To:* users
> *Subject:* Re: thrift.transport.TTransportException
>
>
>
> Then it looks like something wrong with the python process. Do you run it
> in yarn-cluster mode or yarn-client mode ?
>
> Try to add the following line to log4j.properties for yarn-client mode or
> log4j_yarn_cluster.properties for yarn-cluster mode
>
>
>
> log4j.logger.org.apache.zeppelin.interpreter=DEBUG
>
>
>
> And try it again, this time you will get more log info, I suspect the
> python process fail to start
>
>
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月4日周五 上午9:09
> 写道:
>
> Sorry for the late response,
>
>
>
> Yes, I have successfully ran few simple scala codes using %spark
> interpreter in zeppelin.
>
>
>
> What should I do next?
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Tuesday, October 1, 2019 5:44 PM
> *To:* users
> *Subject:* Re: thrift.transport.TTransportException
>
>
>
> It looks like you are using pyspark, could you try just start scala spark
> interpreter via `%spark` ? First let's figure out whether it is related
> with pyspark.
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月1日周二 下午3:29
> 写道:
>
> Dear Zeppelin community,
>
>
>
> I would like to ask for advice in regards an error I am having with thrift.
>
>
>
> I am getting quite a lot of these errors while running my notebooks
>
>
>
> org.apache.thrift.transport.TTransportException at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
> at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at
> org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> And this is the Spark driver application logs:
>
> …
>
>
> ===============================================================================
>
> YARN executor launch context:
>
>   env:
>
>     CLASSPATH ->
> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
>
>     SPARK_YARN_STAGING_DIR ->
> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
>
>     SPARK_USER -> mansop
>
>     PYTHONPATH ->
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip
>
>
>
>   command:
>
>
> LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH"
> \
>
>       {{JAVA_HOME}}/bin/java \
>
>       -server \
>
>       -Xmx1024m \
>
>       '-XX:+UseNUMA' \
>
>       -Djava.io.tmpdir={{PWD}}/tmp \
>
>       '-Dspark.history.ui.port=18081' \
>
>       -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
>
>       -XX:OnOutOfMemoryError='kill %p' \
>
>       org.apache.spark.executor.CoarseGrainedExecutorBackend \
>
>       --driver-url \
>
>       spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
>
>       --executor-id \
>
>       <executorId> \
>
>       --hostname \
>
>       <hostname> \
>
>       --cores \
>
>       1 \
>
>       --app-id \
>
>       application_1568954689585_0052 \
>
>       --user-class-path \
>
>       file:$PWD/__app__.jar \
>
>       1><LOG_DIR>/stdout \
>
>       2><LOG_DIR>/stderr
>
>
>
>   resources:
>
>     __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar"
> } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
>
>     __spark_conf__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip"
> } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
>
>     sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" }
> size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
>
>     log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties"
> } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
>
>     pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" }
> size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
>
>     __spark_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size:
> 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
>
>     py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip"
> } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
>
>     __hive_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size:
> 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC
>
>
>
>
> ===============================================================================
>
> INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133)
> - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030
>
> INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) -
> Registering the ApplicationMaster
>
> INFO [2019-09-30 10:42:37,454] ({main}
> Configuration.java[getConfResourceAsInputStream]:2756) - found resource
> resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
>
> INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will
> request 2 executor container(s), each with 1 core(s) and 1408 MB memory
> (including 384 MB of overhead)
>
> INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14}
> Logging.scala[logInfo]:54) - ApplicationMaster registered as
> NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
>
> INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) -
> Submitted 2 unlocalized container requests.
>
> INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) -
> Started progress reporter thread with (heartbeat : 3000, initial allocation
> : 200) intervals
>
> INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000002 on host
> r640-1-12-mlx.mlx for executor with ID 1
>
> INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000003 on host
> r640-1-13-mlx.mlx for executor with ID 2
>
> INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) -
> Received 2 containers from YARN, launching executors on 2 of them.
>
> INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1
>
> INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1,
> r640-1-12-mlx.mlx, 33043, None)
>
> INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2
>
> INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling
> beginning after reached minRegisteredResourcesRatio: 0.8
>
> INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
>
> INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2,
> r640-1-13-mlx.mlx, 34105, None)
>
> INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2}
> SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
>
> INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2}
> Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at
> 127.0.0.1:36897
>
> INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2}
> PySparkInterpreter.java[createGatewayServerAndStartScript]:265) -
> pythonExec: /home/mansop/anaconda2/bin/python
>
> INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2}
> PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH:
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
>
> ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler}
> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>
> INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
>
> INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0}
> AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3
> {HTTP/1.1,[http/1.1]}{0.0.0.0:0}
>
> INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopped Spark web UI at
> http://r640-1-12-mlx.mlx:42446
>
> INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52}
> Logging.scala[logInfo]:54) - Driver requested a total number of 0
> executor(s).
>
> INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutting down all executors
>
> INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Asking each executor to shut down
>
> INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
>
> (serviceOption=None,
>
> services=List(),
>
> started=false)
>
> ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) -
> Exception from Reporter thread.
>
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>                at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>
>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>                at java.lang.reflect.Method.invoke(Method.java:498)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>
>                at com.sun.proxy.$Proxy21.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
>
>                at
> org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
>
>                at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1443)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1353)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>
>                at com.sun.proxy.$Proxy20.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>
>                ... 13 more
>
> INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) -
> Final app status: FAILED, exitCode: 12, (reason: Application attempt
> appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
> )
>
> INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54}
> Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
>
> INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - MemoryStore cleared
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManager stopped
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManagerMaster stopped
>
> INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73}
> Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
>
> INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Successfully stopped SparkContext
>
> INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutdown hook called
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
>
> INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086
>
>
>
> How can I continue troubleshooting in order to find out what this error
> means?
>
>
>
> Thank you very much
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


-- 
Best Regards

Jeff Zhang

RE: thrift.transport.TTransportException

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.
Got it,

But I still can’t see why interpreter fails, logs below:

DEBUG [2019-10-09 14:48:02,193] ({pool-6-thread-2} Interpreter.java[getProperty]:222) - key: zeppelin.PySparkInterpreter.precode, value: null
DEBUG [2019-10-09 14:48:02,195] ({pool-6-thread-2} RemoteInterpreterServer.java[jobRun]:632) - Script after hooks: a = "bigword"
aList = []
for i in range(1000):
    aList.append(i**i*a)
#print aList

for word in aList:
    print word
__zeppelin__._displayhook()
DEBUG [2019-10-09 14:48:02,195] ({pool-6-thread-2} RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event: RemoteInterpreterEvent(type:META_INFOS, data:{"message":"Spark UI enabled","url":"http://r640-1-10-mlx.mlx:36423"})
DEBUG [2019-10-09 14:48:02,195] ({pool-5-thread-2} RemoteInterpreterEventClient.java[pollEvent]:366) - Send event META_INFOS
DEBUG [2019-10-09 14:48:04,720] ({Thread-33} RemoteInterpreterServer.java[onAppend]:789) - Output Append: /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1570490897819_0016/container_e05_1570490897819_0016_01_000001/tmp/zeppelin_pyspark-1580515882697345087.py:179: UserWarning: Unable to load inline matplotlib backend, falling back to Agg

DEBUG [2019-10-09 14:48:04,722] ({Thread-33} RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event: RemoteInterpreterEvent(type:OUTPUT_APPEND, data:{"data":"/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1570490897819_0016/container_e05_1570490897819_0016_01_000001/tmp/zeppelin_pyspark-1580515882697345087.py:179: UserWarning: Unable to load inline matplotlib backend, falling back to Agg\n","index":"0","noteId":"2ENM9X82N","paragraphId":"20190926-163159_1153559848"})
DEBUG [2019-10-09 14:48:04,722] ({Thread-33} RemoteInterpreterServer.java[onAppend]:789) - Output Append:   warnings.warn("Unable to load inline matplotlib backend, "

DEBUG [2019-10-09 14:48:04,722] ({pool-5-thread-2} RemoteInterpreterEventClient.java[pollEvent]:366) - Send event OUTPUT_APPEND
DEBUG [2019-10-09 14:48:04,722] ({Thread-33} RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event: RemoteInterpreterEvent(type:OUTPUT_APPEND, data:{"data":"  warnings.warn(\"Unable to load inline matplotlib backend, \"\n","index":"0","noteId":"2ENM9X82N","paragraphId":"20190926-163159_1153559848"})
DEBUG [2019-10-09 14:48:04,723] ({pool-5-thread-2} RemoteInterpreterEventClient.java[pollEvent]:366) - Send event OUTPUT_APPEND
ERROR [2019-10-09 14:48:10,937] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
INFO [2019-10-09 14:48:10,981] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
INFO [2019-10-09 14:48:11,002] ({shutdown-hook-0} AbstractConnector.java[doStop]:318) - Stopped Spark@3e8aac20{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
INFO [2019-10-09 14:48:11,006] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopped Spark web UI at http://r640-1-10-mlx.mlx:36423
INFO [2019-10-09 14:48:11,057] ({dispatcher-event-loop-22} Logging.scala[logInfo]:54) - Driver requested a total number of 0 executor(s).
INFO [2019-10-09 14:48:11,059] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutting down all executors
INFO [2019-10-09 14:48:11,061] ({dispatcher-event-loop-23} Logging.scala[logInfo]:54) - Asking each executor to shut down
INFO [2019-10-09 14:48:11,070] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
ERROR [2019-10-09 14:48:11,075] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1570490897819_0016_000001 doesn't exist in ApplicationMasterService cache.
        at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
        at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
        at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
        at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
        at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy21.allocate(Unknown Source)
        at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
        at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1570490897819_0016_000001 doesn't exist in ApplicationMasterService cache.
        at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
        at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
        at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
        at org.apache.hadoop.ipc.Client.call(Client.java:1443)
        at org.apache.hadoop.ipc.Client.call(Client.java:1353)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy20.allocate(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
        ... 13 more
INFO [2019-10-09 14:48:11,084] ({Reporter} Logging.scala[logInfo]:54) - Final app status: FAILED, exitCode: 12, (reason: Application attempt appattempt_1570490897819_0016_000001 doesn't exist in ApplicationMasterService cache.
        at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
        at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
        at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
)
INFO [2019-10-09 14:48:11,086] ({dispatcher-event-loop-33} Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
INFO [2019-10-09 14:48:11,119] ({shutdown-hook-0} Logging.scala[logInfo]:54) - MemoryStore cleared
INFO [2019-10-09 14:48:11,120] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManager stopped
INFO [2019-10-09 14:48:11,133] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManagerMaster stopped
INFO [2019-10-09 14:48:11,138] ({dispatcher-event-loop-40} Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
INFO [2019-10-09 14:48:11,159] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Successfully stopped SparkContext
INFO [2019-10-09 14:48:11,163] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutdown hook called

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Wednesday, October 9, 2019 1:10 PM
To: users
Subject: Re: thrift.transport.TTransportException

>>> I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the ` log4j_yarn_cluster.properties` file but nothing has changed, in fact the ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is not updated after running my notes

In yarn cluster mode, you should check yarn app log file instead of the local log file.


Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年10月9日周三 上午10:06写道:
Hi Jeff,

Sorry for the late response.

I ran yarn-cluster mode with this setup

%spark2.conf

master yarn
spark.submit.deployMode cluster
zeppelin.pyspark.python /home/mansop/anaconda2/bin/python
spark.driver.memory 10g

I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the ` log4j_yarn_cluster.properties` file but nothing has changed, in fact the ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is not updated after running my notes

This code works

%pyspark

print("Hello world!")

However this one does not work:

%pyspark

a = "bigword"
aList = []
for i in range(1000):
    aList.append(i**i*a)
#print aList

for word in aList:
    print word

which means I am still getting org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

and spark logs says:
ERROR [2019-10-09 12:15:16,454] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
…
ERROR [2019-10-09 12:15:16,609] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1570490897819_0013_000001 doesn't exist in ApplicationMasterService cache.

Any idea?

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Friday, October 4, 2019 5:12 PM
To: users
Subject: Re: thrift.transport.TTransportException

Then it looks like something wrong with the python process. Do you run it in yarn-cluster mode or yarn-client mode ?
Try to add the following line to log4j.properties for yarn-client mode or log4j_yarn_cluster.properties for yarn-cluster mode

log4j.logger.org.apache.zeppelin.interpreter=DEBUG

And try it again, this time you will get more log info, I suspect the python process fail to start




Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年10月4日周五 上午9:09写道:
Sorry for the late response,

Yes, I have successfully ran few simple scala codes using %spark interpreter in zeppelin.

What should I do next?

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Tuesday, October 1, 2019 5:44 PM
To: users
Subject: Re: thrift.transport.TTransportException

It looks like you are using pyspark, could you try just start scala spark interpreter via `%spark` ? First let's figure out whether it is related with pyspark.



Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年10月1日周二 下午3:29写道:
Dear Zeppelin community,

I would like to ask for advice in regards an error I am having with thrift.

I am getting quite a lot of these errors while running my notebooks

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

And this is the Spark driver application logs:
…
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
    SPARK_YARN_STAGING_DIR -> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
    SPARK_USER -> mansop
    PYTHONPATH -> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip

  command:
    LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \
      {{JAVA_HOME}}/bin/java \
      -server \
      -Xmx1024m \
      '-XX:+UseNUMA' \
      -Djava.io.tmpdir={{PWD}}/tmp \
      '-Dspark.history.ui.port=18081' \
      -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
      -XX:OnOutOfMemoryError='kill %p' \
      org.apache.spark.executor.CoarseGrainedExecutorBackend \
      --driver-url \
      spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
      --executor-id \
      <executorId> \
      --hostname \
      <hostname> \
      --cores \
      1 \
      --app-id \
      application_1568954689585_0052 \
      --user-class-path \
      file:$PWD/__app__.jar \
      1><LOG_DIR>/stdout \
      2><LOG_DIR>/stderr

  resources:
    __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar" } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
    __spark_conf__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip" } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
    sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" } size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
    log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties" } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
    pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" } size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
    __spark_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
    py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip" } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
    __hive_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size: 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC

===============================================================================
INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133) - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030<http://10.0.1.248:8030>
INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) - Registering the ApplicationMaster
INFO [2019-09-30 10:42:37,454] ({main} Configuration.java[getConfResourceAsInputStream]:2756) - found resource resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14} Logging.scala[logInfo]:54) - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) - Submitted 2 unlocalized container requests.
INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000002 on host r640-1-12-mlx.mlx for executor with ID 1
INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000003 on host r640-1-13-mlx.mlx for executor with ID 2
INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) - Received 2 containers from YARN, launching executors on 2 of them.
INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340<http://10.0.1.12:54340>) with ID 1
INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62} Logging.scala[logInfo]:54) - Registering block manager r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1, r640-1-12-mlx.mlx, 33043, None)
INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812<http://10.0.1.13:33812>) with ID 2
INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2} Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2} Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11} Logging.scala[logInfo]:54) - Registering block manager r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2, r640-1-13-mlx.mlx, 34105, None)
INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2} SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2} Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at 127.0.0.1:36897<http://127.0.0.1:36897>
INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2} PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - pythonExec: /home/mansop/anaconda2/bin/python
INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2} PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH: /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0} AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3{HTTP/1.1,[http/1.1]}{0.0.0.0:0<http://0.0.0.0:0>}
INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopped Spark web UI at http://r640-1-12-mlx.mlx:42446
INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52} Logging.scala[logInfo]:54) - Driver requested a total number of 0 executor(s).
INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutting down all executors
INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Asking each executor to shut down
INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
               at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
               at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
               at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
               at com.sun.proxy.$Proxy21.allocate(Unknown Source)
               at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
               at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
               at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
               at org.apache.hadoop.ipc.Client.call(Client.java:1443)
               at org.apache.hadoop.ipc.Client.call(Client.java:1353)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
               at com.sun.proxy.$Proxy20.allocate(Unknown Source)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
               ... 13 more
INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) - Final app status: FAILED, exitCode: 12, (reason: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
)
INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54} Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0} Logging.scala[logInfo]:54) - MemoryStore cleared
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManager stopped
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManagerMaster stopped
INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73} Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Successfully stopped SparkContext
INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutdown hook called
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086

How can I continue troubleshooting in order to find out what this error means?

Thank you very much

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: thrift.transport.TTransportException

Posted by Jeff Zhang <zj...@gmail.com>.
>>> I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the `
log4j_yarn_cluster.properties` file but nothing has changed, in fact the `
zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is not
updated after running my notes

In yarn cluster mode, you should check yarn app log file instead of the
local log file.


Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月9日周三 上午10:06写道:

> Hi Jeff,
>
>
>
> Sorry for the late response.
>
>
>
> I ran yarn-cluster mode with this setup
>
>
>
> %spark2.conf
>
>
>
> master yarn
>
> spark.submit.deployMode cluster
>
> zeppelin.pyspark.python /home/mansop/anaconda2/bin/python
>
> spark.driver.memory 10g
>
>
>
> I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the `
> log4j_yarn_cluster.properties` file but nothing has changed, in fact the
> ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is not
> updated after running my notes
>
>
>
> This code works
>
>
>
> %pyspark
>
>
>
> print("Hello world!")
>
>
>
> However this one does not work:
>
>
>
> %pyspark
>
>
>
> a = "bigword"
>
> aList = []
>
> for i in range(1000):
>
>     aList.append(i**i*a)
>
> #print aList
>
>
>
> for word in aList:
>
>     print word
>
>
>
> which means I am still getting org.apache.thrift.transport.TTransportException
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
>
>
> and spark logs says:
>
> ERROR [2019-10-09 12:15:16,454] ({SIGTERM handler}
> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>
> …
>
> ERROR [2019-10-09 12:15:16,609] ({Reporter} Logging.scala[logError]:91) -
> Exception from Reporter thread.
>
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1570490897819_0013_000001 doesn't exist in
> ApplicationMasterService cache.
>
>
>
> Any idea?
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, October 4, 2019 5:12 PM
> *To:* users
> *Subject:* Re: thrift.transport.TTransportException
>
>
>
> Then it looks like something wrong with the python process. Do you run it
> in yarn-cluster mode or yarn-client mode ?
>
> Try to add the following line to log4j.properties for yarn-client mode or
> log4j_yarn_cluster.properties for yarn-cluster mode
>
>
>
> log4j.logger.org.apache.zeppelin.interpreter=DEBUG
>
>
>
> And try it again, this time you will get more log info, I suspect the
> python process fail to start
>
>
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月4日周五 上午9:09
> 写道:
>
> Sorry for the late response,
>
>
>
> Yes, I have successfully ran few simple scala codes using %spark
> interpreter in zeppelin.
>
>
>
> What should I do next?
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Tuesday, October 1, 2019 5:44 PM
> *To:* users
> *Subject:* Re: thrift.transport.TTransportException
>
>
>
> It looks like you are using pyspark, could you try just start scala spark
> interpreter via `%spark` ? First let's figure out whether it is related
> with pyspark.
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月1日周二 下午3:29
> 写道:
>
> Dear Zeppelin community,
>
>
>
> I would like to ask for advice in regards an error I am having with thrift.
>
>
>
> I am getting quite a lot of these errors while running my notebooks
>
>
>
> org.apache.thrift.transport.TTransportException at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
> at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at
> org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> And this is the Spark driver application logs:
>
> …
>
>
> ===============================================================================
>
> YARN executor launch context:
>
>   env:
>
>     CLASSPATH ->
> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
>
>     SPARK_YARN_STAGING_DIR ->
> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
>
>     SPARK_USER -> mansop
>
>     PYTHONPATH ->
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip
>
>
>
>   command:
>
>
> LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH"
> \
>
>       {{JAVA_HOME}}/bin/java \
>
>       -server \
>
>       -Xmx1024m \
>
>       '-XX:+UseNUMA' \
>
>       -Djava.io.tmpdir={{PWD}}/tmp \
>
>       '-Dspark.history.ui.port=18081' \
>
>       -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
>
>       -XX:OnOutOfMemoryError='kill %p' \
>
>       org.apache.spark.executor.CoarseGrainedExecutorBackend \
>
>       --driver-url \
>
>       spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
>
>       --executor-id \
>
>       <executorId> \
>
>       --hostname \
>
>       <hostname> \
>
>       --cores \
>
>       1 \
>
>       --app-id \
>
>       application_1568954689585_0052 \
>
>       --user-class-path \
>
>       file:$PWD/__app__.jar \
>
>       1><LOG_DIR>/stdout \
>
>       2><LOG_DIR>/stderr
>
>
>
>   resources:
>
>     __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar"
> } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
>
>     __spark_conf__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip"
> } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
>
>     sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" }
> size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
>
>     log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties"
> } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
>
>     pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" }
> size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
>
>     __spark_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size:
> 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
>
>     py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip"
> } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
>
>     __hive_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size:
> 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC
>
>
>
>
> ===============================================================================
>
> INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133)
> - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030
>
> INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) -
> Registering the ApplicationMaster
>
> INFO [2019-09-30 10:42:37,454] ({main}
> Configuration.java[getConfResourceAsInputStream]:2756) - found resource
> resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
>
> INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will
> request 2 executor container(s), each with 1 core(s) and 1408 MB memory
> (including 384 MB of overhead)
>
> INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14}
> Logging.scala[logInfo]:54) - ApplicationMaster registered as
> NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
>
> INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) -
> Submitted 2 unlocalized container requests.
>
> INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) -
> Started progress reporter thread with (heartbeat : 3000, initial allocation
> : 200) intervals
>
> INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000002 on host
> r640-1-12-mlx.mlx for executor with ID 1
>
> INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000003 on host
> r640-1-13-mlx.mlx for executor with ID 2
>
> INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) -
> Received 2 containers from YARN, launching executors on 2 of them.
>
> INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1
>
> INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1,
> r640-1-12-mlx.mlx, 33043, None)
>
> INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2
>
> INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling
> beginning after reached minRegisteredResourcesRatio: 0.8
>
> INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
>
> INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2,
> r640-1-13-mlx.mlx, 34105, None)
>
> INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2}
> SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
>
> INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2}
> Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at
> 127.0.0.1:36897
>
> INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2}
> PySparkInterpreter.java[createGatewayServerAndStartScript]:265) -
> pythonExec: /home/mansop/anaconda2/bin/python
>
> INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2}
> PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH:
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
>
> ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler}
> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>
> INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
>
> INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0}
> AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3
> {HTTP/1.1,[http/1.1]}{0.0.0.0:0}
>
> INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopped Spark web UI at
> http://r640-1-12-mlx.mlx:42446
>
> INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52}
> Logging.scala[logInfo]:54) - Driver requested a total number of 0
> executor(s).
>
> INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutting down all executors
>
> INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Asking each executor to shut down
>
> INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
>
> (serviceOption=None,
>
> services=List(),
>
> started=false)
>
> ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) -
> Exception from Reporter thread.
>
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>                at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>
>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>                at java.lang.reflect.Method.invoke(Method.java:498)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>
>                at com.sun.proxy.$Proxy21.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
>
>                at
> org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
>
>                at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1443)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1353)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>
>                at com.sun.proxy.$Proxy20.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>
>                ... 13 more
>
> INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) -
> Final app status: FAILED, exitCode: 12, (reason: Application attempt
> appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
> )
>
> INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54}
> Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
>
> INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - MemoryStore cleared
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManager stopped
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManagerMaster stopped
>
> INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73}
> Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
>
> INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Successfully stopped SparkContext
>
> INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutdown hook called
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
>
> INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086
>
>
>
> How can I continue troubleshooting in order to find out what this error
> means?
>
>
>
> Thank you very much
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


-- 
Best Regards

Jeff Zhang

RE: thrift.transport.TTransportException

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.
Hi Jeff,

Sorry for the late response.

I ran yarn-cluster mode with this setup

%spark2.conf

master yarn
spark.submit.deployMode cluster
zeppelin.pyspark.python /home/mansop/anaconda2/bin/python
spark.driver.memory 10g

I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the ` log4j_yarn_cluster.properties` file but nothing has changed, in fact the ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is not updated after running my notes

This code works

%pyspark

print("Hello world!")

However this one does not work:

%pyspark

a = "bigword"
aList = []
for i in range(1000):
    aList.append(i**i*a)
#print aList

for word in aList:
    print word

which means I am still getting org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

and spark logs says:
ERROR [2019-10-09 12:15:16,454] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
…
ERROR [2019-10-09 12:15:16,609] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1570490897819_0013_000001 doesn't exist in ApplicationMasterService cache.

Any idea?

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, October 4, 2019 5:12 PM
To: users
Subject: Re: thrift.transport.TTransportException

Then it looks like something wrong with the python process. Do you run it in yarn-cluster mode or yarn-client mode ?
Try to add the following line to log4j.properties for yarn-client mode or log4j_yarn_cluster.properties for yarn-cluster mode

log4j.logger.org.apache.zeppelin.interpreter=DEBUG

And try it again, this time you will get more log info, I suspect the python process fail to start




Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年10月4日周五 上午9:09写道:
Sorry for the late response,

Yes, I have successfully ran few simple scala codes using %spark interpreter in zeppelin.

What should I do next?

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Tuesday, October 1, 2019 5:44 PM
To: users
Subject: Re: thrift.transport.TTransportException

It looks like you are using pyspark, could you try just start scala spark interpreter via `%spark` ? First let's figure out whether it is related with pyspark.



Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年10月1日周二 下午3:29写道:
Dear Zeppelin community,

I would like to ask for advice in regards an error I am having with thrift.

I am getting quite a lot of these errors while running my notebooks

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

And this is the Spark driver application logs:
…
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
    SPARK_YARN_STAGING_DIR -> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
    SPARK_USER -> mansop
    PYTHONPATH -> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip

  command:
    LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \
      {{JAVA_HOME}}/bin/java \
      -server \
      -Xmx1024m \
      '-XX:+UseNUMA' \
      -Djava.io.tmpdir={{PWD}}/tmp \
      '-Dspark.history.ui.port=18081' \
      -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
      -XX:OnOutOfMemoryError='kill %p' \
      org.apache.spark.executor.CoarseGrainedExecutorBackend \
      --driver-url \
      spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
      --executor-id \
      <executorId> \
      --hostname \
      <hostname> \
      --cores \
      1 \
      --app-id \
      application_1568954689585_0052 \
      --user-class-path \
      file:$PWD/__app__.jar \
      1><LOG_DIR>/stdout \
      2><LOG_DIR>/stderr

  resources:
    __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar" } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
    __spark_conf__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip" } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
    sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" } size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
    log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties" } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
    pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" } size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
    __spark_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
    py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip" } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
    __hive_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size: 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC

===============================================================================
INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133) - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030<http://10.0.1.248:8030>
INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) - Registering the ApplicationMaster
INFO [2019-09-30 10:42:37,454] ({main} Configuration.java[getConfResourceAsInputStream]:2756) - found resource resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14} Logging.scala[logInfo]:54) - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) - Submitted 2 unlocalized container requests.
INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000002 on host r640-1-12-mlx.mlx for executor with ID 1
INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000003 on host r640-1-13-mlx.mlx for executor with ID 2
INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) - Received 2 containers from YARN, launching executors on 2 of them.
INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340<http://10.0.1.12:54340>) with ID 1
INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62} Logging.scala[logInfo]:54) - Registering block manager r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1, r640-1-12-mlx.mlx, 33043, None)
INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812<http://10.0.1.13:33812>) with ID 2
INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2} Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2} Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11} Logging.scala[logInfo]:54) - Registering block manager r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2, r640-1-13-mlx.mlx, 34105, None)
INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2} SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2} Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at 127.0.0.1:36897<http://127.0.0.1:36897>
INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2} PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - pythonExec: /home/mansop/anaconda2/bin/python
INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2} PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH: /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0} AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3{HTTP/1.1,[http/1.1]}{0.0.0.0:0<http://0.0.0.0:0>}
INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopped Spark web UI at http://r640-1-12-mlx.mlx:42446
INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52} Logging.scala[logInfo]:54) - Driver requested a total number of 0 executor(s).
INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutting down all executors
INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Asking each executor to shut down
INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
               at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
               at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
               at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
               at com.sun.proxy.$Proxy21.allocate(Unknown Source)
               at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
               at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
               at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
               at org.apache.hadoop.ipc.Client.call(Client.java:1443)
               at org.apache.hadoop.ipc.Client.call(Client.java:1353)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
               at com.sun.proxy.$Proxy20.allocate(Unknown Source)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
               ... 13 more
INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) - Final app status: FAILED, exitCode: 12, (reason: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
)
INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54} Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0} Logging.scala[logInfo]:54) - MemoryStore cleared
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManager stopped
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManagerMaster stopped
INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73} Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Successfully stopped SparkContext
INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutdown hook called
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086

How can I continue troubleshooting in order to find out what this error means?

Thank you very much

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: thrift.transport.TTransportException

Posted by Jeff Zhang <zj...@gmail.com>.
Then it looks like something wrong with the python process. Do you run it
in yarn-cluster mode or yarn-client mode ?
Try to add the following line to log4j.properties for yarn-client mode or
log4j_yarn_cluster.properties for yarn-cluster mode

log4j.logger.org.apache.zeppelin.interpreter=DEBUG

And try it again, this time you will get more log info, I suspect the
python process fail to start




Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月4日周五 上午9:09写道:

> Sorry for the late response,
>
>
>
> Yes, I have successfully ran few simple scala codes using %spark
> interpreter in zeppelin.
>
>
>
> What should I do next?
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Tuesday, October 1, 2019 5:44 PM
> *To:* users
> *Subject:* Re: thrift.transport.TTransportException
>
>
>
> It looks like you are using pyspark, could you try just start scala spark
> interpreter via `%spark` ? First let's figure out whether it is related
> with pyspark.
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月1日周二 下午3:29
> 写道:
>
> Dear Zeppelin community,
>
>
>
> I would like to ask for advice in regards an error I am having with thrift.
>
>
>
> I am getting quite a lot of these errors while running my notebooks
>
>
>
> org.apache.thrift.transport.TTransportException at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
> at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at
> org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> And this is the Spark driver application logs:
>
> …
>
>
> ===============================================================================
>
> YARN executor launch context:
>
>   env:
>
>     CLASSPATH ->
> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
>
>     SPARK_YARN_STAGING_DIR ->
> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
>
>     SPARK_USER -> mansop
>
>     PYTHONPATH ->
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip
>
>
>
>   command:
>
>
> LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH"
> \
>
>       {{JAVA_HOME}}/bin/java \
>
>       -server \
>
>       -Xmx1024m \
>
>       '-XX:+UseNUMA' \
>
>       -Djava.io.tmpdir={{PWD}}/tmp \
>
>       '-Dspark.history.ui.port=18081' \
>
>       -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
>
>       -XX:OnOutOfMemoryError='kill %p' \
>
>       org.apache.spark.executor.CoarseGrainedExecutorBackend \
>
>       --driver-url \
>
>       spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
>
>       --executor-id \
>
>       <executorId> \
>
>       --hostname \
>
>       <hostname> \
>
>       --cores \
>
>       1 \
>
>       --app-id \
>
>       application_1568954689585_0052 \
>
>       --user-class-path \
>
>       file:$PWD/__app__.jar \
>
>       1><LOG_DIR>/stdout \
>
>       2><LOG_DIR>/stderr
>
>
>
>   resources:
>
>     __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar"
> } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
>
>     __spark_conf__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip"
> } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
>
>     sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" }
> size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
>
>     log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties"
> } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
>
>     pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" }
> size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
>
>     __spark_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size:
> 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
>
>     py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip"
> } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
>
>     __hive_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size:
> 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC
>
>
>
>
> ===============================================================================
>
> INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133)
> - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030
>
> INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) -
> Registering the ApplicationMaster
>
> INFO [2019-09-30 10:42:37,454] ({main}
> Configuration.java[getConfResourceAsInputStream]:2756) - found resource
> resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
>
> INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will
> request 2 executor container(s), each with 1 core(s) and 1408 MB memory
> (including 384 MB of overhead)
>
> INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14}
> Logging.scala[logInfo]:54) - ApplicationMaster registered as
> NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
>
> INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) -
> Submitted 2 unlocalized container requests.
>
> INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) -
> Started progress reporter thread with (heartbeat : 3000, initial allocation
> : 200) intervals
>
> INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000002 on host
> r640-1-12-mlx.mlx for executor with ID 1
>
> INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000003 on host
> r640-1-13-mlx.mlx for executor with ID 2
>
> INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) -
> Received 2 containers from YARN, launching executors on 2 of them.
>
> INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1
>
> INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1,
> r640-1-12-mlx.mlx, 33043, None)
>
> INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2
>
> INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling
> beginning after reached minRegisteredResourcesRatio: 0.8
>
> INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
>
> INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2,
> r640-1-13-mlx.mlx, 34105, None)
>
> INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2}
> SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
>
> INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2}
> Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at
> 127.0.0.1:36897
>
> INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2}
> PySparkInterpreter.java[createGatewayServerAndStartScript]:265) -
> pythonExec: /home/mansop/anaconda2/bin/python
>
> INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2}
> PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH:
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
>
> ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler}
> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>
> INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
>
> INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0}
> AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3
> {HTTP/1.1,[http/1.1]}{0.0.0.0:0}
>
> INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopped Spark web UI at
> http://r640-1-12-mlx.mlx:42446
>
> INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52}
> Logging.scala[logInfo]:54) - Driver requested a total number of 0
> executor(s).
>
> INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutting down all executors
>
> INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Asking each executor to shut down
>
> INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
>
> (serviceOption=None,
>
> services=List(),
>
> started=false)
>
> ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) -
> Exception from Reporter thread.
>
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>                at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>
>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>                at java.lang.reflect.Method.invoke(Method.java:498)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>
>                at com.sun.proxy.$Proxy21.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
>
>                at
> org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
>
>                at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1443)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1353)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>
>                at com.sun.proxy.$Proxy20.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>
>                ... 13 more
>
> INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) -
> Final app status: FAILED, exitCode: 12, (reason: Application attempt
> appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
> )
>
> INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54}
> Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
>
> INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - MemoryStore cleared
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManager stopped
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManagerMaster stopped
>
> INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73}
> Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
>
> INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Successfully stopped SparkContext
>
> INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutdown hook called
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
>
> INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086
>
>
>
> How can I continue troubleshooting in order to find out what this error
> means?
>
>
>
> Thank you very much
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


-- 
Best Regards

Jeff Zhang

RE: thrift.transport.TTransportException

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.
Sorry for the late response,

Yes, I have successfully ran few simple scala codes using %spark interpreter in zeppelin.

What should I do next?

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Tuesday, October 1, 2019 5:44 PM
To: users
Subject: Re: thrift.transport.TTransportException

It looks like you are using pyspark, could you try just start scala spark interpreter via `%spark` ? First let's figure out whether it is related with pyspark.



Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年10月1日周二 下午3:29写道:
Dear Zeppelin community,

I would like to ask for advice in regards an error I am having with thrift.

I am getting quite a lot of these errors while running my notebooks

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

And this is the Spark driver application logs:
…
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
    SPARK_YARN_STAGING_DIR -> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
    SPARK_USER -> mansop
    PYTHONPATH -> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip

  command:
    LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \
      {{JAVA_HOME}}/bin/java \
      -server \
      -Xmx1024m \
      '-XX:+UseNUMA' \
      -Djava.io.tmpdir={{PWD}}/tmp \
      '-Dspark.history.ui.port=18081' \
      -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
      -XX:OnOutOfMemoryError='kill %p' \
      org.apache.spark.executor.CoarseGrainedExecutorBackend \
      --driver-url \
      spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
      --executor-id \
      <executorId> \
      --hostname \
      <hostname> \
      --cores \
      1 \
      --app-id \
      application_1568954689585_0052 \
      --user-class-path \
      file:$PWD/__app__.jar \
      1><LOG_DIR>/stdout \
      2><LOG_DIR>/stderr

  resources:
    __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar" } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
    __spark_conf__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip" } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
    sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" } size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
    log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties" } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
    pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" } size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
    __spark_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
    py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip" } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
    __hive_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size: 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC

===============================================================================
INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133) - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030<http://10.0.1.248:8030>
INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) - Registering the ApplicationMaster
INFO [2019-09-30 10:42:37,454] ({main} Configuration.java[getConfResourceAsInputStream]:2756) - found resource resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14} Logging.scala[logInfo]:54) - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) - Submitted 2 unlocalized container requests.
INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000002 on host r640-1-12-mlx.mlx for executor with ID 1
INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000003 on host r640-1-13-mlx.mlx for executor with ID 2
INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) - Received 2 containers from YARN, launching executors on 2 of them.
INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340<http://10.0.1.12:54340>) with ID 1
INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62} Logging.scala[logInfo]:54) - Registering block manager r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1, r640-1-12-mlx.mlx, 33043, None)
INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812<http://10.0.1.13:33812>) with ID 2
INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2} Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2} Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11} Logging.scala[logInfo]:54) - Registering block manager r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2, r640-1-13-mlx.mlx, 34105, None)
INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2} SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2} Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at 127.0.0.1:36897<http://127.0.0.1:36897>
INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2} PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - pythonExec: /home/mansop/anaconda2/bin/python
INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2} PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH: /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0} AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3{HTTP/1.1,[http/1.1]}{0.0.0.0:0<http://0.0.0.0:0>}
INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopped Spark web UI at http://r640-1-12-mlx.mlx:42446
INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52} Logging.scala[logInfo]:54) - Driver requested a total number of 0 executor(s).
INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutting down all executors
INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Asking each executor to shut down
INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
               at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
               at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
               at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
               at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
               at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
               at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
               at com.sun.proxy.$Proxy21.allocate(Unknown Source)
               at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
               at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
               at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

               at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
               at org.apache.hadoop.ipc.Client.call(Client.java:1443)
               at org.apache.hadoop.ipc.Client.call(Client.java:1353)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
               at com.sun.proxy.$Proxy20.allocate(Unknown Source)
               at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
               ... 13 more
INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) - Final app status: FAILED, exitCode: 12, (reason: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache.
               at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
               at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
               at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
               at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
               at java.security.AccessController.doPrivileged(Native Method)
               at javax.security.auth.Subject.doAs(Subject.java:422)
               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
)
INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54} Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0} Logging.scala[logInfo]:54) - MemoryStore cleared
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManager stopped
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManagerMaster stopped
INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73} Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Successfully stopped SparkContext
INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutdown hook called
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086

How can I continue troubleshooting in order to find out what this error means?

Thank you very much

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: thrift.transport.TTransportException

Posted by Jeff Zhang <zj...@gmail.com>.
It looks like you are using pyspark, could you try just start scala spark
interpreter via `%spark` ? First let's figure out whether it is related
with pyspark.



Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年10月1日周二 下午3:29写道:

> Dear Zeppelin community,
>
>
>
> I would like to ask for advice in regards an error I am having with thrift.
>
>
>
> I am getting quite a lot of these errors while running my notebooks
>
>
>
> org.apache.thrift.transport.TTransportException at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
> at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at
> org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> And this is the Spark driver application logs:
>
> …
>
>
> ===============================================================================
>
> YARN executor launch context:
>
>   env:
>
>     CLASSPATH ->
> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
>
>     SPARK_YARN_STAGING_DIR ->
> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
>
>     SPARK_USER -> mansop
>
>     PYTHONPATH ->
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip
>
>
>
>   command:
>
>
> LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH"
> \
>
>       {{JAVA_HOME}}/bin/java \
>
>       -server \
>
>       -Xmx1024m \
>
>       '-XX:+UseNUMA' \
>
>       -Djava.io.tmpdir={{PWD}}/tmp \
>
>       '-Dspark.history.ui.port=18081' \
>
>       -Dspark.yarn.app.container.log.dir=<LOG_DIR> \
>
>       -XX:OnOutOfMemoryError='kill %p' \
>
>       org.apache.spark.executor.CoarseGrainedExecutorBackend \
>
>       --driver-url \
>
>       spark://CoarseGrainedScheduler@r640-1-12-mlx.mlx:35602 \
>
>       --executor-id \
>
>       <executorId> \
>
>       --hostname \
>
>       <hostname> \
>
>       --cores \
>
>       1 \
>
>       --app-id \
>
>       application_1568954689585_0052 \
>
>       --user-class-path \
>
>       file:$PWD/__app__.jar \
>
>       1><LOG_DIR>/stdout \
>
>       2><LOG_DIR>/stderr
>
>
>
>   resources:
>
>     __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar"
> } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
>
>     __spark_conf__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip"
> } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
>
>     sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" }
> size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
>
>     log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties"
> } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
>
>     pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
> port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" }
> size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
>
>     __spark_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size:
> 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
>
>     py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip"
> } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
>
>     __hive_libs__ -> resource { scheme: "hdfs" host:
> "gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
> "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size:
> 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC
>
>
>
>
> ===============================================================================
>
> INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133)
> - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030
>
> INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) -
> Registering the ApplicationMaster
>
> INFO [2019-09-30 10:42:37,454] ({main}
> Configuration.java[getConfResourceAsInputStream]:2756) - found resource
> resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
>
> INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will
> request 2 executor container(s), each with 1 core(s) and 1408 MB memory
> (including 384 MB of overhead)
>
> INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14}
> Logging.scala[logInfo]:54) - ApplicationMaster registered as
> NettyRpcEndpointRef(spark://YarnAM@r640-1-12-mlx.mlx:35602)
>
> INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) -
> Submitted 2 unlocalized container requests.
>
> INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) -
> Started progress reporter thread with (heartbeat : 3000, initial allocation
> : 200) intervals
>
> INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000002 on host
> r640-1-12-mlx.mlx for executor with ID 1
>
> INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) -
> Launching container container_e01_1568954689585_0052_01_000003 on host
> r640-1-13-mlx.mlx for executor with ID 2
>
> INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) -
> Received 2 containers from YARN, launching executors on 2 of them.
>
> INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1
>
> INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1,
> r640-1-12-mlx.mlx, 33043, None)
>
> INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9}
> Logging.scala[logInfo]:54) - Registered executor
> NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2
>
> INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling
> beginning after reached minRegisteredResourcesRatio: 0.8
>
> INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2}
> Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done
>
> INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11}
> Logging.scala[logInfo]:54) - Registering block manager
> r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2,
> r640-1-13-mlx.mlx, 34105, None)
>
> INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2}
> SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
>
> INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2}
> Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at
> 127.0.0.1:36897
>
> INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2}
> PySparkInterpreter.java[createGatewayServerAndStartScript]:265) -
> pythonExec: /home/mansop/anaconda2/bin/python
>
> INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2}
> PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH:
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
>
> ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler}
> SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
>
> INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook
>
> INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0}
> AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3
> {HTTP/1.1,[http/1.1]}{0.0.0.0:0}
>
> INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopped Spark web UI at
> http://r640-1-12-mlx.mlx:42446
>
> INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52}
> Logging.scala[logInfo]:54) - Driver requested a total number of 0
> executor(s).
>
> INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutting down all executors
>
> INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51}
> Logging.scala[logInfo]:54) - Asking each executor to shut down
>
> INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices
>
> (serviceOption=None,
>
> services=List(),
>
> started=false)
>
> ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) -
> Exception from Reporter thread.
>
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
>                at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>                at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>
>                at
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>
>                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>                at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>                at java.lang.reflect.Method.invoke(Method.java:498)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>
>                at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>
>                at com.sun.proxy.$Proxy21.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
>
>                at
> org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
>
>                at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
> Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
>
>
>                at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1443)
>
>                at org.apache.hadoop.ipc.Client.call(Client.java:1353)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>
>                at com.sun.proxy.$Proxy20.allocate(Unknown Source)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>
>                ... 13 more
>
> INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) -
> Final app status: FAILED, exitCode: 12, (reason: Application attempt
> appattempt_1568954689585_0052_000001 doesn't exist in
> ApplicationMasterService cache.
>
>                at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
>
>                at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
>                at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
>                at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>
>                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>
>                at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>
>                at java.security.AccessController.doPrivileged(Native
> Method)
>
>                at javax.security.auth.Subject.doAs(Subject.java:422)
>
>                at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>                at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
> )
>
> INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54}
> Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
>
> INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - MemoryStore cleared
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManager stopped
>
> INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - BlockManagerMaster stopped
>
> INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73}
> Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
>
> INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Successfully stopped SparkContext
>
> INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Shutdown hook called
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
>
> INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
>
> INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0}
> Logging.scala[logInfo]:54) - Deleting directory
> /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086
>
>
>
> How can I continue troubleshooting in order to find out what this error
> means?
>
>
>
> Thank you very much
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


-- 
Best Regards

Jeff Zhang