You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Saisai Shao (JIRA)" <ji...@apache.org> on 2016/08/08 01:48:20 UTC
[jira] [Comment Edited] (SPARK-16914) NodeManager crash when spark
are registering executor infomartion into leveldb
[ https://issues.apache.org/jira/browse/SPARK-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411159#comment-15411159 ]
Saisai Shao edited comment on SPARK-16914 at 8/8/16 1:48 AM:
-------------------------------------------------------------
So from your description, is this exception mainly due to the problem of disk1 that leveldb fail to write data into it?
Maybe this JIRA SPARK-14963 could address your problem, it uses NM's recovery dir to store aux-service data. And I guess NM will handle this disk failure problem if you configure multiple disks for NM local dir.
was (Author: jerryshao):
So from your description, is this exception mainly due to the problem of disk1 that leveldb fail to write data into it?
Maybe this JIRA SPARK-16917 could address your problem, it uses NM's recovery dir to store aux-service data. And I guess NM will handle this disk failure problem if you configure multiple disks for NM local dir.
> NodeManager crash when spark are registering executor infomartion into leveldb
> ------------------------------------------------------------------------------
>
> Key: SPARK-16914
> URL: https://issues.apache.org/jira/browse/SPARK-16914
> Project: Spark
> Issue Type: Bug
> Components: Shuffle
> Affects Versions: 1.6.2
> Reporter: cen yuhai
>
> {noformat}
> Stack: [0x00007fb5b53de000,0x00007fb5b54df000], sp=0x00007fb5b54dcba8, free space=1018k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> C [libc.so.6+0x896b1] memcpy+0x11
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
> j org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
> j org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
> j org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
> j org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
> j org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
> j org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
> J 8429 C2 org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V (100 bytes) @ 0x00007fb5f27ff6cc [0x00007fb5f27fdde0+0x18ec]
> J 8371 C2 org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (10 bytes) @ 0x00007fb5f242df20 [0x00007fb5f242de80+0xa0]
> J 6853 C2 io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (74 bytes) @ 0x00007fb5f215587c [0x00007fb5f21557e0+0x9c]
> J 5872 C2 io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (42 bytes) @ 0x00007fb5f2183268 [0x00007fb5f2183100+0x168]
> J 5849 C2 io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (158 bytes) @ 0x00007fb5f2191524 [0x00007fb5f218f5a0+0x1f84]
> J 5941 C2 org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (170 bytes) @ 0x00007fb5f220a230 [0x00007fb5f2209fc0+0x270]
> J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V (363 bytes) @ 0x00007fb5f264465c [0x00007fb5f2644140+0x51c]
> J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 0x00007fb5f26f6764 [0x00007fb5f26f63c0+0x3a4]
> j io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
> j java.lang.Thread.run()V+11
> v ~StubRoutines::call_stub
> {noformat}
> The target code in spark is in ExternalShuffleBlockResolver
> {code}
> /** Registers a new Executor with all the configuration we need to find its shuffle files. */
> public void registerExecutor(
> String appId,
> String execId,
> ExecutorShuffleInfo executorInfo) {
> AppExecId fullId = new AppExecId(appId, execId);
> logger.info("Registered executor {} with {}", fullId, executorInfo);
> try {
> if (db != null) {
> byte[] key = dbAppExecKey(fullId);
> byte[] value = mapper.writeValueAsString(executorInfo).getBytes(Charsets.UTF_8);
> db.put(key, value);
> }
> } catch (Exception e) {
> logger.error("Error saving registered executors", e);
> }
> executors.put(fullId, executorInfo);
> }
> {code}
> There is a problem with disk1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org