You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "cen yuhai (JIRA)" <ji...@apache.org> on 2016/08/05 10:52:20 UTC

[jira] [Updated] (SPARK-16914) NodeManager crash when spark are registering executor infomartion into leveldb

     [ https://issues.apache.org/jira/browse/SPARK-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

cen yuhai updated SPARK-16914:
------------------------------
    Description: 
Stack: [0x00007fb5b53de000,0x00007fb5b54df000],  sp=0x00007fb5b54dcba8,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x896b1]  memcpy+0x11

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
j  org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
j  org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
J 8429 C2 org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V (100 bytes) @ 0x00007fb5f27ff6cc [0x00007fb5f27fdde0+0x18ec]
J 8371 C2 org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (10 bytes) @ 0x00007fb5f242df20 [0x00007fb5f242de80+0xa0]
J 6853 C2 io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (74 bytes) @ 0x00007fb5f215587c [0x00007fb5f21557e0+0x9c]
J 5872 C2 io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (42 bytes) @ 0x00007fb5f2183268 [0x00007fb5f2183100+0x168]
J 5849 C2 io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (158 bytes) @ 0x00007fb5f2191524 [0x00007fb5f218f5a0+0x1f84]
J 5941 C2 org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (170 bytes) @ 0x00007fb5f220a230 [0x00007fb5f2209fc0+0x270]
J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V (363 bytes) @ 0x00007fb5f264465c [0x00007fb5f2644140+0x51c]
J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 0x00007fb5f26f6764 [0x00007fb5f26f63c0+0x3a4]
j  io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub

```
The target code in spark is in ExternalShuffleBlockResolver
  /** Registers a new Executor with all the configuration we need to find its shuffle files. */
  public void registerExecutor(
      String appId,
      String execId,
      ExecutorShuffleInfo executorInfo) {
    AppExecId fullId = new AppExecId(appId, execId);
    logger.info("Registered executor {} with {}", fullId, executorInfo);
    try {
      if (db != null) {
        byte[] key = dbAppExecKey(fullId);
        byte[] value = mapper.writeValueAsString(executorInfo).getBytes(Charsets.UTF_8);
        db.put(key, value);
      }
    } catch (Exception e) {
      logger.error("Error saving registered executors", e);
    }
    executors.put(fullId, executorInfo);
  }

```

  was:
Stack: [0x00007fb5b53de000,0x00007fb5b54df000],  sp=0x00007fb5b54dcba8,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x896b1]  memcpy+0x11

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
j  org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
j  org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
J 8429 C2 org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V (100 bytes) @ 0x00007fb5f27ff6cc [0x00007fb5f27fdde0+0x18ec]
J 8371 C2 org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (10 bytes) @ 0x00007fb5f242df20 [0x00007fb5f242de80+0xa0]
J 6853 C2 io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (74 bytes) @ 0x00007fb5f215587c [0x00007fb5f21557e0+0x9c]
J 5872 C2 io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (42 bytes) @ 0x00007fb5f2183268 [0x00007fb5f2183100+0x168]
J 5849 C2 io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (158 bytes) @ 0x00007fb5f2191524 [0x00007fb5f218f5a0+0x1f84]
J 5941 C2 org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (170 bytes) @ 0x00007fb5f220a230 [0x00007fb5f2209fc0+0x270]
J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V (363 bytes) @ 0x00007fb5f264465c [0x00007fb5f2644140+0x51c]
J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 0x00007fb5f26f6764 [0x00007fb5f26f63c0+0x3a4]
j  io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub


The target code in spark is in ExternalShuffleBlockResolver
  /** Registers a new Executor with all the configuration we need to find its shuffle files. */
  public void registerExecutor(
      String appId,
      String execId,
      ExecutorShuffleInfo executorInfo) {
    AppExecId fullId = new AppExecId(appId, execId);
    logger.info("Registered executor {} with {}", fullId, executorInfo);
    try {
      if (db != null) {
        byte[] key = dbAppExecKey(fullId);
        byte[] value = mapper.writeValueAsString(executorInfo).getBytes(Charsets.UTF_8);
        db.put(key, value);
      }
    } catch (Exception e) {
      logger.error("Error saving registered executors", e);
    }
    executors.put(fullId, executorInfo);
  }




> NodeManager crash when spark are registering executor infomartion into leveldb
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-16914
>                 URL: https://issues.apache.org/jira/browse/SPARK-16914
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.6.2
>            Reporter: cen yuhai
>
> Stack: [0x00007fb5b53de000,0x00007fb5b54df000],  sp=0x00007fb5b54dcba8,  free space=1018k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> C  [libc.so.6+0x896b1]  memcpy+0x11
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
> j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
> j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
> j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
> j  org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
> j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
> j  org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
> J 8429 C2 org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V (100 bytes) @ 0x00007fb5f27ff6cc [0x00007fb5f27fdde0+0x18ec]
> J 8371 C2 org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (10 bytes) @ 0x00007fb5f242df20 [0x00007fb5f242de80+0xa0]
> J 6853 C2 io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (74 bytes) @ 0x00007fb5f215587c [0x00007fb5f21557e0+0x9c]
> J 5872 C2 io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (42 bytes) @ 0x00007fb5f2183268 [0x00007fb5f2183100+0x168]
> J 5849 C2 io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (158 bytes) @ 0x00007fb5f2191524 [0x00007fb5f218f5a0+0x1f84]
> J 5941 C2 org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (170 bytes) @ 0x00007fb5f220a230 [0x00007fb5f2209fc0+0x270]
> J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V (363 bytes) @ 0x00007fb5f264465c [0x00007fb5f2644140+0x51c]
> J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 0x00007fb5f26f6764 [0x00007fb5f26f63c0+0x3a4]
> j  io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> ```
> The target code in spark is in ExternalShuffleBlockResolver
>   /** Registers a new Executor with all the configuration we need to find its shuffle files. */
>   public void registerExecutor(
>       String appId,
>       String execId,
>       ExecutorShuffleInfo executorInfo) {
>     AppExecId fullId = new AppExecId(appId, execId);
>     logger.info("Registered executor {} with {}", fullId, executorInfo);
>     try {
>       if (db != null) {
>         byte[] key = dbAppExecKey(fullId);
>         byte[] value = mapper.writeValueAsString(executorInfo).getBytes(Charsets.UTF_8);
>         db.put(key, value);
>       }
>     } catch (Exception e) {
>       logger.error("Error saving registered executors", e);
>     }
>     executors.put(fullId, executorInfo);
>   }
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org