You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crail.apache.org by David Crespi <da...@storedgesystems.com> on 2019/06/27 23:09:26 UTC

Setting up storage class 1 and 2

Hi,
I’m trying to integrate the storage classes and I’m hitting another issue when running terasort and just
using the crail-shuffle with HDFS as the tmp storage.  The program just sits, after the following
message:
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser: closed
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining connections 0

During this run, I’ve removed the two crail nvmf (class 1 and 2) containers from the server, and I’m only running
the namenode and a rdma storage class 1 datanode.  My spark configuration is also now only looking at
the rdma class.  It looks as though it’s picking up the NVMf IP and port in the INFO messages seen below.
I must be configuring something wrong, but I’ve not been able to track it down.  Any thoughts?


************************************
         TeraSort
************************************
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
19/06/27 15:59:07 INFO SecurityManager: Changing view acls to: hduser
19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to: hduser
19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups to:
19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups to:
19/06/27 15:59:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hduser); groups with view permissions: Set(); users  with modify permissions: Set(hduser); groups with modify permissions: Set()
19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the default logging framework
19/06/27 15:59:08 DEBUG InternalThreadLocalMap: -Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
19/06/27 15:59:08 DEBUG InternalThreadLocalMap: -Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup: -Dio.netty.eventLoopThreads: 112
19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe: false
19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
19/06/27 15:59:08 DEBUG PlatformDependent0: sun.misc.Unsafe.theUnsafe: available
19/06/27 15:59:08 DEBUG PlatformDependent0: sun.misc.Unsafe.copyMemory: available
19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address: available
19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer constructor: available
19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned: available, true
19/06/27 15:59:08 DEBUG PlatformDependent0: jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable prior to Java9
19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.DirectByteBuffer.<init>(long, int): available
19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe: available
19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64 (sun.arch.data.model)
19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.noPreferDirect: false
19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.maxDirectMemory: 1029177344 bytes
19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.uninitializedArrayAllocationThreshold: -1
19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner(): available
19/06/27 15:59:08 DEBUG NioEventLoop: -Dio.netty.noKeySetOptimization: false
19/06/27 15:59:08 DEBUG NioEventLoop: -Dio.netty.selectorAutoRebuildThreshold: 512
19/06/27 15:59:08 DEBUG PlatformDependent: org.jctools-core.MpscChunkedArrayQueue: available
19/06/27 15:59:08 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.level: simple
19/06/27 15:59:08 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.targetRecords: 4
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numHeapArenas: 9
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numDirectArenas: 10
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.pageSize: 8192
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxOrder: 11
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.chunkSize: 16777216
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.tinyCacheSize: 512
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.smallCacheSize: 256
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.normalCacheSize: 64
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxCachedBufferCapacity: 32768
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.cacheTrimInterval: 8192
19/06/27 15:59:08 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.useCacheForAllThreads: true
19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236 (auto-detected)
19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses: false
19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo, 127.0.0.1)
19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId: 02:42:ac:ff:fe:1b:00:02 (auto-detected)
19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type: pooled
19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.threadLocalDirectBufferSize: 65536
19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.maxThreadLocalCharBufferSize: 16384
19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on port: 36915
19/06/27 15:59:08 INFO Utils: Successfully started service 'sparkDriver' on port 36915.
19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class org.apache.spark.serializer.KryoSerializer
19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
19/06/27 15:59:08 DEBUG OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui: SSLOptions{enabled=false, port=None, keyStore=None, keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, enabledAlgorithms=Set()}
19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.161:4040
19/06/27 15:59:08 INFO SparkContext: Added JAR file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar at spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar with timestamp 1561676348562
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://master:7077...
19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new connection to master/192.168.3.13:7077
19/06/27 15:59:08 DEBUG AbstractByteBuf: -Dio.netty.buffer.bytebuf.checkAccessible: true
19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to master/192.168.3.13:7077 successful, running bootstraps...
19/06/27 15:59:08 INFO TransportClientFactory: Successfully created connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in bootstraps)
19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.maxCapacityPerThread: 32768
19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.maxSharedCapacityFactor: 2
19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: 16
19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20190627155908-0005
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20190627155908-0005/0 on worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2 core(s)
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2 core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20190627155908-0005/1 on worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2 core(s)
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2 core(s), 1024.0 MB RAM
19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on port: 39189
19/06/27 15:59:08 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39189.
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20190627155908-0005/2 on worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2 core(s)
19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on master:39189
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2 core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20190627155908-0005/3 on worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2 core(s)
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2 core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20190627155908-0005/4 on worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2 core(s)
19/06/27 15:59:08 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2 core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20190627155908-0005/0 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20190627155908-0005/3 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20190627155908-0005/4 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20190627155908-0005/1 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20190627155908-0005/2 is now RUNNING
19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 39189, None)
19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for master
19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block manager master:39189 with 366.3 MB RAM, BlockManagerId(driver, master, 39189, None)
19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 39189, None)
19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 39189, None)
19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = false
19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
19/06/27 15:59:09 DEBUG Client: getting client out of cache: org.apache.hadoop.ipc.Client@3ed03652
19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit local reads and UNIX domain socket are disabled.
19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 288.9 KB, free 366.0 MB)
19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally took  115 ms
19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0 without replication took  117 ms
19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block broadcast_0_piece0
19/06/27 15:59:10 DEBUG BlockManager: Told master about block broadcast_0_piece0
19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0 locally took  6 ms
19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0_piece0 without replication took  6 ms
19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at TeraSort.scala:60
19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
19/06/27 15:59:10 DEBUG Client: Connecting to NameNode-1/192.168.3.7:54310
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser: starting, having connections 1
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser sending #0
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser got value #0
19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took 31ms
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser sending #1
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser got value #1
19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get FileStatuses: 134
19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process : 2
19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated by getSplits: 2, TimeTaken: 139
19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1; output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String, Boolean) constructor
19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0: masked=rwxr-xr-x
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser sending #2
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser got value #2
19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda: $anonfun$write$1
19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure ($anonfun$write$1) is now cleaned +++
19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at SparkHadoopWriter.scala:78
19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version 400
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose false
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart true
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer org.apache.spark.serializer.CrailSparkSerializer
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity true
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.outstanding 1
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.storageclass 0
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.broadcast.storageclass 0
19/06/27 15:59:10 INFO crail: creating singleton crail file system
19/06/27 15:59:10 INFO crail: crail.version 3101
19/06/27 15:59:10 INFO crail: crail.directorydepth 16
19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
19/06/27 15:59:10 INFO crail: crail.cachelimit 0
19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
19/06/27 15:59:10 INFO crail: crail.user crail
19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
19/06/27 15:59:10 INFO crail: crail.debug true
19/06/27 15:59:10 INFO crail: crail.statistics true
19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
19/06/27 15:59:10 INFO crail: crail.slicesize 65536
19/06/27 15:59:10 INFO crail: crail.singleton true
19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
19/06/27 15:59:10 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache
19/06/27 15:59:10 INFO crail: crail.locationmap
19/06/27 15:59:10 INFO crail: crail.namenode.address crail://192.168.1.164:9060
19/06/27 15:59:10 INFO crail: crail.namenode.blockselection roundrobin
19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
19/06/27 15:59:10 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode
19/06/27 15:59:10 INFO crail: crail.namenode.log
19/06/27 15:59:10 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier
19/06/27 15:59:10 INFO crail: crail.storage.classes 1
19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0, bufferCount 1024
19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit 4294967296
19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize 1073741824
19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath /dev/hugepages/rdma
19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true
19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
19/06/27 15:59:10 INFO crail: connected to namenode(s) /192.168.1.164:9060
19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
19/06/27 15:59:10 INFO crail: createNode: name /spark, type DIRECTORY, storageAffinity 0, locationAffinity 0
19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0, streamId 1, isDir true, writeHint 0
19/06/27 15:59:10 INFO crail: passive data client
19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
19/06/27 15:59:10 INFO disni: jverbs jni version 32
19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs size 28, native size 16
19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32, native size 32
19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size 72, native size 128
19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48, native size 48
19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16, native size 16
19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size 40, native size 40
19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48, native size 48
19/06/27 15:59:10 INFO disni: createEventChannel, objId 139811924587312
19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32, maxSge 4, cqSize 64
19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
19/06/27 15:59:10 INFO disni: createId, id 139811924676432
19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
19/06/27 15:59:10 INFO disni: resolveAddr, addres /192.168.3.100:4420
19/06/27 15:59:10 INFO disni: resolveRoute, id 0
19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
19/06/27 15:59:10 INFO disni: setting up protection domain, context 467, pd 1
19/06/27 15:59:10 INFO disni: setting up cq processor
19/06/27 15:59:10 INFO disni: new endpoint CQ processor
19/06/27 15:59:10 INFO disni: createCompChannel, context 139810647883744
19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe 64
19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192, send_wr size 32, recv_wr_size 32
19/06/27 15:59:10 INFO disni: connect, id 0
19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress /192.168.3.13:43273, dstAddress /192.168.3.100:4420
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.11:35854) with ID 0
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.12:44312) with ID 1
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.8:34774) with ID 4
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.9:58808) with ID 2
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.11
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0, 192.168.3.11, 41919, None)
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.12
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1, 192.168.3.12, 46697, None)
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.8
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4, 192.168.3.8, 37281, None)
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.9
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2, 192.168.3.9, 43857, None)
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.10:40100) with ID 3
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.10
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3, 192.168.3.10, 38527, None)
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser: closed
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining connections 0


Regards,

           David


RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
This morning I only started the rdma datanode, and only enabled the namenode with the rdma storage Tier.

It looks like it found the right combination now for IP/port, but it still hangs at the exact same place.



Could this be the same problem as the other one I’ve been having (daemonization of a thread)?  I’ve included

a stack trace of this hang as well (below).



19/06/28 07:38:21 INFO disni: got event type + UNKNOWN, srcAddress /192.168.3.13:53911, dstAddress /192.168.3.100:50020

19/06/28 07:38:21 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.9:57008) with ID 2

19/06/28 07:38:21 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.10:60204) with ID 3

19/06/28 07:38:21 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.12:43740) with ID 1

19/06/28 07:38:21 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.11:39758) with ID 0

19/06/28 07:38:21 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.3.8:52590) with ID 4

19/06/28 07:38:21 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.9

19/06/28 07:38:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.9:33673 with 366.3 MB RAM, BlockManagerId(2, 192.168.3.9, 33673, None)

19/06/28 07:38:21 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.12

19/06/28 07:38:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.12:46629 with 366.3 MB RAM, BlockManagerId(1, 192.168.3.12, 46629, None)

19/06/28 07:38:21 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.10

19/06/28 07:38:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.10:41079 with 366.3 MB RAM, BlockManagerId(3, 192.168.3.10, 41079, None)

19/06/28 07:38:21 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.11

19/06/28 07:38:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.11:44679 with 366.3 MB RAM, BlockManagerId(0, 192.168.3.11, 44679, None)

19/06/28 07:38:21 DEBUG DefaultTopologyMapper: Got a request for 192.168.3.8

19/06/28 07:38:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.8:45991 with 366.3 MB RAM, BlockManagerId(4, 192.168.3.8, 45991, None)

19/06/28 07:38:30 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser: closed

19/06/28 07:38:30 DEBUG Client: IPC Client (1998371610) connection to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining connections 0





jstack 712

2019-06-28 07:41:54

Full thread dump OpenJDK 64-Bit Server VM (25.212-b03 mixed mode):



"Attach Listener" #132 daemon prio=9 os_prio=0 tid=0x00007fe630001000 nid=0x43f waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"SparkUI-131" #131 daemon prio=5 os_prio=0 tid=0x00007fe460001000 nid=0x3b4 waiting on condition [0x00007fe6911c8000]

   java.lang.Thread.State: TIMED_WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c0487220> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at org.spark_project.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-8" #128 daemon prio=5 os_prio=0 tid=0x00007fe4fc00f800 nid=0x3ae runnable [0x00007fe44d8d2000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c048cf28> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c048e028> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c048df50> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-7" #127 daemon prio=5 os_prio=0 tid=0x00007fe4fc00e000 nid=0x3ad runnable [0x00007fe44d9d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c0490888> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c1782ae8> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c1782a10> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-6" #126 daemon prio=5 os_prio=0 tid=0x00007fe4fc00c000 nid=0x3ac runnable [0x00007fe44fad6000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c1785348> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c1786448> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c1786370> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-5" #125 daemon prio=5 os_prio=0 tid=0x00007fe4fc00a800 nid=0x3ab runnable [0x00007fe44fbd7000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c1788ca8> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c1825b38> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c1825a60> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"Thread-19" #124 daemon prio=5 os_prio=0 tid=0x00007fe4a81b4800 nid=0x3aa runnable [0x00007fe454106000]

   java.lang.Thread.State: RUNNABLE

        at com.ibm.disni.verbs.impl.NativeDispatcher._getCmEvent(Native Method)

        at com.ibm.disni.verbs.impl.RdmaCmNat.getCmEvent(RdmaCmNat.java:193)

        at com.ibm.disni.verbs.RdmaEventChannel.getCmEvent(RdmaEventChannel.java:75)

        at com.ibm.disni.RdmaCmProcessor.run(RdmaCmProcessor.java:69)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-4" #123 daemon prio=5 os_prio=0 tid=0x00007fe4fc008800 nid=0x3a9 runnable [0x00007fe49075d000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c1828398> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c1829498> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c18293c0> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"threadDeathWatcher-6-1" #122 daemon prio=1 os_prio=0 tid=0x00007fe464003800 nid=0x3a8 waiting on condition [0x00007fe49185f000]

   java.lang.Thread.State: TIMED_WAITING (sleeping)

        at java.lang.Thread.sleep(Native Method)

        at io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:152)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-3" #121 daemon prio=5 os_prio=0 tid=0x00007fe4fc007800 nid=0x3a7 runnable [0x00007fe491960000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c1940730> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c1941830> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c1941758> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-2" #120 daemon prio=5 os_prio=0 tid=0x00007fe4fc005800 nid=0x3a6 runnable [0x00007fe491a61000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c1928738> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c1929838> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c1929760> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #117 daemon prio=5 os_prio=0 tid=0x00007fe474163000 nid=0x3a3 in Object.wait() [0x00007fe6901be000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        - waiting on <0x00000000ff70c888> (a java.lang.ref.ReferenceQueue$Lock)

        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)

        - locked <0x00000000ff70c888> (a java.lang.ref.ReferenceQueue$Lock)

        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)

        at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3063)

        at java.lang.Thread.run(Thread.java:748)



"element-tracking-store-worker" #116 daemon prio=5 os_prio=0 tid=0x00007fe478012800 nid=0x3a2 waiting on condition [0x00007fe6904bf000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c025d728> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"spark-listener-group-executorManagement" #102 daemon prio=5 os_prio=0 tid=0x00007fe6fc6ed800 nid=0x3a1 waiting on condition [0x00007fe6905c0000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c0461920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:97)

        at org.apache.spark.scheduler.AsyncEventQueue$$Lambda$528/1418119693.apply$mcJ$sp(Unknown Source)

        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)

        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)

        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)

        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:83)

        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2$$Lambda$527/1354319042.apply$mcV$sp(Unknown Source)

        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)

        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:83)



"spark-listener-group-appStatus" #25 daemon prio=5 os_prio=0 tid=0x00007fe6fc6ec000 nid=0x3a0 waiting on condition [0x00007fe6906c1000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c04624e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:97)

        at org.apache.spark.scheduler.AsyncEventQueue$$Lambda$528/1418119693.apply$mcJ$sp(Unknown Source)

        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)

        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)

        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)

        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:83)

        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2$$Lambda$527/1354319042.apply$mcV$sp(Unknown Source)

        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)

        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:83)



"context-cleaner-periodic-gc" #115 daemon prio=5 os_prio=0 tid=0x00007fe6fcb6a000 nid=0x39f waiting on condition [0x00007fe6907c2000]

   java.lang.Thread.State: TIMED_WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000ff758a28> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"Spark Context Cleaner" #114 daemon prio=5 os_prio=0 tid=0x00007fe6fcb68000 nid=0x39e in Object.wait() [0x00007fe6908c3000]

   java.lang.Thread.State: TIMED_WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)

        - locked <0x00000000ff6f09c8> (a java.lang.ref.ReferenceQueue$Lock)

        at org.apache.spark.ContextCleaner.$anonfun$keepCleaning$1(ContextCleaner.scala:181)

        at org.apache.spark.ContextCleaner$$Lambda$523/1657875753.apply$mcV$sp(Unknown Source)

        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)

        at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:179)

        at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)



"shuffle-server-5-1" #113 daemon prio=5 os_prio=0 tid=0x00007fe6ff4d3000 nid=0x39d runnable [0x00007fe690bc4000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c046ecf8> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c046ed10> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c046ecb0> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"rpc-client-1-1" #112 daemon prio=5 os_prio=0 tid=0x00007fe498005800 nid=0x39b runnable [0x00007fe690ec5000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c0464cf0> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c0464d08> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c0464ca8> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"appclient-registration-retry-thread" #110 daemon prio=5 os_prio=0 tid=0x00007fe5b4003800 nid=0x399 waiting on condition [0x00007fe6910c7000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c050cd28> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"driver-revive-thread" #108 daemon prio=5 os_prio=0 tid=0x00007fe5c8001800 nid=0x397 waiting on condition [0x00007fe6912c9000]

   java.lang.Thread.State: TIMED_WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c048a1f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dag-scheduler-event-loop" #107 daemon prio=5 os_prio=0 tid=0x00007fe6ff4b6800 nid=0x396 in Object.wait() [0x00007fe6913c9000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        - waiting on <0x00000000ff7b2150> (a org.apache.crail.storage.rdma.client.RdmaStoragePassiveEndpoint)

        at java.lang.Object.wait(Object.java:502)

        at com.ibm.disni.RdmaEndpoint.connect(RdmaEndpoint.java:128)

        - locked <0x00000000ff7b2150> (a org.apache.crail.storage.rdma.client.RdmaStoragePassiveEndpoint)

        at org.apache.crail.storage.rdma.client.RdmaStoragePassiveGroup.createEndpoint(RdmaStoragePassiveGroup.java:68)

        at org.apache.crail.storage.rdma.client.RdmaStoragePassiveGroup.createEndpoint(RdmaStoragePassiveGroup.java:51)

        at org.apache.crail.storage.rdma.RdmaStorageClient.createEndpoint(RdmaStorageClient.java:83)

        at org.apache.crail.utils.EndpointCache$StorageEndpointCache.getDataEndpoint(EndpointCache.java:130)

        - locked <0x00000000ff7b2258> (a java.lang.Object)

        at org.apache.crail.utils.EndpointCache.getDataEndpoint(EndpointCache.java:69)

        at org.apache.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:230)

        at org.apache.crail.core.CoreStream.dataOperation(CoreStream.java:142)

        at org.apache.crail.core.CoreOutputStream.write(CoreOutputStream.java:63)

        at org.apache.crail.core.DirectoryOutputStream.writeRecord(DirectoryOutputStream.java:50)

        at org.apache.crail.core.CoreDataStore.getSyncOperation(CoreDataStore.java:647)

        at org.apache.crail.core.CoreDataStore._delete(CoreDataStore.java:335)

        at org.apache.crail.core.DeleteNodeFuture.process(CoreMetaDataOperation.java:210)

        at org.apache.crail.core.DeleteNodeFuture.process(CoreMetaDataOperation.java:196)

        at org.apache.crail.core.CoreMetaDataOperation.get(CoreMetaDataOperation.java:84)

        at org.apache.spark.storage.CrailDispatcher.org$apache$spark$storage$CrailDispatcher$$init(CrailDispatcher.scala:136)

        at org.apache.spark.storage.CrailDispatcher$.get(CrailDispatcher.scala:662)

        - locked <0x00000000ff9a53a0> (a java.lang.Object)

        at org.apache.spark.shuffle.crail.CrailShuffleManager.registerShuffle(CrailShuffleManager.scala:52)

        at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94)

        at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87)

        at org.apache.spark.rdd.RDD.$anonfun$dependencies$2(RDD.scala:240)

        at org.apache.spark.rdd.RDD$$Lambda$805/2019083900.apply(Unknown Source)

        at scala.Option.getOrElse(Option.scala:138)

        at org.apache.spark.rdd.RDD.dependencies(RDD.scala:238)

        at org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:512)

        at org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:461)

        at org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:448)

        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:962)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)

        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)



"netty-rpc-env-timeout" #106 daemon prio=5 os_prio=0 tid=0x00007fe4b400c000 nid=0x395 waiting on condition [0x00007fe6916cb000]

   java.lang.Thread.State: TIMED_WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c04bedd8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"Timer-1" #105 daemon prio=5 os_prio=0 tid=0x00007fe6ff46c000 nid=0x394 in Object.wait() [0x00007fe6917cc000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        - waiting on <0x00000000c054c030> (a java.util.TaskQueue)

        at java.lang.Object.wait(Object.java:502)

        at java.util.TimerThread.mainLoop(Timer.java:526)

        - locked <0x00000000c054c030> (a java.util.TaskQueue)

        at java.util.TimerThread.run(Timer.java:505)



"Timer-0" #104 daemon prio=5 os_prio=0 tid=0x00007fe6ff46a000 nid=0x393 in Object.wait() [0x00007fe6918cd000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        - waiting on <0x00000000c052b400> (a java.util.TaskQueue)

        at java.lang.Object.wait(Object.java:502)

        at java.util.TimerThread.mainLoop(Timer.java:526)

        - locked <0x00000000c052b400> (a java.util.TaskQueue)

        at java.util.TimerThread.run(Timer.java:505)



"heartbeat-receiver-event-loop-thread" #103 daemon prio=5 os_prio=0 tid=0x00007fe5d0003800 nid=0x392 waiting on condition [0x00007fe691acf000]

   java.lang.Thread.State: TIMED_WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c05324c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)

        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-99-acceptor-3@162ca734-Spark@7a8fa7ef{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #99 daemon prio=3 os_prio=0 tid=0x00007fe6ff3c5800 nid=0x38f waiting for monitor entry [0x00007fe691bd0000]

   java.lang.Thread.State: BLOCKED (on object monitor)

        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)

        - waiting to lock <0x00000000c053a468> (a java.lang.Object)

        at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:397)

        at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-98-acceptor-2@51df0104-Spark@7a8fa7ef{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #98 daemon prio=3 os_prio=0 tid=0x00007fe6ff3c4000 nid=0x38e waiting for monitor entry [0x00007fe691cd1000]

   java.lang.Thread.State: BLOCKED (on object monitor)

        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)

        - waiting to lock <0x00000000c053a468> (a java.lang.Object)

        at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:397)

        at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-97-acceptor-1@62473a9b-ServerConnector@7a8fa7ef{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #97 daemon prio=3 os_prio=0 tid=0x00007fe6ff3c2000 nid=0x38d waiting for monitor entry [0x00007fe691dd2000]

   java.lang.Thread.State: BLOCKED (on object monitor)

        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)

        - waiting to lock <0x00000000c053a468> (a java.lang.Object)

        at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:397)

        at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-96-acceptor-0@74484716-ServerConnector@7a8fa7ef{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #96 daemon prio=3 os_prio=0 tid=0x00007fe6ff3c0800 nid=0x38c runnable [0x00007fe691ed3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)

        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)

        - locked <0x00000000c053a468> (a java.lang.Object)

        at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:397)

        at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-95" #95 daemon prio=5 os_prio=0 tid=0x00007fe6ff3be800 nid=0x38b runnable [0x00007fe691fd4000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c054a378> (a sun.nio.ch.Util$3)

        - locked <0x00000000c054a388> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c054a330> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:243)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:191)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:249)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.execute(ExecuteProduceConsume.java:100)

        at org.spark_project.jetty.io.ManagedSelector.run(ManagedSelector.java:147)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-94" #94 daemon prio=5 os_prio=0 tid=0x00007fe6ff3bd000 nid=0x38a runnable [0x00007fe6920d5000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c0496930> (a sun.nio.ch.Util$3)

        - locked <0x00000000c0496940> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c04968e8> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:243)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:191)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:249)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.execute(ExecuteProduceConsume.java:100)

        at org.spark_project.jetty.io.ManagedSelector.run(ManagedSelector.java:147)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-93" #93 daemon prio=5 os_prio=0 tid=0x00007fe6ff3bb800 nid=0x389 runnable [0x00007fe6921d6000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c0552628> (a sun.nio.ch.Util$3)

        - locked <0x00000000c0552638> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c05525e0> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:243)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:191)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:249)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.execute(ExecuteProduceConsume.java:100)

        at org.spark_project.jetty.io.ManagedSelector.run(ManagedSelector.java:147)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"SparkUI-92" #92 daemon prio=5 os_prio=0 tid=0x00007fe6ff3b8000 nid=0x388 runnable [0x00007fe6922d7000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c055a5f8> (a sun.nio.ch.Util$3)

        - locked <0x00000000c055a608> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c055a5b0> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:243)

        at org.spark_project.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:191)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:249)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)

        at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.execute(ExecuteProduceConsume.java:100)

        at org.spark_project.jetty.io.ManagedSelector.run(ManagedSelector.java:147)

        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)



"RemoteBlock-temp-file-clean-thread" #91 daemon prio=5 os_prio=0 tid=0x00007fe6ff2b5000 nid=0x387 in Object.wait() [0x00007fe6925d8000]

   java.lang.Thread.State: TIMED_WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)

        - locked <0x00000000c0562538> (a java.lang.ref.ReferenceQueue$Lock)

        at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1724)

        at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$2.run(BlockManager.scala:1692)



"map-output-dispatcher-7" #90 daemon prio=5 os_prio=0 tid=0x00007fe6ff25f800 nid=0x386 waiting on condition [0x00007fe6928d9000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"map-output-dispatcher-6" #89 daemon prio=5 os_prio=0 tid=0x00007fe6ff25d800 nid=0x385 waiting on condition [0x00007fe6929da000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"map-output-dispatcher-5" #88 daemon prio=5 os_prio=0 tid=0x00007fe6ff25c000 nid=0x384 waiting on condition [0x00007fe692adb000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"map-output-dispatcher-4" #87 daemon prio=5 os_prio=0 tid=0x00007fe6ff262000 nid=0x383 waiting on condition [0x00007fe692bdc000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"map-output-dispatcher-3" #86 daemon prio=5 os_prio=0 tid=0x00007fe6ff261000 nid=0x382 waiting on condition [0x00007fe692cdd000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"map-output-dispatcher-2" #85 daemon prio=5 os_prio=0 tid=0x00007fe6ff252000 nid=0x381 waiting on condition [0x00007fe692dde000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"map-output-dispatcher-1" #84 daemon prio=5 os_prio=0 tid=0x00007fe6ff263000 nid=0x380 waiting on condition [0x00007fe692edf000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"map-output-dispatcher-0" #83 daemon prio=5 os_prio=0 tid=0x00007fe6ff256000 nid=0x37f waiting on condition [0x00007fe692fe0000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c054c920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:384)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"rpc-server-3-1" #82 daemon prio=5 os_prio=0 tid=0x00007fe6ff213000 nid=0x37e runnable [0x00007fe6930e1000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000000c048a730> (a io.netty.channel.nio.SelectedSelectionKeySet)

        - locked <0x00000000c048a748> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000000c048a6e8> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)

        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)

        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-55" #81 daemon prio=5 os_prio=0 tid=0x00007fe6fefb6000 nid=0x37d waiting on condition [0x00007fe6935f4000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-54" #80 daemon prio=5 os_prio=0 tid=0x00007fe6fefb4000 nid=0x37c waiting on condition [0x00007fe6936f5000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-53" #79 daemon prio=5 os_prio=0 tid=0x00007fe6fefb2000 nid=0x37b waiting on condition [0x00007fe6937f6000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-52" #78 daemon prio=5 os_prio=0 tid=0x00007fe6fefb0800 nid=0x37a waiting on condition [0x00007fe6938f7000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-51" #77 daemon prio=5 os_prio=0 tid=0x00007fe6fefae800 nid=0x379 waiting on condition [0x00007fe6939f8000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-50" #76 daemon prio=5 os_prio=0 tid=0x00007fe6fefac800 nid=0x378 waiting on condition [0x00007fe693af9000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-49" #75 daemon prio=5 os_prio=0 tid=0x00007fe6fefab000 nid=0x377 waiting on condition [0x00007fe693bfa000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-48" #74 daemon prio=5 os_prio=0 tid=0x00007fe6fefa9800 nid=0x376 waiting on condition [0x00007fe693cfb000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-47" #73 daemon prio=5 os_prio=0 tid=0x00007fe6fefa7800 nid=0x375 waiting on condition [0x00007fe693dfc000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-46" #72 daemon prio=5 os_prio=0 tid=0x00007fe6fefa6000 nid=0x374 waiting on condition [0x00007fe693efd000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-45" #71 daemon prio=5 os_prio=0 tid=0x00007fe6fefa4000 nid=0x373 waiting on condition [0x00007fe693ffe000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-44" #70 daemon prio=5 os_prio=0 tid=0x00007fe6fefa2800 nid=0x372 waiting on condition [0x00007fe6981ef000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-43" #69 daemon prio=5 os_prio=0 tid=0x00007fe6fefa1000 nid=0x371 waiting on condition [0x00007fe6982f0000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-42" #68 daemon prio=5 os_prio=0 tid=0x00007fe6fef9f000 nid=0x370 waiting on condition [0x00007fe6983f1000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-41" #67 daemon prio=5 os_prio=0 tid=0x00007fe6fef9d800 nid=0x36f waiting on condition [0x00007fe6984f2000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-40" #66 daemon prio=5 os_prio=0 tid=0x00007fe6fef9b800 nid=0x36e waiting on condition [0x00007fe6985f3000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-39" #65 daemon prio=5 os_prio=0 tid=0x00007fe6fef9a000 nid=0x36d waiting on condition [0x00007fe6986f4000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-38" #64 daemon prio=5 os_prio=0 tid=0x00007fe6fef98000 nid=0x36c waiting on condition [0x00007fe6987f5000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-37" #63 daemon prio=5 os_prio=0 tid=0x00007fe6fef96800 nid=0x36b waiting on condition [0x00007fe6988f6000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-36" #62 daemon prio=5 os_prio=0 tid=0x00007fe6fef95000 nid=0x36a waiting on condition [0x00007fe6989f7000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-35" #61 daemon prio=5 os_prio=0 tid=0x00007fe6fef93000 nid=0x369 waiting on condition [0x00007fe698af8000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-34" #60 daemon prio=5 os_prio=0 tid=0x00007fe6fef91800 nid=0x368 waiting on condition [0x00007fe698bf9000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-33" #59 daemon prio=5 os_prio=0 tid=0x00007fe6fef8f800 nid=0x367 waiting on condition [0x00007fe698cfa000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-32" #58 daemon prio=5 os_prio=0 tid=0x00007fe6fef8e000 nid=0x366 waiting on condition [0x00007fe698dfb000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-31" #57 daemon prio=5 os_prio=0 tid=0x00007fe6fef8c000 nid=0x365 waiting on condition [0x00007fe698efc000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-30" #56 daemon prio=5 os_prio=0 tid=0x00007fe6fef8a800 nid=0x364 waiting on condition [0x00007fe698ffd000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-29" #55 daemon prio=5 os_prio=0 tid=0x00007fe6fef89000 nid=0x363 waiting on condition [0x00007fe6a01ab000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-28" #54 daemon prio=5 os_prio=0 tid=0x00007fe6fef87000 nid=0x362 waiting on condition [0x00007fe6a02ac000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-27" #53 daemon prio=5 os_prio=0 tid=0x00007fe6fef85800 nid=0x361 waiting on condition [0x00007fe6a03ad000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-26" #52 daemon prio=5 os_prio=0 tid=0x00007fe6fef83800 nid=0x360 waiting on condition [0x00007fe6a04ae000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-25" #51 daemon prio=5 os_prio=0 tid=0x00007fe6fef82000 nid=0x35f waiting on condition [0x00007fe6a05af000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-24" #50 daemon prio=5 os_prio=0 tid=0x00007fe6fef80800 nid=0x35e waiting on condition [0x00007fe6a06b0000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-23" #49 daemon prio=5 os_prio=0 tid=0x00007fe6fef7e800 nid=0x35d waiting on condition [0x00007fe6a07b1000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-22" #48 daemon prio=5 os_prio=0 tid=0x00007fe6fef7d000 nid=0x35c waiting on condition [0x00007fe6a08b2000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-21" #47 daemon prio=5 os_prio=0 tid=0x00007fe6fef7b000 nid=0x35b waiting on condition [0x00007fe6a09b3000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-20" #46 daemon prio=5 os_prio=0 tid=0x00007fe6fef79800 nid=0x35a waiting on condition [0x00007fe6a0ab4000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-19" #45 daemon prio=5 os_prio=0 tid=0x00007fe6fef77800 nid=0x359 waiting on condition [0x00007fe6a0bb5000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-18" #44 daemon prio=5 os_prio=0 tid=0x00007fe6fef76000 nid=0x358 waiting on condition [0x00007fe6a0cb6000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-17" #43 daemon prio=5 os_prio=0 tid=0x00007fe6fef74800 nid=0x357 waiting on condition [0x00007fe6a0db7000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-16" #42 daemon prio=5 os_prio=0 tid=0x00007fe6fef72800 nid=0x356 waiting on condition [0x00007fe6a0eb8000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-15" #41 daemon prio=5 os_prio=0 tid=0x00007fe6fef71000 nid=0x355 waiting on condition [0x00007fe6a0fb9000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-14" #40 daemon prio=5 os_prio=0 tid=0x00007fe6fef6f000 nid=0x354 waiting on condition [0x00007fe6a10ba000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-13" #39 daemon prio=5 os_prio=0 tid=0x00007fe6fef6d800 nid=0x353 waiting on condition [0x00007fe6a11bb000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-12" #38 daemon prio=5 os_prio=0 tid=0x00007fe6fef6b800 nid=0x352 waiting on condition [0x00007fe6a12bc000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-11" #37 daemon prio=5 os_prio=0 tid=0x00007fe6fef6a000 nid=0x351 waiting on condition [0x00007fe6a13bd000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-10" #36 daemon prio=5 os_prio=0 tid=0x00007fe6fef68800 nid=0x350 waiting on condition [0x00007fe6a14be000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-9" #35 daemon prio=5 os_prio=0 tid=0x00007fe6fef66800 nid=0x34f waiting on condition [0x00007fe6a15bf000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-8" #34 daemon prio=5 os_prio=0 tid=0x00007fe6fef65000 nid=0x34e waiting on condition [0x00007fe6a16c0000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-7" #33 daemon prio=5 os_prio=0 tid=0x00007fe6fef63000 nid=0x34d waiting on condition [0x00007fe6a17c1000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-6" #32 daemon prio=5 os_prio=0 tid=0x00007fe6fef61800 nid=0x34c waiting on condition [0x00007fe6a18c2000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-5" #31 daemon prio=5 os_prio=0 tid=0x00007fe6fef5f800 nid=0x34b waiting on condition [0x00007fe6a19c3000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-4" #30 daemon prio=5 os_prio=0 tid=0x00007fe6fef5e000 nid=0x34a waiting on condition [0x00007fe6a1ac4000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-3" #29 daemon prio=5 os_prio=0 tid=0x00007fe6fef5a000 nid=0x349 waiting on condition [0x00007fe6a1bc5000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-2" #28 daemon prio=5 os_prio=0 tid=0x00007fe6fef58800 nid=0x348 waiting on condition [0x00007fe6a1cc6000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-1" #27 daemon prio=5 os_prio=0 tid=0x00007fe6fef57000 nid=0x347 waiting on condition [0x00007fe6a1dc7000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"dispatcher-event-loop-0" #26 daemon prio=5 os_prio=0 tid=0x00007fe6fef56000 nid=0x346 waiting on condition [0x00007fe6a252c000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000c045d790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)



"Service Thread" #20 daemon prio=9 os_prio=0 tid=0x00007fe6fc12e800 nid=0x341 runnable [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C1 CompilerThread14" #19 daemon prio=9 os_prio=0 tid=0x00007fe6fc123800 nid=0x340 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C1 CompilerThread13" #18 daemon prio=9 os_prio=0 tid=0x00007fe6fc121800 nid=0x33f waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C1 CompilerThread12" #17 daemon prio=9 os_prio=0 tid=0x00007fe6fc11f800 nid=0x33e waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C1 CompilerThread11" #16 daemon prio=9 os_prio=0 tid=0x00007fe6fc11d800 nid=0x33d waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C1 CompilerThread10" #15 daemon prio=9 os_prio=0 tid=0x00007fe6fc11b800 nid=0x33c waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread9" #14 daemon prio=9 os_prio=0 tid=0x00007fe6fc119800 nid=0x33b waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread8" #13 daemon prio=9 os_prio=0 tid=0x00007fe6fc117000 nid=0x33a waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread7" #12 daemon prio=9 os_prio=0 tid=0x00007fe6fc115000 nid=0x339 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread6" #11 daemon prio=9 os_prio=0 tid=0x00007fe6fc113000 nid=0x338 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread5" #10 daemon prio=9 os_prio=0 tid=0x00007fe6fc110800 nid=0x337 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread4" #9 daemon prio=9 os_prio=0 tid=0x00007fe6fc106800 nid=0x336 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007fe6fc104800 nid=0x335 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007fe6fc100000 nid=0x334 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fe6fc0fe800 nid=0x333 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fe6fc0fb800 nid=0x332 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fe6fc0f9800 nid=0x331 runnable [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE



"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fe6fc0d2800 nid=0x330 in Object.wait() [0x00007fe6a92d1000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        - waiting on <0x00000000c001f4a8> (a java.lang.ref.ReferenceQueue$Lock)

        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)

        - locked <0x00000000c001f4a8> (a java.lang.ref.ReferenceQueue$Lock)

        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)

        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)



"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fe6fc0d0000 nid=0x32f in Object.wait() [0x00007fe6a93d2000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        - waiting on <0x00000000c002ead8> (a java.lang.ref.Reference$Lock)

        at java.lang.Object.wait(Object.java:502)

        at java.lang.ref.Reference.tryHandlePending(Reference.java:191)

        - locked <0x00000000c002ead8> (a java.lang.ref.Reference$Lock)

        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)



"main" #1 prio=5 os_prio=0 tid=0x00007fe6fc013000 nid=0x307 waiting on condition [0x00007fe7050e9000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000000ff90fd18> (a scala.concurrent.impl.Promise$CompletionLatch)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)

        at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242)

        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258)

        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187)

        at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:243)

        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:728)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)

        at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)

        at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopDataset$1(PairRDDFunctions.scala:1083)

        at org.apache.spark.rdd.PairRDDFunctions$$Lambda$787/751460639.apply$mcV$sp(Unknown Source)

        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)

        at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopFile$2(PairRDDFunctions.scala:1000)

        at org.apache.spark.rdd.PairRDDFunctions$$Lambda$786/1653309853.apply$mcV$sp(Unknown Source)

        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:991)

        at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopFile$1(PairRDDFunctions.scala:979)

        at org.apache.spark.rdd.PairRDDFunctions$$Lambda$785/500096147.apply$mcV$sp(Unknown Source)

        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:979)

        at com.github.ehiggs.spark.terasort.TeraSort$.main(TeraSort.scala:63)

        at com.github.ehiggs.spark.terasort.TeraSort.main(TeraSort.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)

        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)

        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)

        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)

        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



"VM Thread" os_prio=0 tid=0x00007fe6fc0c6000 nid=0x32e runnable



"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fe6fc029000 nid=0x308 runnable



"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fe6fc02a800 nid=0x309 runnable



"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fe6fc02c800 nid=0x30a runnable



"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fe6fc02e000 nid=0x30b runnable



"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fe6fc030000 nid=0x30c runnable



"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fe6fc031800 nid=0x30d runnable



"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fe6fc033800 nid=0x30e runnable



"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fe6fc035000 nid=0x30f runnable



"GC task thread#8 (ParallelGC)" os_prio=0 tid=0x00007fe6fc037000 nid=0x310 runnable



"GC task thread#9 (ParallelGC)" os_prio=0 tid=0x00007fe6fc038800 nid=0x311 runnable



"GC task thread#10 (ParallelGC)" os_prio=0 tid=0x00007fe6fc03a800 nid=0x312 runnable



"GC task thread#11 (ParallelGC)" os_prio=0 tid=0x00007fe6fc03c000 nid=0x313 runnable



"GC task thread#12 (ParallelGC)" os_prio=0 tid=0x00007fe6fc03e000 nid=0x314 runnable



"GC task thread#13 (ParallelGC)" os_prio=0 tid=0x00007fe6fc03f800 nid=0x315 runnable



"GC task thread#14 (ParallelGC)" os_prio=0 tid=0x00007fe6fc041800 nid=0x316 runnable



"GC task thread#15 (ParallelGC)" os_prio=0 tid=0x00007fe6fc043000 nid=0x317 runnable



"GC task thread#16 (ParallelGC)" os_prio=0 tid=0x00007fe6fc045000 nid=0x318 runnable



"GC task thread#17 (ParallelGC)" os_prio=0 tid=0x00007fe6fc046800 nid=0x319 runnable



"GC task thread#18 (ParallelGC)" os_prio=0 tid=0x00007fe6fc048800 nid=0x31a runnable



"GC task thread#19 (ParallelGC)" os_prio=0 tid=0x00007fe6fc04a000 nid=0x31b runnable



"GC task thread#20 (ParallelGC)" os_prio=0 tid=0x00007fe6fc04c000 nid=0x31c runnable



"GC task thread#21 (ParallelGC)" os_prio=0 tid=0x00007fe6fc04d800 nid=0x31d runnable



"GC task thread#22 (ParallelGC)" os_prio=0 tid=0x00007fe6fc04f800 nid=0x31e runnable



"GC task thread#23 (ParallelGC)" os_prio=0 tid=0x00007fe6fc051000 nid=0x31f runnable



"GC task thread#24 (ParallelGC)" os_prio=0 tid=0x00007fe6fc053000 nid=0x320 runnable



"GC task thread#25 (ParallelGC)" os_prio=0 tid=0x00007fe6fc054800 nid=0x321 runnable



"GC task thread#26 (ParallelGC)" os_prio=0 tid=0x00007fe6fc056800 nid=0x322 runnable



"GC task thread#27 (ParallelGC)" os_prio=0 tid=0x00007fe6fc058000 nid=0x323 runnable



"GC task thread#28 (ParallelGC)" os_prio=0 tid=0x00007fe6fc05a000 nid=0x324 runnable



"GC task thread#29 (ParallelGC)" os_prio=0 tid=0x00007fe6fc05b800 nid=0x325 runnable



"GC task thread#30 (ParallelGC)" os_prio=0 tid=0x00007fe6fc05d800 nid=0x326 runnable



"GC task thread#31 (ParallelGC)" os_prio=0 tid=0x00007fe6fc05f000 nid=0x327 runnable



"GC task thread#32 (ParallelGC)" os_prio=0 tid=0x00007fe6fc061000 nid=0x328 runnable



"GC task thread#33 (ParallelGC)" os_prio=0 tid=0x00007fe6fc062800 nid=0x329 runnable



"GC task thread#34 (ParallelGC)" os_prio=0 tid=0x00007fe6fc064800 nid=0x32a runnable



"GC task thread#35 (ParallelGC)" os_prio=0 tid=0x00007fe6fc066000 nid=0x32b runnable



"GC task thread#36 (ParallelGC)" os_prio=0 tid=0x00007fe6fc068000 nid=0x32c runnable



"GC task thread#37 (ParallelGC)" os_prio=0 tid=0x00007fe6fc069800 nid=0x32d runnable



"VM Periodic Task Thread" os_prio=0 tid=0x00007fe6fc131000 nid=0x342 waiting on condition



JNI global references: 1851





Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Friday, June 28, 2019 12:54:27 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


At the moment, it is possible to add a NVMf datanode even if only the RDMA
storage type is specified in the config. As you have seen this will go wrong
as soon as a client tries to connect to the datanode. Make sure to start the
RDMA datanode with the appropriate classname, see:
https://incubator-crail.readthedocs.io/en/latest/run.html
The correct classname is org.apache.crail.storage.rdma.RdmaStorageTier.

Regards,
Jonas

  On Thu, 27 Jun 2019 23:09:26 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Hi,
> I’m trying to integrate the storage classes and I’m hitting another
>issue when running terasort and just
> using the crail-shuffle with HDFS as the tmp storage.  The program
>just sits, after the following
> message:
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>connections 0
>
> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>containers from the server, and I’m only running
> the namenode and a rdma storage class 1 datanode.  My spark
>configuration is also now only looking at
> the rdma class.  It looks as though it’s picking up the NVMf IP and
>port in the INFO messages seen below.
> I must be configuring something wrong, but I’ve not been able to
>track it down.  Any thoughts?
>
>
> ************************************
>         TeraSort
> ************************************
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>native-hadoop library for your platform... using builtin-java classes
>where applicable
> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>to:
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>to:
> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>authentication disabled; ui acls disabled; users  with view
>permissions: Set(hduser); groups with view permissions: Set(); users
> with modify permissions: Set(hduser); groups with modify
>permissions: Set()
> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>default logging framework
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>-Dio.netty.eventLoopThreads: 112
> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>false
> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>sun.misc.Unsafe.theUnsafe: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>sun.misc.Unsafe.copyMemory: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>constructor: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>available, true
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>prior to Java9
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>java.nio.DirectByteBuffer.<init>(long, int): available
> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>(java.io.tmpdir)
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>(sun.arch.data.model)
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.noPreferDirect: false
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.maxDirectMemory: 1029177344 bytes
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.uninitializedArrayAllocationThreshold: -1
> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>available
> 19/06/27 15:59:08 DEBUG NioEventLoop:
>-Dio.netty.noKeySetOptimization: false
> 19/06/27 15:59:08 DEBUG NioEventLoop:
>-Dio.netty.selectorAutoRebuildThreshold: 512
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>org.jctools-core.MpscChunkedArrayQueue: available
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>-Dio.netty.leakDetection.level: simple
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>-Dio.netty.leakDetection.targetRecords: 4
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.numHeapArenas: 9
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.numDirectArenas: 10
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.pageSize: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.maxOrder: 11
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.chunkSize: 16777216
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.tinyCacheSize: 512
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.smallCacheSize: 256
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.normalCacheSize: 64
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.cacheTrimInterval: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.useCacheForAllThreads: true
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>(auto-detected)
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>false
> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>127.0.0.1)
> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>02:42:ac:ff:fe:1b:00:02 (auto-detected)
> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>pooled
> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>-Dio.netty.threadLocalDirectBufferSize: 65536
> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>-Dio.netty.maxThreadLocalCharBufferSize: 16384
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>port: 36915
> 19/06/27 15:59:08 INFO Utils: Successfully started service
>'sparkDriver' on port 36915.
> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>org.apache.spark.serializer.KryoSerializer
> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>information
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>BlockManagerMasterEndpoint up
> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>capacity 366.3 MB
> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
> 19/06/27 15:59:08 DEBUG
>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>SSLOptions{enabled=false, port=None, keyStore=None,
>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>protocol=None, enabledAlgorithms=Set()}
> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>on port 4040.
> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>started at http://192.168.1.161:4040
> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>at
>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>with timestamp 1561676348562
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>Connecting to master spark://master:7077...
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>connection to master/192.168.3.13:7077
> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>-Dio.netty.buffer.bytebuf.checkAccessible: true
> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>master/192.168.3.13:7077 successful, running bootstraps...
> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>bootstraps)
> 19/06/27 15:59:08 DEBUG Recycler:
>-Dio.netty.recycler.maxCapacityPerThread: 32768
> 19/06/27 15:59:08 DEBUG Recycler:
>-Dio.netty.recycler.maxSharedCapacityFactor: 2
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>16
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>Spark cluster with app ID app-20190627155908-0005
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/0 on
>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/1 on
>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>port: 39189
> 19/06/27 15:59:08 INFO Utils: Successfully started service
>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>39189.
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/2 on
>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>core(s)
> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>master:39189
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/3 on
>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/4 on
>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>core(s)
> 19/06/27 15:59:08 INFO BlockManager: Using
>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>replication policy
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/0 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/3 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/4 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/1 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/2 is now RUNNING
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>master
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>is ready for scheduling beginning after reached
>minRegisteredResourcesRatio: 0.0
> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.use.legacy.blockreader.local = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.read.shortcircuit = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.domain.socket.data.traffic = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>rpcRequestWrapperClass=class
>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>org.apache.hadoop.ipc.Client@3ed03652
> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>local reads and UNIX domain socket are disabled.
> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>not using SaslPropertiesResolver, no QOP found in configuration for
>dfs.data.transfer.protection
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>values in memory (estimated size 288.9 KB, free 366.0 MB)
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>took  115 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>without replication took  117 ms
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>locally took  6 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>broadcast_0_piece0 without replication took  6 ms
> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>newAPIHadoopFile at TeraSort.scala:60
> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
> 19/06/27 15:59:10 DEBUG Client: Connecting to
>NameNode-1/192.168.3.7:54310
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>connections 1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #0
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #0
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>31ms
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #1
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>FileStatuses: 134
> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>: 2
> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>by getSplits: 2, TimeTaken: 139
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>Boolean) constructor
> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>Algorithm version is 1
> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>masked=rwxr-xr-x
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #2
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #2
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>$anonfun$write$1
> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>($anonfun$write$1) is now cleaned +++
> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>SparkHadoopWriter.scala:78
> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>400
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>true
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>org.apache.spark.serializer.CrailSparkSerializer
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>true
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.shuffle.outstanding 1
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.shuffle.storageclass 0
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.broadcast.storageclass 0
> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
> 19/06/27 15:59:10 INFO crail: crail.version 3101
> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
> 19/06/27 15:59:10 INFO crail: crail.user crail
> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
> 19/06/27 15:59:10 INFO crail: crail.debug true
> 19/06/27 15:59:10 INFO crail: crail.statistics true
> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
> 19/06/27 15:59:10 INFO crail: crail.singleton true
> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>org.apache.crail.memory.MappedBufferCache
> 19/06/27 15:59:10 INFO crail: crail.locationmap
> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>crail://192.168.1.164:9060
> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>roundrobin
> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 19/06/27 15:59:10 INFO crail: crail.namenode.log
> 19/06/27 15:59:10 INFO crail: crail.storage.types
>org.apache.crail.storage.rdma.RdmaStorageTier
> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>bufferCount 1024
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>4294967296
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>1073741824
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>/dev/hugepages/rdma
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>queueDepth 32, messageSize 512, nodealy true
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>/192.168.1.164:9060
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>DIRECTORY, storageAffinity 0, locationAffinity 0
> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>streamId 1, isDir true, writeHint 0
> 19/06/27 15:59:10 INFO crail: passive data client
> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>size 28, native size 16
> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>native size 32
> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>72, native size 128
> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>native size 48
> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>native size 16
> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>40, native size 40
> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>native size 48
> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>139811924587312
> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>maxSge 4, cqSize 64
> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>/192.168.3.100:4420
> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>467, pd 1
> 19/06/27 15:59:10 INFO disni: setting up cq processor
> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>139810647883744
> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>64
> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>send_wr size 32, recv_wr_size 32
> 19/06/27 15:59:10 INFO disni: connect, id 0
> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.11:35854) with ID 0
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.12:44312) with ID 1
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.8:34774) with ID 4
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.9:58808) with ID 2
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.11
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>192.168.3.11, 41919, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.12
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>192.168.3.12, 46697, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.8
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>192.168.3.8, 37281, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.9
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>192.168.3.9, 43857, None)
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.10:40100) with ID 3
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.10
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>192.168.3.10, 38527, None)
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>connections 0
>
>
> Regards,
>
>           David
>



RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
I can do a docker shared volume for the config file. I had it originally

set up that way, but changed it and then just added the file to the image.

I’ll play around with that this morning. Thanks for the info!



Regards,



           David



________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Tuesday, July 2, 2019 7:01:46 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

I typically do use the start-crail.sh script. Then you have to put all the
command line arguments in the slaves file.


The configuration files need to be identical. In our configuration we put
the conf file on a NFS share this way we don't have to bother with it being
synchronized between the nodes.

Regards,
Jonas

  On Tue, 2 Jul 2019 13:48:31 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Thanks for the info Jonas.
>
> Quick question… do you typically start the datanodes from the
>namenode using the command line?
>
> I’ve been launching containers independently of the namenode.  The
>containers do have the same
>
> base configuration file, but I pass in behaviors via environment
>variables.
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Tuesday, July 2, 2019 4:27:05 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> We run a great mix of configurations of NVMf and RDMA storage tiers
>with
> different storage classes, e.g. 3 storage classes where a group of
>NVMf
> datanodes is 0, another group of NVMf server is 1 and the RDMA
>datanodes are
> storage class 2. So this should work. I understand that the setup
>might be a
> bit tricky in the beginning.
>
> From your logs I see that you do not use the same configuration file
>for
> all containers. It is crucial that e.g. the order of storage types
>etc is
> the same in all configuration files. They have to be identical. To
>specify a
> storage class for a datanode you need to append "-c 1" (storage
>class 1)
> when starting the datanode. You can find the details of how exactly
>this
> works here:
>https://incubator-crail.readthedocs.io/en/latest/run.html
> The last example in "Starting Crail manually" talks about this.
>
> Regarding the patched version, I have to take another look. Please
>use the
> Apache Crail master for now (It will hang with Spark at the end of
>your job
> but it should run through).
>
> Regards,
> Jonas
>
>  On Tue, 2 Jul 2019 00:27:33 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>
>> Just wanted to be sure I’m doing things correctly.  It runs okay
>>without adding in the NVMf datanode (i.e.
>>
>> completes teragen).  When I add the NVMf node in, even without using
>>it on the run, it hangs during the
>>
>> terasort, with nothing being written to the datanode – only the
>>metadata is created (i.e. /spark).
>>
>>
>> My config is:
>>
>> 1 namenode container
>>
>> 1 rdma datanode storage class 1 container
>>
>> 1 nvmf datanode storage class 1 container.
>>
>>
>> The namenode is showing that both datanode are starting up as
>>
>> Type 0 to storage class 0… is that correct?
>>
>>
>> NameNode log at startup:
>>
>> 19/07/01 17:18:16 INFO crail: initalizing namenode
>>
>> 19/07/01 17:18:16 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:16 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:16 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:16 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:16 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:16 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.address
>>crail://minnie:9060?id=0&size=1
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.types
>>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true, cores 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of
>>type 0 to storage class 0
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020
>>of type 0 to storage class 0
>>
>>
>> The RDMA datanode – it is set to have 4x1GB hugepages:
>>
>> 19/07/01 17:18:17 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:17 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:17 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:17 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:17 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:17 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.address
>>crail://minnie:9060
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
>>
>> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
>>
>> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>>
>> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>>
>> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>>
>> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>>
>> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>>
>> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: createEventChannel, objId
>>140349068383088
>>
>> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 3200
>>
>> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
>>
>> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
>>
>> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
>>
>> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
>>
>> 19/07/01 17:18:17 INFO disni: listen, id 0
>>
>> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
>>
>> 19/07/01 17:18:17 INFO disni: setting up protection domain, context
>>100, pd 1
>>
>> 19/07/01 17:18:17 INFO disni: PD value 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
>>
>> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: rdma storage server started, address
>>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize
>>3200
>>
>> 19/07/01 17:18:17 INFO disni: starting accept
>>
>> 19/07/01 17:18:18 INFO crail: connected to namenode(s)
>>minnie/192.168.1.164:9060
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>>
>> NVMf datanode is showing 1TB.
>>
>> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks
>>1048576
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi <da...@storedgesystems.com>
>> Sent: Monday, July 1, 2019 3:57:42 PM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>> A standard pull from the repo, one that didn’t have the patches from
>>your private repo.
>>
>> I can put patches back in both the client and server containers if
>>you really think it
>>
>> would make a difference.
>>
>>
>> Are you guys running multiple types together?  I’m running a RDMA
>>storage class 1,
>>
>> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get
>>errors when the
>>
>> RDMA is introduced into the mix.  I have a small amount of memory
>>(4GB) assigned
>>
>> with the RDMA tier, and looking for it to fall into the NVMf class 1
>>tier.  It appears to want
>>
>> to do that, but gets screwed up… it looks like it’s trying to create
>>another set of qp’s for
>>
>> an RDMA connection.  It even blew up spdk trying to accomplish that.
>>
>>
>> Do you guys have some documentation that shows what’s been tested
>>(mixes/variations) so far?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Monday, July 1, 2019 12:51:09 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> Can you clarify which unpatched version you are talking about? Are
>>you
>> talking about the NVMf thread fix where I send you a link to a
>>branch in my
>> repository or the fix we provided earlier for the Spark hang in the
>>Crail
>> master?
>>
>> Generally, if you update, update all: clients and datanode/namenode.
>>
>> Regards,
>> Jonas
>>
>>  On Fri, 28 Jun 2019 17:59:32 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Jonas,
>>>FYI - I went back to using the unpatched version of crail on the
>>>clients and it appears to work
>>> okay now with the shuffle and RDMA, with only the RDMA containers
>>>running on the server.
>>>
>>> Regards,
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: David Crespi
>>> Sent: Friday, June 28, 2019 7:49:51 AM
>>> To: Jonas Pfefferle; dev@crail.apache.org
>>> Subject: RE: Setting up storage class 1 and 2
>>>
>>>
>>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>>you provided the other day, I only
>>>
>>> added them to the spark containers (clients) not to my crail
>>>containers running on my storage server.
>>>
>>> Should the patches been added to all of the containers?
>>>
>>>
>>> Regards,
>>>
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: Jonas Pfefferle <pe...@japf.ch>
>>> Sent: Friday, June 28, 2019 12:54:27 AM
>>> To: dev@crail.apache.org; David Crespi
>>> Subject: Re: Setting up storage class 1 and 2
>>>
>>> Hi David,
>>>
>>>
>>> At the moment, it is possible to add a NVMf datanode even if only
>>>the RDMA
>>> storage type is specified in the config. As you have seen this will
>>>go wrong
>>> as soon as a client tries to connect to the datanode. Make sure to
>>>start the
>>> RDMA datanode with the appropriate classname, see:
>>> https://incubator-crail.readthedocs.io/en/latest/run.html
>>> The correct classname is
>>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>>
>>> Regards,
>>> Jonas
>>>
>>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>>  David Crespi <da...@storedgesystems.com> wrote:
>>>> Hi,
>>>> I’m trying to integrate the storage classes and I’m hitting another
>>>>issue when running terasort and just
>>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>>just sits, after the following
>>>> message:
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>>containers from the server, and I’m only running
>>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>>configuration is also now only looking at
>>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>>port in the INFO messages seen below.
>>>> I must be configuring something wrong, but I’ve not been able to
>>>>track it down.  Any thoughts?
>>>>
>>>>
>>>> ************************************
>>>>         TeraSort
>>>> ************************************
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>>native-hadoop library for your platform... using builtin-java classes
>>>>where applicable
>>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>>authentication disabled; ui acls disabled; users  with view
>>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>>> with modify permissions: Set(hduser); groups with modify
>>>>permissions: Set()
>>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>>default logging framework
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>>-Dio.netty.eventLoopThreads: 112
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.theUnsafe: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.copyMemory: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>>constructor: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>>available, true
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>>prior to Java9
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>>(java.io.tmpdir)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>>(sun.arch.data.model)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.noPreferDirect: false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>>available
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.noKeySetOptimization: false
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>org.jctools-core.MpscChunkedArrayQueue: available
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.level: simple
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.targetRecords: 4
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numHeapArenas: 9
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numDirectArenas: 10
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.pageSize: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxOrder: 11
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.chunkSize: 16777216
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.tinyCacheSize: 512
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.smallCacheSize: 256
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.normalCacheSize: 64
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>>(auto-detected)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>>127.0.0.1)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>>pooled
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 36915
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'sparkDriver' on port 36915.
>>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>>org.apache.spark.serializer.KryoSerializer
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>>information
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>>BlockManagerMasterEndpoint up
>>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>>capacity 366.3 MB
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>>> 19/06/27 15:59:08 DEBUG
>>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>>protocol=None, enabledAlgorithms=Set()}
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>>on port 4040.
>>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>>started at http://192.168.1.161:4040
>>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>at
>>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>with timestamp 1561676348562
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>>Connecting to master spark://master:7077...
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>>connection to master/192.168.3.13:7077
>>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>>master/192.168.3.13:7077 successful, running bootstraps...
>>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>>bootstraps)
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>>16
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>>Spark cluster with app ID app-20190627155908-0005
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/0 on
>>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/1 on
>>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 39189
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>>39189.
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/2 on
>>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>>master:39189
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/3 on
>>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/4 on
>>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>>replication policy
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/0 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/3 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/4 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/1 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/2 is now RUNNING
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>>master
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>>master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>>is ready for scheduling beginning after reached
>>>>minRegisteredResourcesRatio: 0.0
>>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.use.legacy.blockreader.local = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.read.shortcircuit = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.domain.socket.data.traffic = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>>rpcRequestWrapperClass=class
>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>>org.apache.hadoop.ipc.Client@3ed03652
>>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>>local reads and UNIX domain socket are disabled.
>>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>>dfs.data.transfer.protection
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>>took  115 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>>without replication took  117 ms
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>>locally took  6 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>>broadcast_0_piece0 without replication took  6 ms
>>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>>newAPIHadoopFile at TeraSort.scala:60
>>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>>NameNode-1/192.168.3.7:54310
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>>connections 1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>>31ms
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>>FileStatuses: 134
>>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>>: 2
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>>by getSplits: 2, TimeTaken: 139
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>>Boolean) constructor
>>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>>Algorithm version is 1
>>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>>masked=rwxr-xr-x
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>>$anonfun$write$1
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>>($anonfun$write$1) is now cleaned +++
>>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>>SparkHadoopWriter.scala:78
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>>400
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>>false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>>org.apache.spark.serializer.CrailSparkSerializer
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.outstanding 1
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.storageclass 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.broadcast.storageclass 0
>>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>>org.apache.crail.memory.MappedBufferCache
>>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>>crail://192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>>roundrobin
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>>bufferCount 1024
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>>4294967296
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>>1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>>/dev/hugepages/rdma
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>>queueDepth 32, messageSize 512, nodealy true
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>>/192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>>streamId 1, isDir true, writeHint 0
>>>> 19/06/27 15:59:10 INFO crail: passive data client
>>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>>size 28, native size 16
>>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>>native size 32
>>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>>72, native size 128
>>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>>native size 16
>>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>>40, native size 40
>>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>>139811924587312
>>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>>maxSge 4, cqSize 64
>>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>>/192.168.3.100:4420
>>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>>467, pd 1
>>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>>139810647883744
>>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>>64
>>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>>send_wr size 32, recv_wr_size 32
>>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.11:35854) with ID 0
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.12:44312) with ID 1
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.8:34774) with ID 4
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.9:58808) with ID 2
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.11
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>>192.168.3.11, 41919, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.12
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>>192.168.3.12, 46697, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.8
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>>192.168.3.8, 37281, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.9
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>>192.168.3.9, 43857, None)
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.10:40100) with ID 3
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.10
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>>192.168.3.10, 38527, None)
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>>
>>>> Regards,
>>>>
>>>>           David
>>>>
>>>


Re: Setting up storage class 1 and 2

Posted by Jonas Pfefferle <pe...@japf.ch>.
I typically do use the start-crail.sh script. Then you have to put all the 
command line arguments in the slaves file.


The configuration files need to be identical. In our configuration we put 
the conf file on a NFS share this way we don't have to bother with it being 
synchronized between the nodes.

Regards,
Jonas

  On Tue, 2 Jul 2019 13:48:31 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Thanks for the info Jonas.
> 
> Quick question… do you typically start the datanodes from the 
>namenode using the command line?
> 
> I’ve been launching containers independently of the namenode.  The 
>containers do have the same
> 
> base configuration file, but I pass in behaviors via environment 
>variables.
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Tuesday, July 2, 2019 4:27:05 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
> 
> Hi David,
> 
> 
> We run a great mix of configurations of NVMf and RDMA storage tiers 
>with
> different storage classes, e.g. 3 storage classes where a group of 
>NVMf
> datanodes is 0, another group of NVMf server is 1 and the RDMA 
>datanodes are
> storage class 2. So this should work. I understand that the setup 
>might be a
> bit tricky in the beginning.
> 
> From your logs I see that you do not use the same configuration file 
>for
> all containers. It is crucial that e.g. the order of storage types 
>etc is
> the same in all configuration files. They have to be identical. To 
>specify a
> storage class for a datanode you need to append "-c 1" (storage 
>class 1)
> when starting the datanode. You can find the details of how exactly 
>this
> works here: 
>https://incubator-crail.readthedocs.io/en/latest/run.html
> The last example in "Starting Crail manually" talks about this.
> 
> Regarding the patched version, I have to take another look. Please 
>use the
> Apache Crail master for now (It will hang with Spark at the end of 
>your job
> but it should run through).
> 
> Regards,
> Jonas
> 
>  On Tue, 2 Jul 2019 00:27:33 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>
>> Just wanted to be sure I’m doing things correctly.  It runs okay
>>without adding in the NVMf datanode (i.e.
>>
>> completes teragen).  When I add the NVMf node in, even without using
>>it on the run, it hangs during the
>>
>> terasort, with nothing being written to the datanode – only the
>>metadata is created (i.e. /spark).
>>
>>
>> My config is:
>>
>> 1 namenode container
>>
>> 1 rdma datanode storage class 1 container
>>
>> 1 nvmf datanode storage class 1 container.
>>
>>
>> The namenode is showing that both datanode are starting up as
>>
>> Type 0 to storage class 0… is that correct?
>>
>>
>> NameNode log at startup:
>>
>> 19/07/01 17:18:16 INFO crail: initalizing namenode
>>
>> 19/07/01 17:18:16 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:16 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:16 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:16 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:16 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:16 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.address
>>crail://minnie:9060?id=0&size=1
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.types
>>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true, cores 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of
>>type 0 to storage class 0
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020
>>of type 0 to storage class 0
>>
>>
>> The RDMA datanode – it is set to have 4x1GB hugepages:
>>
>> 19/07/01 17:18:17 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:17 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:17 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:17 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:17 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:17 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.address
>>crail://minnie:9060
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
>>
>> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
>>
>> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>>
>> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>>
>> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>>
>> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>>
>> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>>
>> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: createEventChannel, objId
>>140349068383088
>>
>> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 3200
>>
>> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
>>
>> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
>>
>> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
>>
>> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
>>
>> 19/07/01 17:18:17 INFO disni: listen, id 0
>>
>> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
>>
>> 19/07/01 17:18:17 INFO disni: setting up protection domain, context
>>100, pd 1
>>
>> 19/07/01 17:18:17 INFO disni: PD value 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
>>
>> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: rdma storage server started, address
>>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize
>>3200
>>
>> 19/07/01 17:18:17 INFO disni: starting accept
>>
>> 19/07/01 17:18:18 INFO crail: connected to namenode(s)
>>minnie/192.168.1.164:9060
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>>
>> NVMf datanode is showing 1TB.
>>
>> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks
>>1048576
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi <da...@storedgesystems.com>
>> Sent: Monday, July 1, 2019 3:57:42 PM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>> A standard pull from the repo, one that didn’t have the patches from
>>your private repo.
>>
>> I can put patches back in both the client and server containers if
>>you really think it
>>
>> would make a difference.
>>
>>
>> Are you guys running multiple types together?  I’m running a RDMA
>>storage class 1,
>>
>> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get
>>errors when the
>>
>> RDMA is introduced into the mix.  I have a small amount of memory
>>(4GB) assigned
>>
>> with the RDMA tier, and looking for it to fall into the NVMf class 1
>>tier.  It appears to want
>>
>> to do that, but gets screwed up… it looks like it’s trying to create
>>another set of qp’s for
>>
>> an RDMA connection.  It even blew up spdk trying to accomplish that.
>>
>>
>> Do you guys have some documentation that shows what’s been tested
>>(mixes/variations) so far?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Monday, July 1, 2019 12:51:09 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> Can you clarify which unpatched version you are talking about? Are
>>you
>> talking about the NVMf thread fix where I send you a link to a
>>branch in my
>> repository or the fix we provided earlier for the Spark hang in the
>>Crail
>> master?
>>
>> Generally, if you update, update all: clients and datanode/namenode.
>>
>> Regards,
>> Jonas
>>
>>  On Fri, 28 Jun 2019 17:59:32 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Jonas,
>>>FYI - I went back to using the unpatched version of crail on the
>>>clients and it appears to work
>>> okay now with the shuffle and RDMA, with only the RDMA containers
>>>running on the server.
>>>
>>> Regards,
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: David Crespi
>>> Sent: Friday, June 28, 2019 7:49:51 AM
>>> To: Jonas Pfefferle; dev@crail.apache.org
>>> Subject: RE: Setting up storage class 1 and 2
>>>
>>>
>>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>>you provided the other day, I only
>>>
>>> added them to the spark containers (clients) not to my crail
>>>containers running on my storage server.
>>>
>>> Should the patches been added to all of the containers?
>>>
>>>
>>> Regards,
>>>
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: Jonas Pfefferle <pe...@japf.ch>
>>> Sent: Friday, June 28, 2019 12:54:27 AM
>>> To: dev@crail.apache.org; David Crespi
>>> Subject: Re: Setting up storage class 1 and 2
>>>
>>> Hi David,
>>>
>>>
>>> At the moment, it is possible to add a NVMf datanode even if only
>>>the RDMA
>>> storage type is specified in the config. As you have seen this will
>>>go wrong
>>> as soon as a client tries to connect to the datanode. Make sure to
>>>start the
>>> RDMA datanode with the appropriate classname, see:
>>> https://incubator-crail.readthedocs.io/en/latest/run.html
>>> The correct classname is
>>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>>
>>> Regards,
>>> Jonas
>>>
>>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>>  David Crespi <da...@storedgesystems.com> wrote:
>>>> Hi,
>>>> I’m trying to integrate the storage classes and I’m hitting another
>>>>issue when running terasort and just
>>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>>just sits, after the following
>>>> message:
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>>containers from the server, and I’m only running
>>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>>configuration is also now only looking at
>>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>>port in the INFO messages seen below.
>>>> I must be configuring something wrong, but I’ve not been able to
>>>>track it down.  Any thoughts?
>>>>
>>>>
>>>> ************************************
>>>>         TeraSort
>>>> ************************************
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>>native-hadoop library for your platform... using builtin-java classes
>>>>where applicable
>>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>>authentication disabled; ui acls disabled; users  with view
>>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>>> with modify permissions: Set(hduser); groups with modify
>>>>permissions: Set()
>>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>>default logging framework
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>>-Dio.netty.eventLoopThreads: 112
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.theUnsafe: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.copyMemory: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>>constructor: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>>available, true
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>>prior to Java9
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>>(java.io.tmpdir)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>>(sun.arch.data.model)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.noPreferDirect: false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>>available
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.noKeySetOptimization: false
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>org.jctools-core.MpscChunkedArrayQueue: available
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.level: simple
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.targetRecords: 4
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numHeapArenas: 9
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numDirectArenas: 10
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.pageSize: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxOrder: 11
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.chunkSize: 16777216
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.tinyCacheSize: 512
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.smallCacheSize: 256
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.normalCacheSize: 64
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>>(auto-detected)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>>127.0.0.1)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>>pooled
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 36915
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'sparkDriver' on port 36915.
>>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>>org.apache.spark.serializer.KryoSerializer
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>>information
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>>BlockManagerMasterEndpoint up
>>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>>capacity 366.3 MB
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>>> 19/06/27 15:59:08 DEBUG
>>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>>protocol=None, enabledAlgorithms=Set()}
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>>on port 4040.
>>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>>started at http://192.168.1.161:4040
>>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>at
>>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>with timestamp 1561676348562
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>>Connecting to master spark://master:7077...
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>>connection to master/192.168.3.13:7077
>>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>>master/192.168.3.13:7077 successful, running bootstraps...
>>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>>bootstraps)
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>>16
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>>Spark cluster with app ID app-20190627155908-0005
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/0 on
>>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/1 on
>>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 39189
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>>39189.
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/2 on
>>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>>master:39189
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/3 on
>>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/4 on
>>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>>replication policy
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/0 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/3 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/4 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/1 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/2 is now RUNNING
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>>master
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>>master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>>is ready for scheduling beginning after reached
>>>>minRegisteredResourcesRatio: 0.0
>>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.use.legacy.blockreader.local = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.read.shortcircuit = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.domain.socket.data.traffic = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>>rpcRequestWrapperClass=class
>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>>org.apache.hadoop.ipc.Client@3ed03652
>>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>>local reads and UNIX domain socket are disabled.
>>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>>dfs.data.transfer.protection
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>>took  115 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>>without replication took  117 ms
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>>locally took  6 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>>broadcast_0_piece0 without replication took  6 ms
>>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>>newAPIHadoopFile at TeraSort.scala:60
>>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>>NameNode-1/192.168.3.7:54310
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>>connections 1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>>31ms
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>>FileStatuses: 134
>>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>>: 2
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>>by getSplits: 2, TimeTaken: 139
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>>Boolean) constructor
>>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>>Algorithm version is 1
>>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>>masked=rwxr-xr-x
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>>$anonfun$write$1
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>>($anonfun$write$1) is now cleaned +++
>>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>>SparkHadoopWriter.scala:78
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>>400
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>>false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>>org.apache.spark.serializer.CrailSparkSerializer
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.outstanding 1
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.storageclass 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.broadcast.storageclass 0
>>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>>org.apache.crail.memory.MappedBufferCache
>>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>>crail://192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>>roundrobin
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>>bufferCount 1024
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>>4294967296
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>>1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>>/dev/hugepages/rdma
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>>queueDepth 32, messageSize 512, nodealy true
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>>/192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>>streamId 1, isDir true, writeHint 0
>>>> 19/06/27 15:59:10 INFO crail: passive data client
>>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>>size 28, native size 16
>>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>>native size 32
>>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>>72, native size 128
>>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>>native size 16
>>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>>40, native size 40
>>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>>139811924587312
>>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>>maxSge 4, cqSize 64
>>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>>/192.168.3.100:4420
>>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>>467, pd 1
>>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>>139810647883744
>>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>>64
>>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>>send_wr size 32, recv_wr_size 32
>>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.11:35854) with ID 0
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.12:44312) with ID 1
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.8:34774) with ID 4
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.9:58808) with ID 2
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.11
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>>192.168.3.11, 41919, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.12
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>>192.168.3.12, 46697, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.8
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>>192.168.3.8, 37281, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.9
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>>192.168.3.9, 43857, None)
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.10:40100) with ID 3
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.10
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>>192.168.3.10, 38527, None)
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>>
>>>> Regards,
>>>>
>>>>           David
>>>>
>>>



RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Thanks for the info Jonas.

Quick question… do you typically start the datanodes from the namenode using the command line?

I’ve been launching containers independently of the namenode.  The containers do have the same

base configuration file, but I pass in behaviors via environment variables.



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Tuesday, July 2, 2019 4:27:05 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


We run a great mix of configurations of NVMf and RDMA storage tiers with
different storage classes, e.g. 3 storage classes where a group of NVMf
datanodes is 0, another group of NVMf server is 1 and the RDMA datanodes are
storage class 2. So this should work. I understand that the setup might be a
bit tricky in the beginning.

 From your logs I see that you do not use the same configuration file for
all containers. It is crucial that e.g. the order of storage types etc is
the same in all configuration files. They have to be identical. To specify a
storage class for a datanode you need to append "-c 1" (storage class 1)
when starting the datanode. You can find the details of how exactly this
works here: https://incubator-crail.readthedocs.io/en/latest/run.html
The last example in "Starting Crail manually" talks about this.

Regarding the patched version, I have to take another look. Please use the
Apache Crail master for now (It will hang with Spark at the end of your job
but it should run through).

Regards,
Jonas

  On Tue, 2 Jul 2019 00:27:33 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>
> Just wanted to be sure I’m doing things correctly.  It runs okay
>without adding in the NVMf datanode (i.e.
>
> completes teragen).  When I add the NVMf node in, even without using
>it on the run, it hangs during the
>
> terasort, with nothing being written to the datanode – only the
>metadata is created (i.e. /spark).
>
>
> My config is:
>
> 1 namenode container
>
> 1 rdma datanode storage class 1 container
>
> 1 nvmf datanode storage class 1 container.
>
>
> The namenode is showing that both datanode are starting up as
>
> Type 0 to storage class 0… is that correct?
>
>
> NameNode log at startup:
>
> 19/07/01 17:18:16 INFO crail: initalizing namenode
>
> 19/07/01 17:18:16 INFO crail: crail.version 3101
>
> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
>
> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
>
> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
>
> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
>
> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
>
> 19/07/01 17:18:16 INFO crail: crail.user crail
>
> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
>
> 19/07/01 17:18:16 INFO crail: crail.debug true
>
> 19/07/01 17:18:16 INFO crail: crail.statistics false
>
> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
>
> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
>
> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
>
> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
>
> 19/07/01 17:18:16 INFO crail: crail.singleton true
>
> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
>
> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
>
> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
>
> 19/07/01 17:18:16 INFO crail: crail.cacheimpl
>org.apache.crail.memory.MappedBufferCache
>
> 19/07/01 17:18:16 INFO crail: crail.locationmap
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.address
>crail://minnie:9060?id=0&size=1
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection
>roundrobin
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.log
>
> 19/07/01 17:18:16 INFO crail: crail.storage.types
>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
>
> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
>
> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
>
> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
>
> 19/07/01 17:18:16 INFO crail: round robin block selection
>
> 19/07/01 17:18:16 INFO crail: round robin block selection
>
> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0,
>queueDepth 32, messageSize 512, nodealy true, cores 2
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
>
> 19/07/01 17:18:17 INFO crail: new connection from
>/192.168.1.164:39260
>
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>/192.168.1.164:39260
>
> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of
>type 0 to storage class 0
>
> 19/07/01 17:18:17 INFO crail: new connection from
>/192.168.1.164:39262
>
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>/192.168.1.164:39262
>
> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020
>of type 0 to storage class 0
>
>
> The RDMA datanode – it is set to have 4x1GB hugepages:
>
> 19/07/01 17:18:17 INFO crail: crail.version 3101
>
> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
>
> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
>
> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
>
> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
>
> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
>
> 19/07/01 17:18:17 INFO crail: crail.user crail
>
> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
>
> 19/07/01 17:18:17 INFO crail: crail.debug true
>
> 19/07/01 17:18:17 INFO crail: crail.statistics false
>
> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
>
> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
>
> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
>
> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
>
> 19/07/01 17:18:17 INFO crail: crail.singleton true
>
> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
>
> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
>
> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
>
> 19/07/01 17:18:17 INFO crail: crail.cacheimpl
>org.apache.crail.memory.MappedBufferCache
>
> 19/07/01 17:18:17 INFO crail: crail.locationmap
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.address
>crail://minnie:9060
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection
>roundrobin
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.log
>
> 19/07/01 17:18:17 INFO crail: crail.storage.types
>org.apache.crail.storage.rdma.RdmaStorageTier
>
> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
>
> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
>
> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
>
> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs
>size 28, native size 16
>
> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32,
>native size 32
>
> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size
>72, native size 128
>
> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48,
>native size 48
>
> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16,
>native size 16
>
> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size
>40, native size 40
>
> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48,
>native size 48
>
> 19/07/01 17:18:17 INFO disni: createEventChannel, objId
>140349068383088
>
> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32,
>maxSge 4, cqSize 3200
>
> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
>
> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
>
> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
>
> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
>
> 19/07/01 17:18:17 INFO disni: listen, id 0
>
> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
>
> 19/07/01 17:18:17 INFO disni: setting up protection domain, context
>100, pd 1
>
> 19/07/01 17:18:17 INFO disni: PD value 1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit
>4294967296
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize
>1073741824
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath
>/dev/hugepages/rdma
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
>
> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0,
>queueDepth 32, messageSize 512, nodealy true
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
>
> 19/07/01 17:18:17 INFO crail: rdma storage server started, address
>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize
>3200
>
> 19/07/01 17:18:17 INFO disni: starting accept
>
> 19/07/01 17:18:18 INFO crail: connected to namenode(s)
>minnie/192.168.1.164:9060
>
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
>
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
>
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
>
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>
>
> NVMf datanode is showing 1TB.
>
> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks
>1048576
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: David Crespi <da...@storedgesystems.com>
> Sent: Monday, July 1, 2019 3:57:42 PM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
>
> A standard pull from the repo, one that didn’t have the patches from
>your private repo.
>
> I can put patches back in both the client and server containers if
>you really think it
>
> would make a difference.
>
>
> Are you guys running multiple types together?  I’m running a RDMA
>storage class 1,
>
> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get
>errors when the
>
> RDMA is introduced into the mix.  I have a small amount of memory
>(4GB) assigned
>
> with the RDMA tier, and looking for it to fall into the NVMf class 1
>tier.  It appears to want
>
> to do that, but gets screwed up… it looks like it’s trying to create
>another set of qp’s for
>
> an RDMA connection.  It even blew up spdk trying to accomplish that.
>
>
> Do you guys have some documentation that shows what’s been tested
>(mixes/variations) so far?
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Monday, July 1, 2019 12:51:09 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> Can you clarify which unpatched version you are talking about? Are
>you
> talking about the NVMf thread fix where I send you a link to a
>branch in my
> repository or the fix we provided earlier for the Spark hang in the
>Crail
> master?
>
> Generally, if you update, update all: clients and datanode/namenode.
>
> Regards,
> Jonas
>
>  On Fri, 28 Jun 2019 17:59:32 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>FYI - I went back to using the unpatched version of crail on the
>>clients and it appears to work
>> okay now with the shuffle and RDMA, with only the RDMA containers
>>running on the server.
>>
>> Regards,
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi
>> Sent: Friday, June 28, 2019 7:49:51 AM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>>
>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>you provided the other day, I only
>>
>> added them to the spark containers (clients) not to my crail
>>containers running on my storage server.
>>
>> Should the patches been added to all of the containers?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Friday, June 28, 2019 12:54:27 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> At the moment, it is possible to add a NVMf datanode even if only
>>the RDMA
>> storage type is specified in the config. As you have seen this will
>>go wrong
>> as soon as a client tries to connect to the datanode. Make sure to
>>start the
>> RDMA datanode with the appropriate classname, see:
>> https://incubator-crail.readthedocs.io/en/latest/run.html
>> The correct classname is
>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>
>> Regards,
>> Jonas
>>
>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Hi,
>>> I’m trying to integrate the storage classes and I’m hitting another
>>>issue when running terasort and just
>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>just sits, after the following
>>> message:
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>containers from the server, and I’m only running
>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>configuration is also now only looking at
>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>port in the INFO messages seen below.
>>> I must be configuring something wrong, but I’ve not been able to
>>>track it down.  Any thoughts?
>>>
>>>
>>> ************************************
>>>         TeraSort
>>> ************************************
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>native-hadoop library for your platform... using builtin-java classes
>>>where applicable
>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>authentication disabled; ui acls disabled; users  with view
>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>> with modify permissions: Set(hduser); groups with modify
>>>permissions: Set()
>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>default logging framework
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>-Dio.netty.eventLoopThreads: 112
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.theUnsafe: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.copyMemory: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>constructor: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>available, true
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>prior to Java9
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>(java.io.tmpdir)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>(sun.arch.data.model)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.noPreferDirect: false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>available
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.noKeySetOptimization: false
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>org.jctools-core.MpscChunkedArrayQueue: available
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.level: simple
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.targetRecords: 4
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numHeapArenas: 9
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numDirectArenas: 10
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.pageSize: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxOrder: 11
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.chunkSize: 16777216
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.tinyCacheSize: 512
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.smallCacheSize: 256
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.normalCacheSize: 64
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>(auto-detected)
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>false
>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>127.0.0.1)
>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>pooled
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 36915
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'sparkDriver' on port 36915.
>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>org.apache.spark.serializer.KryoSerializer
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>information
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>BlockManagerMasterEndpoint up
>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>capacity 366.3 MB
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>> 19/06/27 15:59:08 DEBUG
>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>protocol=None, enabledAlgorithms=Set()}
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>on port 4040.
>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>started at http://192.168.1.161:4040
>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>at
>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>with timestamp 1561676348562
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>Connecting to master spark://master:7077...
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>connection to master/192.168.3.13:7077
>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>master/192.168.3.13:7077 successful, running bootstraps...
>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>bootstraps)
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>16
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>Spark cluster with app ID app-20190627155908-0005
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/0 on
>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/1 on
>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 39189
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>39189.
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/2 on
>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>master:39189
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/3 on
>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/4 on
>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>replication policy
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/0 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/3 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/4 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/1 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/2 is now RUNNING
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>master
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>is ready for scheduling beginning after reached
>>>minRegisteredResourcesRatio: 0.0
>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.use.legacy.blockreader.local = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.read.shortcircuit = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.domain.socket.data.traffic = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>rpcRequestWrapperClass=class
>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>org.apache.hadoop.ipc.Client@3ed03652
>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>local reads and UNIX domain socket are disabled.
>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>dfs.data.transfer.protection
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>took  115 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>without replication took  117 ms
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>locally took  6 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>broadcast_0_piece0 without replication took  6 ms
>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>newAPIHadoopFile at TeraSort.scala:60
>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>NameNode-1/192.168.3.7:54310
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>connections 1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>31ms
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>FileStatuses: 134
>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>: 2
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>by getSplits: 2, TimeTaken: 139
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>Boolean) constructor
>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>Algorithm version is 1
>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>masked=rwxr-xr-x
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>$anonfun$write$1
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>($anonfun$write$1) is now cleaned +++
>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>SparkHadoopWriter.scala:78
>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>400
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>org.apache.spark.serializer.CrailSparkSerializer
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.outstanding 1
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.storageclass 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.broadcast.storageclass 0
>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>org.apache.crail.memory.MappedBufferCache
>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>crail://192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>roundrobin
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>bufferCount 1024
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>4294967296
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>/dev/hugepages/rdma
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>queueDepth 32, messageSize 512, nodealy true
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>/192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>streamId 1, isDir true, writeHint 0
>>> 19/06/27 15:59:10 INFO crail: passive data client
>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>size 28, native size 16
>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>native size 32
>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>72, native size 128
>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>native size 16
>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>40, native size 40
>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>139811924587312
>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>maxSge 4, cqSize 64
>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>/192.168.3.100:4420
>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>467, pd 1
>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>139810647883744
>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>64
>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>send_wr size 32, recv_wr_size 32
>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.11:35854) with ID 0
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.12:44312) with ID 1
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.8:34774) with ID 4
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.9:58808) with ID 2
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.11
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>192.168.3.11, 41919, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.12
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>192.168.3.12, 46697, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.8
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>192.168.3.8, 37281, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.9
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>192.168.3.9, 43857, None)
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.10:40100) with ID 3
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.10
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>192.168.3.10, 38527, None)
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>>
>>> Regards,
>>>
>>>           David
>>>
>>



Re: Setting up storage class 1 and 2

Posted by Jonas Pfefferle <pe...@japf.ch>.
Hi David,


We run a great mix of configurations of NVMf and RDMA storage tiers with 
different storage classes, e.g. 3 storage classes where a group of NVMf 
datanodes is 0, another group of NVMf server is 1 and the RDMA datanodes are 
storage class 2. So this should work. I understand that the setup might be a 
bit tricky in the beginning.

 From your logs I see that you do not use the same configuration file for 
all containers. It is crucial that e.g. the order of storage types etc is 
the same in all configuration files. They have to be identical. To specify a 
storage class for a datanode you need to append "-c 1" (storage class 1) 
when starting the datanode. You can find the details of how exactly this 
works here: https://incubator-crail.readthedocs.io/en/latest/run.html
The last example in "Starting Crail manually" talks about this.

Regarding the patched version, I have to take another look. Please use the 
Apache Crail master for now (It will hang with Spark at the end of your job 
but it should run through).

Regards,
Jonas

  On Tue, 2 Jul 2019 00:27:33 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
> 
> Just wanted to be sure I’m doing things correctly.  It runs okay 
>without adding in the NVMf datanode (i.e.
> 
> completes teragen).  When I add the NVMf node in, even without using 
>it on the run, it hangs during the
> 
> terasort, with nothing being written to the datanode – only the 
>metadata is created (i.e. /spark).
> 
> 
> My config is:
> 
> 1 namenode container
> 
> 1 rdma datanode storage class 1 container
> 
> 1 nvmf datanode storage class 1 container.
> 
> 
> The namenode is showing that both datanode are starting up as
> 
> Type 0 to storage class 0… is that correct?
> 
> 
> NameNode log at startup:
> 
> 19/07/01 17:18:16 INFO crail: initalizing namenode
> 
> 19/07/01 17:18:16 INFO crail: crail.version 3101
> 
> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
> 
> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
> 
> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
> 
> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
> 
> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
> 
> 19/07/01 17:18:16 INFO crail: crail.user crail
> 
> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
> 
> 19/07/01 17:18:16 INFO crail: crail.debug true
> 
> 19/07/01 17:18:16 INFO crail: crail.statistics false
> 
> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
> 
> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
> 
> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
> 
> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
> 
> 19/07/01 17:18:16 INFO crail: crail.singleton true
> 
> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
> 
> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
> 
> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
> 
> 19/07/01 17:18:16 INFO crail: crail.cacheimpl 
>org.apache.crail.memory.MappedBufferCache
> 
> 19/07/01 17:18:16 INFO crail: crail.locationmap
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.address 
>crail://minnie:9060?id=0&size=1
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection 
>roundrobin
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype 
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.log
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.types 
>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
> 
> 19/07/01 17:18:16 INFO crail: round robin block selection
> 
> 19/07/01 17:18:16 INFO crail: round robin block selection
> 
> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0, 
>queueDepth 32, messageSize 512, nodealy true, cores 2
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
> 
> 19/07/01 17:18:17 INFO crail: new connection from 
>/192.168.1.164:39260
> 
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from 
>/192.168.1.164:39260
> 
> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of 
>type 0 to storage class 0
> 
> 19/07/01 17:18:17 INFO crail: new connection from 
>/192.168.1.164:39262
> 
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from 
>/192.168.1.164:39262
> 
> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020 
>of type 0 to storage class 0
> 
> 
> The RDMA datanode – it is set to have 4x1GB hugepages:
> 
> 19/07/01 17:18:17 INFO crail: crail.version 3101
> 
> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
> 
> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
> 
> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
> 
> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
> 
> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
> 
> 19/07/01 17:18:17 INFO crail: crail.user crail
> 
> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
> 
> 19/07/01 17:18:17 INFO crail: crail.debug true
> 
> 19/07/01 17:18:17 INFO crail: crail.statistics false
> 
> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
> 
> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
> 
> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
> 
> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
> 
> 19/07/01 17:18:17 INFO crail: crail.singleton true
> 
> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
> 
> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
> 
> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
> 
> 19/07/01 17:18:17 INFO crail: crail.cacheimpl 
>org.apache.crail.memory.MappedBufferCache
> 
> 19/07/01 17:18:17 INFO crail: crail.locationmap
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.address 
>crail://minnie:9060
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection 
>roundrobin
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype 
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.log
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.types 
>org.apache.crail.storage.rdma.RdmaStorageTier
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
> 
> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
> 
> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
> 
> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs 
>size 28, native size 16
> 
> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32, 
>native size 32
> 
> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size 
>72, native size 128
> 
> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48, 
>native size 48
> 
> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16, 
>native size 16
> 
> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size 
>40, native size 40
> 
> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48, 
>native size 48
> 
> 19/07/01 17:18:17 INFO disni: createEventChannel, objId 
>140349068383088
> 
> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32, 
>maxSge 4, cqSize 3200
> 
> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
> 
> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
> 
> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
> 
> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
> 
> 19/07/01 17:18:17 INFO disni: listen, id 0
> 
> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
> 
> 19/07/01 17:18:17 INFO disni: setting up protection domain, context 
>100, pd 1
> 
> 19/07/01 17:18:17 INFO disni: PD value 1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit 
>4294967296
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize 
>1073741824
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath 
>/dev/hugepages/rdma
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
> 
> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0, 
>queueDepth 32, messageSize 512, nodealy true
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
> 
> 19/07/01 17:18:17 INFO crail: rdma storage server started, address 
>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize 
>3200
> 
> 19/07/01 17:18:17 INFO disni: starting accept
> 
> 19/07/01 17:18:18 INFO crail: connected to namenode(s) 
>minnie/192.168.1.164:9060
> 
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
> 
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
> 
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
> 
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
> 
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
> 
> 
> NVMf datanode is showing 1TB.
> 
> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks 
>1048576
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: David Crespi <da...@storedgesystems.com>
> Sent: Monday, July 1, 2019 3:57:42 PM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
> 
> A standard pull from the repo, one that didn’t have the patches from 
>your private repo.
> 
> I can put patches back in both the client and server containers if 
>you really think it
> 
> would make a difference.
> 
> 
> Are you guys running multiple types together?  I’m running a RDMA 
>storage class 1,
> 
> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get 
>errors when the
> 
> RDMA is introduced into the mix.  I have a small amount of memory 
>(4GB) assigned
> 
> with the RDMA tier, and looking for it to fall into the NVMf class 1 
>tier.  It appears to want
> 
> to do that, but gets screwed up… it looks like it’s trying to create 
>another set of qp’s for
> 
> an RDMA connection.  It even blew up spdk trying to accomplish that.
> 
> 
> Do you guys have some documentation that shows what’s been tested 
>(mixes/variations) so far?
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Monday, July 1, 2019 12:51:09 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
> 
> Hi David,
> 
> 
> Can you clarify which unpatched version you are talking about? Are 
>you
> talking about the NVMf thread fix where I send you a link to a 
>branch in my
> repository or the fix we provided earlier for the Spark hang in the 
>Crail
> master?
> 
> Generally, if you update, update all: clients and datanode/namenode.
> 
> Regards,
> Jonas
> 
>  On Fri, 28 Jun 2019 17:59:32 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>FYI - I went back to using the unpatched version of crail on the
>>clients and it appears to work
>> okay now with the shuffle and RDMA, with only the RDMA containers
>>running on the server.
>>
>> Regards,
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi
>> Sent: Friday, June 28, 2019 7:49:51 AM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>>
>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>you provided the other day, I only
>>
>> added them to the spark containers (clients) not to my crail
>>containers running on my storage server.
>>
>> Should the patches been added to all of the containers?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Friday, June 28, 2019 12:54:27 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> At the moment, it is possible to add a NVMf datanode even if only
>>the RDMA
>> storage type is specified in the config. As you have seen this will
>>go wrong
>> as soon as a client tries to connect to the datanode. Make sure to
>>start the
>> RDMA datanode with the appropriate classname, see:
>> https://incubator-crail.readthedocs.io/en/latest/run.html
>> The correct classname is
>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>
>> Regards,
>> Jonas
>>
>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Hi,
>>> I’m trying to integrate the storage classes and I’m hitting another
>>>issue when running terasort and just
>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>just sits, after the following
>>> message:
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>containers from the server, and I’m only running
>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>configuration is also now only looking at
>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>port in the INFO messages seen below.
>>> I must be configuring something wrong, but I’ve not been able to
>>>track it down.  Any thoughts?
>>>
>>>
>>> ************************************
>>>         TeraSort
>>> ************************************
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>native-hadoop library for your platform... using builtin-java classes
>>>where applicable
>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>authentication disabled; ui acls disabled; users  with view
>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>> with modify permissions: Set(hduser); groups with modify
>>>permissions: Set()
>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>default logging framework
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>-Dio.netty.eventLoopThreads: 112
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.theUnsafe: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.copyMemory: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>constructor: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>available, true
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>prior to Java9
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>(java.io.tmpdir)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>(sun.arch.data.model)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.noPreferDirect: false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>available
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.noKeySetOptimization: false
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>org.jctools-core.MpscChunkedArrayQueue: available
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.level: simple
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.targetRecords: 4
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numHeapArenas: 9
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numDirectArenas: 10
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.pageSize: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxOrder: 11
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.chunkSize: 16777216
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.tinyCacheSize: 512
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.smallCacheSize: 256
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.normalCacheSize: 64
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>(auto-detected)
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>false
>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>127.0.0.1)
>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>pooled
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 36915
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'sparkDriver' on port 36915.
>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>org.apache.spark.serializer.KryoSerializer
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>information
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>BlockManagerMasterEndpoint up
>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>capacity 366.3 MB
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>> 19/06/27 15:59:08 DEBUG
>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>protocol=None, enabledAlgorithms=Set()}
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>on port 4040.
>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>started at http://192.168.1.161:4040
>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>at
>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>with timestamp 1561676348562
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>Connecting to master spark://master:7077...
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>connection to master/192.168.3.13:7077
>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>master/192.168.3.13:7077 successful, running bootstraps...
>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>bootstraps)
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>16
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>Spark cluster with app ID app-20190627155908-0005
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/0 on
>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/1 on
>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 39189
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>39189.
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/2 on
>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>master:39189
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/3 on
>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/4 on
>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>replication policy
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/0 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/3 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/4 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/1 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/2 is now RUNNING
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>master
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>is ready for scheduling beginning after reached
>>>minRegisteredResourcesRatio: 0.0
>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.use.legacy.blockreader.local = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.read.shortcircuit = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.domain.socket.data.traffic = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>rpcRequestWrapperClass=class
>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>org.apache.hadoop.ipc.Client@3ed03652
>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>local reads and UNIX domain socket are disabled.
>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>dfs.data.transfer.protection
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>took  115 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>without replication took  117 ms
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>locally took  6 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>broadcast_0_piece0 without replication took  6 ms
>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>newAPIHadoopFile at TeraSort.scala:60
>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>NameNode-1/192.168.3.7:54310
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>connections 1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>31ms
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>FileStatuses: 134
>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>: 2
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>by getSplits: 2, TimeTaken: 139
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>Boolean) constructor
>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>Algorithm version is 1
>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>masked=rwxr-xr-x
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>$anonfun$write$1
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>($anonfun$write$1) is now cleaned +++
>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>SparkHadoopWriter.scala:78
>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>400
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>org.apache.spark.serializer.CrailSparkSerializer
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.outstanding 1
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.storageclass 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.broadcast.storageclass 0
>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>org.apache.crail.memory.MappedBufferCache
>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>crail://192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>roundrobin
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>bufferCount 1024
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>4294967296
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>/dev/hugepages/rdma
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>queueDepth 32, messageSize 512, nodealy true
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>/192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>streamId 1, isDir true, writeHint 0
>>> 19/06/27 15:59:10 INFO crail: passive data client
>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>size 28, native size 16
>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>native size 32
>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>72, native size 128
>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>native size 16
>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>40, native size 40
>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>139811924587312
>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>maxSge 4, cqSize 64
>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>/192.168.3.100:4420
>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>467, pd 1
>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>139810647883744
>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>64
>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>send_wr size 32, recv_wr_size 32
>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.11:35854) with ID 0
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.12:44312) with ID 1
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.8:34774) with ID 4
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.9:58808) with ID 2
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.11
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>192.168.3.11, 41919, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.12
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>192.168.3.12, 46697, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.8
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>192.168.3.8, 37281, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.9
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>192.168.3.9, 43857, None)
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.10:40100) with ID 3
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.10
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>192.168.3.10, 38527, None)
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>>
>>> Regards,
>>>
>>>           David
>>>
>>




RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Bounced on the first attempt.

Regards,

           David
From: David Crespi<ma...@storedgesystems.com>
Sent: Monday, July 1, 2019 5:27 PM
To: dev@crail.apache.org<ma...@crail.apache.org>; Jonas Pfefferle<ma...@japf.ch>
Subject: RE: Setting up storage class 1 and 2


Jonas,

Just wanted to be sure I’m doing things correctly.  It runs okay without adding in the NVMf datanode (i.e.

completes teragen).  When I add the NVMf node in, even without using it on the run, it hangs during the

terasort, with nothing being written to the datanode – only the metadata is created (i.e. /spark).



My config is:

1 namenode container

1 rdma datanode storage class 1 container

1 nvmf datanode storage class 1 container.



The namenode is showing that both datanode are starting up as

Type 0 to storage class 0… is that correct?



NameNode log at startup:

19/07/01 17:18:16 INFO crail: initalizing namenode

19/07/01 17:18:16 INFO crail: crail.version 3101

19/07/01 17:18:16 INFO crail: crail.directorydepth 16

19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:16 INFO crail: crail.blocksize 1048576

19/07/01 17:18:16 INFO crail: crail.cachelimit 0

19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:16 INFO crail: crail.user crail

19/07/01 17:18:16 INFO crail: crail.shadowreplication 1

19/07/01 17:18:16 INFO crail: crail.debug true

19/07/01 17:18:16 INFO crail: crail.statistics false

19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:16 INFO crail: crail.datatimeout 1000

19/07/01 17:18:16 INFO crail: crail.buffersize 1048576

19/07/01 17:18:16 INFO crail: crail.slicesize 65536

19/07/01 17:18:16 INFO crail: crail.singleton true

19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:16 INFO crail: crail.directoryrecord 512

19/07/01 17:18:16 INFO crail: crail.directoryrandomize true

19/07/01 17:18:16 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:16 INFO crail: crail.locationmap

19/07/01 17:18:16 INFO crail: crail.namenode.address crail://minnie:9060?id=0&size=1

19/07/01 17:18:16 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:16 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:16 INFO crail: crail.namenode.log

19/07/01 17:18:16 INFO crail: crail.storage.types org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:16 INFO crail: crail.storage.classes 2

19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true, cores 2

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39260

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39260

19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of type 0 to storage class 0

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39262

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39262

19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020 of type 0 to storage class 0



The RDMA datanode – it is set to have 4x1GB hugepages:

19/07/01 17:18:17 INFO crail: crail.version 3101

19/07/01 17:18:17 INFO crail: crail.directorydepth 16

19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:17 INFO crail: crail.blocksize 1048576

19/07/01 17:18:17 INFO crail: crail.cachelimit 0

19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:17 INFO crail: crail.user crail

19/07/01 17:18:17 INFO crail: crail.shadowreplication 1

19/07/01 17:18:17 INFO crail: crail.debug true

19/07/01 17:18:17 INFO crail: crail.statistics false

19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:17 INFO crail: crail.datatimeout 1000

19/07/01 17:18:17 INFO crail: crail.buffersize 1048576

19/07/01 17:18:17 INFO crail: crail.slicesize 65536

19/07/01 17:18:17 INFO crail: crail.singleton true

19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:17 INFO crail: crail.directoryrecord 512

19/07/01 17:18:17 INFO crail: crail.directoryrandomize true

19/07/01 17:18:17 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:17 INFO crail: crail.locationmap

19/07/01 17:18:17 INFO crail: crail.namenode.address crail://minnie:9060

19/07/01 17:18:17 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:17 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:17 INFO crail: crail.namenode.log

19/07/01 17:18:17 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:17 INFO crail: crail.storage.classes 1

19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'

19/07/01 17:18:17 INFO disni: jverbs jni version 32

19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs size 28, native size 16

19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32, native size 32

19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size 72, native size 128

19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16, native size 16

19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size 40, native size 40

19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: createEventChannel, objId 140349068383088

19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: createId, id 140349068429968

19/07/01 17:18:17 INFO disni: new server endpoint, id 0

19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0

19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020

19/07/01 17:18:17 INFO disni: listen, id 0

19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808

19/07/01 17:18:17 INFO disni: setting up protection domain, context 100, pd 1

19/07/01 17:18:17 INFO disni: PD value 1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020

19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit 4294967296

19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize 1073741824

19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath /dev/hugepages/rdma

19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true

19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32

19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive

19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100

19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000

19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: rdma storage server started, address /192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: starting accept

19/07/01 17:18:18 INFO crail: connected to namenode(s) minnie/192.168.1.164:9060

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096



NVMf datanode is showing 1TB.

19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks 1048576





Regards,



           David




RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Jonas,

Just wanted to be sure I’m doing things correctly.  It runs okay without adding in the NVMf datanode (i.e.

completes teragen).  When I add the NVMf node in, even without using it on the run, it hangs during the

terasort, with nothing being written to the datanode – only the metadata is created (i.e. /spark).



My config is:

1 namenode container

1 rdma datanode storage class 1 container

1 nvmf datanode storage class 1 container.



The namenode is showing that both datanode are starting up as

Type 0 to storage class 0… is that correct?



NameNode log at startup:

19/07/01 17:18:16 INFO crail: initalizing namenode

19/07/01 17:18:16 INFO crail: crail.version 3101

19/07/01 17:18:16 INFO crail: crail.directorydepth 16

19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:16 INFO crail: crail.blocksize 1048576

19/07/01 17:18:16 INFO crail: crail.cachelimit 0

19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:16 INFO crail: crail.user crail

19/07/01 17:18:16 INFO crail: crail.shadowreplication 1

19/07/01 17:18:16 INFO crail: crail.debug true

19/07/01 17:18:16 INFO crail: crail.statistics false

19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:16 INFO crail: crail.datatimeout 1000

19/07/01 17:18:16 INFO crail: crail.buffersize 1048576

19/07/01 17:18:16 INFO crail: crail.slicesize 65536

19/07/01 17:18:16 INFO crail: crail.singleton true

19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:16 INFO crail: crail.directoryrecord 512

19/07/01 17:18:16 INFO crail: crail.directoryrandomize true

19/07/01 17:18:16 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:16 INFO crail: crail.locationmap

19/07/01 17:18:16 INFO crail: crail.namenode.address crail://minnie:9060?id=0&size=1

19/07/01 17:18:16 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:16 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:16 INFO crail: crail.namenode.log

19/07/01 17:18:16 INFO crail: crail.storage.types org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:16 INFO crail: crail.storage.classes 2

19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true, cores 2

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39260

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39260

19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of type 0 to storage class 0

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39262

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39262

19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020 of type 0 to storage class 0



The RDMA datanode – it is set to have 4x1GB hugepages:

19/07/01 17:18:17 INFO crail: crail.version 3101

19/07/01 17:18:17 INFO crail: crail.directorydepth 16

19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:17 INFO crail: crail.blocksize 1048576

19/07/01 17:18:17 INFO crail: crail.cachelimit 0

19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:17 INFO crail: crail.user crail

19/07/01 17:18:17 INFO crail: crail.shadowreplication 1

19/07/01 17:18:17 INFO crail: crail.debug true

19/07/01 17:18:17 INFO crail: crail.statistics false

19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:17 INFO crail: crail.datatimeout 1000

19/07/01 17:18:17 INFO crail: crail.buffersize 1048576

19/07/01 17:18:17 INFO crail: crail.slicesize 65536

19/07/01 17:18:17 INFO crail: crail.singleton true

19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:17 INFO crail: crail.directoryrecord 512

19/07/01 17:18:17 INFO crail: crail.directoryrandomize true

19/07/01 17:18:17 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:17 INFO crail: crail.locationmap

19/07/01 17:18:17 INFO crail: crail.namenode.address crail://minnie:9060

19/07/01 17:18:17 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:17 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:17 INFO crail: crail.namenode.log

19/07/01 17:18:17 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:17 INFO crail: crail.storage.classes 1

19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'

19/07/01 17:18:17 INFO disni: jverbs jni version 32

19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs size 28, native size 16

19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32, native size 32

19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size 72, native size 128

19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16, native size 16

19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size 40, native size 40

19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: createEventChannel, objId 140349068383088

19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: createId, id 140349068429968

19/07/01 17:18:17 INFO disni: new server endpoint, id 0

19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0

19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020

19/07/01 17:18:17 INFO disni: listen, id 0

19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808

19/07/01 17:18:17 INFO disni: setting up protection domain, context 100, pd 1

19/07/01 17:18:17 INFO disni: PD value 1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020

19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit 4294967296

19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize 1073741824

19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath /dev/hugepages/rdma

19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true

19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32

19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive

19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100

19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000

19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: rdma storage server started, address /192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: starting accept

19/07/01 17:18:18 INFO crail: connected to namenode(s) minnie/192.168.1.164:9060

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096



NVMf datanode is showing 1TB.

19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks 1048576





Regards,



           David



________________________________
From: David Crespi <da...@storedgesystems.com>
Sent: Monday, July 1, 2019 3:57:42 PM
To: Jonas Pfefferle; dev@crail.apache.org
Subject: RE: Setting up storage class 1 and 2

A standard pull from the repo, one that didn’t have the patches from your private repo.

I can put patches back in both the client and server containers if you really think it

would make a difference.



Are you guys running multiple types together?  I’m running a RDMA storage class 1,

a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get errors when the

RDMA is introduced into the mix.  I have a small amount of memory (4GB) assigned

with the RDMA tier, and looking for it to fall into the NVMf class 1 tier.  It appears to want

to do that, but gets screwed up… it looks like it’s trying to create another set of qp’s for

an RDMA connection.  It even blew up spdk trying to accomplish that.



Do you guys have some documentation that shows what’s been tested (mixes/variations) so far?



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Monday, July 1, 2019 12:51:09 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


Can you clarify which unpatched version you are talking about? Are you
talking about the NVMf thread fix where I send you a link to a branch in my
repository or the fix we provided earlier for the Spark hang in the Crail
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

  On Fri, 28 Jun 2019 17:59:32 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>FYI - I went back to using the unpatched version of crail on the
>clients and it appears to work
> okay now with the shuffle and RDMA, with only the RDMA containers
>running on the server.
>
> Regards,
>
>           David
>
>
> ________________________________
>From: David Crespi
> Sent: Friday, June 28, 2019 7:49:51 AM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
>
>
> Oh, and while I’m thinking about it Jonas, when I added the patches
>you provided the other day, I only
>
> added them to the spark containers (clients) not to my crail
>containers running on my storage server.
>
> Should the patches been added to all of the containers?
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Friday, June 28, 2019 12:54:27 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> At the moment, it is possible to add a NVMf datanode even if only
>the RDMA
> storage type is specified in the config. As you have seen this will
>go wrong
> as soon as a client tries to connect to the datanode. Make sure to
>start the
> RDMA datanode with the appropriate classname, see:
> https://incubator-crail.readthedocs.io/en/latest/run.html
> The correct classname is
>org.apache.crail.storage.rdma.RdmaStorageTier.
>
> Regards,
> Jonas
>
>  On Thu, 27 Jun 2019 23:09:26 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Hi,
>> I’m trying to integrate the storage classes and I’m hitting another
>>issue when running terasort and just
>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>just sits, after the following
>> message:
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>containers from the server, and I’m only running
>> the namenode and a rdma storage class 1 datanode.  My spark
>>configuration is also now only looking at
>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>port in the INFO messages seen below.
>> I must be configuring something wrong, but I’ve not been able to
>>track it down.  Any thoughts?
>>
>>
>> ************************************
>>         TeraSort
>> ************************************
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>where applicable
>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>authentication disabled; ui acls disabled; users  with view
>>permissions: Set(hduser); groups with view permissions: Set(); users
>> with modify permissions: Set(hduser); groups with modify
>>permissions: Set()
>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>default logging framework
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>-Dio.netty.eventLoopThreads: 112
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>false
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.theUnsafe: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.copyMemory: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>constructor: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>available, true
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>prior to Java9
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>java.nio.DirectByteBuffer.<init>(long, int): available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>(java.io.tmpdir)
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>(sun.arch.data.model)
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.noPreferDirect: false
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>available
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.noKeySetOptimization: false
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.selectorAutoRebuildThreshold: 512
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>org.jctools-core.MpscChunkedArrayQueue: available
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.level: simple
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.targetRecords: 4
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numHeapArenas: 9
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numDirectArenas: 10
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.pageSize: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxOrder: 11
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.chunkSize: 16777216
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.tinyCacheSize: 512
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.smallCacheSize: 256
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.normalCacheSize: 64
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.cacheTrimInterval: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.useCacheForAllThreads: true
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>(auto-detected)
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>false
>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>127.0.0.1)
>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>pooled
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.threadLocalDirectBufferSize: 65536
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 36915
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'sparkDriver' on port 36915.
>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>org.apache.spark.serializer.KryoSerializer
>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>information
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>BlockManagerMasterEndpoint up
>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>capacity 366.3 MB
>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>> 19/06/27 15:59:08 DEBUG
>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>SSLOptions{enabled=false, port=None, keyStore=None,
>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>protocol=None, enabledAlgorithms=Set()}
>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>on port 4040.
>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>started at http://192.168.1.161:4040
>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>at
>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>with timestamp 1561676348562
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>Connecting to master spark://master:7077...
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>connection to master/192.168.3.13:7077
>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>master/192.168.3.13:7077 successful, running bootstraps...
>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>bootstraps)
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>16
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>Spark cluster with app ID app-20190627155908-0005
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/0 on
>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/1 on
>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 39189
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>39189.
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/2 on
>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>master:39189
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/3 on
>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/4 on
>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO BlockManager: Using
>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>replication policy
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/0 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/3 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/4 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/1 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/2 is now RUNNING
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>master
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>is ready for scheduling beginning after reached
>>minRegisteredResourcesRatio: 0.0
>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.use.legacy.blockreader.local = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.read.shortcircuit = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.domain.socket.data.traffic = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>rpcRequestWrapperClass=class
>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>org.apache.hadoop.ipc.Client@3ed03652
>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>local reads and UNIX domain socket are disabled.
>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>not using SaslPropertiesResolver, no QOP found in configuration for
>>dfs.data.transfer.protection
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>took  115 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>without replication took  117 ms
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>locally took  6 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>broadcast_0_piece0 without replication took  6 ms
>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>newAPIHadoopFile at TeraSort.scala:60
>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>NameNode-1/192.168.3.7:54310
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>connections 1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>31ms
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>FileStatuses: 134
>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>: 2
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>by getSplits: 2, TimeTaken: 139
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>Boolean) constructor
>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>Algorithm version is 1
>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>masked=rwxr-xr-x
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>$anonfun$write$1
>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>($anonfun$write$1) is now cleaned +++
>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>SparkHadoopWriter.scala:78
>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>400
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>org.apache.spark.serializer.CrailSparkSerializer
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.outstanding 1
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.storageclass 0
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.broadcast.storageclass 0
>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>> 19/06/27 15:59:10 INFO crail: crail.user crail
>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>> 19/06/27 15:59:10 INFO crail: crail.debug true
>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>crail://192.168.1.164:9060
>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>roundrobin
>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>bufferCount 1024
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>/192.168.1.164:9060
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>DIRECTORY, storageAffinity 0, locationAffinity 0
>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>streamId 1, isDir true, writeHint 0
>> 19/06/27 15:59:10 INFO crail: passive data client
>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>139811924587312
>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 64
>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>/192.168.3.100:4420
>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>467, pd 1
>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>139810647883744
>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>64
>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>send_wr size 32, recv_wr_size 32
>> 19/06/27 15:59:10 INFO disni: connect, id 0
>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.11:35854) with ID 0
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.12:44312) with ID 1
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.8:34774) with ID 4
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.9:58808) with ID 2
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.11
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>192.168.3.11, 41919, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.12
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>192.168.3.12, 46697, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.8
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>192.168.3.8, 37281, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.9
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>192.168.3.9, 43857, None)
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.10:40100) with ID 3
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.10
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>192.168.3.10, 38527, None)
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>>
>> Regards,
>>
>>           David
>>
>


RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
A standard pull from the repo, one that didn’t have the patches from your private repo.

I can put patches back in both the client and server containers if you really think it

would make a difference.



Are you guys running multiple types together?  I’m running a RDMA storage class 1,

a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get errors when the

RDMA is introduced into the mix.  I have a small amount of memory (4GB) assigned

with the RDMA tier, and looking for it to fall into the NVMf class 1 tier.  It appears to want

to do that, but gets screwed up… it looks like it’s trying to create another set of qp’s for

an RDMA connection.  It even blew up spdk trying to accomplish that.



Do you guys have some documentation that shows what’s been tested (mixes/variations) so far?



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Monday, July 1, 2019 12:51:09 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


Can you clarify which unpatched version you are talking about? Are you
talking about the NVMf thread fix where I send you a link to a branch in my
repository or the fix we provided earlier for the Spark hang in the Crail
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

  On Fri, 28 Jun 2019 17:59:32 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>FYI - I went back to using the unpatched version of crail on the
>clients and it appears to work
> okay now with the shuffle and RDMA, with only the RDMA containers
>running on the server.
>
> Regards,
>
>           David
>
>
> ________________________________
>From: David Crespi
> Sent: Friday, June 28, 2019 7:49:51 AM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
>
>
> Oh, and while I’m thinking about it Jonas, when I added the patches
>you provided the other day, I only
>
> added them to the spark containers (clients) not to my crail
>containers running on my storage server.
>
> Should the patches been added to all of the containers?
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Friday, June 28, 2019 12:54:27 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> At the moment, it is possible to add a NVMf datanode even if only
>the RDMA
> storage type is specified in the config. As you have seen this will
>go wrong
> as soon as a client tries to connect to the datanode. Make sure to
>start the
> RDMA datanode with the appropriate classname, see:
> https://incubator-crail.readthedocs.io/en/latest/run.html
> The correct classname is
>org.apache.crail.storage.rdma.RdmaStorageTier.
>
> Regards,
> Jonas
>
>  On Thu, 27 Jun 2019 23:09:26 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Hi,
>> I’m trying to integrate the storage classes and I’m hitting another
>>issue when running terasort and just
>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>just sits, after the following
>> message:
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>containers from the server, and I’m only running
>> the namenode and a rdma storage class 1 datanode.  My spark
>>configuration is also now only looking at
>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>port in the INFO messages seen below.
>> I must be configuring something wrong, but I’ve not been able to
>>track it down.  Any thoughts?
>>
>>
>> ************************************
>>         TeraSort
>> ************************************
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>where applicable
>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>authentication disabled; ui acls disabled; users  with view
>>permissions: Set(hduser); groups with view permissions: Set(); users
>> with modify permissions: Set(hduser); groups with modify
>>permissions: Set()
>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>default logging framework
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>-Dio.netty.eventLoopThreads: 112
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>false
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.theUnsafe: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.copyMemory: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>constructor: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>available, true
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>prior to Java9
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>java.nio.DirectByteBuffer.<init>(long, int): available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>(java.io.tmpdir)
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>(sun.arch.data.model)
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.noPreferDirect: false
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>available
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.noKeySetOptimization: false
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.selectorAutoRebuildThreshold: 512
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>org.jctools-core.MpscChunkedArrayQueue: available
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.level: simple
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.targetRecords: 4
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numHeapArenas: 9
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numDirectArenas: 10
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.pageSize: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxOrder: 11
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.chunkSize: 16777216
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.tinyCacheSize: 512
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.smallCacheSize: 256
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.normalCacheSize: 64
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.cacheTrimInterval: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.useCacheForAllThreads: true
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>(auto-detected)
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>false
>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>127.0.0.1)
>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>pooled
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.threadLocalDirectBufferSize: 65536
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 36915
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'sparkDriver' on port 36915.
>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>org.apache.spark.serializer.KryoSerializer
>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>information
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>BlockManagerMasterEndpoint up
>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>capacity 366.3 MB
>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>> 19/06/27 15:59:08 DEBUG
>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>SSLOptions{enabled=false, port=None, keyStore=None,
>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>protocol=None, enabledAlgorithms=Set()}
>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>on port 4040.
>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>started at http://192.168.1.161:4040
>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>at
>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>with timestamp 1561676348562
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>Connecting to master spark://master:7077...
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>connection to master/192.168.3.13:7077
>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>master/192.168.3.13:7077 successful, running bootstraps...
>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>bootstraps)
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>16
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>Spark cluster with app ID app-20190627155908-0005
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/0 on
>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/1 on
>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 39189
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>39189.
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/2 on
>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>master:39189
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/3 on
>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/4 on
>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO BlockManager: Using
>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>replication policy
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/0 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/3 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/4 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/1 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/2 is now RUNNING
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>master
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>is ready for scheduling beginning after reached
>>minRegisteredResourcesRatio: 0.0
>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.use.legacy.blockreader.local = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.read.shortcircuit = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.domain.socket.data.traffic = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>rpcRequestWrapperClass=class
>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>org.apache.hadoop.ipc.Client@3ed03652
>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>local reads and UNIX domain socket are disabled.
>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>not using SaslPropertiesResolver, no QOP found in configuration for
>>dfs.data.transfer.protection
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>took  115 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>without replication took  117 ms
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>locally took  6 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>broadcast_0_piece0 without replication took  6 ms
>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>newAPIHadoopFile at TeraSort.scala:60
>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>NameNode-1/192.168.3.7:54310
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>connections 1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>31ms
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>FileStatuses: 134
>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>: 2
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>by getSplits: 2, TimeTaken: 139
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>Boolean) constructor
>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>Algorithm version is 1
>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>masked=rwxr-xr-x
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>$anonfun$write$1
>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>($anonfun$write$1) is now cleaned +++
>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>SparkHadoopWriter.scala:78
>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>400
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>org.apache.spark.serializer.CrailSparkSerializer
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.outstanding 1
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.storageclass 0
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.broadcast.storageclass 0
>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>> 19/06/27 15:59:10 INFO crail: crail.user crail
>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>> 19/06/27 15:59:10 INFO crail: crail.debug true
>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>crail://192.168.1.164:9060
>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>roundrobin
>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>bufferCount 1024
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>/192.168.1.164:9060
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>DIRECTORY, storageAffinity 0, locationAffinity 0
>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>streamId 1, isDir true, writeHint 0
>> 19/06/27 15:59:10 INFO crail: passive data client
>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>139811924587312
>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 64
>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>/192.168.3.100:4420
>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>467, pd 1
>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>139810647883744
>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>64
>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>send_wr size 32, recv_wr_size 32
>> 19/06/27 15:59:10 INFO disni: connect, id 0
>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.11:35854) with ID 0
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.12:44312) with ID 1
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.8:34774) with ID 4
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.9:58808) with ID 2
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.11
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>192.168.3.11, 41919, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.12
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>192.168.3.12, 46697, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.8
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>192.168.3.8, 37281, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.9
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>192.168.3.9, 43857, None)
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.10:40100) with ID 3
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.10
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>192.168.3.10, 38527, None)
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>>
>> Regards,
>>
>>           David
>>
>


Re: Setting up storage class 1 and 2

Posted by Jonas Pfefferle <pe...@japf.ch>.
Hi David,


Can you clarify which unpatched version you are talking about? Are you 
talking about the NVMf thread fix where I send you a link to a branch in my 
repository or the fix we provided earlier for the Spark hang in the Crail 
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

  On Fri, 28 Jun 2019 17:59:32 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>FYI - I went back to using the unpatched version of crail on the 
>clients and it appears to work
> okay now with the shuffle and RDMA, with only the RDMA containers 
>running on the server.
> 
> Regards,
> 
>           David
> 
> 
> ________________________________
>From: David Crespi
> Sent: Friday, June 28, 2019 7:49:51 AM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
> 
> 
> Oh, and while I’m thinking about it Jonas, when I added the patches 
>you provided the other day, I only
> 
> added them to the spark containers (clients) not to my crail 
>containers running on my storage server.
> 
> Should the patches been added to all of the containers?
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Friday, June 28, 2019 12:54:27 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
> 
> Hi David,
> 
> 
> At the moment, it is possible to add a NVMf datanode even if only 
>the RDMA
> storage type is specified in the config. As you have seen this will 
>go wrong
> as soon as a client tries to connect to the datanode. Make sure to 
>start the
> RDMA datanode with the appropriate classname, see:
> https://incubator-crail.readthedocs.io/en/latest/run.html
> The correct classname is 
>org.apache.crail.storage.rdma.RdmaStorageTier.
> 
> Regards,
> Jonas
> 
>  On Thu, 27 Jun 2019 23:09:26 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Hi,
>> I’m trying to integrate the storage classes and I’m hitting another
>>issue when running terasort and just
>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>just sits, after the following
>> message:
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>containers from the server, and I’m only running
>> the namenode and a rdma storage class 1 datanode.  My spark
>>configuration is also now only looking at
>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>port in the INFO messages seen below.
>> I must be configuring something wrong, but I’ve not been able to
>>track it down.  Any thoughts?
>>
>>
>> ************************************
>>         TeraSort
>> ************************************
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>where applicable
>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>authentication disabled; ui acls disabled; users  with view
>>permissions: Set(hduser); groups with view permissions: Set(); users
>> with modify permissions: Set(hduser); groups with modify
>>permissions: Set()
>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>default logging framework
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>-Dio.netty.eventLoopThreads: 112
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>false
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.theUnsafe: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.copyMemory: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>constructor: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>available, true
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>prior to Java9
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>java.nio.DirectByteBuffer.<init>(long, int): available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>(java.io.tmpdir)
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>(sun.arch.data.model)
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.noPreferDirect: false
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>available
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.noKeySetOptimization: false
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.selectorAutoRebuildThreshold: 512
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>org.jctools-core.MpscChunkedArrayQueue: available
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.level: simple
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.targetRecords: 4
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numHeapArenas: 9
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numDirectArenas: 10
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.pageSize: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxOrder: 11
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.chunkSize: 16777216
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.tinyCacheSize: 512
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.smallCacheSize: 256
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.normalCacheSize: 64
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.cacheTrimInterval: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.useCacheForAllThreads: true
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>(auto-detected)
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>false
>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>127.0.0.1)
>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>pooled
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.threadLocalDirectBufferSize: 65536
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 36915
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'sparkDriver' on port 36915.
>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>org.apache.spark.serializer.KryoSerializer
>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>information
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>BlockManagerMasterEndpoint up
>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>capacity 366.3 MB
>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>> 19/06/27 15:59:08 DEBUG
>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>SSLOptions{enabled=false, port=None, keyStore=None,
>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>protocol=None, enabledAlgorithms=Set()}
>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>on port 4040.
>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>started at http://192.168.1.161:4040
>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>at
>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>with timestamp 1561676348562
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>Connecting to master spark://master:7077...
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>connection to master/192.168.3.13:7077
>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>master/192.168.3.13:7077 successful, running bootstraps...
>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>bootstraps)
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>16
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>Spark cluster with app ID app-20190627155908-0005
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/0 on
>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/1 on
>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 39189
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>39189.
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/2 on
>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>master:39189
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/3 on
>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/4 on
>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO BlockManager: Using
>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>replication policy
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/0 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/3 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/4 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/1 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/2 is now RUNNING
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>master
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>is ready for scheduling beginning after reached
>>minRegisteredResourcesRatio: 0.0
>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.use.legacy.blockreader.local = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.read.shortcircuit = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.domain.socket.data.traffic = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>rpcRequestWrapperClass=class
>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>org.apache.hadoop.ipc.Client@3ed03652
>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>local reads and UNIX domain socket are disabled.
>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>not using SaslPropertiesResolver, no QOP found in configuration for
>>dfs.data.transfer.protection
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>took  115 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>without replication took  117 ms
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>locally took  6 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>broadcast_0_piece0 without replication took  6 ms
>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>newAPIHadoopFile at TeraSort.scala:60
>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>NameNode-1/192.168.3.7:54310
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>connections 1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>31ms
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>FileStatuses: 134
>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>: 2
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>by getSplits: 2, TimeTaken: 139
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>Boolean) constructor
>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>Algorithm version is 1
>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>masked=rwxr-xr-x
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>$anonfun$write$1
>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>($anonfun$write$1) is now cleaned +++
>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>SparkHadoopWriter.scala:78
>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>400
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>org.apache.spark.serializer.CrailSparkSerializer
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.outstanding 1
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.storageclass 0
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.broadcast.storageclass 0
>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>> 19/06/27 15:59:10 INFO crail: crail.user crail
>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>> 19/06/27 15:59:10 INFO crail: crail.debug true
>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>crail://192.168.1.164:9060
>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>roundrobin
>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>bufferCount 1024
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>/192.168.1.164:9060
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>DIRECTORY, storageAffinity 0, locationAffinity 0
>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>streamId 1, isDir true, writeHint 0
>> 19/06/27 15:59:10 INFO crail: passive data client
>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>139811924587312
>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 64
>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>/192.168.3.100:4420
>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>467, pd 1
>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>139810647883744
>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>64
>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>send_wr size 32, recv_wr_size 32
>> 19/06/27 15:59:10 INFO disni: connect, id 0
>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.11:35854) with ID 0
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.12:44312) with ID 1
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.8:34774) with ID 4
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.9:58808) with ID 2
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.11
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>192.168.3.11, 41919, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.12
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>192.168.3.12, 46697, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.8
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>192.168.3.8, 37281, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.9
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>192.168.3.9, 43857, None)
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.10:40100) with ID 3
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.10
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>192.168.3.10, 38527, None)
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>>
>> Regards,
>>
>>           David
>>
> 


RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Jonas,
FYI - I went back to using the unpatched version of crail on the clients and it appears to work
okay now with the shuffle and RDMA, with only the RDMA containers running on the server.

Regards,

           David


________________________________
From: David Crespi
Sent: Friday, June 28, 2019 7:49:51 AM
To: Jonas Pfefferle; dev@crail.apache.org
Subject: RE: Setting up storage class 1 and 2


Oh, and while I’m thinking about it Jonas, when I added the patches you provided the other day, I only

added them to the spark containers (clients) not to my crail containers running on my storage server.

Should the patches been added to all of the containers?



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Friday, June 28, 2019 12:54:27 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


At the moment, it is possible to add a NVMf datanode even if only the RDMA
storage type is specified in the config. As you have seen this will go wrong
as soon as a client tries to connect to the datanode. Make sure to start the
RDMA datanode with the appropriate classname, see:
https://incubator-crail.readthedocs.io/en/latest/run.html
The correct classname is org.apache.crail.storage.rdma.RdmaStorageTier.

Regards,
Jonas

  On Thu, 27 Jun 2019 23:09:26 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Hi,
> I’m trying to integrate the storage classes and I’m hitting another
>issue when running terasort and just
> using the crail-shuffle with HDFS as the tmp storage.  The program
>just sits, after the following
> message:
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>connections 0
>
> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>containers from the server, and I’m only running
> the namenode and a rdma storage class 1 datanode.  My spark
>configuration is also now only looking at
> the rdma class.  It looks as though it’s picking up the NVMf IP and
>port in the INFO messages seen below.
> I must be configuring something wrong, but I’ve not been able to
>track it down.  Any thoughts?
>
>
> ************************************
>         TeraSort
> ************************************
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>native-hadoop library for your platform... using builtin-java classes
>where applicable
> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>to:
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>to:
> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>authentication disabled; ui acls disabled; users  with view
>permissions: Set(hduser); groups with view permissions: Set(); users
> with modify permissions: Set(hduser); groups with modify
>permissions: Set()
> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>default logging framework
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>-Dio.netty.eventLoopThreads: 112
> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>false
> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>sun.misc.Unsafe.theUnsafe: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>sun.misc.Unsafe.copyMemory: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>constructor: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>available, true
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>prior to Java9
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>java.nio.DirectByteBuffer.<init>(long, int): available
> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>(java.io.tmpdir)
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>(sun.arch.data.model)
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.noPreferDirect: false
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.maxDirectMemory: 1029177344 bytes
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.uninitializedArrayAllocationThreshold: -1
> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>available
> 19/06/27 15:59:08 DEBUG NioEventLoop:
>-Dio.netty.noKeySetOptimization: false
> 19/06/27 15:59:08 DEBUG NioEventLoop:
>-Dio.netty.selectorAutoRebuildThreshold: 512
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>org.jctools-core.MpscChunkedArrayQueue: available
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>-Dio.netty.leakDetection.level: simple
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>-Dio.netty.leakDetection.targetRecords: 4
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.numHeapArenas: 9
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.numDirectArenas: 10
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.pageSize: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.maxOrder: 11
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.chunkSize: 16777216
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.tinyCacheSize: 512
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.smallCacheSize: 256
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.normalCacheSize: 64
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.cacheTrimInterval: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.useCacheForAllThreads: true
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>(auto-detected)
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>false
> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>127.0.0.1)
> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>02:42:ac:ff:fe:1b:00:02 (auto-detected)
> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>pooled
> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>-Dio.netty.threadLocalDirectBufferSize: 65536
> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>-Dio.netty.maxThreadLocalCharBufferSize: 16384
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>port: 36915
> 19/06/27 15:59:08 INFO Utils: Successfully started service
>'sparkDriver' on port 36915.
> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>org.apache.spark.serializer.KryoSerializer
> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>information
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>BlockManagerMasterEndpoint up
> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>capacity 366.3 MB
> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
> 19/06/27 15:59:08 DEBUG
>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>SSLOptions{enabled=false, port=None, keyStore=None,
>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>protocol=None, enabledAlgorithms=Set()}
> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>on port 4040.
> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>started at http://192.168.1.161:4040
> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>at
>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>with timestamp 1561676348562
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>Connecting to master spark://master:7077...
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>connection to master/192.168.3.13:7077
> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>-Dio.netty.buffer.bytebuf.checkAccessible: true
> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>master/192.168.3.13:7077 successful, running bootstraps...
> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>bootstraps)
> 19/06/27 15:59:08 DEBUG Recycler:
>-Dio.netty.recycler.maxCapacityPerThread: 32768
> 19/06/27 15:59:08 DEBUG Recycler:
>-Dio.netty.recycler.maxSharedCapacityFactor: 2
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>16
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>Spark cluster with app ID app-20190627155908-0005
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/0 on
>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/1 on
>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>port: 39189
> 19/06/27 15:59:08 INFO Utils: Successfully started service
>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>39189.
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/2 on
>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>core(s)
> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>master:39189
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/3 on
>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/4 on
>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>core(s)
> 19/06/27 15:59:08 INFO BlockManager: Using
>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>replication policy
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/0 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/3 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/4 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/1 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/2 is now RUNNING
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>master
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>is ready for scheduling beginning after reached
>minRegisteredResourcesRatio: 0.0
> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.use.legacy.blockreader.local = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.read.shortcircuit = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.domain.socket.data.traffic = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>rpcRequestWrapperClass=class
>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>org.apache.hadoop.ipc.Client@3ed03652
> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>local reads and UNIX domain socket are disabled.
> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>not using SaslPropertiesResolver, no QOP found in configuration for
>dfs.data.transfer.protection
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>values in memory (estimated size 288.9 KB, free 366.0 MB)
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>took  115 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>without replication took  117 ms
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>locally took  6 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>broadcast_0_piece0 without replication took  6 ms
> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>newAPIHadoopFile at TeraSort.scala:60
> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
> 19/06/27 15:59:10 DEBUG Client: Connecting to
>NameNode-1/192.168.3.7:54310
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>connections 1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #0
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #0
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>31ms
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #1
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>FileStatuses: 134
> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>: 2
> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>by getSplits: 2, TimeTaken: 139
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>Boolean) constructor
> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>Algorithm version is 1
> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>masked=rwxr-xr-x
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #2
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #2
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>$anonfun$write$1
> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>($anonfun$write$1) is now cleaned +++
> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>SparkHadoopWriter.scala:78
> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>400
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>true
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>org.apache.spark.serializer.CrailSparkSerializer
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>true
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.shuffle.outstanding 1
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.shuffle.storageclass 0
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.broadcast.storageclass 0
> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
> 19/06/27 15:59:10 INFO crail: crail.version 3101
> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
> 19/06/27 15:59:10 INFO crail: crail.user crail
> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
> 19/06/27 15:59:10 INFO crail: crail.debug true
> 19/06/27 15:59:10 INFO crail: crail.statistics true
> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
> 19/06/27 15:59:10 INFO crail: crail.singleton true
> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>org.apache.crail.memory.MappedBufferCache
> 19/06/27 15:59:10 INFO crail: crail.locationmap
> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>crail://192.168.1.164:9060
> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>roundrobin
> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 19/06/27 15:59:10 INFO crail: crail.namenode.log
> 19/06/27 15:59:10 INFO crail: crail.storage.types
>org.apache.crail.storage.rdma.RdmaStorageTier
> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>bufferCount 1024
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>4294967296
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>1073741824
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>/dev/hugepages/rdma
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>queueDepth 32, messageSize 512, nodealy true
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>/192.168.1.164:9060
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>DIRECTORY, storageAffinity 0, locationAffinity 0
> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>streamId 1, isDir true, writeHint 0
> 19/06/27 15:59:10 INFO crail: passive data client
> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>size 28, native size 16
> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>native size 32
> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>72, native size 128
> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>native size 48
> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>native size 16
> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>40, native size 40
> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>native size 48
> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>139811924587312
> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>maxSge 4, cqSize 64
> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>/192.168.3.100:4420
> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>467, pd 1
> 19/06/27 15:59:10 INFO disni: setting up cq processor
> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>139810647883744
> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>64
> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>send_wr size 32, recv_wr_size 32
> 19/06/27 15:59:10 INFO disni: connect, id 0
> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.11:35854) with ID 0
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.12:44312) with ID 1
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.8:34774) with ID 4
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.9:58808) with ID 2
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.11
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>192.168.3.11, 41919, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.12
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>192.168.3.12, 46697, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.8
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>192.168.3.8, 37281, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.9
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>192.168.3.9, 43857, None)
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.10:40100) with ID 3
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.10
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>192.168.3.10, 38527, None)
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>connections 0
>
>
> Regards,
>
>           David
>



RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Oh, and while I’m thinking about it Jonas, when I added the patches you provided the other day, I only

added them to the spark containers (clients) not to my crail containers running on my storage server.

Should the patches been added to all of the containers?



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Friday, June 28, 2019 12:54:27 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


At the moment, it is possible to add a NVMf datanode even if only the RDMA
storage type is specified in the config. As you have seen this will go wrong
as soon as a client tries to connect to the datanode. Make sure to start the
RDMA datanode with the appropriate classname, see:
https://incubator-crail.readthedocs.io/en/latest/run.html
The correct classname is org.apache.crail.storage.rdma.RdmaStorageTier.

Regards,
Jonas

  On Thu, 27 Jun 2019 23:09:26 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Hi,
> I’m trying to integrate the storage classes and I’m hitting another
>issue when running terasort and just
> using the crail-shuffle with HDFS as the tmp storage.  The program
>just sits, after the following
> message:
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>connections 0
>
> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>containers from the server, and I’m only running
> the namenode and a rdma storage class 1 datanode.  My spark
>configuration is also now only looking at
> the rdma class.  It looks as though it’s picking up the NVMf IP and
>port in the INFO messages seen below.
> I must be configuring something wrong, but I’ve not been able to
>track it down.  Any thoughts?
>
>
> ************************************
>         TeraSort
> ************************************
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>native-hadoop library for your platform... using builtin-java classes
>where applicable
> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>to:
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>to:
> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>authentication disabled; ui acls disabled; users  with view
>permissions: Set(hduser); groups with view permissions: Set(); users
> with modify permissions: Set(hduser); groups with modify
>permissions: Set()
> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>default logging framework
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>-Dio.netty.eventLoopThreads: 112
> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>false
> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>sun.misc.Unsafe.theUnsafe: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>sun.misc.Unsafe.copyMemory: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>constructor: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>available, true
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>prior to Java9
> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>java.nio.DirectByteBuffer.<init>(long, int): available
> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>(java.io.tmpdir)
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>(sun.arch.data.model)
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.noPreferDirect: false
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.maxDirectMemory: 1029177344 bytes
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>-Dio.netty.uninitializedArrayAllocationThreshold: -1
> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>available
> 19/06/27 15:59:08 DEBUG NioEventLoop:
>-Dio.netty.noKeySetOptimization: false
> 19/06/27 15:59:08 DEBUG NioEventLoop:
>-Dio.netty.selectorAutoRebuildThreshold: 512
> 19/06/27 15:59:08 DEBUG PlatformDependent:
>org.jctools-core.MpscChunkedArrayQueue: available
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>-Dio.netty.leakDetection.level: simple
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>-Dio.netty.leakDetection.targetRecords: 4
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.numHeapArenas: 9
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.numDirectArenas: 10
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.pageSize: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.maxOrder: 11
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.chunkSize: 16777216
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.tinyCacheSize: 512
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.smallCacheSize: 256
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.normalCacheSize: 64
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.cacheTrimInterval: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>-Dio.netty.allocator.useCacheForAllThreads: true
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>(auto-detected)
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>false
> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>127.0.0.1)
> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>02:42:ac:ff:fe:1b:00:02 (auto-detected)
> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>pooled
> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>-Dio.netty.threadLocalDirectBufferSize: 65536
> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>-Dio.netty.maxThreadLocalCharBufferSize: 16384
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>port: 36915
> 19/06/27 15:59:08 INFO Utils: Successfully started service
>'sparkDriver' on port 36915.
> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>org.apache.spark.serializer.KryoSerializer
> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>information
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>BlockManagerMasterEndpoint up
> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>capacity 366.3 MB
> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
> 19/06/27 15:59:08 DEBUG
>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>SSLOptions{enabled=false, port=None, keyStore=None,
>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>protocol=None, enabledAlgorithms=Set()}
> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>on port 4040.
> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>started at http://192.168.1.161:4040
> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>at
>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>with timestamp 1561676348562
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>Connecting to master spark://master:7077...
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>connection to master/192.168.3.13:7077
> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>-Dio.netty.buffer.bytebuf.checkAccessible: true
> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>master/192.168.3.13:7077 successful, running bootstraps...
> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>bootstraps)
> 19/06/27 15:59:08 DEBUG Recycler:
>-Dio.netty.recycler.maxCapacityPerThread: 32768
> 19/06/27 15:59:08 DEBUG Recycler:
>-Dio.netty.recycler.maxSharedCapacityFactor: 2
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>16
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>Spark cluster with app ID app-20190627155908-0005
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/0 on
>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/1 on
>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>port: 39189
> 19/06/27 15:59:08 INFO Utils: Successfully started service
>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>39189.
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/2 on
>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>core(s)
> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>master:39189
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/3 on
>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>added: app-20190627155908-0005/4 on
>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>core(s)
> 19/06/27 15:59:08 INFO BlockManager: Using
>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>replication policy
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/0 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/3 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/4 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/1 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>updated: app-20190627155908-0005/2 is now RUNNING
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>master
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>is ready for scheduling beginning after reached
>minRegisteredResourcesRatio: 0.0
> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.use.legacy.blockreader.local = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.read.shortcircuit = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>dfs.client.domain.socket.data.traffic = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>rpcRequestWrapperClass=class
>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>org.apache.hadoop.ipc.Client@3ed03652
> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>local reads and UNIX domain socket are disabled.
> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>not using SaslPropertiesResolver, no QOP found in configuration for
>dfs.data.transfer.protection
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>values in memory (estimated size 288.9 KB, free 366.0 MB)
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>took  115 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>without replication took  117 ms
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>locally took  6 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>broadcast_0_piece0 without replication took  6 ms
> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>newAPIHadoopFile at TeraSort.scala:60
> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
> 19/06/27 15:59:10 DEBUG Client: Connecting to
>NameNode-1/192.168.3.7:54310
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>connections 1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #0
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #0
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>31ms
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #1
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>FileStatuses: 134
> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>: 2
> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>by getSplits: 2, TimeTaken: 139
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>Boolean) constructor
> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>Algorithm version is 1
> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>masked=rwxr-xr-x
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser sending #2
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser got value #2
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>$anonfun$write$1
> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>($anonfun$write$1) is now cleaned +++
> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>SparkHadoopWriter.scala:78
> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>400
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>true
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>org.apache.spark.serializer.CrailSparkSerializer
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>true
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.shuffle.outstanding 1
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.shuffle.storageclass 0
> 19/06/27 15:59:10 INFO CrailDispatcher:
>spark.crail.broadcast.storageclass 0
> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
> 19/06/27 15:59:10 INFO crail: crail.version 3101
> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
> 19/06/27 15:59:10 INFO crail: crail.user crail
> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
> 19/06/27 15:59:10 INFO crail: crail.debug true
> 19/06/27 15:59:10 INFO crail: crail.statistics true
> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
> 19/06/27 15:59:10 INFO crail: crail.singleton true
> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>org.apache.crail.memory.MappedBufferCache
> 19/06/27 15:59:10 INFO crail: crail.locationmap
> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>crail://192.168.1.164:9060
> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>roundrobin
> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 19/06/27 15:59:10 INFO crail: crail.namenode.log
> 19/06/27 15:59:10 INFO crail: crail.storage.types
>org.apache.crail.storage.rdma.RdmaStorageTier
> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>bufferCount 1024
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>4294967296
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>1073741824
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>/dev/hugepages/rdma
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>queueDepth 32, messageSize 512, nodealy true
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>/192.168.1.164:9060
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>DIRECTORY, storageAffinity 0, locationAffinity 0
> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>streamId 1, isDir true, writeHint 0
> 19/06/27 15:59:10 INFO crail: passive data client
> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>size 28, native size 16
> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>native size 32
> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>72, native size 128
> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>native size 48
> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>native size 16
> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>40, native size 40
> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>native size 48
> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>139811924587312
> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>maxSge 4, cqSize 64
> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>/192.168.3.100:4420
> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>467, pd 1
> 19/06/27 15:59:10 INFO disni: setting up cq processor
> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>139810647883744
> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>64
> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>send_wr size 32, recv_wr_size 32
> 19/06/27 15:59:10 INFO disni: connect, id 0
> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.11:35854) with ID 0
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.12:44312) with ID 1
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.8:34774) with ID 4
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.9:58808) with ID 2
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.11
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>192.168.3.11, 41919, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.12
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>192.168.3.12, 46697, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.8
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>192.168.3.8, 37281, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.9
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>192.168.3.9, 43857, None)
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>(192.168.3.10:40100) with ID 3
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>192.168.3.10
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>192.168.3.10, 38527, None)
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>connections 0
>
>
> Regards,
>
>           David
>



Re: Setting up storage class 1 and 2

Posted by Jonas Pfefferle <pe...@japf.ch>.
Hi David,


At the moment, it is possible to add a NVMf datanode even if only the RDMA 
storage type is specified in the config. As you have seen this will go wrong 
as soon as a client tries to connect to the datanode. Make sure to start the 
RDMA datanode with the appropriate classname, see: 
https://incubator-crail.readthedocs.io/en/latest/run.html
The correct classname is org.apache.crail.storage.rdma.RdmaStorageTier.

Regards,
Jonas

  On Thu, 27 Jun 2019 23:09:26 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Hi,
> I’m trying to integrate the storage classes and I’m hitting another 
>issue when running terasort and just
> using the crail-shuffle with HDFS as the tmp storage.  The program 
>just sits, after the following
> message:
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining 
>connections 0
> 
> During this run, I’ve removed the two crail nvmf (class 1 and 2) 
>containers from the server, and I’m only running
> the namenode and a rdma storage class 1 datanode.  My spark 
>configuration is also now only looking at
> the rdma class.  It looks as though it’s picking up the NVMf IP and 
>port in the INFO messages seen below.
> I must be configuring something wrong, but I’ve not been able to 
>track it down.  Any thoughts?
> 
> 
> ************************************
>         TeraSort
> ************************************
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load 
>native-hadoop library for your platform... using builtin-java classes 
>where applicable
> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to: 
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to: 
>hduser
> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups 
>to:
> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups 
>to:
> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager: 
>authentication disabled; ui acls disabled; users  with view 
>permissions: Set(hduser); groups with view permissions: Set(); users 
> with modify permissions: Set(hduser); groups with modify 
>permissions: Set()
> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the 
>default logging framework
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap: 
>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap: 
>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup: 
>-Dio.netty.eventLoopThreads: 112
> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe: 
>false
> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
> 19/06/27 15:59:08 DEBUG PlatformDependent0: 
>sun.misc.Unsafe.theUnsafe: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: 
>sun.misc.Unsafe.copyMemory: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address: 
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer 
>constructor: available
> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned: 
>available, true
> 19/06/27 15:59:08 DEBUG PlatformDependent0: 
>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable 
>prior to Java9
> 19/06/27 15:59:08 DEBUG PlatformDependent0: 
>java.nio.DirectByteBuffer.<init>(long, int): available
> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe: 
>available
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp 
>(java.io.tmpdir)
> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64 
>(sun.arch.data.model)
> 19/06/27 15:59:08 DEBUG PlatformDependent: 
>-Dio.netty.noPreferDirect: false
> 19/06/27 15:59:08 DEBUG PlatformDependent: 
>-Dio.netty.maxDirectMemory: 1029177344 bytes
> 19/06/27 15:59:08 DEBUG PlatformDependent: 
>-Dio.netty.uninitializedArrayAllocationThreshold: -1
> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner(): 
>available
> 19/06/27 15:59:08 DEBUG NioEventLoop: 
>-Dio.netty.noKeySetOptimization: false
> 19/06/27 15:59:08 DEBUG NioEventLoop: 
>-Dio.netty.selectorAutoRebuildThreshold: 512
> 19/06/27 15:59:08 DEBUG PlatformDependent: 
>org.jctools-core.MpscChunkedArrayQueue: available
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector: 
>-Dio.netty.leakDetection.level: simple
> 19/06/27 15:59:08 DEBUG ResourceLeakDetector: 
>-Dio.netty.leakDetection.targetRecords: 4
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.numHeapArenas: 9
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.numDirectArenas: 10
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.pageSize: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.maxOrder: 11
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.chunkSize: 16777216
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.tinyCacheSize: 512
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.smallCacheSize: 256
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.normalCacheSize: 64
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.cacheTrimInterval: 8192
> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: 
>-Dio.netty.allocator.useCacheForAllThreads: true
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236 
>(auto-detected)
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses: 
>false
> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo, 
>127.0.0.1)
> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId: 
>02:42:ac:ff:fe:1b:00:02 (auto-detected)
> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type: 
>pooled
> 19/06/27 15:59:08 DEBUG ByteBufUtil: 
>-Dio.netty.threadLocalDirectBufferSize: 65536
> 19/06/27 15:59:08 DEBUG ByteBufUtil: 
>-Dio.netty.maxThreadLocalCharBufferSize: 16384
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on 
>port: 36915
> 19/06/27 15:59:08 INFO Utils: Successfully started service 
>'sparkDriver' on port 36915.
> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class 
>org.apache.spark.serializer.KryoSerializer
> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using 
>org.apache.spark.storage.DefaultTopologyMapper for getting topology 
>information
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: 
>BlockManagerMasterEndpoint up
> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at 
>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with 
>capacity 366.3 MB
> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
> 19/06/27 15:59:08 DEBUG 
>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui: 
>SSLOptions{enabled=false, port=None, keyStore=None, 
>keyStorePassword=None, trustStore=None, trustStorePassword=None, 
>protocol=None, enabledAlgorithms=Set()}
> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI' 
>on port 4040.
> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and 
>started at http://192.168.1.161:4040
> 19/06/27 15:59:08 INFO SparkContext: Added JAR 
>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 
>at 
>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 
>with timestamp 1561676348562
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: 
>Connecting to master spark://master:7077...
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new 
>connection to master/192.168.3.13:7077
> 19/06/27 15:59:08 DEBUG AbstractByteBuf: 
>-Dio.netty.buffer.bytebuf.checkAccessible: true
> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default 
>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to 
>master/192.168.3.13:7077 successful, running bootstraps...
> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created 
>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in 
>bootstraps)
> 19/06/27 15:59:08 DEBUG Recycler: 
>-Dio.netty.recycler.maxCapacityPerThread: 32768
> 19/06/27 15:59:08 DEBUG Recycler: 
>-Dio.netty.recycler.maxSharedCapacityFactor: 2
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: 
>16
> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to 
>Spark cluster with app ID app-20190627155908-0005
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>added: app-20190627155908-0005/0 on 
>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2 
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor 
>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2 
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>added: app-20190627155908-0005/1 on 
>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2 
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor 
>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2 
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on 
>port: 39189
> 19/06/27 15:59:08 INFO Utils: Successfully started service 
>'org.apache.spark.network.netty.NettyBlockTransferService' on port 
>39189.
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>added: app-20190627155908-0005/2 on 
>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2 
>core(s)
> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on 
>master:39189
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor 
>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2 
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>added: app-20190627155908-0005/3 on 
>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2 
>core(s)
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor 
>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2 
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>added: app-20190627155908-0005/4 on 
>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2 
>core(s)
> 19/06/27 15:59:08 INFO BlockManager: Using 
>org.apache.spark.storage.RandomBlockReplicationPolicy for block 
>replication policy
> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor 
>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2 
>core(s), 1024.0 MB RAM
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>updated: app-20190627155908-0005/0 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>updated: app-20190627155908-0005/3 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>updated: app-20190627155908-0005/4 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>updated: app-20190627155908-0005/1 is now RUNNING
> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor 
>updated: app-20190627155908-0005/2 is now RUNNING
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager 
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for 
>master
> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block 
>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver, 
>master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager 
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager: 
>BlockManagerId(driver, master, 39189, None)
> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend 
>is ready for scheduling beginning after reached 
>minRegisteredResourcesRatio: 0.0
> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
> 19/06/27 15:59:09 DEBUG BlockReaderLocal: 
>dfs.client.use.legacy.blockreader.local = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal: 
>dfs.client.read.shortcircuit = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal: 
>dfs.client.domain.socket.data.traffic = false
> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER, 
>rpcRequestWrapperClass=class 
>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, 
>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
> 19/06/27 15:59:09 DEBUG Client: getting client out of cache: 
>org.apache.hadoop.ipc.Client@3ed03652
> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit 
>local reads and UNIX domain socket are disabled.
> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol 
>not using SaslPropertiesResolver, no QOP found in configuration for 
>dfs.data.transfer.protection
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as 
>values in memory (estimated size 288.9 KB, free 366.0 MB)
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally 
>took  115 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0 
>without replication took  117 ms
> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored 
>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in 
>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block 
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block 
>broadcast_0_piece0
> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0 
>locally took  6 ms
> 19/06/27 15:59:10 DEBUG BlockManager: Putting block 
>broadcast_0_piece0 without replication took  6 ms
> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from 
>newAPIHadoopFile at TeraSort.scala:60
> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
> 19/06/27 15:59:10 DEBUG Client: Connecting to 
>NameNode-1/192.168.3.7:54310
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser: starting, having 
>connections 1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser sending #0
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser got value #0
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took 
>31ms
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser sending #1
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser got value #1
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get 
>FileStatuses: 134
> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process 
>: 2
> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated 
>by getSplits: 2, TimeTaken: 139
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer 
>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1; 
>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String, 
>Boolean) constructor
> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer 
>Algorithm version is 1
> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0: 
>masked=rwxr-xr-x
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser sending #2
> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser got value #2
> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda: 
>$anonfun$write$1
> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure 
>($anonfun$write$1) is now cleaned +++
> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at 
>SparkHadoopWriter.scala:78
> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version 
>400
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose 
>false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart 
>true
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer 
>org.apache.spark.serializer.CrailSparkSerializer
> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity 
>true
> 19/06/27 15:59:10 INFO CrailDispatcher: 
>spark.crail.shuffle.outstanding 1
> 19/06/27 15:59:10 INFO CrailDispatcher: 
>spark.crail.shuffle.storageclass 0
> 19/06/27 15:59:10 INFO CrailDispatcher: 
>spark.crail.broadcast.storageclass 0
> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
> 19/06/27 15:59:10 INFO crail: crail.version 3101
> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
> 19/06/27 15:59:10 INFO crail: crail.user crail
> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
> 19/06/27 15:59:10 INFO crail: crail.debug true
> 19/06/27 15:59:10 INFO crail: crail.statistics true
> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
> 19/06/27 15:59:10 INFO crail: crail.singleton true
> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
> 19/06/27 15:59:10 INFO crail: crail.cacheimpl 
>org.apache.crail.memory.MappedBufferCache
> 19/06/27 15:59:10 INFO crail: crail.locationmap
> 19/06/27 15:59:10 INFO crail: crail.namenode.address 
>crail://192.168.1.164:9060
> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection 
>roundrobin
> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype 
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 19/06/27 15:59:10 INFO crail: crail.namenode.log
> 19/06/27 15:59:10 INFO crail: crail.storage.types 
>org.apache.crail.storage.rdma.RdmaStorageTier
> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0, 
>bufferCount 1024
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit 
>4294967296
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize 
>1073741824
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath 
>/dev/hugepages/rdma
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0, 
>queueDepth 32, messageSize 512, nodealy true
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
> 19/06/27 15:59:10 INFO crail: connected to namenode(s) 
>/192.168.1.164:9060
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type 
>DIRECTORY, storageAffinity 0, locationAffinity 0
> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0, 
>streamId 1, isDir true, writeHint 0
> 19/06/27 15:59:10 INFO crail: passive data client
> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs 
>size 28, native size 16
> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32, 
>native size 32
> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size 
>72, native size 128
> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48, 
>native size 48
> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16, 
>native size 16
> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size 
>40, native size 40
> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48, 
>native size 48
> 19/06/27 15:59:10 INFO disni: createEventChannel, objId 
>139811924587312
> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32, 
>maxSge 4, cqSize 64
> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
> 19/06/27 15:59:10 INFO disni: resolveAddr, addres 
>/192.168.3.100:4420
> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
> 19/06/27 15:59:10 INFO disni: setting up protection domain, context 
>467, pd 1
> 19/06/27 15:59:10 INFO disni: setting up cq processor
> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
> 19/06/27 15:59:10 INFO disni: createCompChannel, context 
>139810647883744
> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe 
>64
> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192, 
>send_wr size 32, recv_wr_size 32
> 19/06/27 15:59:10 INFO disni: connect, id 0
> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress 
>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 
>Registered executor NettyRpcEndpointRef(spark-client://Executor) 
>(192.168.3.11:35854) with ID 0
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 
>Registered executor NettyRpcEndpointRef(spark-client://Executor) 
>(192.168.3.12:44312) with ID 1
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 
>Registered executor NettyRpcEndpointRef(spark-client://Executor) 
>(192.168.3.8:34774) with ID 4
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 
>Registered executor NettyRpcEndpointRef(spark-client://Executor) 
>(192.168.3.9:58808) with ID 2
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 
>192.168.3.11
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block 
>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0, 
>192.168.3.11, 41919, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 
>192.168.3.12
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block 
>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1, 
>192.168.3.12, 46697, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 
>192.168.3.8
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block 
>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4, 
>192.168.3.8, 37281, None)
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 
>192.168.3.9
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block 
>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2, 
>192.168.3.9, 43857, None)
> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 
>Registered executor NettyRpcEndpointRef(spark-client://Executor) 
>(192.168.3.10:40100) with ID 3
> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for 
>192.168.3.10
> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block 
>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3, 
>192.168.3.10, 38527, None)
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser: closed
> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection 
>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining 
>connections 0
> 
> 
> Regards,
> 
>           David
>