You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crail.apache.org by Jonas Pfefferle <pe...@japf.ch> on 2019/07/01 07:51:09 UTC

Re: Setting up storage class 1 and 2

Hi David,


Can you clarify which unpatched version you are talking about? Are you 
talking about the NVMf thread fix where I send you a link to a branch in my 
repository or the fix we provided earlier for the Spark hang in the Crail 
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

  On Fri, 28 Jun 2019 17:59:32 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>FYI - I went back to using the unpatched version of crail on the 
>clients and it appears to work
> okay now with the shuffle and RDMA, with only the RDMA containers 
>running on the server.
> 
> Regards,
> 
>           David
> 
> 
> ________________________________
>From: David Crespi
> Sent: Friday, June 28, 2019 7:49:51 AM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
> 
> 
> Oh, and while I’m thinking about it Jonas, when I added the patches 
>you provided the other day, I only
> 
> added them to the spark containers (clients) not to my crail 
>containers running on my storage server.
> 
> Should the patches been added to all of the containers?
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Friday, June 28, 2019 12:54:27 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
> 
> Hi David,
> 
> 
> At the moment, it is possible to add a NVMf datanode even if only 
>the RDMA
> storage type is specified in the config. As you have seen this will 
>go wrong
> as soon as a client tries to connect to the datanode. Make sure to 
>start the
> RDMA datanode with the appropriate classname, see:
> https://incubator-crail.readthedocs.io/en/latest/run.html
> The correct classname is 
>org.apache.crail.storage.rdma.RdmaStorageTier.
> 
> Regards,
> Jonas
> 
>  On Thu, 27 Jun 2019 23:09:26 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Hi,
>> I’m trying to integrate the storage classes and I’m hitting another
>>issue when running terasort and just
>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>just sits, after the following
>> message:
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>containers from the server, and I’m only running
>> the namenode and a rdma storage class 1 datanode.  My spark
>>configuration is also now only looking at
>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>port in the INFO messages seen below.
>> I must be configuring something wrong, but I’ve not been able to
>>track it down.  Any thoughts?
>>
>>
>> ************************************
>>         TeraSort
>> ************************************
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>where applicable
>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>authentication disabled; ui acls disabled; users  with view
>>permissions: Set(hduser); groups with view permissions: Set(); users
>> with modify permissions: Set(hduser); groups with modify
>>permissions: Set()
>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>default logging framework
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>-Dio.netty.eventLoopThreads: 112
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>false
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.theUnsafe: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.copyMemory: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>constructor: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>available, true
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>prior to Java9
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>java.nio.DirectByteBuffer.<init>(long, int): available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>(java.io.tmpdir)
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>(sun.arch.data.model)
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.noPreferDirect: false
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>available
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.noKeySetOptimization: false
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.selectorAutoRebuildThreshold: 512
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>org.jctools-core.MpscChunkedArrayQueue: available
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.level: simple
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.targetRecords: 4
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numHeapArenas: 9
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numDirectArenas: 10
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.pageSize: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxOrder: 11
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.chunkSize: 16777216
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.tinyCacheSize: 512
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.smallCacheSize: 256
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.normalCacheSize: 64
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.cacheTrimInterval: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.useCacheForAllThreads: true
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>(auto-detected)
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>false
>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>127.0.0.1)
>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>pooled
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.threadLocalDirectBufferSize: 65536
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 36915
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'sparkDriver' on port 36915.
>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>org.apache.spark.serializer.KryoSerializer
>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>information
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>BlockManagerMasterEndpoint up
>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>capacity 366.3 MB
>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>> 19/06/27 15:59:08 DEBUG
>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>SSLOptions{enabled=false, port=None, keyStore=None,
>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>protocol=None, enabledAlgorithms=Set()}
>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>on port 4040.
>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>started at http://192.168.1.161:4040
>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>at
>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>with timestamp 1561676348562
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>Connecting to master spark://master:7077...
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>connection to master/192.168.3.13:7077
>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>master/192.168.3.13:7077 successful, running bootstraps...
>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>bootstraps)
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>16
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>Spark cluster with app ID app-20190627155908-0005
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/0 on
>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/1 on
>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 39189
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>39189.
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/2 on
>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>master:39189
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/3 on
>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/4 on
>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO BlockManager: Using
>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>replication policy
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/0 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/3 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/4 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/1 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/2 is now RUNNING
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>master
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>is ready for scheduling beginning after reached
>>minRegisteredResourcesRatio: 0.0
>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.use.legacy.blockreader.local = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.read.shortcircuit = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.domain.socket.data.traffic = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>rpcRequestWrapperClass=class
>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>org.apache.hadoop.ipc.Client@3ed03652
>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>local reads and UNIX domain socket are disabled.
>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>not using SaslPropertiesResolver, no QOP found in configuration for
>>dfs.data.transfer.protection
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>took  115 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>without replication took  117 ms
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>locally took  6 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>broadcast_0_piece0 without replication took  6 ms
>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>newAPIHadoopFile at TeraSort.scala:60
>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>NameNode-1/192.168.3.7:54310
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>connections 1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>31ms
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>FileStatuses: 134
>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>: 2
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>by getSplits: 2, TimeTaken: 139
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>Boolean) constructor
>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>Algorithm version is 1
>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>masked=rwxr-xr-x
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>$anonfun$write$1
>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>($anonfun$write$1) is now cleaned +++
>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>SparkHadoopWriter.scala:78
>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>400
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>org.apache.spark.serializer.CrailSparkSerializer
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.outstanding 1
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.storageclass 0
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.broadcast.storageclass 0
>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>> 19/06/27 15:59:10 INFO crail: crail.user crail
>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>> 19/06/27 15:59:10 INFO crail: crail.debug true
>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>crail://192.168.1.164:9060
>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>roundrobin
>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>bufferCount 1024
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>/192.168.1.164:9060
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>DIRECTORY, storageAffinity 0, locationAffinity 0
>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>streamId 1, isDir true, writeHint 0
>> 19/06/27 15:59:10 INFO crail: passive data client
>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>139811924587312
>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 64
>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>/192.168.3.100:4420
>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>467, pd 1
>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>139810647883744
>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>64
>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>send_wr size 32, recv_wr_size 32
>> 19/06/27 15:59:10 INFO disni: connect, id 0
>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.11:35854) with ID 0
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.12:44312) with ID 1
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.8:34774) with ID 4
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.9:58808) with ID 2
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.11
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>192.168.3.11, 41919, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.12
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>192.168.3.12, 46697, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.8
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>192.168.3.8, 37281, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.9
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>192.168.3.9, 43857, None)
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.10:40100) with ID 3
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.10
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>192.168.3.10, 38527, None)
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>>
>> Regards,
>>
>>           David
>>
> 


RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
I can do a docker shared volume for the config file. I had it originally

set up that way, but changed it and then just added the file to the image.

I’ll play around with that this morning. Thanks for the info!



Regards,



           David



________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Tuesday, July 2, 2019 7:01:46 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

I typically do use the start-crail.sh script. Then you have to put all the
command line arguments in the slaves file.


The configuration files need to be identical. In our configuration we put
the conf file on a NFS share this way we don't have to bother with it being
synchronized between the nodes.

Regards,
Jonas

  On Tue, 2 Jul 2019 13:48:31 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Thanks for the info Jonas.
>
> Quick question… do you typically start the datanodes from the
>namenode using the command line?
>
> I’ve been launching containers independently of the namenode.  The
>containers do have the same
>
> base configuration file, but I pass in behaviors via environment
>variables.
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Tuesday, July 2, 2019 4:27:05 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> We run a great mix of configurations of NVMf and RDMA storage tiers
>with
> different storage classes, e.g. 3 storage classes where a group of
>NVMf
> datanodes is 0, another group of NVMf server is 1 and the RDMA
>datanodes are
> storage class 2. So this should work. I understand that the setup
>might be a
> bit tricky in the beginning.
>
> From your logs I see that you do not use the same configuration file
>for
> all containers. It is crucial that e.g. the order of storage types
>etc is
> the same in all configuration files. They have to be identical. To
>specify a
> storage class for a datanode you need to append "-c 1" (storage
>class 1)
> when starting the datanode. You can find the details of how exactly
>this
> works here:
>https://incubator-crail.readthedocs.io/en/latest/run.html
> The last example in "Starting Crail manually" talks about this.
>
> Regarding the patched version, I have to take another look. Please
>use the
> Apache Crail master for now (It will hang with Spark at the end of
>your job
> but it should run through).
>
> Regards,
> Jonas
>
>  On Tue, 2 Jul 2019 00:27:33 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>
>> Just wanted to be sure I’m doing things correctly.  It runs okay
>>without adding in the NVMf datanode (i.e.
>>
>> completes teragen).  When I add the NVMf node in, even without using
>>it on the run, it hangs during the
>>
>> terasort, with nothing being written to the datanode – only the
>>metadata is created (i.e. /spark).
>>
>>
>> My config is:
>>
>> 1 namenode container
>>
>> 1 rdma datanode storage class 1 container
>>
>> 1 nvmf datanode storage class 1 container.
>>
>>
>> The namenode is showing that both datanode are starting up as
>>
>> Type 0 to storage class 0… is that correct?
>>
>>
>> NameNode log at startup:
>>
>> 19/07/01 17:18:16 INFO crail: initalizing namenode
>>
>> 19/07/01 17:18:16 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:16 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:16 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:16 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:16 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:16 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.address
>>crail://minnie:9060?id=0&size=1
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.types
>>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true, cores 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of
>>type 0 to storage class 0
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020
>>of type 0 to storage class 0
>>
>>
>> The RDMA datanode – it is set to have 4x1GB hugepages:
>>
>> 19/07/01 17:18:17 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:17 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:17 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:17 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:17 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:17 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.address
>>crail://minnie:9060
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
>>
>> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
>>
>> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>>
>> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>>
>> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>>
>> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>>
>> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>>
>> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: createEventChannel, objId
>>140349068383088
>>
>> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 3200
>>
>> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
>>
>> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
>>
>> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
>>
>> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
>>
>> 19/07/01 17:18:17 INFO disni: listen, id 0
>>
>> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
>>
>> 19/07/01 17:18:17 INFO disni: setting up protection domain, context
>>100, pd 1
>>
>> 19/07/01 17:18:17 INFO disni: PD value 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
>>
>> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: rdma storage server started, address
>>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize
>>3200
>>
>> 19/07/01 17:18:17 INFO disni: starting accept
>>
>> 19/07/01 17:18:18 INFO crail: connected to namenode(s)
>>minnie/192.168.1.164:9060
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>>
>> NVMf datanode is showing 1TB.
>>
>> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks
>>1048576
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi <da...@storedgesystems.com>
>> Sent: Monday, July 1, 2019 3:57:42 PM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>> A standard pull from the repo, one that didn’t have the patches from
>>your private repo.
>>
>> I can put patches back in both the client and server containers if
>>you really think it
>>
>> would make a difference.
>>
>>
>> Are you guys running multiple types together?  I’m running a RDMA
>>storage class 1,
>>
>> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get
>>errors when the
>>
>> RDMA is introduced into the mix.  I have a small amount of memory
>>(4GB) assigned
>>
>> with the RDMA tier, and looking for it to fall into the NVMf class 1
>>tier.  It appears to want
>>
>> to do that, but gets screwed up… it looks like it’s trying to create
>>another set of qp’s for
>>
>> an RDMA connection.  It even blew up spdk trying to accomplish that.
>>
>>
>> Do you guys have some documentation that shows what’s been tested
>>(mixes/variations) so far?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Monday, July 1, 2019 12:51:09 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> Can you clarify which unpatched version you are talking about? Are
>>you
>> talking about the NVMf thread fix where I send you a link to a
>>branch in my
>> repository or the fix we provided earlier for the Spark hang in the
>>Crail
>> master?
>>
>> Generally, if you update, update all: clients and datanode/namenode.
>>
>> Regards,
>> Jonas
>>
>>  On Fri, 28 Jun 2019 17:59:32 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Jonas,
>>>FYI - I went back to using the unpatched version of crail on the
>>>clients and it appears to work
>>> okay now with the shuffle and RDMA, with only the RDMA containers
>>>running on the server.
>>>
>>> Regards,
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: David Crespi
>>> Sent: Friday, June 28, 2019 7:49:51 AM
>>> To: Jonas Pfefferle; dev@crail.apache.org
>>> Subject: RE: Setting up storage class 1 and 2
>>>
>>>
>>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>>you provided the other day, I only
>>>
>>> added them to the spark containers (clients) not to my crail
>>>containers running on my storage server.
>>>
>>> Should the patches been added to all of the containers?
>>>
>>>
>>> Regards,
>>>
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: Jonas Pfefferle <pe...@japf.ch>
>>> Sent: Friday, June 28, 2019 12:54:27 AM
>>> To: dev@crail.apache.org; David Crespi
>>> Subject: Re: Setting up storage class 1 and 2
>>>
>>> Hi David,
>>>
>>>
>>> At the moment, it is possible to add a NVMf datanode even if only
>>>the RDMA
>>> storage type is specified in the config. As you have seen this will
>>>go wrong
>>> as soon as a client tries to connect to the datanode. Make sure to
>>>start the
>>> RDMA datanode with the appropriate classname, see:
>>> https://incubator-crail.readthedocs.io/en/latest/run.html
>>> The correct classname is
>>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>>
>>> Regards,
>>> Jonas
>>>
>>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>>  David Crespi <da...@storedgesystems.com> wrote:
>>>> Hi,
>>>> I’m trying to integrate the storage classes and I’m hitting another
>>>>issue when running terasort and just
>>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>>just sits, after the following
>>>> message:
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>>containers from the server, and I’m only running
>>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>>configuration is also now only looking at
>>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>>port in the INFO messages seen below.
>>>> I must be configuring something wrong, but I’ve not been able to
>>>>track it down.  Any thoughts?
>>>>
>>>>
>>>> ************************************
>>>>         TeraSort
>>>> ************************************
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>>native-hadoop library for your platform... using builtin-java classes
>>>>where applicable
>>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>>authentication disabled; ui acls disabled; users  with view
>>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>>> with modify permissions: Set(hduser); groups with modify
>>>>permissions: Set()
>>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>>default logging framework
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>>-Dio.netty.eventLoopThreads: 112
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.theUnsafe: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.copyMemory: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>>constructor: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>>available, true
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>>prior to Java9
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>>(java.io.tmpdir)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>>(sun.arch.data.model)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.noPreferDirect: false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>>available
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.noKeySetOptimization: false
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>org.jctools-core.MpscChunkedArrayQueue: available
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.level: simple
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.targetRecords: 4
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numHeapArenas: 9
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numDirectArenas: 10
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.pageSize: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxOrder: 11
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.chunkSize: 16777216
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.tinyCacheSize: 512
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.smallCacheSize: 256
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.normalCacheSize: 64
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>>(auto-detected)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>>127.0.0.1)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>>pooled
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 36915
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'sparkDriver' on port 36915.
>>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>>org.apache.spark.serializer.KryoSerializer
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>>information
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>>BlockManagerMasterEndpoint up
>>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>>capacity 366.3 MB
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>>> 19/06/27 15:59:08 DEBUG
>>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>>protocol=None, enabledAlgorithms=Set()}
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>>on port 4040.
>>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>>started at http://192.168.1.161:4040
>>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>at
>>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>with timestamp 1561676348562
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>>Connecting to master spark://master:7077...
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>>connection to master/192.168.3.13:7077
>>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>>master/192.168.3.13:7077 successful, running bootstraps...
>>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>>bootstraps)
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>>16
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>>Spark cluster with app ID app-20190627155908-0005
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/0 on
>>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/1 on
>>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 39189
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>>39189.
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/2 on
>>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>>master:39189
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/3 on
>>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/4 on
>>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>>replication policy
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/0 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/3 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/4 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/1 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/2 is now RUNNING
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>>master
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>>master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>>is ready for scheduling beginning after reached
>>>>minRegisteredResourcesRatio: 0.0
>>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.use.legacy.blockreader.local = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.read.shortcircuit = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.domain.socket.data.traffic = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>>rpcRequestWrapperClass=class
>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>>org.apache.hadoop.ipc.Client@3ed03652
>>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>>local reads and UNIX domain socket are disabled.
>>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>>dfs.data.transfer.protection
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>>took  115 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>>without replication took  117 ms
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>>locally took  6 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>>broadcast_0_piece0 without replication took  6 ms
>>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>>newAPIHadoopFile at TeraSort.scala:60
>>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>>NameNode-1/192.168.3.7:54310
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>>connections 1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>>31ms
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>>FileStatuses: 134
>>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>>: 2
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>>by getSplits: 2, TimeTaken: 139
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>>Boolean) constructor
>>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>>Algorithm version is 1
>>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>>masked=rwxr-xr-x
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>>$anonfun$write$1
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>>($anonfun$write$1) is now cleaned +++
>>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>>SparkHadoopWriter.scala:78
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>>400
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>>false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>>org.apache.spark.serializer.CrailSparkSerializer
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.outstanding 1
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.storageclass 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.broadcast.storageclass 0
>>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>>org.apache.crail.memory.MappedBufferCache
>>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>>crail://192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>>roundrobin
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>>bufferCount 1024
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>>4294967296
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>>1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>>/dev/hugepages/rdma
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>>queueDepth 32, messageSize 512, nodealy true
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>>/192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>>streamId 1, isDir true, writeHint 0
>>>> 19/06/27 15:59:10 INFO crail: passive data client
>>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>>size 28, native size 16
>>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>>native size 32
>>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>>72, native size 128
>>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>>native size 16
>>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>>40, native size 40
>>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>>139811924587312
>>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>>maxSge 4, cqSize 64
>>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>>/192.168.3.100:4420
>>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>>467, pd 1
>>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>>139810647883744
>>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>>64
>>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>>send_wr size 32, recv_wr_size 32
>>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.11:35854) with ID 0
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.12:44312) with ID 1
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.8:34774) with ID 4
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.9:58808) with ID 2
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.11
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>>192.168.3.11, 41919, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.12
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>>192.168.3.12, 46697, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.8
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>>192.168.3.8, 37281, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.9
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>>192.168.3.9, 43857, None)
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.10:40100) with ID 3
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.10
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>>192.168.3.10, 38527, None)
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>>
>>>> Regards,
>>>>
>>>>           David
>>>>
>>>


Re: Setting up storage class 1 and 2

Posted by Jonas Pfefferle <pe...@japf.ch>.
I typically do use the start-crail.sh script. Then you have to put all the 
command line arguments in the slaves file.


The configuration files need to be identical. In our configuration we put 
the conf file on a NFS share this way we don't have to bother with it being 
synchronized between the nodes.

Regards,
Jonas

  On Tue, 2 Jul 2019 13:48:31 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Thanks for the info Jonas.
> 
> Quick question… do you typically start the datanodes from the 
>namenode using the command line?
> 
> I’ve been launching containers independently of the namenode.  The 
>containers do have the same
> 
> base configuration file, but I pass in behaviors via environment 
>variables.
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Tuesday, July 2, 2019 4:27:05 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
> 
> Hi David,
> 
> 
> We run a great mix of configurations of NVMf and RDMA storage tiers 
>with
> different storage classes, e.g. 3 storage classes where a group of 
>NVMf
> datanodes is 0, another group of NVMf server is 1 and the RDMA 
>datanodes are
> storage class 2. So this should work. I understand that the setup 
>might be a
> bit tricky in the beginning.
> 
> From your logs I see that you do not use the same configuration file 
>for
> all containers. It is crucial that e.g. the order of storage types 
>etc is
> the same in all configuration files. They have to be identical. To 
>specify a
> storage class for a datanode you need to append "-c 1" (storage 
>class 1)
> when starting the datanode. You can find the details of how exactly 
>this
> works here: 
>https://incubator-crail.readthedocs.io/en/latest/run.html
> The last example in "Starting Crail manually" talks about this.
> 
> Regarding the patched version, I have to take another look. Please 
>use the
> Apache Crail master for now (It will hang with Spark at the end of 
>your job
> but it should run through).
> 
> Regards,
> Jonas
> 
>  On Tue, 2 Jul 2019 00:27:33 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>
>> Just wanted to be sure I’m doing things correctly.  It runs okay
>>without adding in the NVMf datanode (i.e.
>>
>> completes teragen).  When I add the NVMf node in, even without using
>>it on the run, it hangs during the
>>
>> terasort, with nothing being written to the datanode – only the
>>metadata is created (i.e. /spark).
>>
>>
>> My config is:
>>
>> 1 namenode container
>>
>> 1 rdma datanode storage class 1 container
>>
>> 1 nvmf datanode storage class 1 container.
>>
>>
>> The namenode is showing that both datanode are starting up as
>>
>> Type 0 to storage class 0… is that correct?
>>
>>
>> NameNode log at startup:
>>
>> 19/07/01 17:18:16 INFO crail: initalizing namenode
>>
>> 19/07/01 17:18:16 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:16 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:16 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:16 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:16 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:16 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.address
>>crail://minnie:9060?id=0&size=1
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.types
>>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO crail: round robin block selection
>>
>> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true, cores 2
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39260
>>
>> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of
>>type 0 to storage class 0
>>
>> 19/07/01 17:18:17 INFO crail: new connection from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>>/192.168.1.164:39262
>>
>> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020
>>of type 0 to storage class 0
>>
>>
>> The RDMA datanode – it is set to have 4x1GB hugepages:
>>
>> 19/07/01 17:18:17 INFO crail: crail.version 3101
>>
>> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
>>
>> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
>>
>> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
>>
>> 19/07/01 17:18:17 INFO crail: crail.user crail
>>
>> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.debug true
>>
>> 19/07/01 17:18:17 INFO crail: crail.statistics false
>>
>> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
>>
>> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
>>
>> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
>>
>> 19/07/01 17:18:17 INFO crail: crail.singleton true
>>
>> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
>>
>> 19/07/01 17:18:17 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>>
>> 19/07/01 17:18:17 INFO crail: crail.locationmap
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.address
>>crail://minnie:9060
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection
>>roundrobin
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.log
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
>>
>> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
>>
>> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
>>
>> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>>
>> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>>
>> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>>
>> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>>
>> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>>
>> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>>
>> 19/07/01 17:18:17 INFO disni: createEventChannel, objId
>>140349068383088
>>
>> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 3200
>>
>> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
>>
>> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
>>
>> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
>>
>> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
>>
>> 19/07/01 17:18:17 INFO disni: listen, id 0
>>
>> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
>>
>> 19/07/01 17:18:17 INFO disni: setting up protection domain, context
>>100, pd 1
>>
>> 19/07/01 17:18:17 INFO disni: PD value 1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
>>
>> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
>>
>> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
>>
>> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
>>
>> 19/07/01 17:18:17 INFO crail: rdma storage server started, address
>>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize
>>3200
>>
>> 19/07/01 17:18:17 INFO disni: starting accept
>>
>> 19/07/01 17:18:18 INFO crail: connected to namenode(s)
>>minnie/192.168.1.164:9060
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
>>
>> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>>
>>
>> NVMf datanode is showing 1TB.
>>
>> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks
>>1048576
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi <da...@storedgesystems.com>
>> Sent: Monday, July 1, 2019 3:57:42 PM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>> A standard pull from the repo, one that didn’t have the patches from
>>your private repo.
>>
>> I can put patches back in both the client and server containers if
>>you really think it
>>
>> would make a difference.
>>
>>
>> Are you guys running multiple types together?  I’m running a RDMA
>>storage class 1,
>>
>> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get
>>errors when the
>>
>> RDMA is introduced into the mix.  I have a small amount of memory
>>(4GB) assigned
>>
>> with the RDMA tier, and looking for it to fall into the NVMf class 1
>>tier.  It appears to want
>>
>> to do that, but gets screwed up… it looks like it’s trying to create
>>another set of qp’s for
>>
>> an RDMA connection.  It even blew up spdk trying to accomplish that.
>>
>>
>> Do you guys have some documentation that shows what’s been tested
>>(mixes/variations) so far?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Monday, July 1, 2019 12:51:09 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> Can you clarify which unpatched version you are talking about? Are
>>you
>> talking about the NVMf thread fix where I send you a link to a
>>branch in my
>> repository or the fix we provided earlier for the Spark hang in the
>>Crail
>> master?
>>
>> Generally, if you update, update all: clients and datanode/namenode.
>>
>> Regards,
>> Jonas
>>
>>  On Fri, 28 Jun 2019 17:59:32 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Jonas,
>>>FYI - I went back to using the unpatched version of crail on the
>>>clients and it appears to work
>>> okay now with the shuffle and RDMA, with only the RDMA containers
>>>running on the server.
>>>
>>> Regards,
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: David Crespi
>>> Sent: Friday, June 28, 2019 7:49:51 AM
>>> To: Jonas Pfefferle; dev@crail.apache.org
>>> Subject: RE: Setting up storage class 1 and 2
>>>
>>>
>>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>>you provided the other day, I only
>>>
>>> added them to the spark containers (clients) not to my crail
>>>containers running on my storage server.
>>>
>>> Should the patches been added to all of the containers?
>>>
>>>
>>> Regards,
>>>
>>>
>>>           David
>>>
>>>
>>> ________________________________
>>>From: Jonas Pfefferle <pe...@japf.ch>
>>> Sent: Friday, June 28, 2019 12:54:27 AM
>>> To: dev@crail.apache.org; David Crespi
>>> Subject: Re: Setting up storage class 1 and 2
>>>
>>> Hi David,
>>>
>>>
>>> At the moment, it is possible to add a NVMf datanode even if only
>>>the RDMA
>>> storage type is specified in the config. As you have seen this will
>>>go wrong
>>> as soon as a client tries to connect to the datanode. Make sure to
>>>start the
>>> RDMA datanode with the appropriate classname, see:
>>> https://incubator-crail.readthedocs.io/en/latest/run.html
>>> The correct classname is
>>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>>
>>> Regards,
>>> Jonas
>>>
>>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>>  David Crespi <da...@storedgesystems.com> wrote:
>>>> Hi,
>>>> I’m trying to integrate the storage classes and I’m hitting another
>>>>issue when running terasort and just
>>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>>just sits, after the following
>>>> message:
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>>containers from the server, and I’m only running
>>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>>configuration is also now only looking at
>>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>>port in the INFO messages seen below.
>>>> I must be configuring something wrong, but I’ve not been able to
>>>>track it down.  Any thoughts?
>>>>
>>>>
>>>> ************************************
>>>>         TeraSort
>>>> ************************************
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>>native-hadoop library for your platform... using builtin-java classes
>>>>where applicable
>>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>>hduser
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>>to:
>>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>>authentication disabled; ui acls disabled; users  with view
>>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>>> with modify permissions: Set(hduser); groups with modify
>>>>permissions: Set()
>>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>>default logging framework
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>>-Dio.netty.eventLoopThreads: 112
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.theUnsafe: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>sun.misc.Unsafe.copyMemory: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>>constructor: available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>>available, true
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>>prior to Java9
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>>available
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>>(java.io.tmpdir)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>>(sun.arch.data.model)
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.noPreferDirect: false
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>>available
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.noKeySetOptimization: false
>>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>>org.jctools-core.MpscChunkedArrayQueue: available
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.level: simple
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>>-Dio.netty.leakDetection.targetRecords: 4
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numHeapArenas: 9
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.numDirectArenas: 10
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.pageSize: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxOrder: 11
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.chunkSize: 16777216
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.tinyCacheSize: 512
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.smallCacheSize: 256
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.normalCacheSize: 64
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>>(auto-detected)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>>false
>>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>>127.0.0.1)
>>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>>pooled
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 36915
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'sparkDriver' on port 36915.
>>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>>org.apache.spark.serializer.KryoSerializer
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>>information
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>>BlockManagerMasterEndpoint up
>>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>>capacity 366.3 MB
>>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>>> 19/06/27 15:59:08 DEBUG
>>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>>protocol=None, enabledAlgorithms=Set()}
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>>on port 4040.
>>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>>started at http://192.168.1.161:4040
>>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>at
>>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>>with timestamp 1561676348562
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>>Connecting to master spark://master:7077...
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>>connection to master/192.168.3.13:7077
>>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>>master/192.168.3.13:7077 successful, running bootstraps...
>>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>>bootstraps)
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>>16
>>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>>Spark cluster with app ID app-20190627155908-0005
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/0 on
>>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/1 on
>>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>>port: 39189
>>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>>39189.
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/2 on
>>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>>master:39189
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/3 on
>>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>added: app-20190627155908-0005/4 on
>>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>>core(s)
>>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>>replication policy
>>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>>core(s), 1024.0 MB RAM
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/0 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/3 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/4 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/1 is now RUNNING
>>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>>updated: app-20190627155908-0005/2 is now RUNNING
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>>master
>>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>>master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>>BlockManagerId(driver, master, 39189, None)
>>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>>is ready for scheduling beginning after reached
>>>>minRegisteredResourcesRatio: 0.0
>>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.use.legacy.blockreader.local = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.read.shortcircuit = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>>dfs.client.domain.socket.data.traffic = false
>>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>>rpcRequestWrapperClass=class
>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>>org.apache.hadoop.ipc.Client@3ed03652
>>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>>local reads and UNIX domain socket are disabled.
>>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>>dfs.data.transfer.protection
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>>took  115 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>>without replication took  117 ms
>>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>>broadcast_0_piece0
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>>locally took  6 ms
>>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>>broadcast_0_piece0 without replication took  6 ms
>>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>>newAPIHadoopFile at TeraSort.scala:60
>>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>>NameNode-1/192.168.3.7:54310
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>>connections 1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>>31ms
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>>FileStatuses: 134
>>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>>: 2
>>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>>by getSplits: 2, TimeTaken: 139
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>>Boolean) constructor
>>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>>Algorithm version is 1
>>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>>masked=rwxr-xr-x
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>>$anonfun$write$1
>>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>>($anonfun$write$1) is now cleaned +++
>>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>>SparkHadoopWriter.scala:78
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>>400
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>>false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>>org.apache.spark.serializer.CrailSparkSerializer
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>>true
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.outstanding 1
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.shuffle.storageclass 0
>>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>>spark.crail.broadcast.storageclass 0
>>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>>org.apache.crail.memory.MappedBufferCache
>>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>>crail://192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>>roundrobin
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>>bufferCount 1024
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>>4294967296
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>>1073741824
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>>/dev/hugepages/rdma
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>>queueDepth 32, messageSize 512, nodealy true
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>>/192.168.1.164:9060
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>>streamId 1, isDir true, writeHint 0
>>>> 19/06/27 15:59:10 INFO crail: passive data client
>>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>>size 28, native size 16
>>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>>native size 32
>>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>>72, native size 128
>>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>>native size 16
>>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>>40, native size 40
>>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>>native size 48
>>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>>139811924587312
>>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>>maxSge 4, cqSize 64
>>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>>/192.168.3.100:4420
>>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>>467, pd 1
>>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>>139810647883744
>>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>>64
>>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>>send_wr size 32, recv_wr_size 32
>>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.11:35854) with ID 0
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.12:44312) with ID 1
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.8:34774) with ID 4
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.9:58808) with ID 2
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.11
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>>192.168.3.11, 41919, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.12
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>>192.168.3.12, 46697, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.8
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>>192.168.3.8, 37281, None)
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.9
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>>192.168.3.9, 43857, None)
>>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>>(192.168.3.10:40100) with ID 3
>>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>>192.168.3.10
>>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>>192.168.3.10, 38527, None)
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>>connections 0
>>>>
>>>>
>>>> Regards,
>>>>
>>>>           David
>>>>
>>>



RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Thanks for the info Jonas.

Quick question… do you typically start the datanodes from the namenode using the command line?

I’ve been launching containers independently of the namenode.  The containers do have the same

base configuration file, but I pass in behaviors via environment variables.



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Tuesday, July 2, 2019 4:27:05 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


We run a great mix of configurations of NVMf and RDMA storage tiers with
different storage classes, e.g. 3 storage classes where a group of NVMf
datanodes is 0, another group of NVMf server is 1 and the RDMA datanodes are
storage class 2. So this should work. I understand that the setup might be a
bit tricky in the beginning.

 From your logs I see that you do not use the same configuration file for
all containers. It is crucial that e.g. the order of storage types etc is
the same in all configuration files. They have to be identical. To specify a
storage class for a datanode you need to append "-c 1" (storage class 1)
when starting the datanode. You can find the details of how exactly this
works here: https://incubator-crail.readthedocs.io/en/latest/run.html
The last example in "Starting Crail manually" talks about this.

Regarding the patched version, I have to take another look. Please use the
Apache Crail master for now (It will hang with Spark at the end of your job
but it should run through).

Regards,
Jonas

  On Tue, 2 Jul 2019 00:27:33 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>
> Just wanted to be sure I’m doing things correctly.  It runs okay
>without adding in the NVMf datanode (i.e.
>
> completes teragen).  When I add the NVMf node in, even without using
>it on the run, it hangs during the
>
> terasort, with nothing being written to the datanode – only the
>metadata is created (i.e. /spark).
>
>
> My config is:
>
> 1 namenode container
>
> 1 rdma datanode storage class 1 container
>
> 1 nvmf datanode storage class 1 container.
>
>
> The namenode is showing that both datanode are starting up as
>
> Type 0 to storage class 0… is that correct?
>
>
> NameNode log at startup:
>
> 19/07/01 17:18:16 INFO crail: initalizing namenode
>
> 19/07/01 17:18:16 INFO crail: crail.version 3101
>
> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
>
> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
>
> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
>
> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
>
> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
>
> 19/07/01 17:18:16 INFO crail: crail.user crail
>
> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
>
> 19/07/01 17:18:16 INFO crail: crail.debug true
>
> 19/07/01 17:18:16 INFO crail: crail.statistics false
>
> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
>
> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
>
> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
>
> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
>
> 19/07/01 17:18:16 INFO crail: crail.singleton true
>
> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
>
> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
>
> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
>
> 19/07/01 17:18:16 INFO crail: crail.cacheimpl
>org.apache.crail.memory.MappedBufferCache
>
> 19/07/01 17:18:16 INFO crail: crail.locationmap
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.address
>crail://minnie:9060?id=0&size=1
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection
>roundrobin
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.log
>
> 19/07/01 17:18:16 INFO crail: crail.storage.types
>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
>
> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
>
> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
>
> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
>
> 19/07/01 17:18:16 INFO crail: round robin block selection
>
> 19/07/01 17:18:16 INFO crail: round robin block selection
>
> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0,
>queueDepth 32, messageSize 512, nodealy true, cores 2
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
>
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
>
> 19/07/01 17:18:17 INFO crail: new connection from
>/192.168.1.164:39260
>
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>/192.168.1.164:39260
>
> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of
>type 0 to storage class 0
>
> 19/07/01 17:18:17 INFO crail: new connection from
>/192.168.1.164:39262
>
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from
>/192.168.1.164:39262
>
> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020
>of type 0 to storage class 0
>
>
> The RDMA datanode – it is set to have 4x1GB hugepages:
>
> 19/07/01 17:18:17 INFO crail: crail.version 3101
>
> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
>
> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
>
> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
>
> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
>
> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
>
> 19/07/01 17:18:17 INFO crail: crail.user crail
>
> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
>
> 19/07/01 17:18:17 INFO crail: crail.debug true
>
> 19/07/01 17:18:17 INFO crail: crail.statistics false
>
> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
>
> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
>
> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
>
> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
>
> 19/07/01 17:18:17 INFO crail: crail.singleton true
>
> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
>
> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
>
> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
>
> 19/07/01 17:18:17 INFO crail: crail.cacheimpl
>org.apache.crail.memory.MappedBufferCache
>
> 19/07/01 17:18:17 INFO crail: crail.locationmap
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.address
>crail://minnie:9060
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection
>roundrobin
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.log
>
> 19/07/01 17:18:17 INFO crail: crail.storage.types
>org.apache.crail.storage.rdma.RdmaStorageTier
>
> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
>
> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
>
> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
>
> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs
>size 28, native size 16
>
> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32,
>native size 32
>
> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size
>72, native size 128
>
> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48,
>native size 48
>
> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16,
>native size 16
>
> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size
>40, native size 40
>
> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48,
>native size 48
>
> 19/07/01 17:18:17 INFO disni: createEventChannel, objId
>140349068383088
>
> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32,
>maxSge 4, cqSize 3200
>
> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
>
> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
>
> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
>
> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
>
> 19/07/01 17:18:17 INFO disni: listen, id 0
>
> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
>
> 19/07/01 17:18:17 INFO disni: setting up protection domain, context
>100, pd 1
>
> 19/07/01 17:18:17 INFO disni: PD value 1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit
>4294967296
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize
>1073741824
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath
>/dev/hugepages/rdma
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
>
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
>
> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0,
>queueDepth 32, messageSize 512, nodealy true
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
>
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
>
> 19/07/01 17:18:17 INFO crail: rdma storage server started, address
>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize
>3200
>
> 19/07/01 17:18:17 INFO disni: starting accept
>
> 19/07/01 17:18:18 INFO crail: connected to namenode(s)
>minnie/192.168.1.164:9060
>
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
>
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
>
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
>
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
>
>
> NVMf datanode is showing 1TB.
>
> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks
>1048576
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: David Crespi <da...@storedgesystems.com>
> Sent: Monday, July 1, 2019 3:57:42 PM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
>
> A standard pull from the repo, one that didn’t have the patches from
>your private repo.
>
> I can put patches back in both the client and server containers if
>you really think it
>
> would make a difference.
>
>
> Are you guys running multiple types together?  I’m running a RDMA
>storage class 1,
>
> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get
>errors when the
>
> RDMA is introduced into the mix.  I have a small amount of memory
>(4GB) assigned
>
> with the RDMA tier, and looking for it to fall into the NVMf class 1
>tier.  It appears to want
>
> to do that, but gets screwed up… it looks like it’s trying to create
>another set of qp’s for
>
> an RDMA connection.  It even blew up spdk trying to accomplish that.
>
>
> Do you guys have some documentation that shows what’s been tested
>(mixes/variations) so far?
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Monday, July 1, 2019 12:51:09 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> Can you clarify which unpatched version you are talking about? Are
>you
> talking about the NVMf thread fix where I send you a link to a
>branch in my
> repository or the fix we provided earlier for the Spark hang in the
>Crail
> master?
>
> Generally, if you update, update all: clients and datanode/namenode.
>
> Regards,
> Jonas
>
>  On Fri, 28 Jun 2019 17:59:32 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>FYI - I went back to using the unpatched version of crail on the
>>clients and it appears to work
>> okay now with the shuffle and RDMA, with only the RDMA containers
>>running on the server.
>>
>> Regards,
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi
>> Sent: Friday, June 28, 2019 7:49:51 AM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>>
>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>you provided the other day, I only
>>
>> added them to the spark containers (clients) not to my crail
>>containers running on my storage server.
>>
>> Should the patches been added to all of the containers?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Friday, June 28, 2019 12:54:27 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> At the moment, it is possible to add a NVMf datanode even if only
>>the RDMA
>> storage type is specified in the config. As you have seen this will
>>go wrong
>> as soon as a client tries to connect to the datanode. Make sure to
>>start the
>> RDMA datanode with the appropriate classname, see:
>> https://incubator-crail.readthedocs.io/en/latest/run.html
>> The correct classname is
>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>
>> Regards,
>> Jonas
>>
>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Hi,
>>> I’m trying to integrate the storage classes and I’m hitting another
>>>issue when running terasort and just
>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>just sits, after the following
>>> message:
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>containers from the server, and I’m only running
>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>configuration is also now only looking at
>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>port in the INFO messages seen below.
>>> I must be configuring something wrong, but I’ve not been able to
>>>track it down.  Any thoughts?
>>>
>>>
>>> ************************************
>>>         TeraSort
>>> ************************************
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>native-hadoop library for your platform... using builtin-java classes
>>>where applicable
>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>authentication disabled; ui acls disabled; users  with view
>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>> with modify permissions: Set(hduser); groups with modify
>>>permissions: Set()
>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>default logging framework
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>-Dio.netty.eventLoopThreads: 112
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.theUnsafe: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.copyMemory: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>constructor: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>available, true
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>prior to Java9
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>(java.io.tmpdir)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>(sun.arch.data.model)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.noPreferDirect: false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>available
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.noKeySetOptimization: false
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>org.jctools-core.MpscChunkedArrayQueue: available
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.level: simple
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.targetRecords: 4
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numHeapArenas: 9
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numDirectArenas: 10
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.pageSize: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxOrder: 11
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.chunkSize: 16777216
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.tinyCacheSize: 512
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.smallCacheSize: 256
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.normalCacheSize: 64
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>(auto-detected)
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>false
>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>127.0.0.1)
>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>pooled
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 36915
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'sparkDriver' on port 36915.
>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>org.apache.spark.serializer.KryoSerializer
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>information
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>BlockManagerMasterEndpoint up
>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>capacity 366.3 MB
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>> 19/06/27 15:59:08 DEBUG
>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>protocol=None, enabledAlgorithms=Set()}
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>on port 4040.
>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>started at http://192.168.1.161:4040
>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>at
>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>with timestamp 1561676348562
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>Connecting to master spark://master:7077...
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>connection to master/192.168.3.13:7077
>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>master/192.168.3.13:7077 successful, running bootstraps...
>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>bootstraps)
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>16
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>Spark cluster with app ID app-20190627155908-0005
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/0 on
>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/1 on
>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 39189
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>39189.
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/2 on
>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>master:39189
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/3 on
>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/4 on
>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>replication policy
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/0 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/3 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/4 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/1 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/2 is now RUNNING
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>master
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>is ready for scheduling beginning after reached
>>>minRegisteredResourcesRatio: 0.0
>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.use.legacy.blockreader.local = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.read.shortcircuit = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.domain.socket.data.traffic = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>rpcRequestWrapperClass=class
>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>org.apache.hadoop.ipc.Client@3ed03652
>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>local reads and UNIX domain socket are disabled.
>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>dfs.data.transfer.protection
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>took  115 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>without replication took  117 ms
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>locally took  6 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>broadcast_0_piece0 without replication took  6 ms
>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>newAPIHadoopFile at TeraSort.scala:60
>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>NameNode-1/192.168.3.7:54310
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>connections 1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>31ms
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>FileStatuses: 134
>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>: 2
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>by getSplits: 2, TimeTaken: 139
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>Boolean) constructor
>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>Algorithm version is 1
>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>masked=rwxr-xr-x
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>$anonfun$write$1
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>($anonfun$write$1) is now cleaned +++
>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>SparkHadoopWriter.scala:78
>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>400
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>org.apache.spark.serializer.CrailSparkSerializer
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.outstanding 1
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.storageclass 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.broadcast.storageclass 0
>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>org.apache.crail.memory.MappedBufferCache
>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>crail://192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>roundrobin
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>bufferCount 1024
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>4294967296
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>/dev/hugepages/rdma
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>queueDepth 32, messageSize 512, nodealy true
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>/192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>streamId 1, isDir true, writeHint 0
>>> 19/06/27 15:59:10 INFO crail: passive data client
>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>size 28, native size 16
>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>native size 32
>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>72, native size 128
>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>native size 16
>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>40, native size 40
>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>139811924587312
>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>maxSge 4, cqSize 64
>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>/192.168.3.100:4420
>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>467, pd 1
>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>139810647883744
>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>64
>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>send_wr size 32, recv_wr_size 32
>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.11:35854) with ID 0
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.12:44312) with ID 1
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.8:34774) with ID 4
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.9:58808) with ID 2
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.11
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>192.168.3.11, 41919, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.12
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>192.168.3.12, 46697, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.8
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>192.168.3.8, 37281, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.9
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>192.168.3.9, 43857, None)
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.10:40100) with ID 3
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.10
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>192.168.3.10, 38527, None)
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>>
>>> Regards,
>>>
>>>           David
>>>
>>



Re: Setting up storage class 1 and 2

Posted by Jonas Pfefferle <pe...@japf.ch>.
Hi David,


We run a great mix of configurations of NVMf and RDMA storage tiers with 
different storage classes, e.g. 3 storage classes where a group of NVMf 
datanodes is 0, another group of NVMf server is 1 and the RDMA datanodes are 
storage class 2. So this should work. I understand that the setup might be a 
bit tricky in the beginning.

 From your logs I see that you do not use the same configuration file for 
all containers. It is crucial that e.g. the order of storage types etc is 
the same in all configuration files. They have to be identical. To specify a 
storage class for a datanode you need to append "-c 1" (storage class 1) 
when starting the datanode. You can find the details of how exactly this 
works here: https://incubator-crail.readthedocs.io/en/latest/run.html
The last example in "Starting Crail manually" talks about this.

Regarding the patched version, I have to take another look. Please use the 
Apache Crail master for now (It will hang with Spark at the end of your job 
but it should run through).

Regards,
Jonas

  On Tue, 2 Jul 2019 00:27:33 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
> 
> Just wanted to be sure I’m doing things correctly.  It runs okay 
>without adding in the NVMf datanode (i.e.
> 
> completes teragen).  When I add the NVMf node in, even without using 
>it on the run, it hangs during the
> 
> terasort, with nothing being written to the datanode – only the 
>metadata is created (i.e. /spark).
> 
> 
> My config is:
> 
> 1 namenode container
> 
> 1 rdma datanode storage class 1 container
> 
> 1 nvmf datanode storage class 1 container.
> 
> 
> The namenode is showing that both datanode are starting up as
> 
> Type 0 to storage class 0… is that correct?
> 
> 
> NameNode log at startup:
> 
> 19/07/01 17:18:16 INFO crail: initalizing namenode
> 
> 19/07/01 17:18:16 INFO crail: crail.version 3101
> 
> 19/07/01 17:18:16 INFO crail: crail.directorydepth 16
> 
> 19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10
> 
> 19/07/01 17:18:16 INFO crail: crail.blocksize 1048576
> 
> 19/07/01 17:18:16 INFO crail: crail.cachelimit 0
> 
> 19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache
> 
> 19/07/01 17:18:16 INFO crail: crail.user crail
> 
> 19/07/01 17:18:16 INFO crail: crail.shadowreplication 1
> 
> 19/07/01 17:18:16 INFO crail: crail.debug true
> 
> 19/07/01 17:18:16 INFO crail: crail.statistics false
> 
> 19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000
> 
> 19/07/01 17:18:16 INFO crail: crail.datatimeout 1000
> 
> 19/07/01 17:18:16 INFO crail: crail.buffersize 1048576
> 
> 19/07/01 17:18:16 INFO crail: crail.slicesize 65536
> 
> 19/07/01 17:18:16 INFO crail: crail.singleton true
> 
> 19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824
> 
> 19/07/01 17:18:16 INFO crail: crail.directoryrecord 512
> 
> 19/07/01 17:18:16 INFO crail: crail.directoryrandomize true
> 
> 19/07/01 17:18:16 INFO crail: crail.cacheimpl 
>org.apache.crail.memory.MappedBufferCache
> 
> 19/07/01 17:18:16 INFO crail: crail.locationmap
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.address 
>crail://minnie:9060?id=0&size=1
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.blockselection 
>roundrobin
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.rpctype 
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.log
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.types 
>org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.classes 2
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1
> 
> 19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2
> 
> 19/07/01 17:18:16 INFO crail: round robin block selection
> 
> 19/07/01 17:18:16 INFO crail: round robin block selection
> 
> 19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0, 
>queueDepth 32, messageSize 512, nodealy true, cores 2
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512
> 
> 19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2
> 
> 19/07/01 17:18:17 INFO crail: new connection from 
>/192.168.1.164:39260
> 
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from 
>/192.168.1.164:39260
> 
> 19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of 
>type 0 to storage class 0
> 
> 19/07/01 17:18:17 INFO crail: new connection from 
>/192.168.1.164:39262
> 
> 19/07/01 17:18:17 INFO narpc: adding new channel to selector, from 
>/192.168.1.164:39262
> 
> 19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020 
>of type 0 to storage class 0
> 
> 
> The RDMA datanode – it is set to have 4x1GB hugepages:
> 
> 19/07/01 17:18:17 INFO crail: crail.version 3101
> 
> 19/07/01 17:18:17 INFO crail: crail.directorydepth 16
> 
> 19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10
> 
> 19/07/01 17:18:17 INFO crail: crail.blocksize 1048576
> 
> 19/07/01 17:18:17 INFO crail: crail.cachelimit 0
> 
> 19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache
> 
> 19/07/01 17:18:17 INFO crail: crail.user crail
> 
> 19/07/01 17:18:17 INFO crail: crail.shadowreplication 1
> 
> 19/07/01 17:18:17 INFO crail: crail.debug true
> 
> 19/07/01 17:18:17 INFO crail: crail.statistics false
> 
> 19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000
> 
> 19/07/01 17:18:17 INFO crail: crail.datatimeout 1000
> 
> 19/07/01 17:18:17 INFO crail: crail.buffersize 1048576
> 
> 19/07/01 17:18:17 INFO crail: crail.slicesize 65536
> 
> 19/07/01 17:18:17 INFO crail: crail.singleton true
> 
> 19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824
> 
> 19/07/01 17:18:17 INFO crail: crail.directoryrecord 512
> 
> 19/07/01 17:18:17 INFO crail: crail.directoryrandomize true
> 
> 19/07/01 17:18:17 INFO crail: crail.cacheimpl 
>org.apache.crail.memory.MappedBufferCache
> 
> 19/07/01 17:18:17 INFO crail: crail.locationmap
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.address 
>crail://minnie:9060
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.blockselection 
>roundrobin
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.rpctype 
>org.apache.crail.namenode.rpc.tcp.TcpNameNode
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.log
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.types 
>org.apache.crail.storage.rdma.RdmaStorageTier
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.classes 1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2
> 
> 19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'
> 
> 19/07/01 17:18:17 INFO disni: jverbs jni version 32
> 
> 19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs 
>size 28, native size 16
> 
> 19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32, 
>native size 32
> 
> 19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size 
>72, native size 128
> 
> 19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48, 
>native size 48
> 
> 19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16, 
>native size 16
> 
> 19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size 
>40, native size 40
> 
> 19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48, 
>native size 48
> 
> 19/07/01 17:18:17 INFO disni: createEventChannel, objId 
>140349068383088
> 
> 19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32, 
>maxSge 4, cqSize 3200
> 
> 19/07/01 17:18:17 INFO disni: createId, id 140349068429968
> 
> 19/07/01 17:18:17 INFO disni: new server endpoint, id 0
> 
> 19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0
> 
> 19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020
> 
> 19/07/01 17:18:17 INFO disni: listen, id 0
> 
> 19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808
> 
> 19/07/01 17:18:17 INFO disni: setting up protection domain, context 
>100, pd 1
> 
> 19/07/01 17:18:17 INFO disni: PD value 1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit 
>4294967296
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize 
>1073741824
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath 
>/dev/hugepages/rdma
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100
> 
> 19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000
> 
> 19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0, 
>queueDepth 32, messageSize 512, nodealy true
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512
> 
> 19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2
> 
> 19/07/01 17:18:17 INFO crail: rdma storage server started, address 
>/192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize 
>3200
> 
> 19/07/01 17:18:17 INFO disni: starting accept
> 
> 19/07/01 17:18:18 INFO crail: connected to namenode(s) 
>minnie/192.168.1.164:9060
> 
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024
> 
> 19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048
> 
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072
> 
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
> 
> 19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096
> 
> 
> NVMf datanode is showing 1TB.
> 
> 19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks 
>1048576
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: David Crespi <da...@storedgesystems.com>
> Sent: Monday, July 1, 2019 3:57:42 PM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
> 
> A standard pull from the repo, one that didn’t have the patches from 
>your private repo.
> 
> I can put patches back in both the client and server containers if 
>you really think it
> 
> would make a difference.
> 
> 
> Are you guys running multiple types together?  I’m running a RDMA 
>storage class 1,
> 
> a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get 
>errors when the
> 
> RDMA is introduced into the mix.  I have a small amount of memory 
>(4GB) assigned
> 
> with the RDMA tier, and looking for it to fall into the NVMf class 1 
>tier.  It appears to want
> 
> to do that, but gets screwed up… it looks like it’s trying to create 
>another set of qp’s for
> 
> an RDMA connection.  It even blew up spdk trying to accomplish that.
> 
> 
> Do you guys have some documentation that shows what’s been tested 
>(mixes/variations) so far?
> 
> 
> Regards,
> 
> 
>           David
> 
> 
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Monday, July 1, 2019 12:51:09 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
> 
> Hi David,
> 
> 
> Can you clarify which unpatched version you are talking about? Are 
>you
> talking about the NVMf thread fix where I send you a link to a 
>branch in my
> repository or the fix we provided earlier for the Spark hang in the 
>Crail
> master?
> 
> Generally, if you update, update all: clients and datanode/namenode.
> 
> Regards,
> Jonas
> 
>  On Fri, 28 Jun 2019 17:59:32 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Jonas,
>>FYI - I went back to using the unpatched version of crail on the
>>clients and it appears to work
>> okay now with the shuffle and RDMA, with only the RDMA containers
>>running on the server.
>>
>> Regards,
>>
>>           David
>>
>>
>> ________________________________
>>From: David Crespi
>> Sent: Friday, June 28, 2019 7:49:51 AM
>> To: Jonas Pfefferle; dev@crail.apache.org
>> Subject: RE: Setting up storage class 1 and 2
>>
>>
>> Oh, and while I’m thinking about it Jonas, when I added the patches
>>you provided the other day, I only
>>
>> added them to the spark containers (clients) not to my crail
>>containers running on my storage server.
>>
>> Should the patches been added to all of the containers?
>>
>>
>> Regards,
>>
>>
>>           David
>>
>>
>> ________________________________
>>From: Jonas Pfefferle <pe...@japf.ch>
>> Sent: Friday, June 28, 2019 12:54:27 AM
>> To: dev@crail.apache.org; David Crespi
>> Subject: Re: Setting up storage class 1 and 2
>>
>> Hi David,
>>
>>
>> At the moment, it is possible to add a NVMf datanode even if only
>>the RDMA
>> storage type is specified in the config. As you have seen this will
>>go wrong
>> as soon as a client tries to connect to the datanode. Make sure to
>>start the
>> RDMA datanode with the appropriate classname, see:
>> https://incubator-crail.readthedocs.io/en/latest/run.html
>> The correct classname is
>>org.apache.crail.storage.rdma.RdmaStorageTier.
>>
>> Regards,
>> Jonas
>>
>>  On Thu, 27 Jun 2019 23:09:26 +0000
>>  David Crespi <da...@storedgesystems.com> wrote:
>>> Hi,
>>> I’m trying to integrate the storage classes and I’m hitting another
>>>issue when running terasort and just
>>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>>just sits, after the following
>>> message:
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>>containers from the server, and I’m only running
>>> the namenode and a rdma storage class 1 datanode.  My spark
>>>configuration is also now only looking at
>>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>>port in the INFO messages seen below.
>>> I must be configuring something wrong, but I’ve not been able to
>>>track it down.  Any thoughts?
>>>
>>>
>>> ************************************
>>>         TeraSort
>>> ************************************
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>>native-hadoop library for your platform... using builtin-java classes
>>>where applicable
>>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>>hduser
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>>to:
>>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>>authentication disabled; ui acls disabled; users  with view
>>>permissions: Set(hduser); groups with view permissions: Set(); users
>>> with modify permissions: Set(hduser); groups with modify
>>>permissions: Set()
>>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>>default logging framework
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>>-Dio.netty.eventLoopThreads: 112
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>>false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.theUnsafe: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>sun.misc.Unsafe.copyMemory: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>>constructor: available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>>available, true
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>>prior to Java9
>>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>>java.nio.DirectByteBuffer.<init>(long, int): available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>>available
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>>(java.io.tmpdir)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>>(sun.arch.data.model)
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.noPreferDirect: false
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>>available
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.noKeySetOptimization: false
>>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>>-Dio.netty.selectorAutoRebuildThreshold: 512
>>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>>org.jctools-core.MpscChunkedArrayQueue: available
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.level: simple
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>>-Dio.netty.leakDetection.targetRecords: 4
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numHeapArenas: 9
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.numDirectArenas: 10
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.pageSize: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxOrder: 11
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.chunkSize: 16777216
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.tinyCacheSize: 512
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.smallCacheSize: 256
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.normalCacheSize: 64
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.cacheTrimInterval: 8192
>>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>>-Dio.netty.allocator.useCacheForAllThreads: true
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>>(auto-detected)
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>>false
>>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>>127.0.0.1)
>>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>>pooled
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.threadLocalDirectBufferSize: 65536
>>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 36915
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'sparkDriver' on port 36915.
>>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>>org.apache.spark.serializer.KryoSerializer
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>>information
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>>BlockManagerMasterEndpoint up
>>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>>capacity 366.3 MB
>>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>>> 19/06/27 15:59:08 DEBUG
>>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>>SSLOptions{enabled=false, port=None, keyStore=None,
>>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>>protocol=None, enabledAlgorithms=Set()}
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>>on port 4040.
>>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>>started at http://192.168.1.161:4040
>>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>at
>>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>>with timestamp 1561676348562
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>>Connecting to master spark://master:7077...
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>>connection to master/192.168.3.13:7077
>>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>>master/192.168.3.13:7077 successful, running bootstraps...
>>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>>bootstraps)
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>>> 19/06/27 15:59:08 DEBUG Recycler:
>>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>>16
>>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>>Spark cluster with app ID app-20190627155908-0005
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/0 on
>>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/1 on
>>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>>port: 39189
>>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>>39189.
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/2 on
>>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>>master:39189
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/3 on
>>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>added: app-20190627155908-0005/4 on
>>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>>core(s)
>>> 19/06/27 15:59:08 INFO BlockManager: Using
>>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>>replication policy
>>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>>core(s), 1024.0 MB RAM
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/0 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/3 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/4 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/1 is now RUNNING
>>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>>updated: app-20190627155908-0005/2 is now RUNNING
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>>master
>>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>>master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>>BlockManagerId(driver, master, 39189, None)
>>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>>is ready for scheduling beginning after reached
>>>minRegisteredResourcesRatio: 0.0
>>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.use.legacy.blockreader.local = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.read.shortcircuit = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>>dfs.client.domain.socket.data.traffic = false
>>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>>rpcRequestWrapperClass=class
>>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>>org.apache.hadoop.ipc.Client@3ed03652
>>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>>local reads and UNIX domain socket are disabled.
>>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>>not using SaslPropertiesResolver, no QOP found in configuration for
>>>dfs.data.transfer.protection
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>>took  115 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>>without replication took  117 ms
>>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>>broadcast_0_piece0
>>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>>locally took  6 ms
>>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>>broadcast_0_piece0 without replication took  6 ms
>>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>>newAPIHadoopFile at TeraSort.scala:60
>>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>>NameNode-1/192.168.3.7:54310
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>>connections 1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>>31ms
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>>FileStatuses: 134
>>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>>: 2
>>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>>by getSplits: 2, TimeTaken: 139
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>>Boolean) constructor
>>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>>Algorithm version is 1
>>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>>masked=rwxr-xr-x
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>>$anonfun$write$1
>>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>>($anonfun$write$1) is now cleaned +++
>>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>>SparkHadoopWriter.scala:78
>>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>>400
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>>false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>>org.apache.spark.serializer.CrailSparkSerializer
>>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>>true
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.outstanding 1
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.shuffle.storageclass 0
>>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>>spark.crail.broadcast.storageclass 0
>>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>>> 19/06/27 15:59:10 INFO crail: crail.user crail
>>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>>> 19/06/27 15:59:10 INFO crail: crail.debug true
>>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>>org.apache.crail.memory.MappedBufferCache
>>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>>crail://192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>>roundrobin
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>>org.apache.crail.storage.rdma.RdmaStorageTier
>>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>>bufferCount 1024
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>>4294967296
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>>1073741824
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>>/dev/hugepages/rdma
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>>queueDepth 32, messageSize 512, nodealy true
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>>/192.168.1.164:9060
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>>DIRECTORY, storageAffinity 0, locationAffinity 0
>>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>>streamId 1, isDir true, writeHint 0
>>> 19/06/27 15:59:10 INFO crail: passive data client
>>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>>size 28, native size 16
>>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>>native size 32
>>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>>72, native size 128
>>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>>native size 16
>>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>>40, native size 40
>>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>>native size 48
>>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>>139811924587312
>>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>>maxSge 4, cqSize 64
>>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>>/192.168.3.100:4420
>>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>>467, pd 1
>>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>>139810647883744
>>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>>64
>>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>>send_wr size 32, recv_wr_size 32
>>> 19/06/27 15:59:10 INFO disni: connect, id 0
>>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.11:35854) with ID 0
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.12:44312) with ID 1
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.8:34774) with ID 4
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.9:58808) with ID 2
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.11
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>>192.168.3.11, 41919, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.12
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>>192.168.3.12, 46697, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.8
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>>192.168.3.8, 37281, None)
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.9
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>>192.168.3.9, 43857, None)
>>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>>(192.168.3.10:40100) with ID 3
>>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>>192.168.3.10
>>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>>192.168.3.10, 38527, None)
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>>connections 0
>>>
>>>
>>> Regards,
>>>
>>>           David
>>>
>>




RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Bounced on the first attempt.

Regards,

           David
From: David Crespi<ma...@storedgesystems.com>
Sent: Monday, July 1, 2019 5:27 PM
To: dev@crail.apache.org<ma...@crail.apache.org>; Jonas Pfefferle<ma...@japf.ch>
Subject: RE: Setting up storage class 1 and 2


Jonas,

Just wanted to be sure I’m doing things correctly.  It runs okay without adding in the NVMf datanode (i.e.

completes teragen).  When I add the NVMf node in, even without using it on the run, it hangs during the

terasort, with nothing being written to the datanode – only the metadata is created (i.e. /spark).



My config is:

1 namenode container

1 rdma datanode storage class 1 container

1 nvmf datanode storage class 1 container.



The namenode is showing that both datanode are starting up as

Type 0 to storage class 0… is that correct?



NameNode log at startup:

19/07/01 17:18:16 INFO crail: initalizing namenode

19/07/01 17:18:16 INFO crail: crail.version 3101

19/07/01 17:18:16 INFO crail: crail.directorydepth 16

19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:16 INFO crail: crail.blocksize 1048576

19/07/01 17:18:16 INFO crail: crail.cachelimit 0

19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:16 INFO crail: crail.user crail

19/07/01 17:18:16 INFO crail: crail.shadowreplication 1

19/07/01 17:18:16 INFO crail: crail.debug true

19/07/01 17:18:16 INFO crail: crail.statistics false

19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:16 INFO crail: crail.datatimeout 1000

19/07/01 17:18:16 INFO crail: crail.buffersize 1048576

19/07/01 17:18:16 INFO crail: crail.slicesize 65536

19/07/01 17:18:16 INFO crail: crail.singleton true

19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:16 INFO crail: crail.directoryrecord 512

19/07/01 17:18:16 INFO crail: crail.directoryrandomize true

19/07/01 17:18:16 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:16 INFO crail: crail.locationmap

19/07/01 17:18:16 INFO crail: crail.namenode.address crail://minnie:9060?id=0&size=1

19/07/01 17:18:16 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:16 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:16 INFO crail: crail.namenode.log

19/07/01 17:18:16 INFO crail: crail.storage.types org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:16 INFO crail: crail.storage.classes 2

19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true, cores 2

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39260

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39260

19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of type 0 to storage class 0

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39262

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39262

19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020 of type 0 to storage class 0



The RDMA datanode – it is set to have 4x1GB hugepages:

19/07/01 17:18:17 INFO crail: crail.version 3101

19/07/01 17:18:17 INFO crail: crail.directorydepth 16

19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:17 INFO crail: crail.blocksize 1048576

19/07/01 17:18:17 INFO crail: crail.cachelimit 0

19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:17 INFO crail: crail.user crail

19/07/01 17:18:17 INFO crail: crail.shadowreplication 1

19/07/01 17:18:17 INFO crail: crail.debug true

19/07/01 17:18:17 INFO crail: crail.statistics false

19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:17 INFO crail: crail.datatimeout 1000

19/07/01 17:18:17 INFO crail: crail.buffersize 1048576

19/07/01 17:18:17 INFO crail: crail.slicesize 65536

19/07/01 17:18:17 INFO crail: crail.singleton true

19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:17 INFO crail: crail.directoryrecord 512

19/07/01 17:18:17 INFO crail: crail.directoryrandomize true

19/07/01 17:18:17 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:17 INFO crail: crail.locationmap

19/07/01 17:18:17 INFO crail: crail.namenode.address crail://minnie:9060

19/07/01 17:18:17 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:17 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:17 INFO crail: crail.namenode.log

19/07/01 17:18:17 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:17 INFO crail: crail.storage.classes 1

19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'

19/07/01 17:18:17 INFO disni: jverbs jni version 32

19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs size 28, native size 16

19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32, native size 32

19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size 72, native size 128

19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16, native size 16

19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size 40, native size 40

19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: createEventChannel, objId 140349068383088

19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: createId, id 140349068429968

19/07/01 17:18:17 INFO disni: new server endpoint, id 0

19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0

19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020

19/07/01 17:18:17 INFO disni: listen, id 0

19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808

19/07/01 17:18:17 INFO disni: setting up protection domain, context 100, pd 1

19/07/01 17:18:17 INFO disni: PD value 1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020

19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit 4294967296

19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize 1073741824

19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath /dev/hugepages/rdma

19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true

19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32

19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive

19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100

19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000

19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: rdma storage server started, address /192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: starting accept

19/07/01 17:18:18 INFO crail: connected to namenode(s) minnie/192.168.1.164:9060

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096



NVMf datanode is showing 1TB.

19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks 1048576





Regards,



           David




RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
Jonas,

Just wanted to be sure I’m doing things correctly.  It runs okay without adding in the NVMf datanode (i.e.

completes teragen).  When I add the NVMf node in, even without using it on the run, it hangs during the

terasort, with nothing being written to the datanode – only the metadata is created (i.e. /spark).



My config is:

1 namenode container

1 rdma datanode storage class 1 container

1 nvmf datanode storage class 1 container.



The namenode is showing that both datanode are starting up as

Type 0 to storage class 0… is that correct?



NameNode log at startup:

19/07/01 17:18:16 INFO crail: initalizing namenode

19/07/01 17:18:16 INFO crail: crail.version 3101

19/07/01 17:18:16 INFO crail: crail.directorydepth 16

19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:16 INFO crail: crail.blocksize 1048576

19/07/01 17:18:16 INFO crail: crail.cachelimit 0

19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:16 INFO crail: crail.user crail

19/07/01 17:18:16 INFO crail: crail.shadowreplication 1

19/07/01 17:18:16 INFO crail: crail.debug true

19/07/01 17:18:16 INFO crail: crail.statistics false

19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:16 INFO crail: crail.datatimeout 1000

19/07/01 17:18:16 INFO crail: crail.buffersize 1048576

19/07/01 17:18:16 INFO crail: crail.slicesize 65536

19/07/01 17:18:16 INFO crail: crail.singleton true

19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:16 INFO crail: crail.directoryrecord 512

19/07/01 17:18:16 INFO crail: crail.directoryrandomize true

19/07/01 17:18:16 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:16 INFO crail: crail.locationmap

19/07/01 17:18:16 INFO crail: crail.namenode.address crail://minnie:9060?id=0&size=1

19/07/01 17:18:16 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:16 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:16 INFO crail: crail.namenode.log

19/07/01 17:18:16 INFO crail: crail.storage.types org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:16 INFO crail: crail.storage.classes 2

19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true, cores 2

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39260

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39260

19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of type 0 to storage class 0

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39262

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39262

19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020 of type 0 to storage class 0



The RDMA datanode – it is set to have 4x1GB hugepages:

19/07/01 17:18:17 INFO crail: crail.version 3101

19/07/01 17:18:17 INFO crail: crail.directorydepth 16

19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:17 INFO crail: crail.blocksize 1048576

19/07/01 17:18:17 INFO crail: crail.cachelimit 0

19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:17 INFO crail: crail.user crail

19/07/01 17:18:17 INFO crail: crail.shadowreplication 1

19/07/01 17:18:17 INFO crail: crail.debug true

19/07/01 17:18:17 INFO crail: crail.statistics false

19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:17 INFO crail: crail.datatimeout 1000

19/07/01 17:18:17 INFO crail: crail.buffersize 1048576

19/07/01 17:18:17 INFO crail: crail.slicesize 65536

19/07/01 17:18:17 INFO crail: crail.singleton true

19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:17 INFO crail: crail.directoryrecord 512

19/07/01 17:18:17 INFO crail: crail.directoryrandomize true

19/07/01 17:18:17 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:17 INFO crail: crail.locationmap

19/07/01 17:18:17 INFO crail: crail.namenode.address crail://minnie:9060

19/07/01 17:18:17 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:17 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:17 INFO crail: crail.namenode.log

19/07/01 17:18:17 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:17 INFO crail: crail.storage.classes 1

19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'

19/07/01 17:18:17 INFO disni: jverbs jni version 32

19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs size 28, native size 16

19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32, native size 32

19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size 72, native size 128

19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16, native size 16

19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size 40, native size 40

19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: createEventChannel, objId 140349068383088

19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: createId, id 140349068429968

19/07/01 17:18:17 INFO disni: new server endpoint, id 0

19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0

19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020

19/07/01 17:18:17 INFO disni: listen, id 0

19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808

19/07/01 17:18:17 INFO disni: setting up protection domain, context 100, pd 1

19/07/01 17:18:17 INFO disni: PD value 1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020

19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit 4294967296

19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize 1073741824

19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath /dev/hugepages/rdma

19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true

19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32

19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive

19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100

19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000

19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: rdma storage server started, address /192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: starting accept

19/07/01 17:18:18 INFO crail: connected to namenode(s) minnie/192.168.1.164:9060

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096



NVMf datanode is showing 1TB.

19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks 1048576





Regards,



           David



________________________________
From: David Crespi <da...@storedgesystems.com>
Sent: Monday, July 1, 2019 3:57:42 PM
To: Jonas Pfefferle; dev@crail.apache.org
Subject: RE: Setting up storage class 1 and 2

A standard pull from the repo, one that didn’t have the patches from your private repo.

I can put patches back in both the client and server containers if you really think it

would make a difference.



Are you guys running multiple types together?  I’m running a RDMA storage class 1,

a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get errors when the

RDMA is introduced into the mix.  I have a small amount of memory (4GB) assigned

with the RDMA tier, and looking for it to fall into the NVMf class 1 tier.  It appears to want

to do that, but gets screwed up… it looks like it’s trying to create another set of qp’s for

an RDMA connection.  It even blew up spdk trying to accomplish that.



Do you guys have some documentation that shows what’s been tested (mixes/variations) so far?



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Monday, July 1, 2019 12:51:09 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


Can you clarify which unpatched version you are talking about? Are you
talking about the NVMf thread fix where I send you a link to a branch in my
repository or the fix we provided earlier for the Spark hang in the Crail
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

  On Fri, 28 Jun 2019 17:59:32 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>FYI - I went back to using the unpatched version of crail on the
>clients and it appears to work
> okay now with the shuffle and RDMA, with only the RDMA containers
>running on the server.
>
> Regards,
>
>           David
>
>
> ________________________________
>From: David Crespi
> Sent: Friday, June 28, 2019 7:49:51 AM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
>
>
> Oh, and while I’m thinking about it Jonas, when I added the patches
>you provided the other day, I only
>
> added them to the spark containers (clients) not to my crail
>containers running on my storage server.
>
> Should the patches been added to all of the containers?
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Friday, June 28, 2019 12:54:27 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> At the moment, it is possible to add a NVMf datanode even if only
>the RDMA
> storage type is specified in the config. As you have seen this will
>go wrong
> as soon as a client tries to connect to the datanode. Make sure to
>start the
> RDMA datanode with the appropriate classname, see:
> https://incubator-crail.readthedocs.io/en/latest/run.html
> The correct classname is
>org.apache.crail.storage.rdma.RdmaStorageTier.
>
> Regards,
> Jonas
>
>  On Thu, 27 Jun 2019 23:09:26 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Hi,
>> I’m trying to integrate the storage classes and I’m hitting another
>>issue when running terasort and just
>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>just sits, after the following
>> message:
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>containers from the server, and I’m only running
>> the namenode and a rdma storage class 1 datanode.  My spark
>>configuration is also now only looking at
>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>port in the INFO messages seen below.
>> I must be configuring something wrong, but I’ve not been able to
>>track it down.  Any thoughts?
>>
>>
>> ************************************
>>         TeraSort
>> ************************************
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>where applicable
>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>authentication disabled; ui acls disabled; users  with view
>>permissions: Set(hduser); groups with view permissions: Set(); users
>> with modify permissions: Set(hduser); groups with modify
>>permissions: Set()
>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>default logging framework
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>-Dio.netty.eventLoopThreads: 112
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>false
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.theUnsafe: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.copyMemory: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>constructor: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>available, true
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>prior to Java9
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>java.nio.DirectByteBuffer.<init>(long, int): available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>(java.io.tmpdir)
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>(sun.arch.data.model)
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.noPreferDirect: false
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>available
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.noKeySetOptimization: false
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.selectorAutoRebuildThreshold: 512
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>org.jctools-core.MpscChunkedArrayQueue: available
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.level: simple
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.targetRecords: 4
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numHeapArenas: 9
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numDirectArenas: 10
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.pageSize: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxOrder: 11
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.chunkSize: 16777216
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.tinyCacheSize: 512
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.smallCacheSize: 256
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.normalCacheSize: 64
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.cacheTrimInterval: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.useCacheForAllThreads: true
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>(auto-detected)
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>false
>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>127.0.0.1)
>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>pooled
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.threadLocalDirectBufferSize: 65536
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 36915
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'sparkDriver' on port 36915.
>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>org.apache.spark.serializer.KryoSerializer
>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>information
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>BlockManagerMasterEndpoint up
>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>capacity 366.3 MB
>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>> 19/06/27 15:59:08 DEBUG
>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>SSLOptions{enabled=false, port=None, keyStore=None,
>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>protocol=None, enabledAlgorithms=Set()}
>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>on port 4040.
>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>started at http://192.168.1.161:4040
>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>at
>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>with timestamp 1561676348562
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>Connecting to master spark://master:7077...
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>connection to master/192.168.3.13:7077
>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>master/192.168.3.13:7077 successful, running bootstraps...
>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>bootstraps)
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>16
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>Spark cluster with app ID app-20190627155908-0005
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/0 on
>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/1 on
>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 39189
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>39189.
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/2 on
>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>master:39189
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/3 on
>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/4 on
>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO BlockManager: Using
>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>replication policy
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/0 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/3 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/4 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/1 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/2 is now RUNNING
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>master
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>is ready for scheduling beginning after reached
>>minRegisteredResourcesRatio: 0.0
>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.use.legacy.blockreader.local = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.read.shortcircuit = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.domain.socket.data.traffic = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>rpcRequestWrapperClass=class
>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>org.apache.hadoop.ipc.Client@3ed03652
>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>local reads and UNIX domain socket are disabled.
>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>not using SaslPropertiesResolver, no QOP found in configuration for
>>dfs.data.transfer.protection
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>took  115 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>without replication took  117 ms
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>locally took  6 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>broadcast_0_piece0 without replication took  6 ms
>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>newAPIHadoopFile at TeraSort.scala:60
>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>NameNode-1/192.168.3.7:54310
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>connections 1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>31ms
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>FileStatuses: 134
>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>: 2
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>by getSplits: 2, TimeTaken: 139
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>Boolean) constructor
>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>Algorithm version is 1
>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>masked=rwxr-xr-x
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>$anonfun$write$1
>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>($anonfun$write$1) is now cleaned +++
>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>SparkHadoopWriter.scala:78
>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>400
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>org.apache.spark.serializer.CrailSparkSerializer
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.outstanding 1
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.storageclass 0
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.broadcast.storageclass 0
>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>> 19/06/27 15:59:10 INFO crail: crail.user crail
>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>> 19/06/27 15:59:10 INFO crail: crail.debug true
>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>crail://192.168.1.164:9060
>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>roundrobin
>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>bufferCount 1024
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>/192.168.1.164:9060
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>DIRECTORY, storageAffinity 0, locationAffinity 0
>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>streamId 1, isDir true, writeHint 0
>> 19/06/27 15:59:10 INFO crail: passive data client
>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>139811924587312
>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 64
>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>/192.168.3.100:4420
>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>467, pd 1
>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>139810647883744
>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>64
>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>send_wr size 32, recv_wr_size 32
>> 19/06/27 15:59:10 INFO disni: connect, id 0
>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.11:35854) with ID 0
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.12:44312) with ID 1
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.8:34774) with ID 4
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.9:58808) with ID 2
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.11
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>192.168.3.11, 41919, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.12
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>192.168.3.12, 46697, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.8
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>192.168.3.8, 37281, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.9
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>192.168.3.9, 43857, None)
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.10:40100) with ID 3
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.10
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>192.168.3.10, 38527, None)
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>>
>> Regards,
>>
>>           David
>>
>


RE: Setting up storage class 1 and 2

Posted by David Crespi <da...@storedgesystems.com>.
A standard pull from the repo, one that didn’t have the patches from your private repo.

I can put patches back in both the client and server containers if you really think it

would make a difference.



Are you guys running multiple types together?  I’m running a RDMA storage class 1,

a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get errors when the

RDMA is introduced into the mix.  I have a small amount of memory (4GB) assigned

with the RDMA tier, and looking for it to fall into the NVMf class 1 tier.  It appears to want

to do that, but gets screwed up… it looks like it’s trying to create another set of qp’s for

an RDMA connection.  It even blew up spdk trying to accomplish that.



Do you guys have some documentation that shows what’s been tested (mixes/variations) so far?



Regards,



           David





________________________________
From: Jonas Pfefferle <pe...@japf.ch>
Sent: Monday, July 1, 2019 12:51:09 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


Can you clarify which unpatched version you are talking about? Are you
talking about the NVMf thread fix where I send you a link to a branch in my
repository or the fix we provided earlier for the Spark hang in the Crail
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

  On Fri, 28 Jun 2019 17:59:32 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> Jonas,
>FYI - I went back to using the unpatched version of crail on the
>clients and it appears to work
> okay now with the shuffle and RDMA, with only the RDMA containers
>running on the server.
>
> Regards,
>
>           David
>
>
> ________________________________
>From: David Crespi
> Sent: Friday, June 28, 2019 7:49:51 AM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
>
>
> Oh, and while I’m thinking about it Jonas, when I added the patches
>you provided the other day, I only
>
> added them to the spark containers (clients) not to my crail
>containers running on my storage server.
>
> Should the patches been added to all of the containers?
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pe...@japf.ch>
> Sent: Friday, June 28, 2019 12:54:27 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> At the moment, it is possible to add a NVMf datanode even if only
>the RDMA
> storage type is specified in the config. As you have seen this will
>go wrong
> as soon as a client tries to connect to the datanode. Make sure to
>start the
> RDMA datanode with the appropriate classname, see:
> https://incubator-crail.readthedocs.io/en/latest/run.html
> The correct classname is
>org.apache.crail.storage.rdma.RdmaStorageTier.
>
> Regards,
> Jonas
>
>  On Thu, 27 Jun 2019 23:09:26 +0000
>  David Crespi <da...@storedgesystems.com> wrote:
>> Hi,
>> I’m trying to integrate the storage classes and I’m hitting another
>>issue when running terasort and just
>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>just sits, after the following
>> message:
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>containers from the server, and I’m only running
>> the namenode and a rdma storage class 1 datanode.  My spark
>>configuration is also now only looking at
>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>port in the INFO messages seen below.
>> I must be configuring something wrong, but I’ve not been able to
>>track it down.  Any thoughts?
>>
>>
>> ************************************
>>         TeraSort
>> ************************************
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>where applicable
>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>authentication disabled; ui acls disabled; users  with view
>>permissions: Set(hduser); groups with view permissions: Set(); users
>> with modify permissions: Set(hduser); groups with modify
>>permissions: Set()
>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>default logging framework
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>-Dio.netty.eventLoopThreads: 112
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>false
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.theUnsafe: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.copyMemory: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>constructor: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>available, true
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>prior to Java9
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>java.nio.DirectByteBuffer.<init>(long, int): available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>(java.io.tmpdir)
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>(sun.arch.data.model)
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.noPreferDirect: false
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>available
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.noKeySetOptimization: false
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.selectorAutoRebuildThreshold: 512
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>org.jctools-core.MpscChunkedArrayQueue: available
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.level: simple
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.targetRecords: 4
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numHeapArenas: 9
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numDirectArenas: 10
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.pageSize: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxOrder: 11
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.chunkSize: 16777216
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.tinyCacheSize: 512
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.smallCacheSize: 256
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.normalCacheSize: 64
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.cacheTrimInterval: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.useCacheForAllThreads: true
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>(auto-detected)
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>false
>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>127.0.0.1)
>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>pooled
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.threadLocalDirectBufferSize: 65536
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 36915
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'sparkDriver' on port 36915.
>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>org.apache.spark.serializer.KryoSerializer
>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>information
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>BlockManagerMasterEndpoint up
>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>capacity 366.3 MB
>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>> 19/06/27 15:59:08 DEBUG
>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>SSLOptions{enabled=false, port=None, keyStore=None,
>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>protocol=None, enabledAlgorithms=Set()}
>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>on port 4040.
>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>started at http://192.168.1.161:4040
>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>at
>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>with timestamp 1561676348562
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>Connecting to master spark://master:7077...
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>connection to master/192.168.3.13:7077
>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>master/192.168.3.13:7077 successful, running bootstraps...
>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>bootstraps)
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>16
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>Spark cluster with app ID app-20190627155908-0005
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/0 on
>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/1 on
>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 39189
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>39189.
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/2 on
>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>master:39189
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/3 on
>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/4 on
>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO BlockManager: Using
>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>replication policy
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/0 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/3 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/4 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/1 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/2 is now RUNNING
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>master
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>is ready for scheduling beginning after reached
>>minRegisteredResourcesRatio: 0.0
>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.use.legacy.blockreader.local = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.read.shortcircuit = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.domain.socket.data.traffic = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>rpcRequestWrapperClass=class
>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>org.apache.hadoop.ipc.Client@3ed03652
>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>local reads and UNIX domain socket are disabled.
>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>not using SaslPropertiesResolver, no QOP found in configuration for
>>dfs.data.transfer.protection
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>took  115 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>without replication took  117 ms
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>locally took  6 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>broadcast_0_piece0 without replication took  6 ms
>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>newAPIHadoopFile at TeraSort.scala:60
>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>NameNode-1/192.168.3.7:54310
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>connections 1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>31ms
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>FileStatuses: 134
>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>: 2
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>by getSplits: 2, TimeTaken: 139
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>Boolean) constructor
>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>Algorithm version is 1
>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>masked=rwxr-xr-x
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>$anonfun$write$1
>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>($anonfun$write$1) is now cleaned +++
>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>SparkHadoopWriter.scala:78
>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>400
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>org.apache.spark.serializer.CrailSparkSerializer
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.outstanding 1
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.storageclass 0
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.broadcast.storageclass 0
>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>> 19/06/27 15:59:10 INFO crail: crail.user crail
>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>> 19/06/27 15:59:10 INFO crail: crail.debug true
>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>crail://192.168.1.164:9060
>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>roundrobin
>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>bufferCount 1024
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>/192.168.1.164:9060
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>DIRECTORY, storageAffinity 0, locationAffinity 0
>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>streamId 1, isDir true, writeHint 0
>> 19/06/27 15:59:10 INFO crail: passive data client
>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>139811924587312
>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 64
>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>/192.168.3.100:4420
>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>467, pd 1
>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>139810647883744
>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>64
>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>send_wr size 32, recv_wr_size 32
>> 19/06/27 15:59:10 INFO disni: connect, id 0
>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.11:35854) with ID 0
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.12:44312) with ID 1
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.8:34774) with ID 4
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.9:58808) with ID 2
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.11
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>192.168.3.11, 41919, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.12
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>192.168.3.12, 46697, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.8
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>192.168.3.8, 37281, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.9
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>192.168.3.9, 43857, None)
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.10:40100) with ID 3
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.10
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>192.168.3.10, 38527, None)
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>>
>> Regards,
>>
>>           David
>>
>