You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crail.apache.org by David Crespi <da...@storedgesystems.com> on 2020/02/03 16:51:29 UTC

RE: Issue with RPC: getBlock

In the runs below I usually pass with an input filesize fo 4G or less.  8G and above
always fails.  The shuffle data always seems to be complete in Crail, but the sort data that gets written to HDFS is most of the time 0, but sometimes gets to be up to 100M.  This morning I turned the logging level to DEBUG for all the crail stuff.  Now the 8G filesize runs to completion nearly every time.  Now it’s 16G and above that fails.  This results seems to be related to a “race” condition.  Does anyone know of code areas related to all this that may be suspect?  I haven’t had any comments on any of these from the community.

Here is the general config and debug output… if anyone sees anything I’ve missed, please let me know.

**************************************
         TeraSort

                         number of workers: 5
       number of executors/worker: 1
            number of cores/executor: 8
              total executor cores used: 40

           Master container memory: 4.0G
           Worker container memory: 16.0G
                           executor memory: 12.8G
                                driver memory: 3.2G
                                 java Heap size: 5.6G
                  java Max Memory size: 11.2G
              Max Direct Memory size: 4.8G
**************************************


Since I’m running with a single executor, the error:

(ERROR Executor: Exception in task 46.3 in stage 1.0 (TID 200)

java.lang.IndexOutOfBoundsException: Index: 51, Size: 0)

shouldn’t be occurring under this condition, unless it is a real error correct?
It looks from the error logs that it’s get 10 read hits before it dies.  If this were
a multiple executor issue, wouldn’t this error occur before or at the first read?
I also notice several other of these exact same errors, but these others haven’t
Hit the 4th retry yet.

Here’s a small part of the log… I can supply LOTs of data if needed.  I’d really like
to figure out what’s going on here, so I can get to some really LARGE files!


20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_46/6-4-35352996

20/01/31 10:05:39 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:39 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_46/6-4-35352996, success, fd 4744

20/01/31 10:05:39 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_46/6-4-35352996, fd 4744, streamId 4661, isDir false, readHint 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4744, token 0, position 1048576, capacity 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4744, token 0, position 2097152, capacity 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4744, token 0, position 3145728, capacity 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_46/6-2-35352998

20/01/31 10:05:39 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:39 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_46/6-2-35352998, success, fd 4017

20/01/31 10:05:39 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_46/6-2-35352998, fd 4017, streamId 4662, isDir false, readHint 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4017, token 0, position 1048576, capacity 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4017, token 0, position 2097152, capacity 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4017, token 0, position 3145728, capacity 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_46/7-2-35352998

20/01/31 10:05:39 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:39 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_46/7-2-35352998, success, fd 4495

20/01/31 10:05:39 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_46/7-2-35352998, fd 4495, streamId 4663, isDir false, readHint 3555664

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4495, token 0, position 1048576, capacity 3555664

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG TaskMemoryManager: unreleased 156.8 MB memory from org.apache.spark.util.collection.ExternalSorter@53f34ecf

20/01/31 10:05:39 ERROR Executor: Exception in task 46.3 in stage 1.0 (TID 200)

java.lang.IndexOutOfBoundsException: Index: 51, Size: 0

         at java.util.ArrayList.rangeCheck(ArrayList.java:657)

         at java.util.ArrayList.get(ArrayList.java:433)

         at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)

         at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)

         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:811)

         at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:278)

         at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)

         at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)

         at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)

         at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)

         at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)

         at org.apache.spark.shuffle.CrailSparkShuffleSorter.sort(CrailSparkShuffleSorter.scala:41)

         at org.apache.spark.shuffle.crail.CrailShuffleReader.read(CrailShuffleReader.scala:53)

         at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)

         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

         at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

         at org.apache.spark.scheduler.Task.run(Task.scala:121)

         at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)

         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)

         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

         at java.lang.Thread.run(Thread.java:748)

20/01/31 10:05:40 INFO CoarseGrainedExecutorBackend: Got assigned task 201

20/01/31 10:05:40 INFO Executor: Running task 49.2 in stage 1.0 (TID 201)

20/01/31 10:05:40 DEBUG Executor: Task 201's epoch is 1

20/01/31 10:05:40 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_49/

20/01/31 10:05:40 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:40 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_49/, success, fd 60

20/01/31 10:05:40 INFO crail: getDirectoryList: /spark/shuffle/shuffle_0/part_49/

20/01/31 10:05:40 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:40 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_49/, fd 60, streamId 4664, isDir false, readHint 0

20/01/31 10:05:40 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:40 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_49/6-0-35352995

20/01/31 10:05:40 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:40 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_49/6-0-35352995, success, fd 3936

20/01/31 10:05:40 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_49/6-0-35352995, fd 3936, streamId 4665, isDir false, readHint 3555770





Regards,
      David



From: David Crespi<ma...@storedgesystems.com>
Sent: Monday, January 27, 2020 7:13 AM
To: dev@crail.apache.org<ma...@crail.apache.org>; dev@crail.incubator.apache.org<ma...@crail.incubator.apache.org>
Subject: RE: Issue with RPC: getBlock

One additional piece of info… Looks like this issue only comes up when the input file size is larger than 4GB… is there a variable or something that is the wrong size perhaps?  For files over 4GB, the time to hit this issue varies, sometimes getting quite a bit of shuffle data before failing.  Running with 4GB and smaller doesn’t seem to fail.

Any suggestions??

Regards,
      David

________________________________
From: David Crespi <da...@storedgesystems.com>
Sent: Sunday, January 12, 2020 11:36:26 AM
To: dev@crail.incubator.apache.org <de...@crail.incubator.apache.org>
Subject: Issue with RPC: getBlock

Hi,
Trying to run terasort with the latest crail (v1.2-rc2-1-g8a739dd) and I’m getting the error below.

(Job aborted due to stage failure: Task 36 in stage 1.0 failed 4 times, most recent failure: Lost task 36.3 in stage 1.0)

there is never a getBlock call to that fd (19318) for that task, and I also see that the previous fd(19153)
is called 6 times, but with different positions.  Is that wrong, as in perhaps the namenode is
getting a collision or is stuck?  I also only see these tasks (36.x) running on one executor.

BTW, I should note that I’m not running with,

com.ibm.crail.terasort.sorter.CrailShuffleNativeRadixSorter
or
com.ibm.crail.terasort.serializer.F22Serializer

as I couldn’t get them to run without error.  I’m getting a “NYI” assertion error when those are used.
Would this matter?


20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_36/1-4-35352996

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_36/1-4-35352996, success, fd 19318

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_36/1-4-35352996, fd 19318, streamId 836, isDir false, readHint 4754948

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 2097152, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 3145728, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 4194304, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_54/1-3-35352997

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_54/1-3-35352997, success, fd 19079

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_54/1-3-35352997, fd 19079, streamId 837, isDir false, readHint 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 1048576, capacity 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 5242880, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_36/3-1-35352995

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_36/3-1-35352995, success, fd 18715

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_36/3-1-35352995, fd 18715, streamId 838, isDir false, readHint 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 6291456, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 1048576, capacity 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 2097152, capacity 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 3145728, capacity 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 2097152, capacity 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 3145728, capacity 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_55/1-4-35352996

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_55/1-4-35352996, success, fd 19337

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_55/1-4-35352996, fd 19337, streamId 839, isDir false, readHint 4764488



Regards,

           David





RE: Issue with RPC: getBlock

Posted by David Crespi <da...@storedgesystems.com>.
Oh, and I forgot, I meant to ask what’s the typical slice size used when
dealing with large files (as in terasort)?  I have mine currently set to 131072.
I’m not sure it really have much to do with this, but it’s worth asking.

Also, is there a “typical” configuration posted anywhere… I’ve looked around
a bit to see if there is anything listed, but haven’t found one that even close.

Regards,
      David


From: David Crespi<ma...@storedgesystems.com>
Sent: Monday, February 3, 2020 8:51 AM
To: dev@crail.apache.org<ma...@crail.apache.org>; dev@crail.incubator.apache.org<ma...@crail.incubator.apache.org>
Subject: RE: Issue with RPC: getBlock

In the runs below I usually pass with an input filesize fo 4G or less.  8G and above
always fails.  The shuffle data always seems to be complete in Crail, but the sort data that gets written to HDFS is most of the time 0, but sometimes gets to be up to 100M.  This morning I turned the logging level to DEBUG for all the crail stuff.  Now the 8G filesize runs to completion nearly every time.  Now it’s 16G and above that fails.  This results seems to be related to a “race” condition.  Does anyone know of code areas related to all this that may be suspect?  I haven’t had any comments on any of these from the community.

Here is the general config and debug output… if anyone sees anything I’ve missed, please let me know.

**************************************
         TeraSort

                         number of workers: 5
       number of executors/worker: 1
            number of cores/executor: 8
              total executor cores used: 40

           Master container memory: 4.0G
           Worker container memory: 16.0G
                           executor memory: 12.8G
                                driver memory: 3.2G
                                 java Heap size: 5.6G
                  java Max Memory size: 11.2G
              Max Direct Memory size: 4.8G
**************************************


Since I’m running with a single executor, the error:

(ERROR Executor: Exception in task 46.3 in stage 1.0 (TID 200)

java.lang.IndexOutOfBoundsException: Index: 51, Size: 0)

shouldn’t be occurring under this condition, unless it is a real error correct?
It looks from the error logs that it’s get 10 read hits before it dies.  If this were
a multiple executor issue, wouldn’t this error occur before or at the first read?
I also notice several other of these exact same errors, but these others haven’t
Hit the 4th retry yet.

Here’s a small part of the log… I can supply LOTs of data if needed.  I’d really like
to figure out what’s going on here, so I can get to some really LARGE files!


20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_46/6-4-35352996

20/01/31 10:05:39 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:39 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_46/6-4-35352996, success, fd 4744

20/01/31 10:05:39 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_46/6-4-35352996, fd 4744, streamId 4661, isDir false, readHint 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4744, token 0, position 1048576, capacity 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4744, token 0, position 2097152, capacity 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4744, token 0, position 3145728, capacity 3540294

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_46/6-2-35352998

20/01/31 10:05:39 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:39 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_46/6-2-35352998, success, fd 4017

20/01/31 10:05:39 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_46/6-2-35352998, fd 4017, streamId 4662, isDir false, readHint 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4017, token 0, position 1048576, capacity 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4017, token 0, position 2097152, capacity 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4017, token 0, position 3145728, capacity 3544534

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_46/7-2-35352998

20/01/31 10:05:39 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:39 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_46/7-2-35352998, success, fd 4495

20/01/31 10:05:39 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_46/7-2-35352998, fd 4495, streamId 4663, isDir false, readHint 3555664

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG crail: RPC: getBlock, fd 4495, token 0, position 1048576, capacity 3555664

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:39 DEBUG TaskMemoryManager: unreleased 156.8 MB memory from org.apache.spark.util.collection.ExternalSorter@53f34ecf

20/01/31 10:05:39 ERROR Executor: Exception in task 46.3 in stage 1.0 (TID 200)

java.lang.IndexOutOfBoundsException: Index: 51, Size: 0

         at java.util.ArrayList.rangeCheck(ArrayList.java:657)

         at java.util.ArrayList.get(ArrayList.java:433)

         at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)

         at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)

         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:811)

         at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:278)

         at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)

         at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)

         at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)

         at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)

         at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)

         at org.apache.spark.shuffle.CrailSparkShuffleSorter.sort(CrailSparkShuffleSorter.scala:41)

         at org.apache.spark.shuffle.crail.CrailShuffleReader.read(CrailShuffleReader.scala:53)

         at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)

         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

         at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

         at org.apache.spark.scheduler.Task.run(Task.scala:121)

         at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)

         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)

         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

         at java.lang.Thread.run(Thread.java:748)

20/01/31 10:05:40 INFO CoarseGrainedExecutorBackend: Got assigned task 201

20/01/31 10:05:40 INFO Executor: Running task 49.2 in stage 1.0 (TID 201)

20/01/31 10:05:40 DEBUG Executor: Task 201's epoch is 1

20/01/31 10:05:40 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_49/

20/01/31 10:05:40 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:40 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_49/, success, fd 60

20/01/31 10:05:40 INFO crail: getDirectoryList: /spark/shuffle/shuffle_0/part_49/

20/01/31 10:05:40 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:40 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_49/, fd 60, streamId 4664, isDir false, readHint 0

20/01/31 10:05:40 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/31 10:05:40 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_49/6-0-35352995

20/01/31 10:05:40 DEBUG crail: RPC: getFile, writeable false

20/01/31 10:05:40 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_49/6-0-35352995, success, fd 3936

20/01/31 10:05:40 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_49/6-0-35352995, fd 3936, streamId 4665, isDir false, readHint 3555770





Regards,
      David



From: David Crespi<ma...@storedgesystems.com>
Sent: Monday, January 27, 2020 7:13 AM
To: dev@crail.apache.org<ma...@crail.apache.org>; dev@crail.incubator.apache.org<ma...@crail.incubator.apache.org>
Subject: RE: Issue with RPC: getBlock

One additional piece of info… Looks like this issue only comes up when the input file size is larger than 4GB… is there a variable or something that is the wrong size perhaps?  For files over 4GB, the time to hit this issue varies, sometimes getting quite a bit of shuffle data before failing.  Running with 4GB and smaller doesn’t seem to fail.

Any suggestions??

Regards,
      David

________________________________
From: David Crespi <da...@storedgesystems.com>
Sent: Sunday, January 12, 2020 11:36:26 AM
To: dev@crail.incubator.apache.org <de...@crail.incubator.apache.org>
Subject: Issue with RPC: getBlock

Hi,
Trying to run terasort with the latest crail (v1.2-rc2-1-g8a739dd) and I’m getting the error below.

(Job aborted due to stage failure: Task 36 in stage 1.0 failed 4 times, most recent failure: Lost task 36.3 in stage 1.0)

there is never a getBlock call to that fd (19318) for that task, and I also see that the previous fd(19153)
is called 6 times, but with different positions.  Is that wrong, as in perhaps the namenode is
getting a collision or is stuck?  I also only see these tasks (36.x) running on one executor.

BTW, I should note that I’m not running with,

com.ibm.crail.terasort.sorter.CrailShuffleNativeRadixSorter
or
com.ibm.crail.terasort.serializer.F22Serializer

as I couldn’t get them to run without error.  I’m getting a “NYI” assertion error when those are used.
Would this matter?


20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_36/1-4-35352996

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_36/1-4-35352996, success, fd 19318

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_36/1-4-35352996, fd 19318, streamId 836, isDir false, readHint 4754948

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 2097152, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 3145728, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 4194304, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_54/1-3-35352997

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_54/1-3-35352997, success, fd 19079

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_54/1-3-35352997, fd 19079, streamId 837, isDir false, readHint 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 1048576, capacity 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 5242880, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_36/3-1-35352995

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_36/3-1-35352995, success, fd 18715

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_36/3-1-35352995, fd 18715, streamId 838, isDir false, readHint 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 6291456, capacity 7070730

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 1048576, capacity 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 2097152, capacity 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 3145728, capacity 7086206

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 2097152, capacity 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 3145728, capacity 9487318

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0

20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_55/1-4-35352996

20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false

20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_55/1-4-35352996, success, fd 19337

20/01/09 10:34:35 INFO crail: CoreInputStream: open, path  /spark/shuffle/shuffle_0/part_55/1-4-35352996, fd 19337, streamId 839, isDir false, readHint 4764488



Regards,

           David