You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/12/05 21:37:00 UTC
[jira] [Commented] (TINKERPOP-2831) Throw NoSuchElementException frequently which slowers the performance

    [ https://issues.apache.org/jira/browse/TINKERPOP-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643561#comment-17643561 ] 

ASF GitHub Bot commented on TINKERPOP-2831:
-------------------------------------------

cole-bq commented on PR #1873:
URL: https://github.com/apache/tinkerpop/pull/1873#issuecomment-1338208731

   I would like to take some time to explore an alternative solution to this issue. The main drawback of switching to FastNoSuchElementException is that it does not include a stack trace. There are some concerns about the impact of removing the stack trace from this exception.
   The current implementation of hasNext() throws and catches NoSuchElementException's which gives it the same performance bottleneck you are trying to avoid. I want to try and speed up hasNext() so that users can check it before calling next() if they want to avoid the performance penalty associated with the NoSuchElementException.
   I believe that a fast implementation of hasNext() would remove the need for this PR and would avoid the need to trade better exceptions for better performance.




> Throw NoSuchElementException frequently which slowers the performance
> ---------------------------------------------------------------------
>
>                 Key: TINKERPOP-2831
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2831
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.5.4
>            Reporter: Redriver
>            Priority: Major
>         Attachments: Screen Shot 2022-11-24 at 11.35.40.png
>
>
> When I run g.V().label().groupCount() on a huge graph: 600m vertices + 6 billion edges, the JVM async profiler exposed that the NoSuchElementException is a hotspot. In fact, that exception is used to inform the caller that the iteration end reached, so the stack trace information is not used. In addition, creating a new exception everytime is also not necessary.
> {code:java}
> java.lang.Throwable.fillInStackTrace(Native Method)
> java.lang.Throwable.fillInStackTrace(Throwable.java:783) => holding Monitor(java.util.NoSuchElementException@1860956919})
> java.lang.Throwable.<init>(Throwable.java:250)
> java.lang.Exception.<init>(Exception.java:54)
> java.lang.RuntimeException.<init>(RuntimeException.java:51)
> java.util.NoSuchElementException.<init>(NoSuchElementException.java:46)
> org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraphIterator.next(TinkerGraphIterator.java:63)
> org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.getOrCreateVertex(JanusGraphVertexDeserializer.java:192)
> org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:153)
> org.janusgraph.hadoop.formats.util.HadoopRecordReader.nextKeyValue(HadoopRecordReader.java:69)
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:220)
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:348)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
> org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> org.apache.spark.scheduler.Task.run(Task.scala:121)
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:416)
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:422)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)