You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/06/08 20:58:00 UTC

[jira] [Commented] (TINKERPOP-1979) Several OLAP issues in MathStep

    [ https://issues.apache.org/jira/browse/TINKERPOP-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506554#comment-16506554 ] 

ASF GitHub Bot commented on TINKERPOP-1979:
-------------------------------------------

GitHub user spmallette opened a pull request:

    https://github.com/apache/tinkerpop/pull/874

    TINKERPOP-1979 Fix math() on OLAP/Spark

    https://issues.apache.org/jira/browse/TINKERPOP-1979
    
    Should be working now - just a serialization problem and fix to dealing with requirements for `PathProcessor` on `MathStep`. Note that you still can't access path labels with `math()` in OLAP after this fix or you will run afoul of standard behavior of `ComputerVerificationStrategy`. I think that's just a limitation of the overall design with OLAP and path.
    
    All tests pass with `docker/build.sh -t -n -i`
    
    VOTE +1

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/tinkerpop TINKERPOP-1979

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #874
    
----
commit 9f8f3b61b8e69624b2ac7aac3c98dcace55a079f
Author: Stephen Mallette <sp...@...>
Date:   2018-06-08T16:50:00Z

    TINKERPOP-1979 Fixed OLAP bug with math() step
    
    math() can now execute in OLAP with by() modulators if the expression given to math() does not access path labels.

commit a6e0a2d55c6b5bb9a710a53b8dfc208696416fe2
Author: Stephen Mallette <sp...@...>
Date:   2018-06-08T20:53:30Z

    TINKERPOP-1979 Get math() working on spark properly
    
    The Expression class was not serializable and Spark was not happy. Wrapped it up in another class that was and now tests work on spark ok.

----


> Several OLAP issues in MathStep
> -------------------------------
>
>                 Key: TINKERPOP-1979
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1979
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: process
>            Reporter: Daniel Kuppitz
>            Priority: Major
>
> {{MathStep}} has a few issues when used in an OLAP query and I think the worst thing is that both of them are silently ignored by our test suite.
> {noformat}
> gremlin> g = TinkerFactory.createModern().traversal().withComputer()
> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
> gremlin> g.V().outE().math("0-_").by("weight")
> It is not possible to access more than a path element's id on GraphComputer: MathStep(0-_,[value(weight)]) requires PROPERTIES
> Type ':help' or ':h' for help.
> Display stack trace? [yN]
> {noformat}
> This one is just wrong and I'm not sure why {{ComputerVerificationStrategy}} assumes that the step is trying to leave the star graph. The next one works for {{TinkerGraphComputer}} but fails in Spark with a serialization-related exception:
> {noformat}
> g.V().values("age").math("_")
> // leads to the following (ignored) exception:
> java.lang.IllegalStateException: org.apache.spark.SparkException: Task not serializable
> 	at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88)
> 	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
> 	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50)
> 	at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68)
> 	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128)
> 	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38)
> 	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:200)
> 	at org.apache.tinkerpop.gremlin.process.traversal.Traversal.forEachRemaining(Traversal.java:265)
> 	at org.apache.tinkerpop.gremlin.process.traversal.step.map.MathTest.serializationTest(MathTest.java:63)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 	at org.apache.tinkerpop.gremlin.process.GremlinProcessRunner.runChild(GremlinProcessRunner.java:53)
> 	at org.apache.tinkerpop.gremlin.process.GremlinProcessRunner.runChild(GremlinProcessRunner.java:37)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 	at org.junit.runners.Suite.runChild(Suite.java:128)
> 	at org.apache.tinkerpop.gremlin.AbstractGremlinSuite.runChild(AbstractGremlinSuite.java:213)
> 	at org.apache.tinkerpop.gremlin.AbstractGremlinSuite.runChild(AbstractGremlinSuite.java:51)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 	at org.apache.tinkerpop.gremlin.AbstractGremlinSuite$1.evaluate(AbstractGremlinSuite.java:222)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> 	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> 	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> 	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> 	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Task not serializable
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> 	at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68)
> 	... 43 more
> Caused by: org.apache.spark.SparkException: Task not serializable
> 	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
> 	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> 	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> 	at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:794)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:793)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> 	at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> 	at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:793)
> 	at org.apache.spark.api.java.JavaRDDLike$class.mapPartitionsToPair(JavaRDDLike.scala:212)
> 	at org.apache.spark.api.java.AbstractJavaRDDLike.mapPartitionsToPair(JavaRDDLike.scala:45)
> 	at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.executeVertexProgramIteration(SparkExecutor.java:92)
> 	at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:399)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.NotSerializableException: net.objecthunter.exp4j.Expression
> Serialization stack:
> 	- object not serializable (class: net.objecthunter.exp4j.Expression, value: net.objecthunter.exp4j.Expression@4007f65e)
> 	- field (class: org.apache.tinkerpop.gremlin.process.traversal.step.map.MathStep, name: expression, type: class net.objecthunter.exp4j.Expression)
> 	- object (class org.apache.tinkerpop.gremlin.process.traversal.step.map.MathStep, MathStep(_))
> 	- field (class: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal, name: finalEndStep, type: interface org.apache.tinkerpop.gremlin.process.traversal.Step)
> 	- object (class org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal, [GraphStep(vertex,[]), PropertiesStep([age],value), MathStep(_)])
> 	- field (class: org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal, name: pureTraversal, type: interface org.apache.tinkerpop.gremlin.process.traversal.Traversal$Admin)
> 	- object (class org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal, [GraphStep(vertex,[]), PropertiesStep([age],value), MathStep(_)])
> 	- writeObject data (class: java.util.HashMap)
> 	- object (class java.util.HashMap, {gremlin.vertexProgram=org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram, gremlin.traversalVertexProgram.traversal=[GraphStep(vertex,[]), PropertiesStep([age],value), MathStep(_)]})
> 	- field (class: org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration, name: properties, type: interface java.util.Map)
> 	- object (class org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration, org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration@70544294)
> 	- element of array (index: 1)
> 	- array (class [Ljava.lang.Object;, size 3)
> 	- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
> 	- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor, functionalInterfaceMethod=org/apache/spark/api/java/function/PairFlatMapFunction.call:(Ljava/lang/Object;)Ljava/util/Iterator;, implementation=invokeStatic org/apache/tinkerpop/gremlin/spark/process/computer/SparkExecutor.lambda$executeVertexProgramIteration$35c6b113$1:(Lorg/apache/commons/configuration/Configuration;Lorg/apache/commons/configuration/Configuration;Lorg/apache/tinkerpop/gremlin/spark/process/computer/SparkMemory;Ljava/util/Iterator;)Ljava/util/Iterator;, instantiatedMethodType=(Ljava/util/Iterator;)Ljava/util/Iterator;, numCaptured=3])
> 	- writeReplace data (class: java.lang.invoke.SerializedLambda)
> 	- object (class org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor$$Lambda$122/1549074911, org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor$$Lambda$122/1549074911@3423a64e)
> 	- field (class: org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$9$1, name: f$11, type: interface org.apache.spark.api.java.function.PairFlatMapFunction)
> 	- object (class org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$9$1, <function1>)
> 	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
> 	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
> 	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> 	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> 	... 17 more
> {noformat}
> Only discovered/tested in {{master/}}. I think it's gonna be the same in {{tp33}}, haven't tested it though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)