You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/29 15:21:03 UTC

[GitHub] [spark] tgravescs commented on issue #24515: [SPARK-14083][WIP] Basic bytecode analyzer to speed up Datasets

tgravescs commented on issue #24515: [SPARK-14083][WIP] Basic bytecode analyzer to speed up Datasets
URL: https://github.com/apache/spark/pull/24515#issuecomment-496983351

thanks for posting this, I haven't looked at the code in detail yet, but we are also interested in this area. In particular we are interested in converting lambdas and udfs into full catalyst expressions. Once you have the catalyst expression you can do more optimizations with it. In our cases we started to look at this for the columnar processing side of things. If you have the catalyst expression then you can map that into a GPU operation. The more time you can keep the data on the GPU, the more performance you can gain. Copying back and forth is inefficient. I think that applies for many types of columnar processing, if you can keep it in columnar without having to switch back and forth to rows, the more benefit you will have.

We had a few people start to look at this. Originally they started with javasist but then switched to use JVMCI (but that is only available in jdk > 8 and very specific oracle versions of jdk8). The main reason they switched from javassit is when there are multiple lambdas within the same class they couldn't differentiate them since the lamdba classes generated at runtime can't be relfected or instrumented. If you only have 1 lambda per class it was fine. I'm not sure how many times this will be an issue but thought I would mention it here.
Also like you mention scala 2.12 doesn't always SAM-convert lambda functions, some of that is documented here: https://www.scala-lang.org/news/2.12.0/#java-8-style-bytecode-for-lambdas.

I agree with many of your points that need to be discussed and decided upon. I think if we can keep it pluggable like you are proposing people can try different things out. I think one of the main things is to know when to not try to convert or give up. If you do that quickly enough

I'm very curious of other people experience here, @rednaxelafx have you had time to write up your thoughts from previous experience?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org