You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Justin Pihony <ju...@gmail.com> on 2017/02/23 06:52:31 UTC

Is there a list of missing optimizations for typed functions?

I was curious if there was introspection of certain typed functions and ran
the following two queries:

ds.where($"col" > 1).explain
ds.filter(_.col > 1).explain

And found that the typed function does NOT result in a PushedFilter. I
imagine this is due to a limited view of the function, so I have two
questions really:

1.) Is there a list of the methods that lose some of the optimizations that
you get from non-functional methods? Is it any method that accepts a generic
function?
2.) Is there any work to attempt reflection and gain some of these
optimizations back? I couldn't find anything in JIRA.

Thanks,
Justin Pihony



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-list-of-missing-optimizations-for-typed-functions-tp28418.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Is there a list of missing optimizations for typed functions?

Posted by lihu <li...@gmail.com>.
Hi, you can refer to https://issues.apache.org/jira/browse/SPARK-14083 for
more detail.

For performance issue,it is better to using the DataFrame than DataSet API.

On Sat, Feb 25, 2017 at 2:45 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Justin,
>
> I have never seen such a list. I think the area is in heavy development
> esp. optimizations for typed operations.
>
> There's a JIRA to somehow find out more on the behavior of Scala code
> (non-Column-based one from your list) but I've seen no activity in this
> area. That's why for now Column-based untyped queries could be faster due
> to more optimizations applied. Same about UDFs.
>
> Jacek
>
> On 23 Feb 2017 7:52 a.m., "Justin Pihony" <ju...@gmail.com> wrote:
>
>> I was curious if there was introspection of certain typed functions and
>> ran
>> the following two queries:
>>
>> ds.where($"col" > 1).explain
>> ds.filter(_.col > 1).explain
>>
>> And found that the typed function does NOT result in a PushedFilter. I
>> imagine this is due to a limited view of the function, so I have two
>> questions really:
>>
>> 1.) Is there a list of the methods that lose some of the optimizations
>> that
>> you get from non-functional methods? Is it any method that accepts a
>> generic
>> function?
>> 2.) Is there any work to attempt reflection and gain some of these
>> optimizations back? I couldn't find anything in JIRA.
>>
>> Thanks,
>> Justin Pihony
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Is-there-a-list-of-missing-optimizatio
>> ns-for-typed-functions-tp28418.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Is there a list of missing optimizations for typed functions?

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Justin,

I have never seen such a list. I think the area is in heavy development
esp. optimizations for typed operations.

There's a JIRA to somehow find out more on the behavior of Scala code
(non-Column-based one from your list) but I've seen no activity in this
area. That's why for now Column-based untyped queries could be faster due
to more optimizations applied. Same about UDFs.

Jacek

On 23 Feb 2017 7:52 a.m., "Justin Pihony" <ju...@gmail.com> wrote:

> I was curious if there was introspection of certain typed functions and ran
> the following two queries:
>
> ds.where($"col" > 1).explain
> ds.filter(_.col > 1).explain
>
> And found that the typed function does NOT result in a PushedFilter. I
> imagine this is due to a limited view of the function, so I have two
> questions really:
>
> 1.) Is there a list of the methods that lose some of the optimizations that
> you get from non-functional methods? Is it any method that accepts a
> generic
> function?
> 2.) Is there any work to attempt reflection and gain some of these
> optimizations back? I couldn't find anything in JIRA.
>
> Thanks,
> Justin Pihony
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Is-there-a-list-of-missing-optimizations-for-typed-
> functions-tp28418.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>