You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2019/02/20 10:13:00 UTC

[jira] [Resolved] (SPARK-26901) Vectorized gapply should not prune columns

     [ https://issues.apache.org/jira/browse/SPARK-26901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-26901.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 23810
[https://github.com/apache/spark/pull/23810]

> Vectorized gapply should not prune columns
> ------------------------------------------
>
>                 Key: SPARK-26901
>                 URL: https://issues.apache.org/jira/browse/SPARK-26901
>             Project: Spark
>          Issue Type: Bug
>          Components: R, SQL
>    Affects Versions: 3.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently, if some columns can be pushed, it's being pushed through {{FlatMapGroupsInRWithArrow}}.
> {code}
> explain(count(gapply(df,
>                      "gear",
>                      function(key, group) {
>                        data.frame(gear = key[[1]], disp = mean(group$disp))
>                      },
>                      structType("gear double, disp double"))), TRUE)
> {code}
> {code}
> *(4) HashAggregate(keys=[], functions=[count(1)], output=[count#64L])
> +- Exchange SinglePartition
>    +- *(3) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#67L])
>       +- *(3) Project
>          +- FlatMapGroupsInRWithArrow [...]
>             +- *(2) Sort [gear#9 ASC NULLS FIRST], false, 0
>                +- Exchange hashpartitioning(gear#9, 200)
>                   +- *(1) Project [gear#9]
>                      +- *(1) Scan ExistingRDD arrow[mpg#0,cyl#1,disp#2,hp#3,drat#4,wt#5,qsec#6,vs#7,am#8,gear#9,carb#10]
> {code}
> This causes to send corrupt values R workers when the R native functions are executed.
> {code}
>   c(5, 5, 5, 5, 5)
>   c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 2.47032822920623e-323)
>   c(0, 0, 0, 0, 2.05578399548861e-314)
>   c(3.4483079184909e-313, 3.4483079184909e-313, 3.4483079184909e-313, 5.31146529464635e-315, 0)
>   c(0, 0, 0, 0, -2.63230705887168e+228)
>   c(5, 5, 5, 0, 2.47032822920623e-323)
>   c(7.90505033345994e-323, 7.90505033345994e-323, 0, 0, 4.17777978645388e-314)
>   c(0, 0, 0, 0, -2.18328530492023e+219)
>   c(3.4483079184909e-313, 5.31146529464635e-315, 0, 0, -2.63230127529109e+228)
>   c(0, 0, 0, 0, 2.47032822920623e-323)
>   c(5, 0, 0, 0, 4.17777978645388e-314)
>   c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
>   c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 2.47032822920623e-323)
>   c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.05578399548861e-314)
>   c(3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 5.30757980430645e-315, 0)
>   c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.73302088532611e+228)
>   c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 2.47032822920623e-323)
>   c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 0, 0, 4.17777978645388e-314)
>   c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.04669129845114e+219)
>   c(3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 5.30757980430645e-315, 0, 0, -2.73301510174552e+228)
>   c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.47032822920623e-323)
>   c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 4.17777978645388e-314)
> {code}
> which should be:
> {code}
>   c(21, 21, 22.8, 24.4, 22.8, 19.2, 17.8, 32.4, 30.4, 33.9, 27.3, 21.4)
>   c(6, 6, 4, 4, 4, 6, 6, 4, 4, 4, 4, 4)
>   c(160, 160, 108, 146.7, 140.8, 167.6, 167.6, 78.7, 75.7, 71.1, 79, 121)
>   c(110, 110, 93, 62, 95, 123, 123, 66, 52, 65, 66, 109)
>   c(3.9, 3.9, 3.85, 3.69, 3.92, 3.92, 3.92, 4.08, 4.93, 4.22, 4.08, 4.11)
>   c(2.62, 2.875, 2.32, 3.19, 3.15, 3.44, 3.44, 2.2, 1.615, 1.835, 1.935, 2.78)
>   c(16.46, 17.02, 18.61, 20, 22.9, 18.3, 18.9, 19.47, 18.52, 19.9, 18.9, 18.6)
>   c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
>   c(1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1)
>   c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)
>   c(4, 4, 1, 2, 2, 4, 4, 1, 2, 1, 1, 2)
>   c(26, 30.4, 15.8, 19.7, 15)
>   c(4, 4, 8, 6, 8)
>   c(120.3, 95.1, 351, 145, 301)
>   c(91, 113, 264, 175, 335)
>   c(4.43, 3.77, 4.22, 3.62, 3.54)
>   c(2.14, 1.513, 3.17, 2.77, 3.57)
>   c(16.7, 16.9, 14.5, 15.5, 14.6)
>   c(0, 1, 0, 0, 0)
>   c(1, 1, 1, 1, 1)
>   c(5, 5, 5, 5, 5)
>   c(2, 2, 4, 6, 8)
>   c(21.4, 18.7, 18.1, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 21.5, 15.5, 15.2, 13.3, 19.2)
>   c(6, 8, 6, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8)
>   c(258, 360, 225, 360, 275.8, 275.8, 275.8, 472, 460, 440, 120.1, 318, 304, 350, 400)
>   c(110, 175, 105, 245, 180, 180, 180, 205, 215, 230, 97, 150, 150, 245, 175)
>   c(3.08, 3.15, 2.76, 3.21, 3.07, 3.07, 3.07, 2.93, 3, 3.23, 3.7, 2.76, 3.15, 3.73, 3.08)
>   c(3.215, 3.44, 3.46, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.465, 3.52, 3.435, 3.84, 3.845)
>   c(19.44, 17.02, 20.22, 15.84, 17.4, 17.6, 18, 17.98, 17.82, 17.42, 20.01, 16.87, 17.3, 15.41, 17.05)
>   c(1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
>   c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
>   c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
>   c(1, 2, 1, 4, 3, 3, 3, 4, 4, 4, 1, 2, 2, 4, 2)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org