You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:18:18 UTC

[jira] [Resolved] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

     [ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-20226.
----------------------------------
    Resolution: Incomplete

> Call to sqlContext.cacheTable takes an incredibly long time in some cases
> -------------------------------------------------------------------------
>
>                 Key: SPARK-20226
>                 URL: https://issues.apache.org/jira/browse/SPARK-20226
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>         Environment: linux or windows
>            Reporter: Barry Becker
>            Priority: Major
>              Labels: bulk-closed, cache
>         Attachments: profile_indexer2.PNG, xyzzy.csv
>
>
> I have a case where the call to sqlContext.cacheTable can take an arbitrarily long time depending on the number of columns that are referenced in a withColumn expression applied to a dataframe.
> The dataset is small (20 columns 7861 rows). The sequence to reproduce is the following:
> 1) add a new column that references 8 - 14 of the columns in the dataset. 
>    - If I add 8 columns, then the call to cacheTable is fast - like *5 seconds*
>    - If I add 11 columns, then it is slow - like *60 seconds*
>    - and if I add 14 columns, then it basically *takes forever* - I gave up after 10 minutes or so.
> 	The Column expression that is added, is basically just concatenating the columns together in a single string. If a number is concatenated on a string (or vice versa) the number is first converted to a string.
>       The expression looks something like this:
> {code}
> `Plate` + `State` + `License Type` + `Summons Number` + `Issue Date` + `Violation Time` + `Violation` + `Judgment Entry Date` + `Fine Amount` + `Penalty Amount` + `Interest Amount`
> {code}
> 	  which we then convert to a Column expression that looks like this:
> {code}
> UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License Type), UDF('Summons Number)), UDF('Issue Date)), 'Violation Time), 'Violation), UDF('Judgment Entry Date)), UDF('Fine Amount)), UDF('Penalty Amount)), UDF('Interest Amount))
> {code}
> 	 where the UDFs are very simple functions that basically call toString and + as needed.
> 2) apply a pipeline that includes some transformers that was saved earlier. Here are the steps of the pipeline (extracted from parquet)
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License Type_CLEANED__","handleInvalid":"skip","outputCol":"License Type_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_6f65ca9fa813",
> 	"paramMap":{
> 	  "outputCol":"Summons Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons Number_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_f5db4fb8120e",
>     "paramMap":{
> 	   "splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"],
> 	    "handleInvalid":"keep","outputCol":"Issue Date_BINNED__","inputCol":"Issue Date_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_74568a2a5cfd",
> 	"paramMap":{
> 	  "handleInvalid":"keep","outputCol":"Fine Amount_BINNED__","inputCol":"Fine Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"]
> 	 }
> 	}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_109705dfdbcd",
> 	"paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest Amount_CLEANED__"}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_2b2e3d8a324f",
> 	"paramMap":{
> 	   "handleInvalid":"keep","inputCol":"Reduction Amount_CLEANED__","outputCol":"Reduction Amount_BINNED__",
> 	   "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"]
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0",
>      "uid":"bucketizer_4d44c2ebf489",
>      "paramMap":{
>        "splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep",
> 	   "outputCol":"Payment Amount_BINNED__","inputCol":"Payment Amount_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_05a75eeef997",
> 	"paramMap":{
> 	   "handleInvalid":"keep",
> 	   "splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"],
> 	   "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_64b3ef2f97cf",
> 	"paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0",
>     "uid":"vecAssembler_932758a8f18e",
> 	"paramMap":{
> 	  "outputCol":"_features_column__",
> 	  "inputCols":["State_IDX__","License Type_IDX__","Violation_IDX__","County_IDX__","Issuing Agency_IDX__","Violation Status_IDX__","Summons Number_BINNED__","Issue Date_BINNED__","Fine Amount_BINNED__","Interest Amount_BINNED__","Reduction Amount_BINNED__","Payment Amount_BINNED__","Amount Due_BINNED__","Precinct_BINNED__"]
> 	}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0",
>     "uid":"nb_e4b24f3c08b0",
> 	"paramMap":{
> 	  "probabilityCol":"_class_probability_column__",
> 	  "labelCol":"Penalty Amount_BINNED__",
> 	  "predictionCol":"_prediction_column_",
> 	  "modelType":"multinomial",
> 	  "featuresCol":"_features_column__",
> 	  "rawPredictionCol":"rawPrediction",
> 	  "smoothing":3.518236190922951E-4
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0",
>     "uid":"sql_1ea4c1b5c52e",
> 	"paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS `_*_prediction_label_column_*__` FROM __THIS__ /*cutInfo:[10.0,25.0]*/"}
>    }{code}
>    3) Call cacheTable on sqlContext. The actual code used is:
>    {code}
>     val key = "foo"
>     if (sqlContext.tableNames.contains(key))
>       sqlContext.dropTempTable(key)
>     df.createOrReplaceTempView(key)
>     sqlContext.cacheTable(key)        <-- this takes a very long time
> {code}
> When I step through cacheTable in the debugger (in CacheManager.cacheQuery), I see that the query "planToCache" is very large (see below). 
> I don't know much about query plans. Is this sort of giant nested query plan expected in this case? Is it in any way typical? Does it explain why it takes a very long time to cache? Why would adding just a few more columns to the add column expression result in a plan that takes exponentially longer?
> {code}
> SubqueryAlias foo123, `foo123`
> +- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue Date#127, Violation Time#128, Violation#129, Judgment Entry Date#130, Fine Amount#131, Penalty Amount#132, Interest Amount#133, Reduction Amount#134, Payment Amount#135, Amount Due#136, Precinct#137, County#138, Issuing Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount (predicted)#2363]
>    +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 33 more fields]
>       +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 33 more fields]
>          +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca, `sql_1ea4c1b5c52e_5640c7097aca`
>             +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 32 more fields]
>                +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 31 more fields]
>                   +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 30 more fields]
>                      +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 29 more fields]
>                         +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 28 more fields]
>                            +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 27 more fields]
>                               +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 26 more fields]
>                                  +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 25 more fields]
>                                     +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 24 more fields]
>                                        +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields]
>                                           +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields]
>                                              +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields]
>                                                 +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields]
>                                                    +- Filter UDF(Violation Status_CLEANED__#174)
>                                                       +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields]
>                                                          +- Filter UDF(Issuing Agency_CLEANED__#173)
>                                                             +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields]
>                                                                +- Filter UDF(County_CLEANED__#172)
>                                                                   +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields]
>                                                                      +- Filter UDF(Violation_CLEANED__#167)
>                                                                         +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 16 more fields]
>                                                                            +- Filter UDF(License Type_CLEANED__#164)
>                                                                               +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 15 more fields]
>                                                                                  +- Filter UDF(State_CLEANED__#163)
>                                                                                     +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, CASE WHEN isnull(Summons Number#126) THEN NaN ELSE Summons Number#126 END AS Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest Amount#133) THEN NaN ELSE Interest Amount#133 END AS Interest Amount_CLEANED__#250, Interest Amount#133, CASE WHEN isnull(Reduction Amount#134) THEN NaN ELSE Reduction Amount#134 END AS Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 14 more fields]
>                                                                                        +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165) THEN NaN ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, CASE WHEN isnull(Judgment Entry Date_CLEANED__#168) THEN NaN ELSE Judgment Entry Date_CLEANED__#168 END AS Judgment Entry Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine Amount_CLEANED__#169) THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine Amount_CLEANED__#212, Penalty Amount#132, CASE WHEN isnull(Penalty Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170 END AS Penalty Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134, Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                                           +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162, State#124, UDF(State#124) AS State_CLEANED__#163, License Type#125, UDF(License Type#125) AS License Type_CLEANED__#164, Summons Number#126, Issue Date#127, cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165, Violation Time#128, UDF(Violation Time#128) AS Violation Time_CLEANED__#166, Violation#129, UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry Date#130, cast(Judgment Entry Date#130 as double) AS Judgment Entry Date_CLEANED__#168, Fine Amount#131, cast(Fine Amount#131 as double) AS Fine Amount_CLEANED__#169, Penalty Amount#132, cast(Penalty Amount#132 as double) AS Penalty Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134, Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                                              +- Project [Plate#6 AS Plate#123, State#7 AS State#124, License Type#8 AS License Type#125, Summons Number#9 AS Summons Number#126, Issue Date#10 AS Issue Date#127, Violation Time#11 AS Violation Time#128, Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment Entry Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty Amount#132, Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS Reduction Amount#134, Payment Amount#18 AS Payment Amount#135, Amount Due#19 AS Amount Due#136, Precinct#20 AS Precinct#137, County#21 AS County#138, Issuing Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation Status#140, columnBasedOnManyCols#43 AS columnBasedOnManyCols#141]
>                                                                                                 +- Project [Plate#6, State#7, License Type#8, Summons Number#9, Issue Date#10, Violation Time#11, Violation#12, Judgment Entry Date#13, Fine Amount#14, Penalty Amount#15, Interest Amount#16, Reduction Amount#17, Payment Amount#18, Amount Due#19, Precinct#20, County#21, Issuing Agency#22, Violation Status#23, cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation Time#11), Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)), UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS columnBasedOnManyCols#43]
>                                                                                                    +- Relation[Plate#6,State#7,License Type#8,Summons Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing Agency#22,Violation Status#23] csv
> {code}	



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org