You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Barry Becker (JIRA)" <ji...@apache.org> on 2017/11/07 18:10:00 UTC

[jira] [Comment Edited] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

    [ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064972#comment-16064972 ] 

Barry Becker edited comment on SPARK-20226 at 11/7/17 6:09 PM:
---------------------------------------------------------------

Calling cache() on the dataframe after the addColumn used to make this run fast. But around the time that we upgraded to spark 2.1.1 it got very slow again. Calling cache on the dataframe does not seem to help any more.

If I hardcode the addColumn column expression to be 
{code}
(((((((((((CAST(Plate AS STRING) + CAST(State AS STRING)) + CAST(License Type AS STRING)) + CAST(Violation Time AS STRING)) + CAST(Violation AS STRING)) + CAST(Judgment Entry Date AS STRING)) + CAST(Issue Date AS STRING)) + CAST(Summons Number AS STRING)) + CAST(Fine Amount AS STRING)) + CAST(Penalty Amount AS STRING)) + CAST(Interest Amount AS STRING)) + CAST(Violation AS STRING))
{code}
instead of 
{code}
CAST(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate, State), License Type), Violation Time), Violation), UDF(Judgment Entry Date)), UDF(Issue Date)), UDF(Summons Number)), UDF(Fine Amount)), UDF(Penalty Amount)), UDF(Interest Amount)), Violation) AS STRING)
{code}
which is what is generated by our expression parser, then the time goes from 70 seconds down to 10 seconds. Still slow, but not nearly as slow.


was (Author: barrybecker4):
Calling cache() on the dataframe on the after the addColumn used to make this run fast. But around the time that we upgraded to spark 2.1.1 it got very slow again. Calling cache on the dataframe does not seem to help any more.

If I hardcode the addColumn column expression to be 
{code}
(((((((((((CAST(Plate AS STRING) + CAST(State AS STRING)) + CAST(License Type AS STRING)) + CAST(Violation Time AS STRING)) + CAST(Violation AS STRING)) + CAST(Judgment Entry Date AS STRING)) + CAST(Issue Date AS STRING)) + CAST(Summons Number AS STRING)) + CAST(Fine Amount AS STRING)) + CAST(Penalty Amount AS STRING)) + CAST(Interest Amount AS STRING)) + CAST(Violation AS STRING))
{code}
instead of 
{code}
CAST(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate, State), License Type), Violation Time), Violation), UDF(Judgment Entry Date)), UDF(Issue Date)), UDF(Summons Number)), UDF(Fine Amount)), UDF(Penalty Amount)), UDF(Interest Amount)), Violation) AS STRING)
{code}
which is what is generated by our expression parser, then the time goes from 70 seconds down to 10 seconds. Still slow, but not nearly as slow.

> Call to sqlContext.cacheTable takes an incredibly long time in some cases
> -------------------------------------------------------------------------
>
>                 Key: SPARK-20226
>                 URL: https://issues.apache.org/jira/browse/SPARK-20226
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>         Environment: linux or windows
>            Reporter: Barry Becker
>              Labels: cache
>         Attachments: profile_indexer2.PNG, xyzzy.csv
>
>
> I have a case where the call to sqlContext.cacheTable can take an arbitrarily long time depending on the number of columns that are referenced in a withColumn expression applied to a dataframe.
> The dataset is small (20 columns 7861 rows). The sequence to reproduce is the following:
> 1) add a new column that references 8 - 14 of the columns in the dataset. 
>    - If I add 8 columns, then the call to cacheTable is fast - like *5 seconds*
>    - If I add 11 columns, then it is slow - like *60 seconds*
>    - and if I add 14 columns, then it basically *takes forever* - I gave up after 10 minutes or so.
> 	The Column expression that is added, is basically just concatenating the columns together in a single string. If a number is concatenated on a string (or vice versa) the number is first converted to a string.
>       The expression looks something like this:
> {code}
> `Plate` + `State` + `License Type` + `Summons Number` + `Issue Date` + `Violation Time` + `Violation` + `Judgment Entry Date` + `Fine Amount` + `Penalty Amount` + `Interest Amount`
> {code}
> 	  which we then convert to a Column expression that looks like this:
> {code}
> UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License Type), UDF('Summons Number)), UDF('Issue Date)), 'Violation Time), 'Violation), UDF('Judgment Entry Date)), UDF('Fine Amount)), UDF('Penalty Amount)), UDF('Interest Amount))
> {code}
> 	 where the UDFs are very simple functions that basically call toString and + as needed.
> 2) apply a pipeline that includes some transformers that was saved earlier. Here are the steps of the pipeline (extracted from parquet)
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License Type_CLEANED__","handleInvalid":"skip","outputCol":"License Type_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_6f65ca9fa813",
> 	"paramMap":{
> 	  "outputCol":"Summons Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons Number_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_f5db4fb8120e",
>     "paramMap":{
> 	   "splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"],
> 	    "handleInvalid":"keep","outputCol":"Issue Date_BINNED__","inputCol":"Issue Date_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_74568a2a5cfd",
> 	"paramMap":{
> 	  "handleInvalid":"keep","outputCol":"Fine Amount_BINNED__","inputCol":"Fine Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"]
> 	 }
> 	}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_109705dfdbcd",
> 	"paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest Amount_CLEANED__"}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_2b2e3d8a324f",
> 	"paramMap":{
> 	   "handleInvalid":"keep","inputCol":"Reduction Amount_CLEANED__","outputCol":"Reduction Amount_BINNED__",
> 	   "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"]
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0",
>      "uid":"bucketizer_4d44c2ebf489",
>      "paramMap":{
>        "splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep",
> 	   "outputCol":"Payment Amount_BINNED__","inputCol":"Payment Amount_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_05a75eeef997",
> 	"paramMap":{
> 	   "handleInvalid":"keep",
> 	   "splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"],
> 	   "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_64b3ef2f97cf",
> 	"paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0",
>     "uid":"vecAssembler_932758a8f18e",
> 	"paramMap":{
> 	  "outputCol":"_features_column__",
> 	  "inputCols":["State_IDX__","License Type_IDX__","Violation_IDX__","County_IDX__","Issuing Agency_IDX__","Violation Status_IDX__","Summons Number_BINNED__","Issue Date_BINNED__","Fine Amount_BINNED__","Interest Amount_BINNED__","Reduction Amount_BINNED__","Payment Amount_BINNED__","Amount Due_BINNED__","Precinct_BINNED__"]
> 	}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0",
>     "uid":"nb_e4b24f3c08b0",
> 	"paramMap":{
> 	  "probabilityCol":"_class_probability_column__",
> 	  "labelCol":"Penalty Amount_BINNED__",
> 	  "predictionCol":"_prediction_column_",
> 	  "modelType":"multinomial",
> 	  "featuresCol":"_features_column__",
> 	  "rawPredictionCol":"rawPrediction",
> 	  "smoothing":3.518236190922951E-4
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0",
>     "uid":"sql_1ea4c1b5c52e",
> 	"paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS `_*_prediction_label_column_*__` FROM __THIS__ /*cutInfo:[10.0,25.0]*/"}
>    }{code}
>    3) Call cacheTable on sqlContext. The actual code used is:
>    {code}
>     val key = "foo"
>     if (sqlContext.tableNames.contains(key))
>       sqlContext.dropTempTable(key)
>     df.createOrReplaceTempView(key)
>     sqlContext.cacheTable(key)        <-- this takes a very long time
> {code}
> When I step through cacheTable in the debugger (in CacheManager.cacheQuery), I see that the query "planToCache" is very large (see below). 
> I don't know much about query plans. Is this sort of giant nested query plan expected in this case? Is it in any way typical? Does it explain why it takes a very long time to cache? Why would adding just a few more columns to the add column expression result in a plan that takes exponentially longer?
> {code}
> SubqueryAlias foo123, `foo123`
> +- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue Date#127, Violation Time#128, Violation#129, Judgment Entry Date#130, Fine Amount#131, Penalty Amount#132, Interest Amount#133, Reduction Amount#134, Payment Amount#135, Amount Due#136, Precinct#137, County#138, Issuing Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount (predicted)#2363]
>    +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 33 more fields]
>       +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 33 more fields]
>          +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca, `sql_1ea4c1b5c52e_5640c7097aca`
>             +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 32 more fields]
>                +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 31 more fields]
>                   +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 30 more fields]
>                      +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 29 more fields]
>                         +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 28 more fields]
>                            +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 27 more fields]
>                               +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 26 more fields]
>                                  +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 25 more fields]
>                                     +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 24 more fields]
>                                        +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields]
>                                           +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields]
>                                              +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields]
>                                                 +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields]
>                                                    +- Filter UDF(Violation Status_CLEANED__#174)
>                                                       +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields]
>                                                          +- Filter UDF(Issuing Agency_CLEANED__#173)
>                                                             +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields]
>                                                                +- Filter UDF(County_CLEANED__#172)
>                                                                   +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields]
>                                                                      +- Filter UDF(Violation_CLEANED__#167)
>                                                                         +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 16 more fields]
>                                                                            +- Filter UDF(License Type_CLEANED__#164)
>                                                                               +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 15 more fields]
>                                                                                  +- Filter UDF(State_CLEANED__#163)
>                                                                                     +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, CASE WHEN isnull(Summons Number#126) THEN NaN ELSE Summons Number#126 END AS Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest Amount#133) THEN NaN ELSE Interest Amount#133 END AS Interest Amount_CLEANED__#250, Interest Amount#133, CASE WHEN isnull(Reduction Amount#134) THEN NaN ELSE Reduction Amount#134 END AS Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 14 more fields]
>                                                                                        +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165) THEN NaN ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, CASE WHEN isnull(Judgment Entry Date_CLEANED__#168) THEN NaN ELSE Judgment Entry Date_CLEANED__#168 END AS Judgment Entry Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine Amount_CLEANED__#169) THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine Amount_CLEANED__#212, Penalty Amount#132, CASE WHEN isnull(Penalty Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170 END AS Penalty Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134, Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                                           +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162, State#124, UDF(State#124) AS State_CLEANED__#163, License Type#125, UDF(License Type#125) AS License Type_CLEANED__#164, Summons Number#126, Issue Date#127, cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165, Violation Time#128, UDF(Violation Time#128) AS Violation Time_CLEANED__#166, Violation#129, UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry Date#130, cast(Judgment Entry Date#130 as double) AS Judgment Entry Date_CLEANED__#168, Fine Amount#131, cast(Fine Amount#131 as double) AS Fine Amount_CLEANED__#169, Penalty Amount#132, cast(Penalty Amount#132 as double) AS Penalty Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134, Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                                              +- Project [Plate#6 AS Plate#123, State#7 AS State#124, License Type#8 AS License Type#125, Summons Number#9 AS Summons Number#126, Issue Date#10 AS Issue Date#127, Violation Time#11 AS Violation Time#128, Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment Entry Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty Amount#132, Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS Reduction Amount#134, Payment Amount#18 AS Payment Amount#135, Amount Due#19 AS Amount Due#136, Precinct#20 AS Precinct#137, County#21 AS County#138, Issuing Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation Status#140, columnBasedOnManyCols#43 AS columnBasedOnManyCols#141]
>                                                                                                 +- Project [Plate#6, State#7, License Type#8, Summons Number#9, Issue Date#10, Violation Time#11, Violation#12, Judgment Entry Date#13, Fine Amount#14, Penalty Amount#15, Interest Amount#16, Reduction Amount#17, Payment Amount#18, Amount Due#19, Precinct#20, County#21, Issuing Agency#22, Violation Status#23, cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation Time#11), Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)), UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS columnBasedOnManyCols#43]
>                                                                                                    +- Relation[Plate#6,State#7,License Type#8,Summons Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing Agency#22,Violation Status#23] csv
> {code}	



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org