You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/03/06 23:26:33 UTC

[jira] [Comment Edited] (SPARK-15522) DataFrame Column Names That are Numbers aren't referenced correctly in SQL

    [ https://issues.apache.org/jira/browse/SPARK-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897638#comment-15897638 ] 

Hyukjin Kwon edited comment on SPARK-15522 at 3/6/17 11:26 PM:
---------------------------------------------------------------

We can use backticks for it as below:

{code}
scala> Seq(Some(1), None).toDF("82").createOrReplaceTempView("piv_qhID")

scala> spark.sql("select * from piv_qhID where '82' is NULL limit 20").show()
+---+
| 82|
+---+
+---+


scala> spark.sql("select * from piv_qhID where `82` is NULL limit 20").show()
+----+
|  82|
+----+
|null|
+----+
{code}

It seems {{'82'}} or {{"82"}} was being created as a constant. I am resolving this JIRA. Please reopen this if I misunderstood. 


was (Author: hyukjin.kwon):
We can use backticks for it as below:

{code}
scala> spark.range(10).toDF("82").createOrReplaceTempView("piv_qhID")

scala> spark.sql("select * from piv_qhID where `82` is NULL limit 20").show()
+---+
| 82|
+---+
+---+
{code}

It seems {{'82'}} or {{"82"}} was being created as a constant. I am resolving this JIRA. Please reopen this if I misunderstood. 

> DataFrame Column Names That are Numbers aren't referenced correctly in SQL
> --------------------------------------------------------------------------
>
>                 Key: SPARK-15522
>                 URL: https://issues.apache.org/jira/browse/SPARK-15522
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Jason Pohl
>
> The following code is run:
> val pre_piv_df_a = sqlContext.sql("""
> SELECT
> CASE WHEN Gender = 'M' Then 1 ELSE 0 END AS has_male,
> CASE WHEN Gender = 'F' Then 1 ELSE 0 END AS has_female,
> CAST(StartAge AS Double) AS StartAge_dbl,
> CAST(EndAge AS Double) AS EndAge_dbl,
> *
> FROM alldem_union_curr
> """)
> .withColumn("JavaStartTimestamp", create_ts($"StartTimestamp"))
> .drop("StartTimestamp").withColumnRenamed("JavaStartTimestamp", "StartTimestamp")
> .drop("StartAge").drop("EndAge")
> .withColumnRenamed("StartAge_dbl", "StartAge").withColumnRenamed("EndAge_dbl", "EndAge")
> val pre_piv_df_b = pre_piv_df_a
> .withColumn("media_month_cc", media_month_cc($"MediaMonth"))
> .withColumn("media_week_cc", media_week_sts_cc($"StartTimestamp"))
> .withColumn("media_day_cc", media_day_sts_cc($"StartTimestamp"))
> .withColumn("week_day", week_day($"StartTimestamp"))
> .withColumn("week_end", week_end($"StartTimestamp"))
> .join(sqlContext.table("cad_nets"), $"Network" === $"nielsen_network", "inner")
> .withColumnRenamed("cad_network", "norm_net_code_a")
> .withColumn("norm_net_code", reCodeNets($"norm_net_code_a"))
> pre_piv_df_b.registerTempTable("pre_piv_df")
> val piv_qhID_df = pre_piv_df_b.groupBy("Network", "Audience", "StartDate", "rating_category_cd")
> .pivot("qaID").agg("rating" -> "mean")
> The pivot creates a lot of columns (96) with names that are like ‘01’,’02’,…,’96’ as a result of pivoting a table that has quarter hour IDs.
> In the below SQL the highlighted section causes problems. If I rename the columns to ‘col01’,’col02’,…,’col96’ I can run the SQL correctly and get the expected results.
> select * from piv_qhID where 82 is NULL limit 20
> And I am getting no rows even though there are nulls.
> On the other hand the query:
> select * from piv_qhID where 82 is NOT NULL limit 20
> Returns all rows (even those with nulls)
> Renaming the columns fixes this, but it would be nice if the columns were referenced correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org