You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:04:28 UTC
[jira] [Updated] (SPARK-11976) Support "." character in DataFrame
column name
[ https://issues.apache.org/jira/browse/SPARK-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-11976:
---------------------------------
Labels: bulk-closed (was: )
> Support "." character in DataFrame column name
> ----------------------------------------------
>
> Key: SPARK-11976
> URL: https://issues.apache.org/jira/browse/SPARK-11976
> Project: Spark
> Issue Type: Improvement
> Components: SparkR
> Affects Versions: 1.5.2
> Reporter: Sun Rui
> Priority: Major
> Labels: bulk-closed
>
> Now Spark Core support "." character in DataFrame column names. However, when accessing a column whose name has "." character, the name should be wrapped with backticks.
> for example,
> {code}
> > df<-createDataFrame(sqlContext, list(list(1,2,3)))
> > names(df)<-c("a.b","c","d.e")
> > df$"`a.b`"
> Column a.b
> > df$"a.b"
> 15/11/25 10:55:06 ERROR RBackendHandler: col on 68 failed
> Error in column(callJMethod(x@sdf, "col", c)) :
> error in evaluating the argument 'x' in selecting a method for function 'column': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "a.b" among (a.b, c, d.e);
> at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
> at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:150)
> at org.apache.spark.sql.DataFrame.col(DataFrame.scala:663)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
> at org.apache.spark.api.r.RBackendHa
> {code}
> This means, the safe way to select a column using its name is to wrap it with backticks in the case the column name is programatically fetched, not known in advance.
> When this is support, the below code piece can be removed from createDataFrame():
> {code}
> # SPAKR-SQL does not support '.' in column name, so replace it with '_'
> # TODO(davies): remove this once SPARK-2775 is fixed
> names <- lapply(names, function(n) {
> nn <- gsub("[.]", "_", n)
> if (nn != n) {
> warning(paste("Use", nn, "instead of", n, " as column name"))
> }
> nn
> })
> {code}
> the PR for SPARK-12034 is to suppress warnings when creating DataFrame from iris in test cases. Remember to clear such warning suppression.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org