You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/01/30 01:02:34 UTC

[jira] [Updated] (SPARK-5462) Catalyst UnresolvedException "Invalid call to qualifiers on unresolved object" error when accessing fields in DataFrames returned from sqlCtx.sql()

     [ https://issues.apache.org/jira/browse/SPARK-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen updated SPARK-5462:
------------------------------
    Summary: Catalyst UnresolvedException "Invalid call to qualifiers on unresolved object" error when accessing fields in DataFrames returned from sqlCtx.sql()  (was: Catalyst UnresolvedException "Invalid call to qualifiers on unresolved object" error when accessing fields in Python DataFrame)

> Catalyst UnresolvedException "Invalid call to qualifiers on unresolved object" error when accessing fields in DataFrames returned from sqlCtx.sql()
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5462
>                 URL: https://issues.apache.org/jira/browse/SPARK-5462
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.3.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Blocker
>
> When trying to access fields on a Python DataFrame created via inferSchema, I ran into a confusing Catalyst Py4J error.  Here's a reproduction:
> {code}
> from pyspark import SparkContext
> from pyspark.sql import SQLContext, Row
> sc = SparkContext("local", "test")
> sqlContext = SQLContext(sc)
> # Load a text file and convert each line to a Row.
> lines = sc.textFile("examples/src/main/resources/people.txt")
> parts = lines.map(lambda l: l.split(","))
> people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))
> # Infer the schema, and register the SchemaRDD as a table.
> schemaPeople = sqlContext.inferSchema(people)
> schemaPeople.registerTempTable("people")
> # SQL can be run over SchemaRDDs that have been registered as a table.
> teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
> print teenagers.name
> {code}
> This fails with the following error:
> {code}
> Traceback (most recent call last):
>   File "/Users/joshrosen/Documents/spark/sqltest.py", line 19, in <module>
>     print teenagers.name
>   File "/Users/joshrosen/Documents/Spark/python/pyspark/sql.py", line 2154, in __getattr__
>     return Column(self._jdf.apply(name))
>   File "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
>   File "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o66.apply.
> : org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to qualifiers on unresolved object, tree: 'name
> 	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:50)
> 	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:46)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:143)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:140)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> 	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:140)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:126)
> 	at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:122)
> 	at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:237)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> 	at py4j.Gateway.invoke(Gateway.java:259)
> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> This is distinct from the helpful error message that I get when trying to access a non-existent column.  This error didn't occur when I tried the same thing with a DataFrame created via jsonRDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org