You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sandeep Pal (JIRA)" <ji...@apache.org> on 2015/07/30 20:11:04 UTC
[jira] [Reopened] (SPARK-9282) Filter on Spark DataFrame with
multiple columns
[ https://issues.apache.org/jira/browse/SPARK-9282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandeep Pal reopened SPARK-9282:
--------------------------------
On using '&' instead of 'and', the following error occurs:
Py4JError Traceback (most recent call last)
<ipython-input-8-b3101afeeb7a> in <module>()
----> 1 df1.filter(df1.age > 21 & df1.age < 45).show(10)
/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/pyspark/sql/dataframe.py in _(self, other)
999 def _(self, other):
1000 jc = other._jc if isinstance(other, Column) else other
-> 1001 njc = getattr(self._jc, name)(jc)
1002 return Column(njc)
1003 _.__doc__ = doc
/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
536 answer = self.gateway_client.send_command(command)
537 return_value = get_return_value(answer, self.gateway_client,
--> 538 self.target_id, self.name)
539
540 for temp_arg in temp_args:
/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
302 raise Py4JError(
303 'An error occurred while calling {0}{1}{2}. Trace:\n{3}\n'.
--> 304 format(target_id, '.', name, value))
305 else:
306 raise Py4JError(
Py4JError: An error occurred while calling o83.and. Trace:
py4j.Py4JException: Method and([class java.lang.Integer]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
> Filter on Spark DataFrame with multiple columns
> -----------------------------------------------
>
> Key: SPARK-9282
> URL: https://issues.apache.org/jira/browse/SPARK-9282
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, Spark Shell, SQL
> Affects Versions: 1.3.0
> Environment: CDH 5.0 on CentOS6
> Reporter: Sandeep Pal
>
> Filter on dataframe does not work if we have more than one column inside the filter. Nonetheless, it works on an RDD.
> Following is the example:
> df1.show()
> age coolid depid empname
> 23 7 1 sandeep
> 21 8 2 john
> 24 9 1 cena
> 45 12 3 bob
> 20 7 4 tanay
> 12 8 5 gaurav
> df1.filter(df1.age > 21 and df1.age < 45).show(10)
> 23 7 1 sandeep
> 21 8 2 john <-------------
> 24 9 1 cena
> 20 7 4 tanay <-------------
> 12 8 5 gaurav <--------------
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org