You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sandeep Pal (JIRA)" <ji...@apache.org> on 2015/07/30 20:11:04 UTC
[jira] [Reopened] (SPARK-9282) Filter on Spark DataFrame with multiple columns

     [ https://issues.apache.org/jira/browse/SPARK-9282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Pal reopened SPARK-9282:
--------------------------------

On using '&' instead of 'and', the following error occurs:

Py4JError                                 Traceback (most recent call last)
<ipython-input-8-b3101afeeb7a> in <module>()
----> 1 df1.filter(df1.age > 21 & df1.age < 45).show(10)

/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/pyspark/sql/dataframe.py in _(self, other)
    999     def _(self, other):
   1000         jc = other._jc if isinstance(other, Column) else other
-> 1001         njc = getattr(self._jc, name)(jc)
   1002         return Column(njc)
   1003     _.__doc__ = doc

/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    536         answer = self.gateway_client.send_command(command)
    537         return_value = get_return_value(answer, self.gateway_client,
--> 538                 self.target_id, self.name)
    539 
    540         for temp_arg in temp_args:

/usr/local/bin/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    302                 raise Py4JError(
    303                     'An error occurred while calling {0}{1}{2}. Trace:\n{3}\n'.
--> 304                     format(target_id, '.', name, value))
    305         else:
    306             raise Py4JError(

Py4JError: An error occurred while calling o83.and. Trace:
py4j.Py4JException: Method and([class java.lang.Integer]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
	at py4j.Gateway.invoke(Gateway.java:252)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:207)
	at java.lang.Thread.run(Thread.java:745)

> Filter on Spark DataFrame with multiple columns
> -----------------------------------------------
>
>                 Key: SPARK-9282
>                 URL: https://issues.apache.org/jira/browse/SPARK-9282
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell, SQL
>    Affects Versions: 1.3.0
>         Environment: CDH 5.0 on CentOS6
>            Reporter: Sandeep Pal
>
> Filter on dataframe does not work if we have more than one column inside the filter. Nonetheless, it works on an RDD.
> Following is the example:
> df1.show()
> age coolid depid empname
> 23  7      1     sandeep
> 21  8      2     john   
> 24  9      1     cena   
> 45  12     3     bob    
> 20  7      4     tanay  
> 12  8      5     gaurav 
> df1.filter(df1.age > 21 and df1.age < 45).show(10)
> 23  7      1     sandeep
> 21  8      2     john                                   <-------------
> 24  9      1     cena   
> 20  7      4     tanay                                 <-------------
> 12  8      5     gaurav                               <--------------



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org