You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Kalyan (JIRA)" <ji...@apache.org> on 2016/08/10 08:01:20 UTC

[jira] [Comment Edited] (PHOENIX-2336) Queries with small case column-names return empty result-set when working with Spark Datasource Plugin

    [ https://issues.apache.org/jira/browse/PHOENIX-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414873#comment-15414873 ] 

Kalyan edited comment on PHOENIX-2336 at 8/10/16 8:00 AM:
----------------------------------------------------------

problem with spark sql parser ..  if expression contains "double quotes" .. there are sending as usual to next level, with out "double quotes" there are parsing using sql parser sending to next level.

i given a solution to handle in phoenix level to work.  This patch is working for above bug also:

https://issues.apache.org/jira/secure/attachment/12821623/phoenix_spark.patch

please review this.

 // not working
DataFrame df = sqlContext.read().format("org.apache.phoenix.spark").options(params).load();
df.filter("\"col1\" = '5.0'").show(); 

But we need to write the query like this only... as per limitation
df.filter("col1 = '5.0'").show(); 
df.filter("col1 > '4.0'").show(); 






was (Author: kalyanhadoop):
problem with spark sql parser ..  if expression contains "double quotes" .. there are sending as usual to next level, with out "double quotes" there are parsing using sql parser sending to next level.

i given a solution to handle in phoenix level to work.  This patch is working for above bug also:

https://issues.apache.org/jira/secure/attachment/12821623/phoenix_spark.patch

please review this

> Queries with small case column-names return empty result-set when working with Spark Datasource Plugin 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2336
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2336
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.6.0
>            Reporter: Suhas Nalapure
>            Assignee: Josh Mahonin
>              Labels: verify
>             Fix For: 4.9.0
>
>
> Hi,
> The Spark DataFrame filter operation returns empty result-set when column-name is in the smaller case. Example below:
> DataFrame df = sqlContext.read().format("org.apache.phoenix.spark").options(params).load();
> df.filter("\"col1\" = '5.0'").show(); 
> Result:
> +---+----+---+---+---+---
> | ID|col1| c1| d2| d3| d4|
> +---+----+---+---+---+---+
> +---+----+---+---+---+---+
> Whereas the table actually has some rows matching the filter condition. And if double quotes are removed from around the column name i.e. df.filter("col1 = '5.0'").show(); , a ColumnNotFoundException is thrown:
> Exception in thread "main" java.lang.RuntimeException: org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined column. columnName=D1
>         at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:125)
>         at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:80)
>         at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)