You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Josh Mahonin (JIRA)" <ji...@apache.org> on 2015/09/24 15:52:04 UTC
[jira] [Commented] (PHOENIX-2290) Spark Phoenix cannot recognize Phoenix view fields

    [ https://issues.apache.org/jira/browse/PHOENIX-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906363#comment-14906363 ] 

Josh Mahonin commented on PHOENIX-2290:
---------------------------------------

Thanks [~azuryy] for the detailed bug report.

I'll work on reproducing locally, but to help diagnose the issue further, can you verify if you see the same result if you use the 'RDD' integration directly, e.g.

{code}
val rdd: RDD[Map[String, AnyRef]] = sc.phoenixTableAsRDD(
  "\"test_table\"", Seq("\"col1\""), zkUrl = Some("phoenix-server:2181")
)

rdd.first().get(0) // should print the first result
{code}

Also, if you're in a position to do so, can you try the patch attached to this issue, and see if it makes a difference for you?
https://issues.apache.org/jira/secure/attachment/12752031/PHOENIX-2196.patch

Thanks!

> Spark Phoenix cannot recognize Phoenix view fields
> --------------------------------------------------
>
>                 Key: PHOENIX-2290
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2290
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.5.1
>            Reporter: Fengdong Yu
>              Labels: spark
>
> I created base table in base shell:
> {code}
> create 'test_table',  {NAME => 'cf1', VERSIONS => 1}
> put 'test_table', 'row_key_1', 'cf1:col_1', '200'
> {code}
> This is a very simple table. then create phoenix view in Phoenix shell.
> {code}
> create view "test_table" (pk varchar primary key, "cf1"."col_1" varchar)
> {code}
> then do following in Spark shell:
> {code}
> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> "\"test_table\"",  "zkUrl" -> "localhost:2181"))
> df.registerTempTable("temp")
> {code}
> {code}
> scala> df.printSchema
> root
>  |-- PK: string (nullable = true)
>  |-- col_1: string (nullable = true)
> {code}
> sqlContext.sql("select * from temp")  ------> {color:red} This does work{color}
> then:
> {code}
> sqlContext.sql("select * from temp where col_1='200' ")
> {code}
> {code}
> java.lang.RuntimeException: org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined column. columnName=col_1
> 	at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:125)
> 	at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:80)
> 	at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
> 	at org.apache.phoenix.spark.PhoenixRDD.getPartitions(PhoenixRDD.scala:47)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
> 	at scala.Option.getOrElse(Option.scala:120)
> {code}
> {color:red}
> I also tried:
> {code}
> sqlContext.sql("select * from temp where \"col_1\"='200' ")  --> EMPTY result, no exception
> {code}
> {code}
> sqlContext.sql("select * from temp where \"cf1\".\"col_1\"='200' ")  --> exception, cannot recognize SQL
> {code}
> {color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)