You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "zhongyuhai (JIRA)" <ji...@apache.org> on 2018/11/21 06:26:00 UTC

[jira] [Updated] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error

     [ https://issues.apache.org/jira/browse/PHOENIX-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhongyuhai updated PHOENIX-5035:
--------------------------------
    Attachment:     (was: patch)

> phoenix-spark dataframe filtes date or timestamp type with error
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-5035
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5035
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.13.0, 4.14.0, 4.13.1, 5.0.0, 4.14.1
>         Environment: HBase:apache 1.2
> Phoenix:4.13.1-HBase-1.2
> Hadoop:CDH 2.6
> Spark:2.3.1
>            Reporter: zhongyuhai
>            Priority: Critical
>              Labels: patch, pull-request-available
>         Attachments: PHOENIX-5035.patch, table desc.png
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> *table desc as following:*
> attach "table desc.png"
>  
> *code as following:*
> val df = SparkUtil.getActiveSession().read.format( "org.apache.phoenix.spark").options(options).load()
> df.filter("INCREATEDDATE = date'2018-07-14'")
>  
> *exception as following:*
> java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997
>  at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)
>  at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87)
>  at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>  
> *analyse as following:*
> In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any ,
>  
>  
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"
>  
> // Pass through anything else
> case _ => value
> }
> {code}
>  
> It only handles the String type , other type returns the toString。It makes the Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to Phoenix filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not run with this syntax and throw the exception ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。
> *soluation as following:*
> add handle to other type just like Date 、Timestamp 
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"
> case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => {
> val config: Configuration = HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, DateUtil.DEFAULT_DATE_FORMAT)
> val df = new SimpleDateFormat(dateFormat)
> s"date'${df.format(d)}'"
> }
> case dt if(isClass(dt , "java.sql.Timestamp")) => {
> val config: Configuration = HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, DateUtil.DEFAULT_TIMESTAMP_FORMAT)
> val df = new SimpleDateFormat(dateTimeFormat)
> s"timestamp'${df.format(dt)}'"
> }
> // Pass through anything else
> case _ => value
> }
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)