You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Lokesh Kumar (JIRA)" <ji...@apache.org> on 2017/11/02 10:06:00 UTC
[jira] [Updated] (PHOENIX-4347) Spark Dataset loaded using Phoenix
Spark Datasource - Timestamp filter issue
[ https://issues.apache.org/jira/browse/PHOENIX-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lokesh Kumar updated PHOENIX-4347:
----------------------------------
Description:
Created a Phoenix table with below schema:
{code:java}
CREATE TABLE IF NOT EXISTS sample_table (
id VARCHAR NOT NULL,
metricid VARCHAR NOT NULL,
timestamp TIMESTAMP NOT NULL,
metricvalue DOUBLE,
CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20;
{code}
Inserted some data into this and loaded as Spark Dataset using the Phoenix spark datasource ('org.apache.phoenix.spark') options.
The Spark Dataset's schema is as given below:
root
|-- ID: string (nullable = true)
|-- METRICID: string (nullable = true)
|-- TIMESTAMP: timestamp (nullable = true)
|-- METRICVALUE: double (nullable = true)
I apply the Dataset's filter operation on Timestamp column as given below:
{code:java}
Dataset<Row> ds = <Derived from Phoenix>
ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)")
{code}
This operation throws me an exception as:
testPhoenixTimestamp(DatasetTest): org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, column 145.
The generated query looks like this:
{code:java}
2017-11-02 15:29:31,722 INFO [main] org.apache.phoenix.mapreduce.PhoenixInputFormat
Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM METRIC_TBR_DATA WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= 2017-10-31 00:00:00.0)
{code}
The issue is with Timestamp filter condition, where the timestamp value is not wrapped in to_timestamp() function.
I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class compileValue() function, by checking the value's class. If it is java.sql.Timestamp then I am wrapping the value with to_timestamp() function.
Please let me know if there is another way of correctly querying Timestamp values in Phoenix through Spark's Dataset API.
was:
Created a Phoenix table with below schema:
{code:java}
CREATE TABLE IF NOT EXISTS sample_table (
id VARCHAR NOT NULL,
metricid VARCHAR NOT NULL,
timestamp TIMESTAMP NOT NULL,
metricvalue DOUBLE,
CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20;
{code}
Inserted some data into this and loaded as Spark Dataset using the Phoenix spark datasource ('org.apache.phoenix.spark') options.
The Spark Dataset's schema is as given below:
root
|-- ID: string (nullable = true)
|-- METRICID: string (nullable = true)
|-- TIMESTAMP: timestamp (nullable = true)
|-- METRICVALUE: double (nullable = true)
I apply the Dataset's filter operation on Timestamp column as given below:
{code:java}
Dataset<Row> ds = <Derived from Phoenix>
ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)")
{code}
This operation throws me an exception as:
testPhoenixTimestamp(DatasetTest): org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, column 145.
The generated query looks like this:
{code:java}
2017-11-02 15:29:31,722 INFO [main] org.apache.phoenix.mapreduce.PhoenixInputFormat
Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM METRIC_TBR_DATA WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= *2017-10-31 00:00:00.0*)
{code}
The issue is highlighted in bold above, where the timestamp value is not wrapped in to_timestamp() function.
I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class compileValue() function, by checking the value's class. If it is java.sql.Timestamp then I am wrapping the value with to_timestamp() function.
Please let me know if there is another way of correctly querying Timestamp values in Phoenix through Spark's Dataset API.
> Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue
> ----------------------------------------------------------------------------
>
> Key: PHOENIX-4347
> URL: https://issues.apache.org/jira/browse/PHOENIX-4347
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.11.0
> Environment: CentOS 6.5, Fedora 25
> Reporter: Lokesh Kumar
> Priority: Major
> Labels: phoenix, spark-sql
>
> Created a Phoenix table with below schema:
> {code:java}
> CREATE TABLE IF NOT EXISTS sample_table (
> id VARCHAR NOT NULL,
> metricid VARCHAR NOT NULL,
> timestamp TIMESTAMP NOT NULL,
> metricvalue DOUBLE,
> CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20;
> {code}
> Inserted some data into this and loaded as Spark Dataset using the Phoenix spark datasource ('org.apache.phoenix.spark') options.
> The Spark Dataset's schema is as given below:
> root
> |-- ID: string (nullable = true)
> |-- METRICID: string (nullable = true)
> |-- TIMESTAMP: timestamp (nullable = true)
> |-- METRICVALUE: double (nullable = true)
> I apply the Dataset's filter operation on Timestamp column as given below:
> {code:java}
> Dataset<Row> ds = <Derived from Phoenix>
> ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)")
> {code}
> This operation throws me an exception as:
> testPhoenixTimestamp(DatasetTest): org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, column 145.
> The generated query looks like this:
> {code:java}
> 2017-11-02 15:29:31,722 INFO [main] org.apache.phoenix.mapreduce.PhoenixInputFormat
> Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM METRIC_TBR_DATA WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= 2017-10-31 00:00:00.0)
> {code}
> The issue is with Timestamp filter condition, where the timestamp value is not wrapped in to_timestamp() function.
> I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class compileValue() function, by checking the value's class. If it is java.sql.Timestamp then I am wrapping the value with to_timestamp() function.
> Please let me know if there is another way of correctly querying Timestamp values in Phoenix through Spark's Dataset API.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)