You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by sujeet jog <su...@gmail.com> on 2017/06/20 10:52:02 UTC
Cassandra querying time stamps
Hello,
I have a table as below
CREATE TABLE analytics_db.ml_forecast_tbl (
"MetricID" int,
"TimeStamp" timestamp,
"ResourceID" timeuuid
"Value" double,
PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
)
select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
'2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
MetricID | TimeStamp | ResourceID
| Value|
----------+---------------------------------+--------------------------------------+----------+
| 1 | 2016-01-22 00:30:00.000000+0000 |
4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371
1 | 2016-01-22 00:30:00.000000+0000 |
4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371
1 | 2016-01-22 00:30:00.000000+0000 |
4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371
1 | 2016-01-22 00:30:00.000000+0000 |
4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371
This query runs perfectly fine from cqlsh, but not with Spark SQL, it
just emits empty results,
Is there a catch to think about on querying timestamp ranges with cassandra
spark connector
Any inputs on this ?..
Thanks,
Sujeet
Re: Cassandra querying time stamps
Posted by sujeet jog <su...@gmail.com>.
Correction.
On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog <su...@gmail.com> wrote:
> , Below is the query, looks like from physical plan, the query is same as
> that of cqlsh,
>
> val query = s"""(select * from model_data
> where TimeStamp > \'$timeStamp+0000\' and TimeStamp <=
> \'$startTS+0000\'
> and MetricID = $metricID)"""
>
> println("Model query" + query)
>
> val df = spark.read
> .format(Config.dbDriver)
> .options(Map("Keyspace" -> Config.dbName, "table" ->
> "ml_forecast_tbl"))
> .load
>
>
> df.createorReplaceTempView("mode_data")
> val modelDF = spark.sql(query).cache.
> println(spark.sql(query).queryExecution)
>
>
>
> == Physical Plan ==
> InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047,
> Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
> Hi99#9052, Low85#9053, Low99#9054]
> : +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047,
> Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
> Hi99#9052, Low85#9053, Low99#9054], true, 10000, StorageLevel(disk, memory,
> deserialized, 1 replicas)
> : : +- *Filter ((((cast(TimeStamp#9046 as string) > 2016-01-22
> 00:00:00+0000) && (cast(TimeStamp#9046 as string) <= 2016-01-22
> 00:30:00+0000)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045))
> : : +- *Scan org.apache.spark.sql.cassandra.
> CassandraSourceRelation@40dc2ade [MetricID#9045,TimeStamp#9046,
> ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049,
> GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054]
> PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID),
> EqualTo(MetricID,1)], ReadSchema: struct<MetricID:int,TimeStamp:
> timestamp,ResourceID:string,Forecast:double,GlobalThresholdMax:doub...
>
>
> On Tue, Jun 20, 2017 at 5:13 PM, Riccardo Ferrari <fe...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Personally I would inspect how dates are managed. How does your spark
>> code looks like? What does the explain say. Does TimeStamp gets parsed the
>> same way?
>>
>> Best,
>>
>> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <su...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have a table as below
>>>
>>> CREATE TABLE analytics_db.ml_forecast_tbl (
>>> "MetricID" int,
>>> "TimeStamp" timestamp,
>>> "ResourceID" timeuuid
>>> "Value" double,
>>> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
>>> )
>>>
>>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
>>> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
>>>
>>> MetricID | TimeStamp | ResourceID
>>> | Value|
>>> ----------+---------------------------------+---------------
>>> -----------------------+----------+
>>> | 1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>> 1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>> 1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>> 1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>>
>>> This query runs perfectly fine from cqlsh, but not with Spark SQL, it
>>> just emits empty results,
>>> Is there a catch to think about on querying timestamp ranges with
>>> cassandra spark connector
>>>
>>> Any inputs on this ?..
>>>
>>>
>>> Thanks,
>>> Sujeet
>>>
>>
>>
>
Re: Cassandra querying time stamps
Posted by sujeet jog <su...@gmail.com>.
, Below is the query, looks like from physical plan, the query is same as
that of cqlsh,
val query = s"""(select * from model_data
where TimeStamp > \'$timeStamp+0000\' and TimeStamp <=
\'$startTS+0000\'
and MetricID = $metricID)"""
println("Model query" + query)
val df = spark.read
.format(Config.dbDriver)
.options(Map("Keyspace" -> Config.dbName, "table" ->
"ml_forecast_tbl"))
.load
val d = spark.sparkContext.cassandraTable("analytics_db",
"ml_forecast_tbl")
.where(" \"TimeStamp\" > ? and \"TimeStamp\" <= ? and \"MetricID\" =
1",
timeStamp + "+0000", startTS + "+0000")
d.createorReplaceTempView("mode_data")
val modelDF = spark.sql(query).cache.
println(spark.sql(query).queryExecution)
== Physical Plan ==
InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054]
: +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054], true, 10000, StorageLevel(disk, memory,
deserialized, 1 replicas)
: : +- *Filter ((((cast(TimeStamp#9046 as string) > 2016-01-22
00:00:00+0000) && (cast(TimeStamp#9046 as string) <= 2016-01-22
00:30:00+0000)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045))
: : +- *Scan
org.apache.spark.sql.cassandra.CassandraSourceRelation@40dc2ade
[MetricID#9045,TimeStamp#9046,ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049,GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054]
PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID),
EqualTo(MetricID,1)], ReadSchema:
struct<MetricID:int,TimeStamp:timestamp,ResourceID:string,Forecast:double,GlobalThresholdMax:doub...
On Tue, Jun 20, 2017 at 5:13 PM, Riccardo Ferrari <fe...@gmail.com>
wrote:
> Hi,
>
> Personally I would inspect how dates are managed. How does your spark code
> looks like? What does the explain say. Does TimeStamp gets parsed the same
> way?
>
> Best,
>
> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <su...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a table as below
>>
>> CREATE TABLE analytics_db.ml_forecast_tbl (
>> "MetricID" int,
>> "TimeStamp" timestamp,
>> "ResourceID" timeuuid
>> "Value" double,
>> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
>> )
>>
>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
>> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
>>
>> MetricID | TimeStamp | ResourceID
>> | Value|
>> ----------+---------------------------------+---------------
>> -----------------------+----------+
>> | 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>>
>> This query runs perfectly fine from cqlsh, but not with Spark SQL, it
>> just emits empty results,
>> Is there a catch to think about on querying timestamp ranges with
>> cassandra spark connector
>>
>> Any inputs on this ?..
>>
>>
>> Thanks,
>> Sujeet
>>
>
>
Re: Cassandra querying time stamps
Posted by Riccardo Ferrari <fe...@gmail.com>.
Hi,
Personally I would inspect how dates are managed. How does your spark code
looks like? What does the explain say. Does TimeStamp gets parsed the same
way?
Best,
On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <su...@gmail.com> wrote:
> Hello,
>
> I have a table as below
>
> CREATE TABLE analytics_db.ml_forecast_tbl (
> "MetricID" int,
> "TimeStamp" timestamp,
> "ResourceID" timeuuid
> "Value" double,
> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
> )
>
> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
>
> MetricID | TimeStamp | ResourceID
> | Value|
> ----------+---------------------------------+---------------
> -----------------------+----------+
> | 1 | 2016-01-22 00:30:00.000000+0000 | 4a925190-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
> 1 | 2016-01-22 00:30:00.000000+0000 | 4a92c6c0-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
> 1 | 2016-01-22 00:30:00.000000+0000 | 4a936300-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
> 1 | 2016-01-22 00:30:00.000000+0000 | 4a93d830-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
>
> This query runs perfectly fine from cqlsh, but not with Spark SQL, it
> just emits empty results,
> Is there a catch to think about on querying timestamp ranges with
> cassandra spark connector
>
> Any inputs on this ?..
>
>
> Thanks,
> Sujeet
>