You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by sujeet jog <su...@gmail.com> on 2017/06/20 10:52:02 UTC

Cassandra querying time stamps

Hello,

I have a table as below

CREATE TABLE analytics_db.ml_forecast_tbl (
   "MetricID" int,
   "TimeStamp" timestamp,
   "ResourceID" timeuuid
   "Value"   double,
    PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
)

select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
'2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;

 MetricID | TimeStamp                       | ResourceID
        | Value|
----------+---------------------------------+--------------------------------------+----------+
|        1 | 2016-01-22 00:30:00.000000+0000 |
4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371
        1 | 2016-01-22 00:30:00.000000+0000 |
4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371
        1 | 2016-01-22 00:30:00.000000+0000 |
4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371
        1 | 2016-01-22 00:30:00.000000+0000 |
4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
| 23.74124 | 15.2371

This query runs perfectly fine from cqlsh,   but not with Spark SQL, it
just emits empty results,
Is there a catch to think about on querying timestamp ranges with cassandra
spark connector

Any inputs on this ?..


Thanks,
Sujeet

Re: Cassandra querying time stamps

Posted by sujeet jog <su...@gmail.com>.
Correction.

On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog <su...@gmail.com> wrote:

> , Below is the query, looks like from physical plan, the query is same as
> that of cqlsh,
>
>  val query = s"""(select * from model_data
>         where TimeStamp > \'$timeStamp+0000\' and TimeStamp <=
> \'$startTS+0000\'
>         and MetricID = $metricID)"""
>
>     println("Model query" + query)
>
>     val df = spark.read
>       .format(Config.dbDriver)
>       .options(Map("Keyspace" -> Config.dbName, "table" ->
> "ml_forecast_tbl"))
>       .load
>
>
>    df.createorReplaceTempView("mode_data")
>    val modelDF = spark.sql(query).cache.
>    println(spark.sql(query).queryExecution)
>
>
>
> == Physical Plan ==
> InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047,
> Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
> Hi99#9052, Low85#9053, Low99#9054]
> :  +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047,
> Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
> Hi99#9052, Low85#9053, Low99#9054], true, 10000, StorageLevel(disk, memory,
> deserialized, 1 replicas)
> :     :  +- *Filter ((((cast(TimeStamp#9046 as string) > 2016-01-22
> 00:00:00+0000) && (cast(TimeStamp#9046 as string) <= 2016-01-22
> 00:30:00+0000)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045))
> :     :     +- *Scan org.apache.spark.sql.cassandra.
> CassandraSourceRelation@40dc2ade [MetricID#9045,TimeStamp#9046,
> ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049,
> GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054]
> PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID),
> EqualTo(MetricID,1)], ReadSchema: struct<MetricID:int,TimeStamp:
> timestamp,ResourceID:string,Forecast:double,GlobalThresholdMax:doub...
>
>
> On Tue, Jun 20, 2017 at 5:13 PM, Riccardo Ferrari <fe...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Personally I would inspect how dates are managed. How does your spark
>> code looks like? What does the explain say. Does TimeStamp gets parsed the
>> same way?
>>
>> Best,
>>
>> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <su...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have a table as below
>>>
>>> CREATE TABLE analytics_db.ml_forecast_tbl (
>>>    "MetricID" int,
>>>    "TimeStamp" timestamp,
>>>    "ResourceID" timeuuid
>>>    "Value"   double,
>>>     PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
>>> )
>>>
>>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
>>> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
>>>
>>>  MetricID | TimeStamp                       | ResourceID
>>>           | Value|
>>> ----------+---------------------------------+---------------
>>> -----------------------+----------+
>>> |        1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>>         1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>>         1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>>         1 | 2016-01-22 00:30:00.000000+0000 |
>>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>>
>>> This query runs perfectly fine from cqlsh,   but not with Spark SQL, it
>>> just emits empty results,
>>> Is there a catch to think about on querying timestamp ranges with
>>> cassandra spark connector
>>>
>>> Any inputs on this ?..
>>>
>>>
>>> Thanks,
>>> Sujeet
>>>
>>
>>
>

Re: Cassandra querying time stamps

Posted by sujeet jog <su...@gmail.com>.
, Below is the query, looks like from physical plan, the query is same as
that of cqlsh,

 val query = s"""(select * from model_data
        where TimeStamp > \'$timeStamp+0000\' and TimeStamp <=
\'$startTS+0000\'
        and MetricID = $metricID)"""

    println("Model query" + query)

    val df = spark.read
      .format(Config.dbDriver)
      .options(Map("Keyspace" -> Config.dbName, "table" ->
"ml_forecast_tbl"))
      .load

    val d = spark.sparkContext.cassandraTable("analytics_db",
"ml_forecast_tbl")
     .where(" \"TimeStamp\" > ? and \"TimeStamp\" <= ? and \"MetricID\" =
1",
      timeStamp + "+0000", startTS + "+0000")


   d.createorReplaceTempView("mode_data")
   val modelDF = spark.sql(query).cache.
   println(spark.sql(query).queryExecution)



== Physical Plan ==
InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054]
:  +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054], true, 10000, StorageLevel(disk, memory,
deserialized, 1 replicas)
:     :  +- *Filter ((((cast(TimeStamp#9046 as string) > 2016-01-22
00:00:00+0000) && (cast(TimeStamp#9046 as string) <= 2016-01-22
00:30:00+0000)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045))
:     :     +- *Scan
org.apache.spark.sql.cassandra.CassandraSourceRelation@40dc2ade
[MetricID#9045,TimeStamp#9046,ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049,GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054]
PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID),
EqualTo(MetricID,1)], ReadSchema:
struct<MetricID:int,TimeStamp:timestamp,ResourceID:string,Forecast:double,GlobalThresholdMax:doub...


On Tue, Jun 20, 2017 at 5:13 PM, Riccardo Ferrari <fe...@gmail.com>
wrote:

> Hi,
>
> Personally I would inspect how dates are managed. How does your spark code
> looks like? What does the explain say. Does TimeStamp gets parsed the same
> way?
>
> Best,
>
> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <su...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a table as below
>>
>> CREATE TABLE analytics_db.ml_forecast_tbl (
>>    "MetricID" int,
>>    "TimeStamp" timestamp,
>>    "ResourceID" timeuuid
>>    "Value"   double,
>>     PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
>> )
>>
>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
>> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
>>
>>  MetricID | TimeStamp                       | ResourceID
>>           | Value|
>> ----------+---------------------------------+---------------
>> -----------------------+----------+
>> |        1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>>         1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>>         1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>>         1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>>
>> This query runs perfectly fine from cqlsh,   but not with Spark SQL, it
>> just emits empty results,
>> Is there a catch to think about on querying timestamp ranges with
>> cassandra spark connector
>>
>> Any inputs on this ?..
>>
>>
>> Thanks,
>> Sujeet
>>
>
>

Re: Cassandra querying time stamps

Posted by Riccardo Ferrari <fe...@gmail.com>.
Hi,

Personally I would inspect how dates are managed. How does your spark code
looks like? What does the explain say. Does TimeStamp gets parsed the same
way?

Best,

On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <su...@gmail.com> wrote:

> Hello,
>
> I have a table as below
>
> CREATE TABLE analytics_db.ml_forecast_tbl (
>    "MetricID" int,
>    "TimeStamp" timestamp,
>    "ResourceID" timeuuid
>    "Value"   double,
>     PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
> )
>
> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
>
>  MetricID | TimeStamp                       | ResourceID
>         | Value|
> ----------+---------------------------------+---------------
> -----------------------+----------+
> |        1 | 2016-01-22 00:30:00.000000+0000 | 4a925190-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
>         1 | 2016-01-22 00:30:00.000000+0000 | 4a92c6c0-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
>         1 | 2016-01-22 00:30:00.000000+0000 | 4a936300-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
>         1 | 2016-01-22 00:30:00.000000+0000 | 4a93d830-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
>
> This query runs perfectly fine from cqlsh,   but not with Spark SQL, it
> just emits empty results,
> Is there a catch to think about on querying timestamp ranges with
> cassandra spark connector
>
> Any inputs on this ?..
>
>
> Thanks,
> Sujeet
>