You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Ryan Templeton <rt...@hortonworks.com> on 2016/08/09 19:58:21 UTC

Problems with Phoenix bulk loader when using row_timestamp feature

I am working on a project that will be consuming sensor data. The “fact” table is defined as:

CREATE TABLE historian.data (
assetid unsigned_int not null,
metricid unsigned_int not null,
ts timestamp not null,
val double
CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp))
IMMUTABLE_ROWS=true;

I generated a 1million row csv sample dataset and use the Phoenix bulk loader to load this data up. The tool reports that all 1,000,000 rows were loaded successfully which I can confirm via sqlline.

I then dropped and recreated the table to take advantage of the row_timestamp feature

drop table historian.data;
CREATE TABLE historian.data (
assetid unsigned_int not null,
metricid unsigned_int not null,
ts timestamp not null,
val double
CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp))
IMMUTABLE_ROWS=true;

I reran the bulk loader utility which says it completed successfully

[rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t HISTORIAN.DATA /tmp/data.csv

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

csv columns from database.

CSV Upsert complete. 1000000 rows upserted

Time: 65.985 sec(s)

But when I run “select count(*) from historian.data” I see that only the first 572 rows appear in the table. These rows correlate to the the first 572 rows of the input file.

0: jdbc:phoenix:localhost:2181> select count(*) from historian.data;

+------------------------------------------+

|                 COUNT(1)                 |

+------------------------------------------+

| 572                                      |

+------------------------------------------+

1 row selected (4.541 seconds)

0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from historian.data;

+------------------------------------------+------------------------------------------+

|                 MIN(TS)                  |                 MAX(TS)                  |

+------------------------------------------+------------------------------------------+

| 2016-08-08 11:05:15.000                  | 2016-08-08 20:36:15.000                  |

+------------------------------------------+—————————————————————+



Any ideas?


Thanks,
Ryan

Re: Problems with Phoenix bulk loader when using row_timestamp feature

Posted by Ankit Singhal <an...@gmail.com>.
Samarth, filed PHOENIX-3176 for the same.



On Wed, Aug 10, 2016 at 11:42 PM, Ryan Templeton <rtempleton@hortonworks.com
> wrote:

> 0: jdbc:phoenix:localhost:2181> explain select count(*) from
> historian.data;
>
> *+------------------------------------------+*
>
> *| * *                  PLAN                  ** |*
>
> *+------------------------------------------+*
>
> *| * CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER HISTORIAN.DATA* |*
>
> *| *     ROW TIMESTAMP FILTER [0, 1470852712807)* |*
>
> *| *     SERVER FILTER BY FIRST KEY ONLY     * |*
>
> *| *     SERVER AGGREGATE INTO SINGLE ROW    * |*
>
> *+------------------------------------------+*
>
> 4 rows selected (0.071 seconds)
>
> From: Samarth Jain <sa...@apache.org>
> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> Date: Wednesday, August 10, 2016 at 12:05 AM
> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> Subject: Re: Problems with Phoenix bulk loader when using row_timestamp
> feature
>
> Ryan,
>
> Can you tell us what the explain plan says for the select count(*) query.
>
> - Samarth
>
>
> On Tue, Aug 9, 2016 at 12:58 PM, Ryan Templeton <
> rtempleton@hortonworks.com> wrote:
>
>> I am working on a project that will be consuming sensor data. The “fact”
>> table is defined as:
>>
>> CREATE TABLE historian.data (
>> assetid unsigned_int not null,
>> metricid unsigned_int not null,
>> ts timestamp not null,
>> val double
>> CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp))
>> IMMUTABLE_ROWS=true;
>>
>> I generated a 1million row csv sample dataset and use the Phoenix bulk
>> loader to load this data up. The tool reports that all 1,000,000 rows were
>> loaded successfully which I can confirm via sqlline.
>>
>> I then dropped and recreated the table to take advantage of the
>> row_timestamp feature
>>
>> drop table historian.data;
>> CREATE TABLE historian.data (
>> assetid unsigned_int not null,
>> metricid unsigned_int not null,
>> ts timestamp not null,
>> val double
>> CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp))
>> IMMUTABLE_ROWS=true;
>>
>> I reran the bulk loader utility which says it completed successfully
>>
>> [rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t
>> HISTORIAN.DATA /tmp/data.csv
>>
>> SLF4J: Class path contains multiple SLF4J bindings.
>>
>> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180
>> /phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/im
>> pl/StaticLoggerBinder.class]
>>
>> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180
>> /hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticL
>> oggerBinder.class]
>>
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>>
>> 16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The
>> short-circuit local reads feature cannot be used because libhadoop cannot
>> be loaded.
>>
>> csv columns from database.
>>
>> CSV Upsert complete. 1000000 rows upserted
>>
>> Time: 65.985 sec(s)
>>
>> But when I run “select count(*) from historian.data” I see that only the
>> first 572 rows appear in the table. These rows correlate to the the first
>> 572 rows of the input file.
>>
>> 0: jdbc:phoenix:localhost:2181> select count(*) from historian.data;
>>
>> *+------------------------------------------+*
>>
>> *| **                COUNT(1)                ** |*
>>
>> *+------------------------------------------+*
>>
>> *| *572                                     * |*
>>
>> *+------------------------------------------+*
>>
>> 1 row selected (4.541 seconds)
>>
>> 0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from
>> historian.data;
>>
>>
>> *+------------------------------------------+------------------------------------------+*
>>
>> *| **                MIN(TS)                 ** | **
>> MAX(TS)                 ** |*
>>
>>
>> *+------------------------------------------+------------------------------------------+*
>>
>> *| *2016-08-08 11:05:15.000                 * | *2016-08-08
>> 20:36:15.000                 * |*
>>
>> *+------------------------------------------+—————————————————————+*
>>
>>
>>
>>
>> Any ideas?
>>
>>
>> Thanks,
>> Ryan
>>
>
>

Re: Problems with Phoenix bulk loader when using row_timestamp feature

Posted by Ryan Templeton <rt...@hortonworks.com>.
FYI…

The sample data that I loaded in the table was based on the current timestamp with each additional row increasing that value by 1 minute so the current time up to 999,999 minutes into the future. Turns out this was a bug that prevents the scanner from reading timestamp values greater than the current time. More details here: https://issues.apache.org/jira/browse/PHOENIX-3176



From: default <rt...@hortonworks.com>>
Reply-To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Date: Wednesday, August 10, 2016 at 1:12 PM
To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Subject: Re: Problems with Phoenix bulk loader when using row_timestamp feature


0: jdbc:phoenix:localhost:2181> explain select count(*) from historian.data;

+------------------------------------------+

|                   PLAN                   |

+------------------------------------------+

| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER HISTORIAN.DATA |

|     ROW TIMESTAMP FILTER [0, 1470852712807) |

|     SERVER FILTER BY FIRST KEY ONLY      |

|     SERVER AGGREGATE INTO SINGLE ROW     |

+------------------------------------------+

4 rows selected (0.071 seconds)

From: Samarth Jain <sa...@apache.org>>
Reply-To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Date: Wednesday, August 10, 2016 at 12:05 AM
To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Subject: Re: Problems with Phoenix bulk loader when using row_timestamp feature

Ryan,

Can you tell us what the explain plan says for the select count(*) query.

- Samarth


On Tue, Aug 9, 2016 at 12:58 PM, Ryan Templeton <rt...@hortonworks.com>> wrote:
I am working on a project that will be consuming sensor data. The “fact” table is defined as:

CREATE TABLE historian.data (
assetid unsigned_int not null,
metricid unsigned_int not null,
ts timestamp not null,
val double
CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp))
IMMUTABLE_ROWS=true;

I generated a 1million row csv sample dataset and use the Phoenix bulk loader to load this data up. The tool reports that all 1,000,000 rows were loaded successfully which I can confirm via sqlline.

I then dropped and recreated the table to take advantage of the row_timestamp feature

drop table historian.data;
CREATE TABLE historian.data (
assetid unsigned_int not null,
metricid unsigned_int not null,
ts timestamp not null,
val double
CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp))
IMMUTABLE_ROWS=true;

I reran the bulk loader utility which says it completed successfully

[rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t HISTORIAN.DATA /tmp/data.csv

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

csv columns from database.

CSV Upsert complete. 1000000 rows upserted

Time: 65.985 sec(s)

But when I run “select count(*) from historian.data” I see that only the first 572 rows appear in the table. These rows correlate to the the first 572 rows of the input file.

0: jdbc:phoenix:localhost:2181> select count(*) from historian.data;

+------------------------------------------+

|                 COUNT(1)                 |

+------------------------------------------+

| 572                                      |

+------------------------------------------+

1 row selected (4.541 seconds)

0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from historian.data;

+------------------------------------------+------------------------------------------+

|                 MIN(TS)                  |                 MAX(TS)                  |

+------------------------------------------+------------------------------------------+

| 2016-08-08 11:05:15.000                  | 2016-08-08 20:36:15.000                  |

+------------------------------------------+—————————————————————+



Any ideas?


Thanks,
Ryan


Re: Problems with Phoenix bulk loader when using row_timestamp feature

Posted by Ryan Templeton <rt...@hortonworks.com>.
0: jdbc:phoenix:localhost:2181> explain select count(*) from historian.data;

+------------------------------------------+

|                   PLAN                   |

+------------------------------------------+

| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER HISTORIAN.DATA |

|     ROW TIMESTAMP FILTER [0, 1470852712807) |

|     SERVER FILTER BY FIRST KEY ONLY      |

|     SERVER AGGREGATE INTO SINGLE ROW     |

+------------------------------------------+

4 rows selected (0.071 seconds)

From: Samarth Jain <sa...@apache.org>>
Reply-To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Date: Wednesday, August 10, 2016 at 12:05 AM
To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Subject: Re: Problems with Phoenix bulk loader when using row_timestamp feature

Ryan,

Can you tell us what the explain plan says for the select count(*) query.

- Samarth


On Tue, Aug 9, 2016 at 12:58 PM, Ryan Templeton <rt...@hortonworks.com>> wrote:
I am working on a project that will be consuming sensor data. The “fact” table is defined as:

CREATE TABLE historian.data (
assetid unsigned_int not null,
metricid unsigned_int not null,
ts timestamp not null,
val double
CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp))
IMMUTABLE_ROWS=true;

I generated a 1million row csv sample dataset and use the Phoenix bulk loader to load this data up. The tool reports that all 1,000,000 rows were loaded successfully which I can confirm via sqlline.

I then dropped and recreated the table to take advantage of the row_timestamp feature

drop table historian.data;
CREATE TABLE historian.data (
assetid unsigned_int not null,
metricid unsigned_int not null,
ts timestamp not null,
val double
CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp))
IMMUTABLE_ROWS=true;

I reran the bulk loader utility which says it completed successfully

[rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t HISTORIAN.DATA /tmp/data.csv

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

csv columns from database.

CSV Upsert complete. 1000000 rows upserted

Time: 65.985 sec(s)

But when I run “select count(*) from historian.data” I see that only the first 572 rows appear in the table. These rows correlate to the the first 572 rows of the input file.

0: jdbc:phoenix:localhost:2181> select count(*) from historian.data;

+------------------------------------------+

|                 COUNT(1)                 |

+------------------------------------------+

| 572                                      |

+------------------------------------------+

1 row selected (4.541 seconds)

0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from historian.data;

+------------------------------------------+------------------------------------------+

|                 MIN(TS)                  |                 MAX(TS)                  |

+------------------------------------------+------------------------------------------+

| 2016-08-08 11:05:15.000                  | 2016-08-08 20:36:15.000                  |

+------------------------------------------+—————————————————————+



Any ideas?


Thanks,
Ryan


Re: Problems with Phoenix bulk loader when using row_timestamp feature

Posted by Samarth Jain <sa...@apache.org>.
Ryan,

Can you tell us what the explain plan says for the select count(*) query.

- Samarth


On Tue, Aug 9, 2016 at 12:58 PM, Ryan Templeton <rt...@hortonworks.com>
wrote:

> I am working on a project that will be consuming sensor data. The “fact”
> table is defined as:
>
> CREATE TABLE historian.data (
> assetid unsigned_int not null,
> metricid unsigned_int not null,
> ts timestamp not null,
> val double
> CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp))
> IMMUTABLE_ROWS=true;
>
> I generated a 1million row csv sample dataset and use the Phoenix bulk
> loader to load this data up. The tool reports that all 1,000,000 rows were
> loaded successfully which I can confirm via sqlline.
>
> I then dropped and recreated the table to take advantage of the
> row_timestamp feature
>
> drop table historian.data;
> CREATE TABLE historian.data (
> assetid unsigned_int not null,
> metricid unsigned_int not null,
> ts timestamp not null,
> val double
> CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp))
> IMMUTABLE_ROWS=true;
>
> I reran the bulk loader utility which says it completed successfully
>
> [rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t
> HISTORIAN.DATA /tmp/data.csv
>
> SLF4J: Class path contains multiple SLF4J bindings.
>
> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-
> 180/phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/
> impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-
> 180/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> 16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The
> short-circuit local reads feature cannot be used because libhadoop cannot
> be loaded.
>
> csv columns from database.
>
> CSV Upsert complete. 1000000 rows upserted
>
> Time: 65.985 sec(s)
>
> But when I run “select count(*) from historian.data” I see that only the
> first 572 rows appear in the table. These rows correlate to the the first
> 572 rows of the input file.
>
> 0: jdbc:phoenix:localhost:2181> select count(*) from historian.data;
>
> *+------------------------------------------+*
>
> *| **                COUNT(1)                ** |*
>
> *+------------------------------------------+*
>
> *| *572                                     * |*
>
> *+------------------------------------------+*
>
> 1 row selected (4.541 seconds)
>
> 0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from
> historian.data;
>
>
> *+------------------------------------------+------------------------------------------+*
>
> *| **                MIN(TS)                 ** | **
> MAX(TS)                 ** |*
>
>
> *+------------------------------------------+------------------------------------------+*
>
> *| *2016-08-08 11:05:15.000                 * | *2016-08-08 20:36:15.000
>                * |*
>
> *+------------------------------------------+—————————————————————+*
>
>
>
>
> Any ideas?
>
>
> Thanks,
> Ryan
>