You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "M. Aaron Bossert" <ma...@gmail.com> on 2019/05/22 17:10:22 UTC

inconsistent commit behavior when using JDBC

I am using Phoenix 5 as shipped with Hortonworks HDP 3.1.  I am storing 3+
million file names in a table and then using the table to keep track of
which files I have processed using a Storm topology.  I have been doing
some testing to make sure that everything is working correctly and as part
of that, tried my code on single files to ensure that columns were being
updated as expected and that my logic was working properly.

I have found that when I do an UPSERT to set a column to the current
timestamp (now()) the first time, it works as expected, then I reset that
column to NULL so that I can redo the test, but find that the exact same
UPSERT statement fails to make a change even though I get a response that
confirms one record was affected.

In order to eliminate potential red herrings, here is an example of the
problem that is reproducible using just the phoenix-sqlline interface (I
get same exact behavior when running equivalent Scala code).

Here is the DDL for the table:

CREATE TABLE DEFAULT.FILE_INDEX(
    MTIME TIMESTAMP NOT NULL,
    FILENAME VARCHAR NOT NULL,
    TYPE VARCHAR NOT NULL,
    SUBTYPE VARCHAR NOT NULL,
    SENSOR VARCHAR NOT NULL,
    SIZE BIGINT NOT NULL,
    OWNER VARCHAR NOT NULL,
    GROUP_OWNER VARCHAR NOT NULL,
    PERMISSIONS VARCHAR NOT NULL,
    STARTED TIMESTAMP,
    PROCESSED TIMESTAMP,
    EVENT_COUNT BIGINT
    CONSTRAINT PK PRIMARY KEY(MTIME
ROW_TIMESTAMP,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS))
COMPRESSION='SNAPPY',DEFAULT_COLUMN_FAMILY='F';

and here is an example set of queries to reproduce the issue I am seeing:

0: jdbc:phoenix:master-1.> UPSERT INTO
DEFAULT.FILE_INDEX(MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,STARTED)
SELECT MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,now()
AS STARTED FROM DEFAULT.FILE_INDEX WHERE
FILENAME='hdfs://filename.log';1 row affected (5.041 seconds)0:
jdbc:phoenix:master-1.> select * from default.file_index where started
is not null;+--------------------------+-----------------------------------------------------------------------------------------------------------------------------+-------+----------+-----------------+---------+------------+--------------+------+|
         MTIME           |
             FILENAME
         | TYPE  | SUBTYPE  |     SENSOR      |  SIZE   |   OWNER    |
GROUP_OWNER  | PERM
|+--------------------------+-----------------------------------------------------------------------------------------------------------------------------+-------+----------+-----------------+---------+------------+--------------+------+|
2018-11-01 00:00:00.000  | hdfs://filename.log  | BRO   | DNS      |
something  | 224500  | somebody  | hdfs         | rw-r
|+--------------------------+-----------------------------------------------------------------------------------------------------------------------------+-------+----------+-----------------+---------+------------+--------------+------+1
row selected (4.046 seconds)0: jdbc:phoenix:master-1.> UPSERT INTO
DEFAULT.FILE_INDEX(MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,STARTED)
SELECT MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,NULL
AS STARTED FROM DEFAULT.FILE_INDEX WHERE
FILENAME='hdfs://filename.log';1 row affected (4.541 seconds)0:
jdbc:phoenix:master-1.> select * from default.file_index where started
is not null;+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+|
MTIME  | FILENAME  | TYPE  | SUBTYPE  | SENSOR  | SIZE  | OWNER  |
GROUP_OWNER  | PERMISSIONS  | STARTED  | PROCESSED  | EVENT_COUNT
|+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------++--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+No
rows selected (4.782 seconds)0: jdbc:phoenix:master-1.> UPSERT INTO
DEFAULT.FILE_INDEX(MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,STARTED)
SELECT MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,now()
AS STARTED FROM DEFAULT.FILE_INDEX WHERE
FILENAME='hdfs://filename.log';1 row affected (5.254 seconds)0:
jdbc:phoenix:master-1.> select * from default.file_index where started
is not null;+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+|
MTIME  | FILENAME  | TYPE  | SUBTYPE  | SENSOR  | SIZE  | OWNER  |
GROUP_OWNER  | PERMISSIONS  | STARTED  | PROCESSED  | EVENT_COUNT
|+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------++--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+No
rows selected (4.389 seconds)

I would greatly appreciate any insights into what I might be doing wrong
here.  My assumption is that this behavior, though undesirable, is expected
and has to do with versions or perhaps autocommit settings...but for the
life of me, I have tried every permutation of autocommit settings and have
not been able to get around this behavior...

Re: inconsistent commit behavior when using JDBC

Posted by "M. Aaron Bossert" <ma...@gmail.com>.
Folks, a quick update.  It appears that I am not the only one to bump into
this issue.  There is an old question about this behavior that seems to
suggest that it is related specifically to columns updated with a NULL
value.
https://dba.stackexchange.com/questions/208544/apache-phoenix-upsert-not-working-while-trying-to-update-null-empty-columns

Unfortunately, the stackexchange question never got an answer, so not much
help.

On Wed, May 22, 2019 at 1:10 PM M. Aaron Bossert <ma...@gmail.com>
wrote:

> I am using Phoenix 5 as shipped with Hortonworks HDP 3.1.  I am storing 3+
> million file names in a table and then using the table to keep track of
> which files I have processed using a Storm topology.  I have been doing
> some testing to make sure that everything is working correctly and as part
> of that, tried my code on single files to ensure that columns were being
> updated as expected and that my logic was working properly.
>
> I have found that when I do an UPSERT to set a column to the current
> timestamp (now()) the first time, it works as expected, then I reset that
> column to NULL so that I can redo the test, but find that the exact same
> UPSERT statement fails to make a change even though I get a response that
> confirms one record was affected.
>
> In order to eliminate potential red herrings, here is an example of the
> problem that is reproducible using just the phoenix-sqlline interface (I
> get same exact behavior when running equivalent Scala code).
>
> Here is the DDL for the table:
>
> CREATE TABLE DEFAULT.FILE_INDEX(
>     MTIME TIMESTAMP NOT NULL,
>     FILENAME VARCHAR NOT NULL,
>     TYPE VARCHAR NOT NULL,
>     SUBTYPE VARCHAR NOT NULL,
>     SENSOR VARCHAR NOT NULL,
>     SIZE BIGINT NOT NULL,
>     OWNER VARCHAR NOT NULL,
>     GROUP_OWNER VARCHAR NOT NULL,
>     PERMISSIONS VARCHAR NOT NULL,
>     STARTED TIMESTAMP,
>     PROCESSED TIMESTAMP,
>     EVENT_COUNT BIGINT
>     CONSTRAINT PK PRIMARY KEY(MTIME ROW_TIMESTAMP,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS)) COMPRESSION='SNAPPY',DEFAULT_COLUMN_FAMILY='F';
>
> and here is an example set of queries to reproduce the issue I am seeing:
>
> 0: jdbc:phoenix:master-1.> UPSERT INTO DEFAULT.FILE_INDEX(MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,STARTED) SELECT MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,now() AS STARTED FROM DEFAULT.FILE_INDEX WHERE FILENAME='hdfs://filename.log';1 row affected (5.041 seconds)0: jdbc:phoenix:master-1.> select * from default.file_index where started is not null;+--------------------------+-----------------------------------------------------------------------------------------------------------------------------+-------+----------+-----------------+---------+------------+--------------+------+|          MTIME           |                                                          FILENAME                                                           | TYPE  | SUBTYPE  |     SENSOR      |  SIZE   |   OWNER    | GROUP_OWNER  | PERM |+--------------------------+-----------------------------------------------------------------------------------------------------------------------------+-------+----------+-----------------+---------+------------+--------------+------+| 2018-11-01 00:00:00.000  | hdfs://filename.log  | BRO   | DNS      | something  | 224500  | somebody  | hdfs         | rw-r |+--------------------------+-----------------------------------------------------------------------------------------------------------------------------+-------+----------+-----------------+---------+------------+--------------+------+1 row selected (4.046 seconds)0: jdbc:phoenix:master-1.> UPSERT INTO DEFAULT.FILE_INDEX(MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,STARTED) SELECT MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,NULL AS STARTED FROM DEFAULT.FILE_INDEX WHERE FILENAME='hdfs://filename.log';1 row affected (4.541 seconds)0: jdbc:phoenix:master-1.> select * from default.file_index where started is not null;+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+| MTIME  | FILENAME  | TYPE  | SUBTYPE  | SENSOR  | SIZE  | OWNER  | GROUP_OWNER  | PERMISSIONS  | STARTED  | PROCESSED  | EVENT_COUNT  |+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------++--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+No rows selected (4.782 seconds)0: jdbc:phoenix:master-1.> UPSERT INTO DEFAULT.FILE_INDEX(MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,STARTED) SELECT MTIME,FILENAME,TYPE,SUBTYPE,SENSOR,SIZE,OWNER,GROUP_OWNER,PERMISSIONS,now() AS STARTED FROM DEFAULT.FILE_INDEX WHERE FILENAME='hdfs://filename.log';1 row affected (5.254 seconds)0: jdbc:phoenix:master-1.> select * from default.file_index where started is not null;+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+| MTIME  | FILENAME  | TYPE  | SUBTYPE  | SENSOR  | SIZE  | OWNER  | GROUP_OWNER  | PERMISSIONS  | STARTED  | PROCESSED  | EVENT_COUNT  |+--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------++--------+-----------+-------+----------+---------+-------+--------+--------------+--------------+----------+------------+--------------+No rows selected (4.389 seconds)
>
> I would greatly appreciate any insights into what I might be doing wrong
> here.  My assumption is that this behavior, though undesirable, is expected
> and has to do with versions or perhaps autocommit settings...but for the
> life of me, I have tried every permutation of autocommit settings and have
> not been able to get around this behavior...
>
>
>
>
>
>
>