You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Gerber, Bryan W" <Br...@pnnl.gov> on 2015/04/01 00:05:06 UTC

Table data exists after DELETE, but not index data

After using DELETE FROM TABLE_NAME; to purge data from a table, queries that "FULL SCAN" TABLE_NAME still return matches against the DELETED data, but queries that SCAN OVER indexes do not return the values. Basically, after a DELETE, data is in the main table, but not in the indexes.

Any ideas how we can ensure that DELETE fully deletes the original data, or detect when this occurs?

Schema is roughly:

CREATE TABLE IF NOT EXISTS table_name
(
  file_time VARCHAR NOT NULL,
  file_name VARCHAR NOT NULL,
  rec_num INTEGER NOT NULL,
  m.f1 VARCHAR,
  m.f2 VARCHAR,
  m.f3 VARCHAR,
  m.f4 VARCHAR,
  m.f5 VARCHAR,
  m.f6 VARCHAR,
  m.f7 VARCHAR,
  m.f8 VARCHAR,
 CONSTRAINT pkey PRIMARY KEY (file_time,file_name,rec_num)
) TTL='7776000',IMMUTABLE_ROWS=true,KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

-- indexes
CREATE INDEX IF NOT EXISTS raw_data_idx  ON table_name(m.f1) TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS table_name_f2f3_idx  ON table_name(m.f2,m.f3) TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS table_name_f4f5_idx  ON table_name(m.f4,m.f5) TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';


Query Examples:
0: jdbc:phoenix:dn01> SELECT FILE_NAME FROM TABLE_NAME WHERE FILE_NAME = 'abcdefg' AND REC_NUM =101;
+------------------------------------------+
|                FILE_NAME                 |
+------------------------------------------+
+------------------------------------------+
No rows selected (1.329 seconds)

Full scan (includes non-index fields)
0: jdbc:phoenix:dn01> SELECT * FROM TABLE_NAME WHERE FILE_NAME = 'abcdefg' AND REC_NUM =101;
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+
|                FILE_TIME                |                FILE_NAME                 |                 REC_NUM                  |                F1                 |               F3                |                    |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+
| 2015-03-23 06:42:37+00                   | abcdefg          | 101                                     | 49                                       | 50                                       | 15               |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+

Full scan (single non-indexed field)
0: jdbc:phoenix:dn01> SELECT F8 FROM TABLE_NAME WHERE FILE_NAME = 'abcdefg' AND REC_NUM =101;
+------------------------------------------+
|                  F8                  |
+------------------------------------------+
| ABC                                      |
+------------------------------------------+


Index scan (single indexed field)
0: jdbc:phoenix:dn01> SELECT F1 FROM TABLE_NAME WHERE FILE_NAME = 'abcdefg' AND REC_NUM =101;

+------------------------------------------+
|                  F1                          |
+------------------------------------------+
+------------------------------------------+

Thanks,
Bryan G.


Re: Table data exists after DELETE, but not index data

Posted by James Taylor <ja...@apache.org>.
Hi Bryan,
Prior to the 4.2 release, if you want to delete rows from a table
declared as immutable, you need to drop the table (in which case the
index would be dropped as well). With 4.2 and above, the index of an
immutable table will be kept in sync when rows are deleted from the
data table with some restrictions (see PHOENIX-619 for details). If
these restrictions don't work for your use case, you'll need to switch
to using mutable secondary indexing, as declaring a table as immutable
is an optimization geared toward append-only data.
Thanks,
James


On Tue, Mar 31, 2015 at 3:05 PM, Gerber, Bryan W <Br...@pnnl.gov> wrote:
> After using DELETE FROM TABLE_NAME; to purge data from a table, queries that
> “FULL SCAN” TABLE_NAME still return matches against the DELETED data, but
> queries that SCAN OVER indexes do not return the values. Basically, after a
> DELETE, data is in the main table, but not in the indexes.
>
>
>
> Any ideas how we can ensure that DELETE fully deletes the original data, or
> detect when this occurs?
>
>
>
> Schema is roughly:
>
>
>
> CREATE TABLE IF NOT EXISTS table_name
>
> (
>
>   file_time VARCHAR NOT NULL,
>
>   file_name VARCHAR NOT NULL,
>
>   rec_num INTEGER NOT NULL,
>
>   m.f1 VARCHAR,
>
>   m.f2 VARCHAR,
>
>   m.f3 VARCHAR,
>
>   m.f4 VARCHAR,
>
>   m.f5 VARCHAR,
>
>   m.f6 VARCHAR,
>
>   m.f7 VARCHAR,
>
>   m.f8 VARCHAR,
>
>  CONSTRAINT pkey PRIMARY KEY (file_time,file_name,rec_num)
>
> )
> TTL='7776000',IMMUTABLE_ROWS=true,KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
>
>
> -- indexes
>
> CREATE INDEX IF NOT EXISTS raw_data_idx  ON table_name(m.f1)
> TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
> CREATE INDEX IF NOT EXISTS table_name_f2f3_idx  ON table_name(m.f2,m.f3)
> TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
> CREATE INDEX IF NOT EXISTS table_name_f4f5_idx  ON table_name(m.f4,m.f5)
> TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
>
>
>
>
> Query Examples:
>
> 0: jdbc:phoenix:dn01> SELECT FILE_NAME FROM TABLE_NAME WHERE FILE_NAME =
> ‘abcdefg' AND REC_NUM =101;
>
> +------------------------------------------+
>
> |                FILE_NAME                 |
>
> +------------------------------------------+
>
> +------------------------------------------+
>
> No rows selected (1.329 seconds)
>
>
>
> Full scan (includes non-index fields)
>
> 0: jdbc:phoenix:dn01> SELECT * FROM TABLE_NAME WHERE FILE_NAME = ‘abcdefg'
> AND REC_NUM =101;
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+
>
> |                FILE_TIME                |                FILE_NAME
> |                 REC_NUM                  |                F1
> |               F3                |                    |
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+
>
> | 2015-03-23 06:42:37+00                   | abcdefg          | 101
> | 49                                       | 50
> | 15               |
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+
>
>
>
> Full scan (single non-indexed field)
>
> 0: jdbc:phoenix:dn01> SELECT F8 FROM TABLE_NAME WHERE FILE_NAME = ‘abcdefg'
> AND REC_NUM =101;
>
> +------------------------------------------+
>
> |                  F8                  |
>
> +------------------------------------------+
>
> | ABC                                      |
>
> +------------------------------------------+
>
>
>
>
>
> Index scan (single indexed field)
>
> 0: jdbc:phoenix:dn01> SELECT F1 FROM TABLE_NAME WHERE FILE_NAME = ‘abcdefg'
> AND REC_NUM =101;
>
>
>
> +------------------------------------------+
>
> |                  F1                          |
>
> +------------------------------------------+
>
> +------------------------------------------+
>
>
>
> Thanks,
>
> Bryan G.
>
>