You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Riesland, Zack" <Za...@sensus.com> on 2015/07/23 11:44:38 UTC

Understanding keys

This is probably a silly question... please humor me: I'm a Java/JS developer learning about databases as I go.

Suppose I have a table with columns A-Z, and declare the primary key to be (A, B, C).

I understand that that forces each row to have a unique A, B, C combination.

But what does it mean for querying? Can I very quickly query again column A and/or B and/or C, or just the combination of the 3?

What is the difference between primary key (A, B, C) and primary key (A, B) with a secondary index on C?

Thanks!

RE: Understanding keys

Posted by "Riesland, Zack" <Za...@sensus.com>.
Thanks a lot for that explanation, Gabriel.

From: Gabriel Reid [mailto:gabriel.reid@gmail.com]
Sent: Thursday, July 23, 2015 10:39 AM
To: user@phoenix.apache.org
Subject: Re: Understanding keys

Filtering a query on the leading columns of the primary key (i.e. [A], [A,B], or [A,B,C]) will give optimal performance. This is because the records are in sorted order based on the combination of [A,B,C], so filtering on a leading subset of the primary key is basically the same as filtering on the full primary key. The set of records that pass the query filter are all within a single defined range of records (not considering salt buckets and multi-tenant tables).

Filtering on [B] or [B,C] may offer relatively decent performance, depending on the cardinality and data type of primary key column A. This is due to the way that Phoenix makes use of skip scans [1].

The farther you get away from using the leading column(s) in a filter, the more rows that Phoenix will need to check in order to return a result, which equals slower query performance.

The difference between having a primary key of (A,B,C) compared to a primary key of (A, B) and a secondary index on C is that the secondary index on C will allow much faster lookups if you're searching for a row with a given value for C. I expect that the performance of a query where you specify a filter on A, B, and C in this case will also be slower than if you have a primary key on (A, B, C), as again Phoenix will have to scan over more rows than if the primary key (A, B, C) were defined.

- Gabriel

1. http://phoenix.apache.org/skip_scan.html


On Thu, Jul 23, 2015 at 11:45 AM Riesland, Zack <Za...@sensus.com>> wrote:
This is probably a silly question… please humor me: I’m a Java/JS developer learning about databases as I go.

Suppose I have a table with columns A-Z, and declare the primary key to be (A, B, C).

I understand that that forces each row to have a unique A, B, C combination.

But what does it mean for querying? Can I very quickly query again column A and/or B and/or C, or just the combination of the 3?

What is the difference between primary key (A, B, C) and primary key (A, B) with a secondary index on C?

Thanks!

Re: Understanding keys

Posted by Gabriel Reid <ga...@gmail.com>.
Filtering a query on the leading columns of the primary key (i.e. [A],
[A,B], or [A,B,C]) will give optimal performance. This is because the
records are in sorted order based on the combination of [A,B,C], so
filtering on a leading subset of the primary key is basically the same as
filtering on the full primary key. The set of records that pass the query
filter are all within a single defined range of records (not considering
salt buckets and multi-tenant tables).

Filtering on [B] or [B,C] may offer relatively decent performance,
depending on the cardinality and data type of primary key column A. This is
due to the way that Phoenix makes use of skip scans [1].

The farther you get away from using the leading column(s) in a filter, the
more rows that Phoenix will need to check in order to return a result,
which equals slower query performance.

The difference between having a primary key of (A,B,C) compared to a
primary key of (A, B) and a secondary index on C is that the secondary
index on C will allow much faster lookups if you're searching for a row
with a given value for C. I expect that the performance of a query where
you specify a filter on A, B, and C in this case will also be slower than
if you have a primary key on (A, B, C), as again Phoenix will have to scan
over more rows than if the primary key (A, B, C) were defined.

- Gabriel

1. http://phoenix.apache.org/skip_scan.html


On Thu, Jul 23, 2015 at 11:45 AM Riesland, Zack <Za...@sensus.com>
wrote:

>  This is probably a silly question… please humor me: I’m a Java/JS
> developer learning about databases as I go.
>
>
>
> Suppose I have a table with columns A-Z, and declare the primary key to be
> (A, B, C).
>
>
>
> I understand that that forces each row to have a unique A, B, C
> combination.
>
>
>
> But what does it mean for querying? Can I very quickly query again column
> A and/or B and/or C, or just the combination of the 3?
>
>
>
> What is the difference between primary key (A, B, C) and primary key (A,
> B) with a secondary index on C?
>
>
>
> Thanks!
>

TableNotFoundException: SYSTEM.CATALOG

Posted by Matjaž Trtnik <mt...@salviol.com>.
Hi!

I get following error when running sqlline tool:

Error: SYSTEM.CATALOG (state=08000,code=101)
org.apache.phoenix.exception.PhoenixIOException: SYSTEM.CATALOG
	at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1041)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1004)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1249)
	at org.apache.phoenix.query.DelegateConnectionQueryServices.createTable(DelegateConnectionQueryServices.java:112)
	at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:1902)
	at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:744)
	at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:186)
	at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:303)
	at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
	at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
	at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:293)
	at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1236)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1891)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
	at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
	at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
	at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
	at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
	at sqlline.DatabaseConnection.connect(DatabaseConnection.java:157)
	at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:203)
	at sqlline.Commands.connect(Commands.java:1064)
	at sqlline.Commands.connect(Commands.java:996)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
	at sqlline.SqlLine.dispatch(SqlLine.java:804)
	at sqlline.SqlLine.initArgs(SqlLine.java:588)
	at sqlline.SqlLine.begin(SqlLine.java:656)
	at sqlline.SqlLine.start(SqlLine.java:398)
	at sqlline.SqlLine.main(SqlLine.java:292)
Caused by: org.apache.hadoop.hbase.TableNotFoundException: SYSTEM.CATALOG
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1257)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1139)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1096)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:931)
	at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
	at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:496)
	at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:736)
	at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:706)
	at org.apache.hadoop.hbase.client.HTable.getStartKeysInRange(HTable.java:1760)
	at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1715)
	at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1695)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1024)
	... 32 more

What is the reason for that? As I understand Phoenix should create this table itself automatically?