You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeff Jirsa <jj...@gmail.com> on 2017/12/07 19:24:39 UTC

Re: Some problems with Apache Cassandra

Reposting to user@ from dev@


On Thu, Dec 7, 2017 at 4:27 AM, v.eliseev@rubic.pro <v....@rubic.pro>
wrote:

> Hello, my name is Vladimir and I have questions on Cassandra DMBS.
>
>
> *1 PROBLEM. How to increase the speed?*
> I installed this system from the repository:
> "deb http://www.apache.org/dist/cassandra/debian 311x main"
> With standard settings, the system runs slowly.
> _Comparison with "MySQL"._
> select count (*) from login_wifly.radacct;
> + ---------- +
> | | count (*) |
> + ---------- +
> | | 9810806 |
> + ---------- +
>
> real 0m3.709s --------- speed in the MySQL (without caching)
> user 0m0.000s
> sys 0m0.000s
>
> _In Cassandra:_
> SELECT count(*) FROM test.radacct;
>
>  count
> ---------
>  9810806
>
> (1 rows)
>
> Warnings :
> Aggregation query used without partition key
>
>
> real    3m7.661s -------- speed in the Cassandra
> user    0m0.444s
> sys     0m0.056s
>
>
This is an antipattern in cassandra. It works, but it will never be
efficient. We make no effort to try to optimize for full table scans.

Depending on your accuracy requirements, you could use the metrics we
expose to approximate the count.



>
> *2 PROBLEM. How to design a table correctly?*
> My test table configure:
> create table radacct2 (
> radacctid bigint,
> acctsessionid text,
> acctuniqueid text,
> username text,
> groupname text,
> realm text,
> nasid text,
> nasipaddress text,
> nasportid text,
> nasporttype text,
> acctstarttime text,
> acctstoptime text,
> acctsessiontime bigint,
> acctauthentic text,
> connectinfo_start text,
> connectinfo_stop text,
> acctinputoctets bigint,
> "acctoutputoctets" bigint,
> "calledstationid" text,
> callingstationid text,
> acctterminatecause text,
> servicetype text,
> framedprotocol text,
> framedipaddress text,
> acctstartdelay bigint,
> acctstopdelay bigint,
> xascendsessionsvrkey text,
> client bigint,
> method text,
> zone bigint,
> localDateStart text,
> localDateStop text,
> localDateTimeStart text,
> localDateTimeStop text,
> msisdn text,
> PRIMARY KEY (radacctid, username)
> ) WITH CLUSTERING ORDER BY (username DESC);
>
> I need do select to the "username" field.
> In MySQL it looks like this:
> SELECT
> a.username,
> COUNT (DISTINCT a.username)
> FROM
> radacct as a
> WHERE
> (LENGTH (a.username) = 17)
> GROUP BY
> a.username;
>
> When I execute the query "SELECT username, count (*) FROM radacct GROUP BY
> username;
> InvalidRequest: Error from server: code = 2200 [Invalid query] message =
> "PRIMARY KEY, got username"
>
>
Cassandra isn't an RDBMS. You have to model the data differently. You
typically build tables based on the SELECT query/queries you'll be using.
You can't just swap in your MySQL table and queries and expect this to work.

It's typical to have multiple denormalized tables, so if you want to do a
lookup by username, you'll build a table with username as the partition key.
If you need to also do a lookup by radacctid, you'll build another table
with radacctid as the partition key.
Since cassandra doesn't have JOINs, you'll probably need to do one query by
username to get the radacctid, and then a second to actually query the main
radacct2 table.

In short: before trying to just make this work, spend some time reading or
watching videos about how to properly data model in cassandra. It's not
MySQL. It's different.


> *3 PROBLEM. How to improve performance using a competent configuration?*
>
> cat /etc/cassandra/cassandra.yaml
> What parameters can be adjusted to achieve maximum effect?
>

All of them?