You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/03 09:45:36 UTC

[GitHub] [doris] eldenmoon opened a new issue, #15571: [Enhancement] improve performace of point query on primary keys

eldenmoon opened a new issue, #15571:
URL: https://github.com/apache/doris/issues/15571

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   # Background
   Doris is build as a MPP OLAP system，it's very fast on analytic queries，but as for queries in hight concurrency scenario, doris has it's drawback as bellow:
   1. It's build on a column oriented format engine, in hight concurrency serving scenario users always want to get a whole row from system.But column oriented format will massively amplify random read IO when table is wide.
   2. Doris query engine and plan is too heavy for some simple queries like point query.We need a short fast path for such queries
   3. FE is an access layer service for SQL queries and write in java，analyzing and parsing SQLs will lead very high CPU overhead for hight concurrency queries
   
   As for above drawback in the system，we provided 3  methods to optimize each. 
      1. provide a row store format in the system
      2. short path optimize for point queries
      3. prepared statement
    I'll discuss details of each one in the follow.
   
   # Row store
   For row store format, we need to consider bellow performance and storage issue:
   1. Fast access of each column of a row
   2. Disk storage cost, less is better
   3. Simple enough and extensible 
      I investigate and survey some design and find out a better codec choice which Doris already used, `JSONB`.It stores binary format of each column, and we mapped each column id to each column value for fast column access.The binary format is compacted and it's binary is directly from vectorized column binary format.The codec detail could be find in file `be/src/vec/jsonb/serialize.cpp`, the codec logic is very simple and extensible.
      For storing a row, an extra column called `__DORIS_SOURCE_COL__` is added to each segment, so the storage is a simple column, we convert each Block to jsonb column and add this column to the back of Block, the extra column will be flushed with the same Block from memtable.
   
   # Short path optimize
   For simple queries like `select * from tbl where pk1 = 123 and pk2 = 456`  such query will only act on a single tablet.So we could use a light weight RPC interface to fetch data instead of generate a huge Fragment plan and execute on the heavy execution engine.The RPC  interface is as follow
   ```
   // We use thrift definition for some structure, since TExpr,
   // list<Exprs.TExpr>, Descriptors.TDescriptorTable are all thrift format.
   // Modify them to protobuf is a redundant work.
   message PTabletKeyLookupRequest {
       required int64 tablet_id = 1;
       repeated KeyTuple key_tuples = 2;
       // serilized from Descriptors.TDescriptorTable
       optional bytes desc_tbl = 4;
       // serilized from TExprList 
       optional bytes output_expr = 5;
   }
   
   message PTabletKeyLookupResponse {
       required PStatus status = 1;
       optional bytes row_batch = 5;
       optional bool empty_batch = 6;
   }
   rpc tablet_fetch_data(PTabletKeyLookupRequest) returns (PTabletKeyLookupResponse);
   ```
   `tablet_id` is caculated from condition column `pk`, and `key_tuples` is string format of pk, in the above example, `key_tuples` is like `['123', '456']`, the key tuples will be encoded to the primary key format and using `lookup_row_key` for indentify positions of keys in the tablet.`lookup_row_key` will locate key row positions and check if the key is in delete bitmap, if not in return it's `RowLocation` otherwise return NotFound.Then use the location to directly get the row from related segment, we only need the `__DORIS_SOURCE_COL__` column for point queries, so we just located a row in this column and get a jsonb format raw and decode it to Block for later expressions evaluation.
   
   # Prepared Statement
   Mysql provided a useful statement caching mechanism on both server and client side called Prepared Statement.For reducing FE analyze and parse CPU cost of a query, we could use server side Prepared Statement mechanism.We provide `PreparedStatement` feature in FE fully compatible with mysql protocol (currently only support point queries like above mentioned).Enable it will pre caculate PreparedStatement SQL and expresions and caches it in a session level memory HashMap and will be reused later on.We could improve 4x+ performance by using `PreparedStatement` when CPU became hotspot doing such queries.Server side Prepared Statement using mysql binary protocol as it's transfer protocol see [mysql binary row](https://dev.mysql.com/doc/dev/mysqlserver/latest/page_protocol_binary_resultset.html#sect_protocol_binary_resultset_row)  for more details.We also implement such protocol in `mysql_row_buffer.[h|cpp]` according to the standard mysql binary protocol.
       It's not enough just caching statement in FE, as for BE we also need caching some resuseful structures such as preallocted Block pool query descriptors and expressions since serialize and deserialize them are also heavy CPU works.For each Prepared Statement, it attached with an UUID named `cacheID`, each prepared query will execute and resused above mentioned resusul structures in BE according to related `cacheID`.
   
   # Benchmark
   I run a standard [YCSB](https://github.com/eldenmoon/YCSB) benchamrk for comparing the performance promoted.Bellow is the result.
   (The machine env is a single cloud machine with 16 cores and 64G memory plus 4 * 1T HDD.Before the benchmark I did some warmup for caching.I will test performance in a cluster later on, it should be linear scalability) 
   
   cost about 12 cores
   ```
   [OVERALL], Throughput(ops/sec), 31134.704298145927
   [READ], Operations, 9882727
   [READ], AverageLatency(us), 638.3491656705685
   [READ], MinLatency(us), 250
   [READ], MaxLatency(us), 152447
   [READ], 95thPercentileLatency(us), 856
   [READ], 99thPercentileLatency(us), 1277
   [READ], Return=OK, 9882727
   [READ], Return=NOT_FOUND, 117273
   ```
   
   the table is created by following SQL
   ```
   CREATE TABLE `usertable` (
     `YCSB_KEY` varchar(255) NULL,
     `FIELD0` text NULL,
     `FIELD1` text NULL,
     `FIELD2` text NULL,
     `FIELD3` text NULL,
     `FIELD4` text NULL,
     `FIELD5` text NULL,
     `FIELD6` text NULL,
     `FIELD7` text NULL,
     `FIELD8` text NULL,
     `FIELD9` text NULL
   ) ENGINE=OLAP
   UNIQUE KEY(`YCSB_KEY`)
   COMMENT 'OLAP'
   DISTRIBUTED BY HASH(`YCSB_KEY`) BUCKETS 16
   PROPERTIES (
   "replication_allocation" = "tag.location.default: 1",
   "in_memory" = "false",
   "persistent" = "false",
   "storage_format" = "V2",
   "enable_unique_key_merge_on_write" = "true",
   "light_schema_change" = "true",
   "store_row_column" = "true",
   "disable_auto_compaction" = "false"
   );
   ```
   and executed in SQL follow with `useServerPrepStmts=true`
   ```
   SELECT * from usertable WHERE YCSB_KEY = ?
   ```
   
   
   ### Solution
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] dataroaring closed issue #15571: [Enhancement] improve performace of point query on primary keys

Posted by GitBox <gi...@apache.org>.

dataroaring closed issue #15571: [Enhancement] improve performace of point query on primary keys
URL: https://github.com/apache/doris/issues/15571


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org