You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/05 06:53:38 UTC
[GitHub] [incubator-doris] Youngwb edited a comment on issue #3930: [Proposal] Doris support version column for REPLACE aggregate type
Youngwb edited a comment on issue #3930:
URL: https://github.com/apache/incubator-doris/issues/3930#issuecomment-654853902
According to @morningman @yangzhg 's suggest, I made some corrections
## name
use `sequence` column instead of `version` column for user understand easy.
## Create table
Use UNIQUE_KEYS instead of AGG_KEYS. Because sequence column is a hidden column, there is no need to create a `version` column with MAX AGG_TYPE.
```
CREATE TABLE `test_1` (
`pin_id` bigint(20) NOT NULL COMMENT "",
`date` date NOT NULL COMMENT "",
`group_id` bigint(20) NOT NULL COMMENT "",
`keyword` varchar(128) NOT NULL
) ENGINE=OLAP
UNIQUE KEY(`pin_id`, `date`, `group_id`)
PROPERTIES (
"function_column.sequence_type" = "int"
);
```
like such example, user need to add `sequence_type` to Identify the sequence column type. It only support the Integer types (int, bigint, largeint) and time types(date, datetime). User can't query the `sequence_column` hidden in table , but can add one column which value is equal to `sequence_column`. like this
```
CREATE TABLE `test_2` (
`pin_id` bigint(20) NOT NULL COMMENT "",
`date` date NOT NULL COMMENT "",
`group_id` bigint(20) NOT NULL COMMENT "",
`sequence_visiable` int NOT NULL,
`keyword` varchar(128) NOT NULL
) ENGINE=OLAP
UNIQUE KEY(`pin_id`, `date`, `group_id`)
PROPERTIES (
"function_column.sequence_type" = "int"
);
```
Column names are not necessarily "sequence_visiable", this is just an example. The user ensures that the values are same by specifying parameters at LOAD time
## LOAD
### Stream Load
`curl --location-trusted -u root -H "columns: pin_id,date,group_id,source_sequence,keyword" -H "function_column.sequence_col: source_sequence" -T test_load http://127.0.01:8030/api/test/test_1/_stream_load`
### Broker Load
```
LOAD LABEL test.test11
(
DATA INFILE("hdfs://path/to/load_file")
INTO TABLE `test_1`
FORMAT AS "parquet"
(pin_id,date,group_id,source_sequence,keyword)
ORDER BY source_sequence
) with BROKER broker_name (...)
```
### Routine Load
```
CREATE ROUTINE LOAD test_1_job ON test_1
COLUMNS TERMINATED BY ",",
(pin_id,date,group_id,source_sequence,keyword)
ORDER BY source_sequence
PROPERTIES
(
"desired_concurrent_number"="3",
"max_batch_interval" = "30",
"max_batch_rows" = "1000000",
"max_batch_size" = "509715200"
) FROM KAFKA
(
"kafka_broker_list" = "...",
"kafka_topic" = "...",
"property.client.id" = "...",
"property.group.id" = "..."
);
```
I added a parameter `sequence_col` to identify the source data for the sequence column at load, because it's hidden column, user need to identify the source column in `columns_mapping`.
For table `test_2` which has column `sequence_visiable`, user can set "sequence_col" = "sequence_visiable" at properties, which means the hidden column "sequence_col" is same with the "sequence_visiable" in table, user can query the the column "sequence_visiable" instead of "sequence_col"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org