You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/06/28 02:35:40 UTC

[GitHub] [incubator-doris] wangbo commented on a change in pull request #3938: (#3061) add user doc for build global dict

wangbo commented on a change in pull request #3938:
URL: https://github.com/apache/incubator-doris/pull/3938#discussion_r446591299



##########
File path: docs/zh-CN/administrator-guide/load-data/spark-load-manual.md
##########
@@ -45,6 +45,7 @@ Spark load 是一种异步导入方式,用户需要通过 MySQL 协议创建 S
 2. Backend(BE):Doris 系统的计算和存储节点。在导入流程中主要负责数据写入及存储。
 3. Spark ETL:在导入流程中主要负责数据的 ETL 工作,包括全局字典构建(BITMAP类型)、分区、排序、聚合等。
 4. Broker:Broker 为一个独立的无状态进程。封装了文件系统接口,提供 Doris 读取远端存储系统中文件的能力。
+5. 全局字典: 保存了数据从原始值到编码值映射的数据结构,原始值可以是任意数据类型,而编码后的值为整型;全局字典主要应用于精确去重预计算的场景。

Review comment:
       目前全局字典表和创建的hive临时表,去重列都统一存成string了,所以类型没有影响




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org