You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iotdb.apache.org by Wz <zh...@qq.com.INVALID> on 2022/06/16 09:38:47 UTC

Data filtering and aggregation with tags

Hi guys,




I believe using tags to do data filtering and aggregation can be a common need. Putting all the attributes into the path is not a good idea because it makes the path extremely long, and slows down the MTree searching, so we take some of the attributes as tags. But that doesn't mean tags are not important.





Let's take the following ECS management scenario as an example. IoTDB stores the cpu_util of each ECS instance. Besides that, an ECS instance has static attributes like region_id, available_zone, hostname, CPU, memory, storage, and OS store. Since the CPU, memory, and storage are numbers and OS is a string with white spaces, they are stored as tags and other attributes are stored as levels in the path like root.${region_id}.${available_zone}.${hostname}.cpu_util.




Let's say there are some ECS instances whose cpu_util is abnormally high in the last hour and we want to know if the problem is caused by a certain version of OS. The query should be like,




&gt; SELECT OS, COUNT(cpu_util) FROM root.** WHERE cpu_util &gt; 95.0 GRUOP BY TAG OS ALIGN BY DEVICE




With the ability to do filter and aggregation with tags, IoTDB can be more powerful in analytics processing. What do you think?




Any suggestions are welcome :D




Zhong Wang,

Alibaba group

Re: Data filtering and aggregation with tags

Posted by Xiangdong Huang <sa...@gmail.com>.

+1, this feature is useful.

-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Eric Pai <er...@hotmail.com> 于2022年6月16日周四 17:45写道：

> Good idea! If we can make use of the tags not only in metadata but also in
> data query, we can enrich the data analysis ability a lot, and help the
> business layer to achieve more goals than before. However as the query
> grammar may become more complicated, we should take the easy-use into
> consideration of SQL design as well.
>
> 在 2022/6/16 17:39，“Wz”<zh...@qq.com.INVALID> 写入:
>
>     Hi guys,
>
>
>
>
>     I believe using tags to do data filtering and aggregation can be a
> common need. Putting all the attributes into the path is not a good idea
> because it makes the path extremely long, and slows down the MTree
> searching, so we take some of the attributes as tags. But that doesn't mean
> tags are not important.
>
>
>
>
>
>     Let's take the following ECS management scenario as an example. IoTDB
> stores the cpu_util of each ECS instance. Besides that, an ECS instance has
> static attributes like region_id, available_zone, hostname, CPU, memory,
> storage, and OS store. Since the CPU, memory, and storage are numbers and
> OS is a string with white spaces, they are stored as tags and other
> attributes are stored as levels in the path like
> root.${region_id}.${available_zone}.${hostname}.cpu_util.
>
>
>
>
>     Let's say there are some ECS instances whose cpu_util is abnormally
> high in the last hour and we want to know if the problem is caused by a
> certain version of OS. The query should be like,
>
>
>
>
>     &gt; SELECT OS, COUNT(cpu_util) FROM root.** WHERE cpu_util &gt; 95.0
> GRUOP BY TAG OS ALIGN BY DEVICE
>
>
>
>
>     With the ability to do filter and aggregation with tags, IoTDB can be
> more powerful in analytics processing. What do you think?
>
>
>
>
>     Any suggestions are welcome :D
>
>
>
>
>     Zhong Wang,
>
>     Alibaba group
>
>

Re: Data filtering and aggregation with tags

Posted by Eric Pai <er...@hotmail.com>.

Good idea! If we can make use of the tags not only in metadata but also in data query, we can enrich the data analysis ability a lot, and help the business layer to achieve more goals than before. However as the query grammar may become more complicated, we should take the easy-use into consideration of SQL design as well.

在 2022/6/16 17:39，“Wz”<zh...@qq.com.INVALID> 写入:

    Hi guys,




    I believe using tags to do data filtering and aggregation can be a common need. Putting all the attributes into the path is not a good idea because it makes the path extremely long, and slows down the MTree searching, so we take some of the attributes as tags. But that doesn't mean tags are not important.





    Let's take the following ECS management scenario as an example. IoTDB stores the cpu_util of each ECS instance. Besides that, an ECS instance has static attributes like region_id, available_zone, hostname, CPU, memory, storage, and OS store. Since the CPU, memory, and storage are numbers and OS is a string with white spaces, they are stored as tags and other attributes are stored as levels in the path like root.${region_id}.${available_zone}.${hostname}.cpu_util.




    Let's say there are some ECS instances whose cpu_util is abnormally high in the last hour and we want to know if the problem is caused by a certain version of OS. The query should be like,




    &gt; SELECT OS, COUNT(cpu_util) FROM root.** WHERE cpu_util &gt; 95.0 GRUOP BY TAG OS ALIGN BY DEVICE




    With the ability to do filter and aggregation with tags, IoTDB can be more powerful in analytics processing. What do you think?




    Any suggestions are welcome :D




    Zhong Wang,

    Alibaba group