You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by 田原 <ti...@mails.tsinghua.edu.cn> on 2020/04/19 07:10:07 UTC

Design of the Tag and attribute management

Hi,


I am working on the JIRA-588 https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-588?filter=allopenissues


We want support tag management for time series.


The tag/attribute info of the time series, to some extent, is also a kind of meta data of that timeseries. So I decide to put the logic of this part in the MManager.


The details are as following:


Firstly, we support two kind property for one timeseries: tag and attribute. The only diiference betwwen them is that we will maintain an inverted index on the tag, so we can use tag property in the where clause.


The create timeseries clause will be changed like:


CREATE TIMESERIES fullPath alias? WITH attributeClauses

alias
    : LR_BRACKET ID RR_BRACKET
    ;
attributeClauses
    : DATATYPE OPERATOR_EQ dataType COMMA ENCODING OPERATOR_EQ encoding
    (COMMA (COMPRESSOR | COMPRESSION) OPERATOR_EQ compressor=propertyValue)?
    tagClause
    attributeClause
    ;


attributeClause
    : (ATTRIBUTES LR_BRACKET property (COMMA property)* RR_BRACKET)?
    ;


tagClause
    : (TAGS LR_BRACKET property (COMMA property)* RR_BRACKET)?
    ;




Notice that we also support an alias for this time series while creating it. We will maintain another alias map in the deviceNode. In this way, we can also use the alias to do query, same as the measurement name.


The tag and attribute info will be persisted to another metadata file named tlog.txt in the same directory as mlog.txt.


The total length of the tag and attribute for one series is fixed to tag_attribute_total_size which you can change in the iotdb-engine.properties. When we need to update the value of one tag or attribute, we can read all of them and change the value we want and rewrite them back to the same position. Surely, the later value should not make the total length exceed the tag_attribute_total_size.


The content in the tlog.txt will be like:
tagsSize (tag1=v1, tag2=v2) attributesSize (attr1=v1, attr2=v2)



The offset of tag/attribute info for one time series in the tlog.txt will be saved in the mlog.txt. In that way, we only need to load the offset into memory instead of loading all the content of the tlog.txt into memory.
Currently, one line record for create timeseries  in the mlog.txt will be like:


cmd, path,                       TSDataType,  TSEncoding, CompressionType,[properties],  [alias],            [tag/attribute offset]
0.     , root.turbine.d1.s1, 3,                     2,                   1,                           ,                    ,  temperature, -1



If the timeseries has no tag/attribute info, the offset will be -1.


There will be one more step while initializing the MMangager in case of restarting, that’s we will also load and deserialize the tag info through offset in the mlog.txt into the inverted index map in the MManager.


Secondly, I will also extend the show timeseries syntax like:


SHOW TIMESERIES prefixPath? showWhereClause?



showWhereClause
    : WHERE (property | containsExpression)
    ;
containsExpression
    : name=ID OPERATOR_CONTAINS value=propertyValue
    ;


The property in the where clause must be a tag or it will throw an exception.


If there is no where clause, the show timeseries query process will be same as before. 
However if there is a where clause, we will use the inverted index map in the MManager to find all the satisfied LeafMNode and filter the by the prefixPath.
 

Re: Design of the Tag and attribute management

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi Yuan,

The design is good, please add it into the design doc.
One remind is that since the mlog is changed, we need to upgrade this file when starting 0.10 from data of 0.9.
Also, a system version file needs to be created indicate the system files are latest.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "田原" <ti...@mails.tsinghua.edu.cn>
> 发送时间: 2020-04-19 15:10:07 (星期日)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: Design of the Tag and attribute management
> 
> Hi,
> 
> 
> I am working on the JIRA-588 https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-588?filter=allopenissues
> 
> 
> We want support tag management for time series.
> 
> 
> The tag/attribute info of the time series, to some extent, is also a kind of meta data of that timeseries. So I decide to put the logic of this part in the MManager.
> 
> 
> The details are as following:
> 
> 
> Firstly, we support two kind property for one timeseries: tag and attribute. The only diiference betwwen them is that we will maintain an inverted index on the tag, so we can use tag property in the where clause.
> 
> 
> The create timeseries clause will be changed like:
> 
> 
> CREATE TIMESERIES fullPath alias? WITH attributeClauses
> 
> alias
>     : LR_BRACKET ID RR_BRACKET
>     ;
> attributeClauses
>     : DATATYPE OPERATOR_EQ dataType COMMA ENCODING OPERATOR_EQ encoding
>     (COMMA (COMPRESSOR | COMPRESSION) OPERATOR_EQ compressor=propertyValue)?
>     tagClause
>     attributeClause
>     ;
> 
> 
> attributeClause
>     : (ATTRIBUTES LR_BRACKET property (COMMA property)* RR_BRACKET)?
>     ;
> 
> 
> tagClause
>     : (TAGS LR_BRACKET property (COMMA property)* RR_BRACKET)?
>     ;
> 
> 
> 
> 
> Notice that we also support an alias for this time series while creating it. We will maintain another alias map in the deviceNode. In this way, we can also use the alias to do query, same as the measurement name.
> 
> 
> The tag and attribute info will be persisted to another metadata file named tlog.txt in the same directory as mlog.txt.
> 
> 
> The total length of the tag and attribute for one series is fixed to tag_attribute_total_size which you can change in the iotdb-engine.properties. When we need to update the value of one tag or attribute, we can read all of them and change the value we want and rewrite them back to the same position. Surely, the later value should not make the total length exceed the tag_attribute_total_size.
> 
> 
> The content in the tlog.txt will be like:
> tagsSize (tag1=v1, tag2=v2) attributesSize (attr1=v1, attr2=v2)
> 
> 
> 
> The offset of tag/attribute info for one time series in the tlog.txt will be saved in the mlog.txt. In that way, we only need to load the offset into memory instead of loading all the content of the tlog.txt into memory.
> Currently, one line record for create timeseries  in the mlog.txt will be like:
> 
> 
> cmd, path,                       TSDataType,  TSEncoding, CompressionType,[properties],  [alias],            [tag/attribute offset]
> 0.     , root.turbine.d1.s1, 3,                     2,                   1,                           ,                    ,  temperature, -1
> 
> 
> 
> If the timeseries has no tag/attribute info, the offset will be -1.
> 
> 
> There will be one more step while initializing the MMangager in case of restarting, that’s we will also load and deserialize the tag info through offset in the mlog.txt into the inverted index map in the MManager.
> 
> 
> Secondly, I will also extend the show timeseries syntax like:
> 
> 
> SHOW TIMESERIES prefixPath? showWhereClause?
> 
> 
> 
> showWhereClause
>     : WHERE (property | containsExpression)
>     ;
> containsExpression
>     : name=ID OPERATOR_CONTAINS value=propertyValue
>     ;
> 
> 
> The property in the where clause must be a tag or it will throw an exception.
> 
> 
> If there is no where clause, the show timeseries query process will be same as before. 
> However if there is a where clause, we will use the inverted index map in the MManager to find all the satisfied LeafMNode and filter the by the prefixPath.
>