You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2020/10/26 01:14:32 UTC

[GitHub] [iotdb] JackieTien97 commented on issue #1833: I want to know the detail design for tag inverted index

JackieTien97 commented on issue #1833:
URL: https://github.com/apache/iotdb/issues/1833#issuecomment-716250568

It is indeed just a hashmap attribute and persisted within mlog.txt. The detailed design document is as following:

Firstly, we support two kind property for one timeseries: tag and attribute. The only diiference betwwen them is that we will maintain an inverted index on the tag, so we can use tag property in the where clause.

The create timeseries clause will be changed like:

CREATE TIMESERIES fullPath alias? WITH attributeClauses
alias
: LR_BRACKET ID RR_BRACKET
;
attributeClauses
: DATATYPE OPERATOR_EQ dataType COMMA ENCODING OPERATOR_EQ encoding
(COMMA (COMPRESSOR | COMPRESSION) OPERATOR_EQ compressor=propertyValue)?
tagClause
attributeClause
;

attributeClause
: (ATTRIBUTES LR_BRACKET property (COMMA property)* RR_BRACKET)?
;

tagClause
: (TAGS LR_BRACKET property (COMMA property)* RR_BRACKET)?
;

Notice that we also support an alias for this time series while creating it. We will maintain another alias map in the deviceNode. In this way, we can also use the alias to do query, same as the measurement name.

The tag and attribute info will be persisted to another metadata file named tlog.txt in the same directory as mlog.txt.

The total length of the tag and attribute for one series is fixed to tag_attribute_total_size which you can change in the iotdb-engine.properties. When we need to update the value of one tag or attribute, we can read all of them and change the value we want and rewrite them back to the same position. Surely, the later value should not make the total length exceed the tag_attribute_total_size.

The content in the tlog.txt will be like:
tagsSize (tag1=v1, tag2=v2) attributesSize (attr1=v1, attr2=v2)

The offset of tag/attribute info for one time series in the tlog.txt will be saved in the mlog.txt. In that way, we only need to load the offset into memory instead of loading all the content of the tlog.txt into memory.
Currently, one line record for create timeseries in the mlog.txt will be like:

cmd, path, TSDataType, TSEncoding, CompressionType,[properties], [alias], [tag/attribute offset]
0 , root.turbine.d1.s1, 3, 2, 1, , , temperature, -1

If the timeseries has no tag/attribute info, the offset will be -1.

There will be one more step while initializing the MMangager in case of restarting, that’s we will also load and deserialize the tag info through offset in the mlog.txt into the inverted index map in the MManager.

Secondly, I will also extend the show timeseries syntax like:

SHOW TIMESERIES prefixPath? showWhereClause?

showWhereClause
: WHERE (property | containsExpression)
;
containsExpression
: name=ID OPERATOR_CONTAINS value=propertyValue
;

The property in the where clause must be a tag or it will throw an exception.

If there is no where clause, the show timeseries query process will be same as before.
However if there is a where clause, we will use the inverted index map in the MManager to find all the satisfied LeafMNode and filter the by the prefixPath.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org