You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by Julian Feinauer <j....@pragmaticminds.de> on 2020/06/09 07:22:26 UTC

JSON Input for IoTDB

Hi folks,

I already created Issue https://issues.apache.org/jira/browse/IOTDB-742 in this direction but wanted to discuss a topic.
Since it is now possible to have measurements and devices below a measurement we could do a pretty one to one mapping between JSON (or other strucuted data) and IoTDB Representation.

E.g.

{
  „temp“ : 20.0,
  „speed“: 100,
  „design“ : {
    „color“: „blue“
  }
}

Could be inserted into a FIELD mycar and would then just be the series

- root.sg.dev.mycar.temp -> 20.0
- root.sg.dev.mycar.speed -> 100.0
- root.sg.dev.mycar.design.color -> „blue“

This works as long as there are no arrays.
For arrays I see two possibilities.
Either store them as „2 series“:

Root.sg.dev.mycar._idx
Root.sg.dev.mycar.arrayvalue

With [1, 2, 4] being represented as

Root.sg.dev.mycar._idx -> 0, 1, 2
Root.sg.dev.mycar.arrayvalue -> 1, 2, 4
(all with equal timestamp)

Or we have a special naming convention e.g. for an array

{
  „a“ : [1, 2, 4]
}

We would map it to three series

Root.sg.dev.mycar.a_0 <- 1
Root.sg.dev.mycar.a_1 <- 2
Root.sg.dev.mycar.a_2 <- 4

What do you think about that?

Julian





Re: JSON Input for IoTDB

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hey Jialin,

thanks for your response.
I indeed took the second path you suggested.

Currently I try to implement it like Type Erasure in Java, so at PlanExecutor level a mapping is done to "plain IoTDB types" and ist a regular insert. Only the PlanExecutor may know that it was a structure before.

This makes it easy to implement.
So we could add another "type" based structure if we like and parse it at that level.

See my comments here: https://issues.apache.org/jira/browse/IOTDB-742?focusedCommentId=17134154&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17134154

Julian

Am 10.06.20, 10:22 schrieb "Jialin Qiao" <qj...@mails.tsinghua.edu.cn>:

    Hi,

    Good idea! This may help IoTDB manage GPS data or other semi-structured data.

    The mapping for json data looks good to me.

    For arrays, as each tuple in the array shares the same timestamp, if we create the number of array-length's timeseries, it will store duplicated timestamps many times. Maybe we could consider to convert the array into a Binary or extend the TSDataType and TsFile to support array natively.

    Besides, it also depends on the query pattern. 
    How will the users query the array? Will the query like "select array[1] from root.sg.d" or "select array from root.sg.d"?

    Thanks,
    --
    Jialin Qiao
    School of Software, Tsinghua University

    乔嘉林
    清华大学 软件学院

    > -----原始邮件-----
    > 发件人: "Julian Feinauer" <j....@pragmaticminds.de>
    > 发送时间: 2020-06-09 15:22:26 (星期二)
    > 收件人: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
    > 抄送: 
    > 主题: JSON Input for IoTDB
    > 
    > Hi folks,
    > 
    > I already created Issue https://issues.apache.org/jira/browse/IOTDB-742 in this direction but wanted to discuss a topic.
    > Since it is now possible to have measurements and devices below a measurement we could do a pretty one to one mapping between JSON (or other strucuted data) and IoTDB Representation.
    > 
    > E.g.
    > 
    > {
    >   „temp“ : 20.0,
    >   „speed“: 100,
    >   „design“ : {
    >     „color“: „blue“
    >   }
    > }
    > 
    > Could be inserted into a FIELD mycar and would then just be the series
    > 
    > - root.sg.dev.mycar.temp -> 20.0
    > - root.sg.dev.mycar.speed -> 100.0
    > - root.sg.dev.mycar.design.color -> „blue“
    > 
    > This works as long as there are no arrays.
    > For arrays I see two possibilities.
    > Either store them as „2 series“:
    > 
    > Root.sg.dev.mycar._idx
    > Root.sg.dev.mycar.arrayvalue
    > 
    > With [1, 2, 4] being represented as
    > 
    > Root.sg.dev.mycar._idx -> 0, 1, 2
    > Root.sg.dev.mycar.arrayvalue -> 1, 2, 4
    > (all with equal timestamp)
    > 
    > Or we have a special naming convention e.g. for an array
    > 
    > {
    >   „a“ : [1, 2, 4]
    > }
    > 
    > We would map it to three series
    > 
    > Root.sg.dev.mycar.a_0 <- 1
    > Root.sg.dev.mycar.a_1 <- 2
    > Root.sg.dev.mycar.a_2 <- 4
    > 
    > What do you think about that?
    > 
    > Julian
    > 
    > 
    > 
    > 


Re: JSON Input for IoTDB

Posted by Jialin Qiao <qj...@mails.tsinghua.edu.cn>.
Hi,

Good idea! This may help IoTDB manage GPS data or other semi-structured data.

The mapping for json data looks good to me.

For arrays, as each tuple in the array shares the same timestamp, if we create the number of array-length's timeseries, it will store duplicated timestamps many times. Maybe we could consider to convert the array into a Binary or extend the TSDataType and TsFile to support array natively.

Besides, it also depends on the query pattern. 
How will the users query the array? Will the query like "select array[1] from root.sg.d" or "select array from root.sg.d"?

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Julian Feinauer" <j....@pragmaticminds.de>
> 发送时间: 2020-06-09 15:22:26 (星期二)
> 收件人: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
> 抄送: 
> 主题: JSON Input for IoTDB
> 
> Hi folks,
> 
> I already created Issue https://issues.apache.org/jira/browse/IOTDB-742 in this direction but wanted to discuss a topic.
> Since it is now possible to have measurements and devices below a measurement we could do a pretty one to one mapping between JSON (or other strucuted data) and IoTDB Representation.
> 
> E.g.
> 
> {
>   „temp“ : 20.0,
>   „speed“: 100,
>   „design“ : {
>     „color“: „blue“
>   }
> }
> 
> Could be inserted into a FIELD mycar and would then just be the series
> 
> - root.sg.dev.mycar.temp -> 20.0
> - root.sg.dev.mycar.speed -> 100.0
> - root.sg.dev.mycar.design.color -> „blue“
> 
> This works as long as there are no arrays.
> For arrays I see two possibilities.
> Either store them as „2 series“:
> 
> Root.sg.dev.mycar._idx
> Root.sg.dev.mycar.arrayvalue
> 
> With [1, 2, 4] being represented as
> 
> Root.sg.dev.mycar._idx -> 0, 1, 2
> Root.sg.dev.mycar.arrayvalue -> 1, 2, 4
> (all with equal timestamp)
> 
> Or we have a special naming convention e.g. for an array
> 
> {
>   „a“ : [1, 2, 4]
> }
> 
> We would map it to three series
> 
> Root.sg.dev.mycar.a_0 <- 1
> Root.sg.dev.mycar.a_1 <- 2
> Root.sg.dev.mycar.a_2 <- 4
> 
> What do you think about that?
> 
> Julian
> 
> 
> 
>