You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by 魏祥威 <52...@qq.com> on 2020/02/09 09:29:36 UTC
回复: [DISCUSS] Table schema of group by device
Hi,
I agree with the opinion of Xiangdong Huang.
(3) is the most friendly for users who are using Relational DB, and if they want a relational query (group by device query), their applications should guarantee the consistency of data type.
Best,
Xiangwei Wei
------------------ 原始邮件 ------------------
发件人: "Xiangdong Huang"<sainthxd@gmail.com>;
发送时间: 2020年2月7日(星期五) 下午2:58
收件人: "dev"<dev@iotdb.apache.org>;
主题: Re: [DISCUSS] Table schema of group by device
One more suggestion, using "align by device" is more clear than "group by
device".
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University
黄向东
清华大学 软件学院
Xiangdong Huang <sainthxd@gmail.com> 于2020年2月7日周五 下午2:56写道:
> -1 for (2), forever and I think I will never vote +1 for it...
>
> If you do it like that, there is no chance to replace those applications
> which are using relational db to manage timeseries data.
>
> (3) is the most friendly for those developers who are using Relational DB,
> because when they write a SQL like "select c1, c2, c3 FROM", they think it
> is of course that the resultset has 3 columns...
>
> Of course, for users who are using RDB and want a table like "Time
> DeviceId, s1, s2", their applications can guarantee the data type of data
> in s2 as const.
> If there are many data types in s2, the RDB users may use "text"
> "varchar2" format directly.
>
> Considering that, I think the choice is: if all data has the same data
> type in a column, use the correct data type. Otherwise use String.
>
> (1) Well, it can be an option. But my suggestion is, if all data has the
> same data type in a column, do not change its column name.
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
> 黄向东
> 清华大学 软件学院
>
>
> Jialin Qiao <qiaojialin@apache.org> 于2020年2月7日周五 下午2:29写道:
>
>> Hi,
>>
>> In IOTDB-243 [1], We want to allow create measurements with the same name
>> but with different types in the same storage group.
>>
>> For example,
>> root.sg1.d1.s1, int32
>> root.sg1.d1.s2 int32
>> root.sg1.d2.s1 boolean
>> root.sg1.d2.s2 int32
>>
>> This may cause trouble in group by device query. How do we organize the
>> result (table schema)? I thought of three ways:
>>
>> (1) Time, Device, s1_int, s1_boolean, s2_int32
>>
>> * advantage:
>> - No ambiguity
>> - The number of columns is acceptable.
>>
>> * disadvantage:
>> - In most cases, the datatype indicator is redundant and weird.
>> - Difficult to use parallelization among devices in the query.
>>
>> (2) Time, d1, s1, s2 Time, d2, s1, s2
>>
>> * advantage:
>> - No ambiguity
>> - This could leverage the parallelization among devices in the query.
>>
>> * disadvantage:
>> - The number of columns may be large.
>>
>> (3) Time DeviceId, s1, s2
>>
>> This may need to do much work in the QueryDataSet, and users need to get
>> value carefully according to the measurement type of one device.
>> Otherwise,
>> it may cause RunTimeException in JDBC Client.
>>
>> * advantage:
>> - The number of columns is the minimal.
>>
>> * disadvantage:
>> - May cause ambiguity, a column of one table has more than one type, which
>> also conflicts to the Spark connector or Hive connector.
>> - Difficult to use parallelization in the query.
>>
>> _______________
>>
>> From my perspective, I prefer (1) ≈ (2) > (3).
>>
>> What's your opinion?
>>
>> [1] https://issues.apache.org/jira/browse/IOTDB-243
>>
>> Thanks,
>> —————————————————
>> Jialin Qiao
>> School of Software, Tsinghua University
>>
>> 乔嘉林
>> 清华大学 软件学院
>>
>
Re: [DISCUSS] Table schema of group by device
Posted by Xiangdong Huang <sa...@gmail.com>.
Hi Jialin,
Very glad that you can agree with that. :-D
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University
黄向东
清华大学 软件学院
Jialin Qiao <qi...@apache.org> 于2020年2月11日周二 下午5:50写道:
> Hi,
>
> If we use text when a column has multiple types, I'm ok with (3).
>
> Thanks,
> —————————————————
> Jialin Qiao
> School of Software, Tsinghua University
>
> 乔嘉林
> 清华大学 软件学院
>
>
> 魏祥威 <52...@qq.com> 于2020年2月9日周日 下午5:30写道:
>
> > Hi,
> >
> >
> > I agree with the opinion of Xiangdong Huang.
> >
> >
> > (3) is the most friendly for users who are using Relational DB, and if
> > they want a relational query (group by device query), their applications
> > should guarantee the consistency of data type.
> >
> > Best,
> > Xiangwei Wei
> >
> >
> >
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Xiangdong Huang"<sainthxd@gmail.com>;
> > 发送时间: 2020年2月7日(星期五) 下午2:58
> > 收件人: "dev"<dev@iotdb.apache.org>;
> >
> > 主题: Re: [DISCUSS] Table schema of group by device
> >
> >
> >
> > One more suggestion, using "align by device" is more clear than "group by
> > device".
> >
> > -----------------------------------
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> > 黄向东
> > 清华大学 软件学院
> >
> >
> > Xiangdong Huang <sainthxd@gmail.com> 于2020年2月7日周五 下午2:56写道:
> >
> > > -1 for (2), forever and I think I will never vote +1 for
> > it...
> > >
> > > If you do it like that, there is no chance to replace those
> > applications
> > > which are using relational db to manage timeseries data.
> > >
> > > (3) is the most friendly for those developers who are using
> > Relational DB,
> > > because when they write a SQL like "select c1, c2, c3 FROM", they
> > think it
> > > is of course that the resultset has 3 columns...
> > >
> > > Of course, for users who are using RDB and want a table like "Time
> > > DeviceId, s1, s2", their applications can guarantee the data type of
> > data
> > > in s2 as const.
> > > If there are many data types in s2, the RDB users may use "text"
> > > "varchar2" format directly.
> > >
> > > Considering that, I think the choice is: if all data has the same
> data
> > > type in a column, use the correct data type. Otherwise use String.
> > >
> > > (1) Well, it can be an option. But my suggestion is, if all data has
> > the
> > > same data type in a column, do not change its column name.
> > >
> > > Best,
> > > -----------------------------------
> > > Xiangdong Huang
> > > School of Software, Tsinghua University
> > >
> > > 黄向东
> > > 清华大学 软件学院
> > >
> > >
> > > Jialin Qiao <qiaojialin@apache.org> 于2020年2月7日周五 下午2:29写道:
> > >
> > >> Hi,
> > >>
> > >> In IOTDB-243 [1], We want to allow create measurements with the
> > same name
> > >> but with different types in the same storage group.
> > >>
> > >> For example,
> > >> root.sg1.d1.s1, int32
> > >> root.sg1.d1.s2 int32
> > >> root.sg1.d2.s1 boolean
> > >> root.sg1.d2.s2 int32
> > >>
> > >> This may cause trouble in group by device query. How do we
> > organize the
> > >> result (table schema)? I thought of three ways:
> > >>
> > >> (1) Time, Device, s1_int, s1_boolean, s2_int32
> > >>
> > >> * advantage:
> > >> - No ambiguity
> > >> - The number of columns is acceptable.
> > >>
> > >> * disadvantage:
> > >> - In most cases, the datatype indicator is redundant and weird.
> > >> - Difficult to use parallelization among devices in the query.
> > >>
> > >> (2) Time, d1, s1, s2 Time, d2, s1, s2
> > >>
> > >> * advantage:
> > >> - No ambiguity
> > >> - This could leverage the parallelization among devices in the
> > query.
> > >>
> > >> * disadvantage:
> > >> - The number of columns may be large.
> > >>
> > >> (3) Time DeviceId, s1, s2
> > >>
> > >> This may need to do much work in the QueryDataSet, and users
> need
> > to get
> > >> value carefully according to the measurement type of one device.
> > >> Otherwise,
> > >> it may cause RunTimeException in JDBC Client.
> > >>
> > >> * advantage:
> > >> - The number of columns is the minimal.
> > >>
> > >> * disadvantage:
> > >> - May cause ambiguity, a column of one table has more than one
> > type, which
> > >> also conflicts to the Spark connector or Hive connector.
> > >> - Difficult to use parallelization in the query.
> > >>
> > >> _______________
> > >>
> > >> From my perspective, I prefer (1) ≈ (2) > (3).
> > >>
> > >> What's your opinion?
> > >>
> > >> [1] https://issues.apache.org/jira/browse/IOTDB-243
> > >>
> > >> Thanks,
> > >> —————————————————
> > >> Jialin Qiao
> > >> School of Software, Tsinghua University
> > >>
> > >> 乔嘉林
> > >> 清华大学 软件学院
> > >>
> > >
>
Re: [DISCUSS] Table schema of group by device
Posted by Jialin Qiao <qi...@apache.org>.
Hi,
If we use text when a column has multiple types, I'm ok with (3).
Thanks,
—————————————————
Jialin Qiao
School of Software, Tsinghua University
乔嘉林
清华大学 软件学院
魏祥威 <52...@qq.com> 于2020年2月9日周日 下午5:30写道:
> Hi,
>
>
> I agree with the opinion of Xiangdong Huang.
>
>
> (3) is the most friendly for users who are using Relational DB, and if
> they want a relational query (group by device query), their applications
> should guarantee the consistency of data type.
>
> Best,
> Xiangwei Wei
>
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Xiangdong Huang"<sainthxd@gmail.com>;
> 发送时间: 2020年2月7日(星期五) 下午2:58
> 收件人: "dev"<dev@iotdb.apache.org>;
>
> 主题: Re: [DISCUSS] Table schema of group by device
>
>
>
> One more suggestion, using "align by device" is more clear than "group by
> device".
>
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
> 黄向东
> 清华大学 软件学院
>
>
> Xiangdong Huang <sainthxd@gmail.com> 于2020年2月7日周五 下午2:56写道:
>
> > -1 for (2), forever and I think I will never vote +1 for
> it...
> >
> > If you do it like that, there is no chance to replace those
> applications
> > which are using relational db to manage timeseries data.
> >
> > (3) is the most friendly for those developers who are using
> Relational DB,
> > because when they write a SQL like "select c1, c2, c3 FROM", they
> think it
> > is of course that the resultset has 3 columns...
> >
> > Of course, for users who are using RDB and want a table like "Time
> > DeviceId, s1, s2", their applications can guarantee the data type of
> data
> > in s2 as const.
> > If there are many data types in s2, the RDB users may use "text"
> > "varchar2" format directly.
> >
> > Considering that, I think the choice is: if all data has the same data
> > type in a column, use the correct data type. Otherwise use String.
> >
> > (1) Well, it can be an option. But my suggestion is, if all data has
> the
> > same data type in a column, do not change its column name.
> >
> > Best,
> > -----------------------------------
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> > 黄向东
> > 清华大学 软件学院
> >
> >
> > Jialin Qiao <qiaojialin@apache.org> 于2020年2月7日周五 下午2:29写道:
> >
> >> Hi,
> >>
> >> In IOTDB-243 [1], We want to allow create measurements with the
> same name
> >> but with different types in the same storage group.
> >>
> >> For example,
> >> root.sg1.d1.s1, int32
> >> root.sg1.d1.s2 int32
> >> root.sg1.d2.s1 boolean
> >> root.sg1.d2.s2 int32
> >>
> >> This may cause trouble in group by device query. How do we
> organize the
> >> result (table schema)? I thought of three ways:
> >>
> >> (1) Time, Device, s1_int, s1_boolean, s2_int32
> >>
> >> * advantage:
> >> - No ambiguity
> >> - The number of columns is acceptable.
> >>
> >> * disadvantage:
> >> - In most cases, the datatype indicator is redundant and weird.
> >> - Difficult to use parallelization among devices in the query.
> >>
> >> (2) Time, d1, s1, s2 Time, d2, s1, s2
> >>
> >> * advantage:
> >> - No ambiguity
> >> - This could leverage the parallelization among devices in the
> query.
> >>
> >> * disadvantage:
> >> - The number of columns may be large.
> >>
> >> (3) Time DeviceId, s1, s2
> >>
> >> This may need to do much work in the QueryDataSet, and users need
> to get
> >> value carefully according to the measurement type of one device.
> >> Otherwise,
> >> it may cause RunTimeException in JDBC Client.
> >>
> >> * advantage:
> >> - The number of columns is the minimal.
> >>
> >> * disadvantage:
> >> - May cause ambiguity, a column of one table has more than one
> type, which
> >> also conflicts to the Spark connector or Hive connector.
> >> - Difficult to use parallelization in the query.
> >>
> >> _______________
> >>
> >> From my perspective, I prefer (1) ≈ (2) > (3).
> >>
> >> What's your opinion?
> >>
> >> [1] https://issues.apache.org/jira/browse/IOTDB-243
> >>
> >> Thanks,
> >> —————————————————
> >> Jialin Qiao
> >> School of Software, Tsinghua University
> >>
> >> 乔嘉林
> >> 清华大学 软件学院
> >>
> >