You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Xiangdong Huang (Jira)" <ji...@apache.org> on 2020/08/18 02:59:00 UTC
[jira] [Created] (IOTDB-842) Better Import-CSV Tool
Xiangdong Huang created IOTDB-842:
-------------------------------------
Summary: Better Import-CSV Tool
Key: IOTDB-842
URL: https://issues.apache.org/jira/browse/IOTDB-842
Project: Apache IoTDB
Issue Type: Task
Components: Tools/Others
Reporter: Xiangdong Huang
Hi, our import-csv tool is currently implemented by JDBC and requires a fossil format:
e.g.,
{code:java}
Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
{code}
Requirement 1:
As we support 3 kinds of output format: align all series (by default), align by device, without alignment, it is better to support such 3 kinds of import-csv format:
a.
{code:java}
Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
{code}
b.
{code:java}
Time,Device,s1,s2,s3
2020-08-18T10:22:31.603+08:00,root.sg.d1,1,2.0,null
2020-08-18T10:22:35.631+08:00,root.sg.d1,1,2.0,null
2020-08-18T10:22:41.093+08:00,root.sg.d2,1,2.0,null
2020-08-18T10:22:52.603+08:00,root.sg.d2,1,2.0,true
{code}
c.
(it is strange, I'd like to do not support such format.)
Requment2:
Different users may have different time formats for the first column.
So, we'd better support different kinds of time format. e.g., let users define how to parse their timestamp: yyyy-MM-ddHH:mm:ss.SSS etc..
Requirement 3:
Support NULL as well as empty char to describe the null data point. For example, the following 3 lines are the same:
2020-08-18T10:22:31.603+08:00,root.sg.d1,1,null,null
2020-08-18T10:22:31.603+08:00,root.sg.d1,1,,
2020-08-18T10:22:31.603+08:00,root.sg.d1,1, ,
Requirement 4:
Support claiming the storage group name once rather than repeat the storage group name for each line:
e.g., for format b, we can tell the tool the sg is `root.sg` and then each row looks like:
2020-08-18T10:22:35.631+08:00,d1,1,2.0,null
Another option is add a new column called storage_group for each row.
For UT:
1. all data type should be covered;
2. incorrect csv format should be covered;
--
This message was sent by Atlassian Jira
(v8.3.4#803005)