You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Xiangdong Huang (Jira)" <ji...@apache.org> on 2020/08/18 02:59:00 UTC

[jira] [Created] (IOTDB-842) Better Import-CSV Tool

Xiangdong Huang created IOTDB-842:
-------------------------------------

             Summary: Better Import-CSV Tool
                 Key: IOTDB-842
                 URL: https://issues.apache.org/jira/browse/IOTDB-842
             Project: Apache IoTDB
          Issue Type: Task
          Components: Tools/Others
            Reporter: Xiangdong Huang


Hi, our import-csv tool is currently implemented by JDBC and requires a fossil format:

e.g., 
{code:java}
Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
{code}

Requirement 1:

As we support 3 kinds of output format: align all series (by default), align by device, without alignment, it is better to support such 3 kinds of import-csv format:

a. 
{code:java}
Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
{code}

b. 
{code:java}
Time,Device,s1,s2,s3
2020-08-18T10:22:31.603+08:00,root.sg.d1,1,2.0,null
2020-08-18T10:22:35.631+08:00,root.sg.d1,1,2.0,null
2020-08-18T10:22:41.093+08:00,root.sg.d2,1,2.0,null
2020-08-18T10:22:52.603+08:00,root.sg.d2,1,2.0,true
{code}

c.
(it is strange, I'd like to do not support such format.)

Requment2:
Different users may have different time formats for the first column.
So, we'd better support different kinds of time format. e.g., let users define how to parse their timestamp: yyyy-MM-ddHH:mm:ss.SSS etc..

Requirement 3:
Support NULL as well as empty char to describe the null data point. For example, the following  3 lines are the same:

2020-08-18T10:22:31.603+08:00,root.sg.d1,1,null,null

2020-08-18T10:22:31.603+08:00,root.sg.d1,1,,

2020-08-18T10:22:31.603+08:00,root.sg.d1,1,    ,

Requirement 4:

Support claiming the storage group name once rather than repeat the storage group name for each line:

e.g., for format b, we can tell the tool the sg is `root.sg` and then each row looks like:

2020-08-18T10:22:35.631+08:00,d1,1,2.0,null

Another option is add a new column called storage_group for each row.



For UT:
1. all data type should be covered;
2. incorrect csv format should be covered;




--
This message was sent by Atlassian Jira
(v8.3.4#803005)