You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Xiangdong Huang (Jira)" <ji...@apache.org> on 2021/08/12 08:44:00 UTC
[jira] [Commented] (IOTDB-842) Better Export/Import-CSV Tool

    [ https://issues.apache.org/jira/browse/IOTDB-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397933#comment-17397933 ] 

Xiangdong Huang commented on IOTDB-842:
---------------------------------------

known Github issues:


  - "使用 export-csv 导出数据后，再用import-csv导入，数据导入失败" https://github.com/apache/iotdb/issues/3587
  - "import csv encounter an error" https://github.com/apache/iotdb/issues/3051
  - "可以有导入导出元数据的功能吗(Could IoTDB export the schema and import into another IoTDB?)" https://github.com/apache/iotdb/issues/1263

> Better Export/Import-CSV Tool
> -----------------------------
>
>                 Key: IOTDB-842
>                 URL: https://issues.apache.org/jira/browse/IOTDB-842
>             Project: Apache IoTDB
>          Issue Type: Task
>          Components: Tools/Others
>            Reporter: Xiangdong Huang
>            Priority: Minor
>
> Hi, our import-csv tool is currently implemented by JDBC and requires a fossil format:
> e.g., 
> {code:java}
> Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
> 2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
> 2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
> 2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
> 2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
> {code}
> Requirement 1:
> As we support 3 kinds of output format: align all series (by default), align by device, without alignment, it is better to support such 3 kinds of import-csv format:
> a. 
> {code:java}
> Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
> 2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
> 2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
> 2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
> 2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
> {code}
> b. 
> {code:java}
> Time,Device,s1,s2,s3
> 2020-08-18T10:22:31.603+08:00,root.sg.d1,1,2.0,null
> 2020-08-18T10:22:35.631+08:00,root.sg.d1,1,2.0,null
> 2020-08-18T10:22:41.093+08:00,root.sg.d2,1,2.0,null
> 2020-08-18T10:22:52.603+08:00,root.sg.d2,1,2.0,true
> {code}
> c.
> (it is strange, I'd like to do not support such format.)
> Requment2:
> Different users may have different time formats for the first column.
> So, we'd better support different kinds of time format. e.g., let users define how to parse their timestamp: yyyy-MM-ddHH:mm:ss.SSS etc..
> Requirement 3:
> Support NULL as well as empty char to describe the null data point. For example, the following  3 lines are the same:
> 2020-08-18T10:22:31.603+08:00,root.sg.d1,1,null,null
> 2020-08-18T10:22:31.603+08:00,root.sg.d1,1,,
> 2020-08-18T10:22:31.603+08:00,root.sg.d1,1,    ,
> Requirement 4:
> Support claiming the storage group name once rather than repeat the storage group name for each line:
> e.g., for format b, we can tell the tool the sg is `root.sg` and then each row looks like:
> 2020-08-18T10:22:35.631+08:00,d1,1,2.0,null
> Another option is add a new column called storage_group for each row.
> For UT:
> 1. all data type should be covered;
> 2. incorrect csv format should be covered;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)