You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by ha...@apache.org on 2022/04/01 01:17:12 UTC
[iotdb] 01/01: Update python client doc and code for NumpyTablet

This is an automated email from the ASF dual-hosted git repository.

haonan pushed a commit to branch pyexm
in repository https://gitbox.apache.org/repos/asf/iotdb.git

commit 669a6b106bff35783da8077743a008b75200e46c
Author: Haonan <hh...@outlook.com>
AuthorDate: Fri Apr 1 08:09:30 2022 +0800

    Update python client doc and code for NumpyTablet
---
 client-py/README.md                                | 277 ++++++++++++++++++---
 client-py/SessionExample.py                        |   2 +-
 client-py/SessionTest.py                           |   2 +-
 .../UserGuide/API/Programming-Python-Native-API.md |  19 +-
 .../UserGuide/API/Programming-Python-Native-API.md |  17 +-
 5 files changed, 266 insertions(+), 51 deletions(-)

diff --git a/client-py/README.md b/client-py/README.md
index d5880af..65492cf 100644
--- a/client-py/README.md
+++ b/client-py/README.md
@@ -39,27 +39,29 @@ architecture, high performance and rich feature set together with its deep integ
 Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, 
 high-speed data ingestion and complex data analysis in the IoT industrial fields.
 
+## Python Native API
 
-# Apache IoTDB Python Client API
+### Requirements
 
-Using the package, you can write data to IoTDB, read data from IoTDB and maintain the schema of IoTDB.
+You have to install thrift (>=0.13) before using the package.
 
-## Requirements
 
-You have to install thrift (>=0.13) before using the package.
 
-## How to use (Example)
+### How to use (Example)
+
+First, download the latest package: `pip3 install apache-iotdb`
 
-First, download the package: `pip3 install apache-iotdb`
+*Notice: If you are installing Python API v0.13.0, DO NOT install by `pip install apache-iotdb==0.13.0`, use `pip install apache-iotdb==0.13.0.post1` instead!* 
 
-You can get an example of using the package to read and write data at here: [Example](https://github.com/apache/iotdb/blob/rel/0.11/client-py/src/SessionExample.py)
+You can get an example of using the package to read and write data at here: [Example](https://github.com/apache/iotdb/blob/master/client-py/SessionExample.py)
+
+An example of aligned timeseries: [Aligned Timeseries Session Example](https://github.com/apache/iotdb/blob/master/client-py/SessionAlignedTimeseriesExample.py)
 
 (you need to add `import iotdb` in the head of the file)
 
 Or:
 
 ```python
-
 from iotdb.Session import Session
 
 ip = "127.0.0.1"
@@ -70,29 +72,208 @@ session = Session(ip, port_, username_, password_)
 session.open(False)
 zone = session.get_time_zone()
 session.close()
+```
+
+### Initialization
+
+* Initialize a Session
+
+```python
+session = Session(ip, port_, username_, password_, fetch_size=1024, zone_id="UTC+8")
+```
+
+* Open a session, with a parameter to specify whether to enable RPC compression
 
+```python
+session.open(enable_rpc_compression=False)
 ```
 
-## IoTDB Testcontainer
+Notice: this RPC compression status of client must comply with that of IoTDB server
 
-The Test Support is based on the lib `testcontainers` (https://testcontainers-python.readthedocs.io/en/latest/index.html) which you need to install in your project if you want to use the feature.
+* Close a Session
 
-To start (and stop) an IoTDB Database in a Docker container simply do:
+```python
+session.close()
 ```
-class MyTestCase(unittest.TestCase):
 
-    def test_something(self):
-        with IoTDBContainer() as c:
-            session = Session('localhost', c.get_exposed_port(6667), 'root', 'root')
-            session.open(False)
-            result = session.execute_query_statement("SHOW TIMESERIES")
-            print(result)
-            session.close()
+### Data Definition Interface (DDL Interface)
+
+#### Storage Group Management
+
+* Set storage group
+
+```python
+session.set_storage_group(group_name)
 ```
 
-by default it will load the image `apache/iotdb:latest`, if you want a specific version just pass it like e.g. `IoTDBContainer("apache/iotdb:0.12.0")` to get version `0.12.0` running.
+* Delete one or several storage groups
+
+```python
+session.delete_storage_group(group_name)
+session.delete_storage_groups(group_name_lst)
+```
+#### Timeseries Management
+
+* Create one or multiple timeseries
+
+```python
+session.create_time_series(ts_path, data_type, encoding, compressor,
+    props=None, tags=None, attributes=None, alias=None)
+      
+session.create_multi_time_series(
+    ts_path_lst, data_type_lst, encoding_lst, compressor_lst,
+    props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None
+)
+```
+
+* Create aligned timeseries
+
+```python
+session.create_aligned_time_series(
+    device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst
+)
+```
+
+Attention: Alias of measurements are **not supported** currently.
+
+* Delete one or several timeseries
+
+```python
+session.delete_time_series(paths_list)
+```
+
+* Check whether the specific timeseries exists
+
+```python
+session.check_time_series_exists(path)
+```
+
+### Data Manipulation Interface (DML Interface)
+
+#### Insert
+
+It is recommended to use insertTablet to help improve write efficiency.
+
+* Insert a Tablet，which is multiple rows of a device, each row has the same measurements
+    * **Better Write Performance**
+    * **Support null values**: fill the null value with any value, and then mark the null value via BitMap (from v0.13)
 
-## Pandas Support
+
+We have two implementations of Tablet in Python API.
+
+* Normal Tablet
+
+```python
+values_ = [
+    [False, 10, 11, 1.1, 10011.1, "test01"],
+    [True, 100, 11111, 1.25, 101.0, "test02"],
+    [False, 100, 1, 188.1, 688.25, "test03"],
+    [True, 0, 0, 0, 6.25, "test04"],
+]
+timestamps_ = [1, 2, 3, 4]
+tablet_ = Tablet(
+    device_id, measurements_, data_types_, values_, timestamps_
+)
+session.insert_tablet(tablet_)
+```
+* Numpy Tablet
+
+Comparing with Tablet, Numpy Tablet is using [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) to record data.
+With less memory footprint and time cost of serialization, the insert performance will be better.
+
+**Notice**
+1. time and numerical value columns in Tablet is ndarray
+2. ndarray should be big-endian, see the example below
+
+```python
+data_types_ = [
+    TSDataType.BOOLEAN,
+    TSDataType.INT32,
+    TSDataType.INT64,
+    TSDataType.FLOAT,
+    TSDataType.DOUBLE,
+    TSDataType.TEXT,
+]
+np_values_ = [
+    np.array([False, True, False, True], np.dtype('>?')),
+    np.array([10, 100, 100, 0], np.dtype('>i4')),
+    np.array([11, 11111, 1, 0], np.dtype('>i8')),
+    np.array([1.1, 1.25, 188.1, 0], np.dtype('>f4')),
+    np.array([10011.1, 101.0, 688.25, 6.25], np.dtype('>f8')),
+    np.array(["test01", "test02", "test03", "test04"]),
+]
+np_timestamps_ = np.array([1, 2, 3, 4], np.dtype('>i8'))
+np_tablet_ = NumpyTablet(
+    "root.sg_test_01.d_02", measurements_, data_types_, np_values_, np_timestamps_
+)
+session.insert_tablet(np_tablet_)
+```
+
+* Insert multiple Tablets
+
+```python
+session.insert_tablets(tablet_lst)
+```
+
+* Insert a Record
+
+```python
+session.insert_record(device_id, timestamp, measurements_, data_types_, values_)
+```
+
+* Insert multiple Records
+
+```python
+session.insert_records(
+    device_ids_, time_list_, measurements_list_, data_type_list_, values_list_
+)
+```
+
+* Insert multiple Records that belong to the same device.
+  With type info the server has no need to do type inference, which leads a better performance
+
+
+```python
+session.insert_records_of_one_device(device_id, time_list, measurements_list, data_types_list, values_list)
+```
+
+#### Insert with type inference
+
+When the data is of String type, we can use the following interface to perform type inference based on the value of the value itself. For example, if value is "true" , it can be automatically inferred to be a boolean type. If value is "3.2" , it can be automatically inferred as a flout type. Without type information, server has to do type inference, which may cost some time.
+
+* Insert a Record, which contains multiple measurement value of a device at a timestamp
+
+```python
+session.insert_str_record(device_id, timestamp, measurements, string_values)
+```
+
+#### Insert of Aligned Timeseries
+
+The Insert of aligned timeseries uses interfaces like insert_aligned_XXX, and others are similar to the above interfaces:
+
+* insert_aligned_record
+* insert_aligned_records
+* insert_aligned_records_of_one_device
+* insert_aligned_tablet
+* insert_aligned_tablets
+
+
+### IoTDB-SQL Interface
+
+* Execute query statement
+
+```python
+session.execute_query_statement(sql)
+```
+
+* Execute non query statement
+
+```python
+session.execute_non_query_statement(sql)
+```
+
+
+### Pandas Support
 
 To easily transform a query result to a [Pandas Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
 the SessionDataSet has a method `.todf()` which consumes the dataset and transforms it to a pandas dataframe.
@@ -100,7 +281,6 @@ the SessionDataSet has a method `.todf()` which consumes the dataset and transfo
 Example:
 
 ```python
-
 from iotdb.Session import Session
 
 ip = "127.0.0.1"
@@ -120,29 +300,52 @@ session.close()
 df = ...
 ```
 
+
+### IoTDB Testcontainer
+
+The Test Support is based on the lib `testcontainers` (https://testcontainers-python.readthedocs.io/en/latest/index.html) which you need to install in your project if you want to use the feature.
+
+To start (and stop) an IoTDB Database in a Docker container simply do:
+```python
+class MyTestCase(unittest.TestCase):
+
+    def test_something(self):
+        with IoTDBContainer() as c:
+            session = Session('localhost', c.get_exposed_port(6667), 'root', 'root')
+            session.open(False)
+            result = session.execute_query_statement("SHOW TIMESERIES")
+            print(result)
+            session.close()
+```
+
+by default it will load the image `apache/iotdb:latest`, if you want a specific version just pass it like e.g. `IoTDBContainer("apache/iotdb:0.12.0")` to get version `0.12.0` running.
+
+
 ## Developers
 
 ### Introduction
 
-This is an example of how to connect to IoTDB with python, using the thrift rpc interfaces. Things
-are almost the same on Windows or Linux, but pay attention to the difference like path separator.
+This is an example of how to connect to IoTDB with python, using the thrift rpc interfaces. Things are almost the same on Windows or Linux, but pay attention to the difference like path separator.
+
+
 
 ### Prerequisites
 
-python3.7 or later is preferred.
+Python3.7 or later is preferred.
 
-You have to install Thrift (0.11.0 or later) to compile our thrift file into python code. Below is the official
-tutorial of installation, eventually, you should have a thrift executable.
+You have to install Thrift (0.11.0 or later) to compile our thrift file into python code. Below is the official tutorial of installation, eventually, you should have a thrift executable.
 
 ```
 http://thrift.apache.org/docs/install/
 ```
 
 Before starting you need to install `requirements_dev.txt` in your python environment, e.g. by calling
-```
+```shell
 pip install -r requirements_dev.txt
 ```
 
+
+
 ### Compile the thrift library and Debug
 
 In the root of IoTDB's source code folder,  run `mvn clean generate-sources -pl client-py -am`.
@@ -153,10 +356,11 @@ This folder is ignored from git and should **never be pushed to git!**
 **Notice** Do not upload `iotdb/thrift` to the git repo.
 
 
+
+
 ### Session Client & Example
 
-We packed up the Thrift interface in `client-py/src/iotdb/Session.py` (similar with its Java counterpart), also provided
-an example file `client-py/src/SessionExample.py` of how to use the session module. please read it carefully.
+We packed up the Thrift interface in `client-py/src/iotdb/Session.py` (similar with its Java counterpart), also provided an example file `client-py/src/SessionExample.py` of how to use the session module. please read it carefully.
 
 
 Or, another simple example:
@@ -174,18 +378,25 @@ zone = session.get_time_zone()
 session.close()
 ```
 
+
+
 ### Tests
 
 Please add your custom tests in `tests` folder.
+
 To run all defined tests just type `pytest .` in the root folder.
 
 **Notice** Some tests need docker to be started on your system as a test instance is started in a docker container using [testcontainers](https://testcontainers-python.readthedocs.io/en/latest/index.html).
 
+
+
 ### Futher Tools
 
 [black](https://pypi.org/project/black/) and [flake8](https://pypi.org/project/flake8/) are installed for autoformatting and linting.
 Both can be run by `black .` or `flake8 .` respectively.
 
+
+
 ## Releasing
 
 To do a release just ensure that you have the right set of generated thrift files.
@@ -193,10 +404,14 @@ Then run linting and auto-formatting.
 Then, ensure that all tests work (via `pytest .`).
 Then you are good to go to do a release!
 
+
+
 ### Preparing your environment
 
 First, install all necessary dev dependencies via `pip install -r requirements_dev.txt`.
 
+
+
 ### Doing the Release
 
 There is a convenient script `release.sh` to do all steps for a release.
@@ -208,3 +423,5 @@ Namely, these are
 * Run Tests via pytest
 * Build
 * Release to pypi
+
+
diff --git a/client-py/SessionExample.py b/client-py/SessionExample.py
index 21a1702..e73abba 100644
--- a/client-py/SessionExample.py
+++ b/client-py/SessionExample.py
@@ -183,7 +183,7 @@ np_values_ = [
     np.array([11, 11111, 1, 0], np.dtype('>i8')),
     np.array([1.1, 1.25, 188.1, 0], np.dtype('>f4')),
     np.array([10011.1, 101.0, 688.25, 6.25], np.dtype('>f8')),
-    ["test01", "test02", "test03", "test04"],
+    np.array(["test01", "test02", "test03", "test04"]),
 ]
 np_timestamps_ = np.array([1, 2, 3, 4], np.dtype('>i8'))
 np_tablet_ = NumpyTablet(
diff --git a/client-py/SessionTest.py b/client-py/SessionTest.py
index 5435df3..e913c5f 100644
--- a/client-py/SessionTest.py
+++ b/client-py/SessionTest.py
@@ -231,7 +231,7 @@ np_values_ = [
     np.array([11, 11111, 1, 0], np.dtype('>i8')),
     np.array([1.1, 1.25, 188.1, 0], np.dtype('>f4')),
     np.array([10011.1, 101.0, 688.25, 6.25], np.dtype('>f8')),
-    ["test01", "test02", "test03", "test04"],
+    np.array(["test01", "test02", "test03", "test04"]),
 ]
 np_timestamps_ = np.array([1, 2, 3, 4], np.dtype('>i8'))
 np_tablet_ = NumpyTablet(
diff --git a/docs/UserGuide/API/Programming-Python-Native-API.md b/docs/UserGuide/API/Programming-Python-Native-API.md
index 19299da..e5a8efb 100644
--- a/docs/UserGuide/API/Programming-Python-Native-API.md
+++ b/docs/UserGuide/API/Programming-Python-Native-API.md
@@ -96,20 +96,20 @@ session.delete_storage_groups(group_name_lst)
 
 ```python
 session.create_time_series(ts_path, data_type, encoding, compressor,
-        props=None, tags=None, attributes=None, alias=None)
+    props=None, tags=None, attributes=None, alias=None)
       
 session.create_multi_time_series(
-            ts_path_lst, data_type_lst, encoding_lst, compressor_lst,
-            props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None
-    )
+    ts_path_lst, data_type_lst, encoding_lst, compressor_lst,
+    props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None
+)
 ```
 
 * Create aligned timeseries
 
 ```python
 session.create_aligned_time_series(
-            device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst
-    )
+    device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst
+)
 ```
 
 Attention: Alias of measurements are **not supported** currently.
@@ -128,7 +128,7 @@ session.check_time_series_exists(path)
 
 ### Data Manipulation Interface (DML Interface)
 
-##### Insert
+#### Insert
 
 It is recommended to use insertTablet to help improve write efficiency.
 
@@ -156,13 +156,12 @@ session.insert_tablet(tablet_)
 ```
 * Numpy Tablet
 
-Comparing with Tablet, Numpy Tablet is using [numpy ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) to record data.
+Comparing with Tablet, Numpy Tablet is using [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) to record data.
 With less memory footprint and time cost of serialization, the insert performance will be better.
 
 **Notice**
 1. time and numerical value columns in Tablet is ndarray
 2. ndarray should be big-endian, see the example below
-3. TEXT type cannot be ndarray
 
 ```python
 data_types_ = [
@@ -179,7 +178,7 @@ np_values_ = [
     np.array([11, 11111, 1, 0], np.dtype('>i8')),
     np.array([1.1, 1.25, 188.1, 0], np.dtype('>f4')),
     np.array([10011.1, 101.0, 688.25, 6.25], np.dtype('>f8')),
-    ["test01", "test02", "test03", "test04"],
+    np.array(["test01", "test02", "test03", "test04"]),
 ]
 np_timestamps_ = np.array([1, 2, 3, 4], np.dtype('>i8'))
 np_tablet_ = NumpyTablet(
diff --git a/docs/zh/UserGuide/API/Programming-Python-Native-API.md b/docs/zh/UserGuide/API/Programming-Python-Native-API.md
index d62d6d0..df782ba 100644
--- a/docs/zh/UserGuide/API/Programming-Python-Native-API.md
+++ b/docs/zh/UserGuide/API/Programming-Python-Native-API.md
@@ -97,20 +97,20 @@ session.delete_storage_groups(group_name_lst)
 
 ```python
 session.create_time_series(ts_path, data_type, encoding, compressor,
-        props=None, tags=None, attributes=None, alias=None)
+    props=None, tags=None, attributes=None, alias=None)
       
 session.create_multi_time_series(
-            ts_path_lst, data_type_lst, encoding_lst, compressor_lst,
-            props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None
-    )
+    ts_path_lst, data_type_lst, encoding_lst, compressor_lst,
+    props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None
+)
 ```
 
 * 创建对齐时间序列
 
 ```python
 session.create_aligned_time_series(
-            device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst
-    )
+    device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst
+)
 ```
 
 注意：目前**暂不支持**使用传感器别名。
@@ -156,13 +156,12 @@ session.insert_tablet(tablet_)
 ```
 * Numpy Tablet
 
-相较于普通 Tablet，Numpy Tablet 使用 [numpy ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) 来记录数值型数据。
+相较于普通 Tablet，Numpy Tablet 使用 [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) 来记录数值型数据。
 内存占用和序列化耗时会降低很多，写入效率也会有很大提升。
 
 **注意**
 1. Tablet 中的每一列值记录为一个 ndarray
 2. ndarray 需要为大端类型的数据类型，具体可参考下面的例子
-3. TEXT 类型数据不支持 ndarray
 
 ```python
 data_types_ = [
@@ -179,7 +178,7 @@ np_values_ = [
     np.array([11, 11111, 1, 0], np.dtype('>i8')),
     np.array([1.1, 1.25, 188.1, 0], np.dtype('>f4')),
     np.array([10011.1, 101.0, 688.25, 6.25], np.dtype('>f8')),
-    ["test01", "test02", "test03", "test04"],
+    np.array(["test01", "test02", "test03", "test04"]),
 ]
 np_timestamps_ = np.array([1, 2, 3, 4], np.dtype('>i8'))
 np_tablet_ = NumpyTablet(