You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bahir.apache.org by es...@apache.org on 2021/12/07 10:12:06 UTC

[bahir-flink] branch master updated: [BAHIR-293] Fix documentation tables

This is an automated email from the ASF dual-hosted git repository.

eskabetxe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/bahir-flink.git


The following commit(s) were added to refs/heads/master by this push:
     new b55e265  [BAHIR-293] Fix documentation tables
b55e265 is described below

commit b55e265d1d932c59fa68dfff09e1d43fa9eb8776
Author: Joao Boto <jb...@idealista.com>
AuthorDate: Tue Dec 7 11:09:26 2021 +0100

    [BAHIR-293] Fix documentation tables
---
 flink-connector-kudu/README.md  | 25 +++++++++++++------------
 flink-connector-pinot/README.md |  9 ++++++++-
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/flink-connector-kudu/README.md b/flink-connector-kudu/README.md
index 6370aa6..a0e0234 100644
--- a/flink-connector-kudu/README.md
+++ b/flink-connector-kudu/README.md
@@ -154,18 +154,19 @@ The example uses lambda expressions to implement the functional interfaces.
 Read more about Kudu schema design in the [Kudu docs](https://kudu.apache.org/docs/schema_design.html).
 
 ### Supported data types
-| Flink/SQL     | Kudu           | 
-| ------------- |:-------------:| 
-|    STRING     | STRING        | 
-| BOOLEAN       |    BOOL       | 
-| TINYINT       |   INT8        | 
-| SMALLINT      |  INT16        | 
-| INT           |  INT32        | 
-| BIGINT        |   INT64     |
-| FLOAT         |  FLOAT      |
-| DOUBLE        |    DOUBLE    |
-| BYTES        |    BINARY    |
-| TIMESTAMP(3)     |    UNIXTIME_MICROS |
+
+| Flink/SQL            | Kudu                    |
+|----------------------|:-----------------------:|
+| `STRING`             | STRING                  |
+| `BOOLEAN`            | BOOL                    |
+| `TINYINT`            | INT8                    |
+| `SMALLINT`           | INT16                   |
+| `INT`                | INT32                   |
+| `BIGINT`             | INT64                   |
+| `FLOAT`              | FLOAT                   |
+| `DOUBLE`             | DOUBLE                  |
+| `BYTES`              | BINARY                  |
+| `TIMESTAMP(3)`       | UNIXTIME_MICROS         |
 
 Note:
 * `TIMESTAMP`s are fixed to a precision of 3, and the corresponding Java conversion class is `java.sql.Timestamp` 
diff --git a/flink-connector-pinot/README.md b/flink-connector-pinot/README.md
index 2044e00..dbd0688 100644
--- a/flink-connector-pinot/README.md
+++ b/flink-connector-pinot/README.md
@@ -17,12 +17,14 @@ See how to link with them for cluster execution [here](https://ci.apache.org/pro
 The sink class is called `PinotSink`.
 
 ## Architecture
+
 The Pinot sink stores elements from upstream Flink tasks in an Apache Pinot table.
 We support two execution modes
 * `RuntimeExecutionMode.BATCH`
 * `RuntimeExecutionMode.STREAMING` which requires checkpointing to be enabled.
 
 ### PinotSinkWriter
+
 Whenever the sink receives elements from upstream tasks, they are received by an instance of the PinotSinkWriter.
 The `PinotSinkWriter` holds a list of `PinotWriterSegment`s where each `PinotWriterSegment` is capable of storing `maxRowsPerSegment` elements.
 Whenever the maximum number of elements to hold is not yet reached the `PinotWriterSegment` is considered to be active.
@@ -33,7 +35,8 @@ Once the maximum number of elements to hold was reached, an active `PinotWriterS
 Thus, there is always one active `PinotWriterSegment` that new incoming elements will go to.
 Over time, the list of `PinotWriterSegment` per `PinotSinkWriter` increases up to the point where a checkpoint is created.
 
-**Checkpointing**  
+**Checkpointing**
+
 On checkpoint creation `PinotSinkWriter.prepareCommit` gets called by the Flink environment.
 This triggers the creation of `PinotSinkCommittable`s where each inactive `PinotWriterSegment` creates exactly one `PinotSinkCommittable`.
 
@@ -45,6 +48,7 @@ A `PinotSinkCommittables` then holds the path to the data file on the shared fil
 
 
 ### PinotSinkGlobalCommitter
+
 In order to be able to follow the guidelines for Pinot segment naming, we need to include the minimum and maximum timestamp of an element in the metadata of a segment and in its name.
 The minimum and maximum timestamp of all elements between two checkpoints is determined at a parallelism of 1 in the `PinotSinkGlobalCommitter`.
 This procedure allows recovering from failure by deleting previously uploaded segments which prevents having duplicate segments in the Pinot table.
@@ -63,10 +67,12 @@ When finally committing a `PinotSinkGlobalCommittable` the following procedure i
 
 
 ## Delivery Guarantees
+
 Resulting from the above described architecture the `PinotSink` provides an at-least-once delivery guarantee.
 While the failure recovery mechanism ensures that duplicate segments are prevented, there might be temporary inconsistencies in the Pinot table which can result in downstream tasks receiving an element multiple times.
 
 ## Options
+
 | Option                 | Description                                                                      |
 | ---------------------- | -------------------------------------------------------------------------------- | 
 | `pinotControllerHost`  | Host of the Pinot controller                                                     |
@@ -81,6 +87,7 @@ While the failure recovery mechanism ensures that duplicate segments are prevent
 | `numCommitThreads`     | Number of threads used in the `PinotSinkGlobalCommitter` for committing segments |
 
 ## Usage
+
 ```java
 StreamExecutionEnvironment env = ...
 // Checkpointing needs to be enabled when executing in STREAMING mode