You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by si...@apache.org on 2019/03/07 13:47:50 UTC

[pulsar] branch master updated: Add several instructions for IO Connectors. (#3739)

This is an automated email from the ASF dual-hosted git repository.

sijie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 3d9b47e  Add several instructions for IO Connectors. (#3739)
3d9b47e is described below

commit 3d9b47e3b7e1dc741183c6588ef7b886950b9523
Author: Fangbin Sun <su...@gmail.com>
AuthorDate: Thu Mar 7 21:47:45 2019 +0800

    Add several instructions for IO Connectors. (#3739)
---
 site2/docs/io-connectors.md    |  4 ++++
 site2/docs/io-elasticsearch.md | 21 +++++++++++++++++++++
 site2/docs/io-file.md          | 27 +++++++++++++++++++++++++++
 site2/docs/io-hdfs.md          | 26 ++++++++++++++++++++++++++
 site2/docs/io-mongo.md         | 20 ++++++++++++++++++++
 5 files changed, 98 insertions(+)

diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md
index 909e554..2f166ac 100644
--- a/site2/docs/io-connectors.md
+++ b/site2/docs/io-connectors.md
@@ -19,3 +19,7 @@ Pulsar Functions cluster.
 - [CDC Source Connector based on Debezium](io-cdc.md)
 - [Netty Source Connector](io-netty.md#source)
 - [Hbase Sink Connector](io-hbase.md#sink)
+- [ElasticSearch Sink Connector](io-elasticsearch.md#sink)
+- [File Source Connector](io-file.md#source)
+- [Hdfs Sink Connector](io-hdfs.md#sink)
+- [MongoDB Sink Connector](io-mongo.md#sink)
diff --git a/site2/docs/io-elasticsearch.md b/site2/docs/io-elasticsearch.md
new file mode 100644
index 0000000..18aacdf
--- /dev/null
+++ b/site2/docs/io-elasticsearch.md
@@ -0,0 +1,21 @@
+---
+id: io-elasticsearch
+title: ElasticSearch Connector
+sidebar_label: ElasticSearch Connector
+---
+
+## Sink
+
+The ElasticSearch Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a index.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `elasticSearchUrl` | `null` | `true` | The url of elastic search cluster that the connector connects to. |
+| `indexName` | `null` | `true` | The index name that the connector writes messages to. |
+| `indexNumberOfShards` | `1` | `false` | The number of shards of the index. |
+| `indexNumberOfReplicas` | `1` | `false` | The number of replicas of the index. |
+| `username` | `null` | `false` | The username used by the connector to connect to the elastic search cluster. If username is set, a password should also be provided. |
+| `password` | `null` | `false` | The password used by the connector to connect to the elastic search cluster. If password is set, a username should also be provided. |
\ No newline at end of file
diff --git a/site2/docs/io-file.md b/site2/docs/io-file.md
new file mode 100644
index 0000000..7d65cc1
--- /dev/null
+++ b/site2/docs/io-file.md
@@ -0,0 +1,27 @@
+---
+id: io-file
+title: File Connector
+sidebar_label: File Connector
+---
+
+## Source
+
+The File Source Connector is used to pull messages from files in a directory and persist the messages
+to a Pulsar topic.
+
+### Source Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| inputDirectory | `true` | `null` | The input directory from which to pull files. |
+| recurse | `false` | `true` | Indicates whether or not to pull files from sub-directories. |
+| keepFile | `false` | `false` | If true, the file is not deleted after it has been processed and causes the file to be picked up continually. |
+| fileFilter | `false` | `[^\\.].*` | Only files whose names match the given regular expression will be picked up. |
+| pathFilter | `false` | `null` | When 'recurse' property is true, then only sub-directories whose path matches the given regular expression will be scanned. |
+| minimumFileAge | `false` | `0` | The minimum age that a file must be in order to be processed; any file younger than this amount of time (according to last modification date) will be ignored. |
+| maximumFileAge | `false` | `Long.MAX_VALUE` | The maximum age that a file must be in order to be processed; any file older than this amount of time (according to last modification date) will be ignored. |
+| minimumSize | `false` | `1` | The minimum size (in bytes) that a file must be in order to be processed. |
+| maximumSize | `false` | `Double.MAX_VALUE` | The maximum size (in bytes) that a file can be in order to be processed. |
+| ignoreHiddenFiles | `false` | `true` | Indicates whether or not hidden files should be ignored or not. |
+| pollingInterval | `false` | `10000` | Indicates how long to wait before performing a directory listing. |
+| numWorkers | `false` | `1` | The number of worker threads that will be processing the files. This allows you to process a larger number of files concurrently. However, setting this to a value greater than 1 will result in the data from multiple files being "intermingled" in the target topic. |
\ No newline at end of file
diff --git a/site2/docs/io-hdfs.md b/site2/docs/io-hdfs.md
new file mode 100644
index 0000000..9c38923
--- /dev/null
+++ b/site2/docs/io-hdfs.md
@@ -0,0 +1,26 @@
+---
+id: io-hdfs
+title: Hdfs Connector
+sidebar_label: Hdfs Connector
+---
+
+## Sink
+
+The Hdfs Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a hdfs file.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `hdfsConfigResources` | `null` | `true` | A file or comma separated list of files which contains the Hadoop file system configuration, e.g. 'core-site.xml', 'hdfs-site.xml'. |
+| `directory` | `null` | `true` | The HDFS directory from which files should be read from or written to. |
+| `encoding` | `null` | `false` | The character encoding for the files, e.g. UTF-8, ASCII, etc. |
+| `compression` | `null` | `false` | The compression codec used to compress/de-compress the files on HDFS. |
+| `kerberosUserPrincipal` | `null` | `false` | The Kerberos user principal account to use for authentication. |
+| `keytab` | `null` | `false` | The full pathname to the Kerberos keytab file to use for authentication. |
+| `filenamePrefix` | `null` | `false` | The prefix of the files to create inside the HDFS directory, i.e. a value of "topicA" will result in files named topicA-, topicA-, etc being produced. |
+| `fileExtension` | `null` | `false` | The extension to add to the files written to HDFS, e.g. '.txt', '.seq', etc. |
+| `separator` | `null` | `false` | The character to use to separate records in a text file. If no value is provided then the content from all of the records will be concatenated together in one continuous byte array. |
+| `syncInterval` | `null` | `false` | The interval (in milliseconds) between calls to flush data to HDFS disk. |
+| `maxPendingRecords` | `Integer.MAX_VALUE` | `false` | The maximum number of records that we hold in memory before acking. Default is `Integer.MAX_VALUE`. Setting this value to one, results in every record being sent to disk before the record is acked, while setting it to a higher values allows us to buffer records before flushing them all to disk. |
\ No newline at end of file
diff --git a/site2/docs/io-mongo.md b/site2/docs/io-mongo.md
new file mode 100644
index 0000000..cc8ea98
--- /dev/null
+++ b/site2/docs/io-mongo.md
@@ -0,0 +1,20 @@
+---
+id: io-mongo
+title: MongoDB Connector
+sidebar_label: MongoDB Connector
+---
+
+## Sink
+
+The MongoDB Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a collection.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `mongoUri` | `null` | `true` | The uri of mongodb that the connector connects to (see: https://docs.mongodb.com/manual/reference/connection-string/). |
+| `database` | `null` | `true` | The name of the database to which the collection belongs to. |
+| `collection` | `null` | `true` | The collection name that the connector writes messages to. |
+| `batchSize` | `100` | `false` | The batch size of write to the collection. |
+| `batchTimeMs` | `1000` | `false` | The batch operation interval in milliseconds. |
\ No newline at end of file