You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by hj...@apache.org on 2019/09/27 05:40:04 UTC
[pulsar] branch master updated: [Doc] Add *File source connector
guide* (#5240)
This is an automated email from the ASF dual-hosted git repository.
hjf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new dcfe04c [Doc] Add *File source connector guide* (#5240)
dcfe04c is described below
commit dcfe04c17886fc54f071f2f573ab03db60df24fd
Author: Anonymitaet <50...@users.noreply.github.com>
AuthorDate: Fri Sep 27 13:39:59 2019 +0800
[Doc] Add *File source connector guide* (#5240)
* Add *File source connector guide*
* Update
* add example
---
site2/docs/io-connectors.md | 2 +-
site2/docs/io-file-source.md | 137 +++++++++++++++++++++++++++++++++++++++++++
site2/docs/io-file.md | 27 ---------
3 files changed, 138 insertions(+), 28 deletions(-)
diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md
index 72aa744..149e75a 100644
--- a/site2/docs/io-connectors.md
+++ b/site2/docs/io-connectors.md
@@ -20,7 +20,7 @@ Pulsar has various source connectors, which are sorted alphabetically as below.
- [Debezium PostgreSQL source Connector](io-postgresql-debezium.md)
-- [File source connector](io-file.md)
+- [File source connector](io-file-source.md)
- [Flume source connector](io-flume-source.md)
diff --git a/site2/docs/io-file-source.md b/site2/docs/io-file-source.md
new file mode 100644
index 0000000..16a6c6a
--- /dev/null
+++ b/site2/docs/io-file-source.md
@@ -0,0 +1,137 @@
+---
+id: io-file
+title: File source connector
+sidebar_label: File source connector
+---
+
+The File source connector pulls messages from files in directories and persists the messages to Pulsar topics.
+
+## Configuration
+
+The configuration of the File source connector has the following properties.
+
+### Property
+
+| Name | Type|Required | Default | Description
+|------|----------|----------|---------|-------------|
+| `inputDirectory` | String|true | No default value|The input directory to pull files. |
+| `recurse` | Boolean|false | true | Whether to pull files from subdirectory or not.|
+| `keepFile` |Boolean|false | false | If set to true, the file is not deleted after it is processed, which means the file can be picked up continually. |
+| `fileFilter` | String|false| [^\\.].* | The file whose name matches the given regular expression is picked up. |
+| `pathFilter` | String |false | NULL | If `recurse` is set to true, the subdirectory whose path matches the given regular expression is scanned. |
+| `minimumFileAge` | Integer|false | 0 | The minimum age that a file can be processed. <br><br>Any file younger than `minimumFileAge` (according to the last modification date) is ignored. |
+| `maximumFileAge` | Long|false |Long.MAX_VALUE | The maximum age that a file can be processed. <br><br>Any file older than `maximumFileAge` (according to last modification date) is ignored. |
+| `minimumSize` |Integer| false |1 | The minimum size (in bytes) that a file can be processed. |
+| `maximumSize` | Double|false |Double.MAX_VALUE| The maximum size (in bytes) that a file can be processed. |
+| `ignoreHiddenFiles` |Boolean| false | true| Whether the hidden files should be ignored or not. |
+| `pollingInterval`|Long | false | 10000L | Indicates how long to wait before performing a directory listing. |
+| `numWorkers` | Integer | false | 1 | The number of worker threads that process files.<br><br> This allows you to process a larger number of files concurrently. <br><br>However, setting this to a value greater than 1 makes the data from multiple files mixed in the target topic. |
+
+### Example
+
+Before using the File source connector, you need to create a configuration file through one of the following methods.
+
+* JSON
+
+ ```json
+ {
+ "inputDirectory": "/Users/david",
+ "recurse": true,
+ "keepFile": true,
+ "fileFilter": "[^\\.].*",
+ "pathFilter": "*",
+ "minimumFileAge": 0,
+ "maximumFileAge": 9999999999,
+ "minimumSize": 1,
+ "maximumSize": 5000000,
+ "ignoreHiddenFiles": true,
+ "pollingInterval": 5000,
+ "numWorkers": 1
+ }
+ ```
+
+* YAML
+
+ ```yaml
+ configs:
+ inputDirectory: "/Users/david"
+ recurse: true
+ keepFile: true
+ fileFilter: "[^\\.].*"
+ pathFilter: "*"
+ minimumFileAge: 0
+ maximumFileAge: 9999999999
+ minimumSize: 1
+ maximumSize: 5000000
+ ignoreHiddenFiles: true
+ pollingInterval: 5000
+ numWorkers: 1
+ ```
+
+## Usage
+
+Here is an example of using the File source connecter.
+
+1. Pull a Pulsar image.
+
+ ```bash
+ $ docker pull apachepulsar/pulsar:{version}
+ ```
+
+2. Start Pulsar standalone.
+
+ ```bash
+ $ docker run -d -it -p 6650:6650 -p 8080:8080 -v $PWD/data:/pulsar/data --name pulsar-standalone apachepulsar/pulsar:{version} bin/pulsar standalone
+ ```
+
+3. Create a configuration file _file-connector.yaml_.
+
+ ```yaml
+ configs:
+ inputDirectory: "/opt"
+ ```
+
+4. Copy the configuration file _file-connector.yaml_ to the container.
+
+ ```bash
+ $ docker cp connectors/file-connector.yaml pulsar-standalone:/pulsar/
+ ```
+
+5. Download the File source connector.
+
+ ```bash
+ $ curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/pulsar/pulsar-{version}/connectors/pulsar-io-file-{version}.nar
+ ```
+
+6. Start the File source connector.
+
+ ```bash
+ $ docker exec -it pulsar-standalone /bin/bash
+
+ $ ./bin/pulsar-admin sources localrun \
+ --archive /pulsar/pulsar-io-file-{version}.nar \
+ --name file-test \
+ --destination-topic-name pulsar-file-test \
+ --source-config-file /pulsar/file-connector.yaml
+ ```
+
+7. Start a consumer.
+
+ ```bash
+ ./bin/pulsar-client consume -s file-test -n 0 pulsar-file-test
+ ```
+
+8. Write the message to the file _test.txt_.
+
+ ```bash
+ echo "hello world!" > /opt/test.txt
+ ```
+
+ The following information appears on the consumer terminal window.
+
+ ```bash
+ ----- got message -----
+ hello world!
+ ```
+
+
\ No newline at end of file
diff --git a/site2/docs/io-file.md b/site2/docs/io-file.md
deleted file mode 100644
index 7d65cc1..0000000
--- a/site2/docs/io-file.md
+++ /dev/null
@@ -1,27 +0,0 @@
----
-id: io-file
-title: File Connector
-sidebar_label: File Connector
----
-
-## Source
-
-The File Source Connector is used to pull messages from files in a directory and persist the messages
-to a Pulsar topic.
-
-### Source Configuration Options
-
-| Name | Required | Default | Description |
-|------|----------|---------|-------------|
-| inputDirectory | `true` | `null` | The input directory from which to pull files. |
-| recurse | `false` | `true` | Indicates whether or not to pull files from sub-directories. |
-| keepFile | `false` | `false` | If true, the file is not deleted after it has been processed and causes the file to be picked up continually. |
-| fileFilter | `false` | `[^\\.].*` | Only files whose names match the given regular expression will be picked up. |
-| pathFilter | `false` | `null` | When 'recurse' property is true, then only sub-directories whose path matches the given regular expression will be scanned. |
-| minimumFileAge | `false` | `0` | The minimum age that a file must be in order to be processed; any file younger than this amount of time (according to last modification date) will be ignored. |
-| maximumFileAge | `false` | `Long.MAX_VALUE` | The maximum age that a file must be in order to be processed; any file older than this amount of time (according to last modification date) will be ignored. |
-| minimumSize | `false` | `1` | The minimum size (in bytes) that a file must be in order to be processed. |
-| maximumSize | `false` | `Double.MAX_VALUE` | The maximum size (in bytes) that a file can be in order to be processed. |
-| ignoreHiddenFiles | `false` | `true` | Indicates whether or not hidden files should be ignored or not. |
-| pollingInterval | `false` | `10000` | Indicates how long to wait before performing a directory listing. |
-| numWorkers | `false` | `1` | The number of worker threads that will be processing the files. This allows you to process a larger number of files concurrently. However, setting this to a value greater than 1 will result in the data from multiple files being "intermingled" in the target topic. |
\ No newline at end of file