You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by hj...@apache.org on 2019/09/27 05:40:04 UTC

[pulsar] branch master updated: [Doc] Add *File source connector guide* (#5240)

This is an automated email from the ASF dual-hosted git repository.

hjf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new dcfe04c  [Doc] Add *File source connector guide* (#5240)
dcfe04c is described below

commit dcfe04c17886fc54f071f2f573ab03db60df24fd
Author: Anonymitaet <50...@users.noreply.github.com>
AuthorDate: Fri Sep 27 13:39:59 2019 +0800

    [Doc] Add *File source connector guide* (#5240)
    
    * Add *File source connector guide*
    
    * Update
    
    * add example
---
 site2/docs/io-connectors.md  |   2 +-
 site2/docs/io-file-source.md | 137 +++++++++++++++++++++++++++++++++++++++++++
 site2/docs/io-file.md        |  27 ---------
 3 files changed, 138 insertions(+), 28 deletions(-)

diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md
index 72aa744..149e75a 100644
--- a/site2/docs/io-connectors.md
+++ b/site2/docs/io-connectors.md
@@ -20,7 +20,7 @@ Pulsar has various source connectors, which are sorted alphabetically as below.
   
 - [Debezium PostgreSQL source Connector](io-postgresql-debezium.md)
   
-- [File source connector](io-file.md)
+- [File source connector](io-file-source.md)
   
 - [Flume source connector](io-flume-source.md)
 
diff --git a/site2/docs/io-file-source.md b/site2/docs/io-file-source.md
new file mode 100644
index 0000000..16a6c6a
--- /dev/null
+++ b/site2/docs/io-file-source.md
@@ -0,0 +1,137 @@
+---
+id: io-file
+title: File source connector
+sidebar_label: File source connector
+---
+
+The File source connector pulls messages from files in directories and persists the messages to Pulsar topics.
+
+## Configuration
+
+The configuration of the File source connector has the following properties.
+
+### Property
+
+| Name | Type|Required | Default | Description 
+|------|----------|----------|---------|-------------|
+| `inputDirectory` | String|true  | No default value|The input directory to pull files. |
+| `recurse` | Boolean|false | true | Whether to pull files from subdirectory or not.|
+| `keepFile` |Boolean|false | false | If set to true, the file is not deleted after it is processed, which means the file can be picked up continually. |
+| `fileFilter` | String|false| [^\\.].* | The file whose name matches the given regular expression is picked up. |
+| `pathFilter` | String |false | NULL | If `recurse` is set to true, the subdirectory whose path matches the given regular expression is scanned. |
+| `minimumFileAge` | Integer|false | 0 | The minimum age that a file can be processed. <br><br>Any file younger than `minimumFileAge` (according to the last modification date) is ignored. |
+| `maximumFileAge` | Long|false |Long.MAX_VALUE | The maximum age that a file can be processed. <br><br>Any file older than `maximumFileAge` (according to last modification date) is ignored. |
+| `minimumSize` |Integer| false |1 | The minimum size (in bytes) that a file can be processed. |
+| `maximumSize` | Double|false |Double.MAX_VALUE| The maximum size (in bytes) that a file can be processed. |
+| `ignoreHiddenFiles` |Boolean| false | true| Whether the hidden files should be ignored or not. |
+| `pollingInterval`|Long | false | 10000L | Indicates how long to wait before performing a directory listing. |
+| `numWorkers` | Integer | false | 1 | The number of worker threads that process files.<br><br> This allows you to process a larger number of files concurrently. <br><br>However, setting this to a value greater than 1 makes the data from multiple files mixed in the target topic. |
+
+### Example
+
+Before using the File source connector, you need to create a configuration file through one of the following methods.
+
+* JSON 
+
+    ```json
+    {
+        "inputDirectory": "/Users/david",
+        "recurse": true,
+        "keepFile": true,
+        "fileFilter": "[^\\.].*",
+        "pathFilter": "*",
+        "minimumFileAge": 0,
+        "maximumFileAge": 9999999999,
+        "minimumSize": 1,
+        "maximumSize": 5000000,
+        "ignoreHiddenFiles": true,
+        "pollingInterval": 5000,
+        "numWorkers": 1
+    }
+    ```
+
+* YAML
+
+    ```yaml
+    configs:
+        inputDirectory: "/Users/david"
+        recurse: true
+        keepFile: true
+        fileFilter: "[^\\.].*"
+        pathFilter: "*"
+        minimumFileAge: 0
+        maximumFileAge: 9999999999
+        minimumSize: 1
+        maximumSize: 5000000
+        ignoreHiddenFiles: true
+        pollingInterval: 5000
+        numWorkers: 1
+    ```
+
+## Usage
+
+Here is an example of using the File source connecter.
+
+1. Pull a Pulsar image.
+
+    ```bash
+    $ docker pull apachepulsar/pulsar:{version}
+    ```
+
+2. Start Pulsar standalone.
+   
+    ```bash
+    $ docker run -d -it -p 6650:6650 -p 8080:8080 -v $PWD/data:/pulsar/data --name pulsar-standalone apachepulsar/pulsar:{version} bin/pulsar standalone
+    ```
+
+3. Create a configuration file _file-connector.yaml_.
+
+    ```yaml
+    configs:
+        inputDirectory: "/opt"
+    ```
+
+4. Copy the configuration file _file-connector.yaml_ to the container.
+
+    ```bash
+    $ docker cp connectors/file-connector.yaml pulsar-standalone:/pulsar/
+    ```
+
+5. Download the File source connector.
+
+    ```bash
+    $ curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/pulsar/pulsar-{version}/connectors/pulsar-io-file-{version}.nar
+    ```
+
+6. Start the File source connector.
+
+    ```bash
+    $ docker exec -it pulsar-standalone /bin/bash
+
+    $ ./bin/pulsar-admin sources localrun \
+    --archive /pulsar/pulsar-io-file-{version}.nar \
+    --name file-test \
+    --destination-topic-name  pulsar-file-test \
+    --source-config-file /pulsar/file-connector.yaml
+    ```
+
+7. Start a consumer.
+
+    ```bash
+    ./bin/pulsar-client consume -s file-test -n 0 pulsar-file-test
+    ```
+
+8. Write the message to the file _test.txt_.
+   
+    ```bash
+    echo "hello world!" > /opt/test.txt
+    ```
+
+    The following information appears on the consumer terminal window.
+
+    ```bash
+    ----- got message -----
+    hello world!
+    ```
+
+    
\ No newline at end of file
diff --git a/site2/docs/io-file.md b/site2/docs/io-file.md
deleted file mode 100644
index 7d65cc1..0000000
--- a/site2/docs/io-file.md
+++ /dev/null
@@ -1,27 +0,0 @@
----
-id: io-file
-title: File Connector
-sidebar_label: File Connector
----
-
-## Source
-
-The File Source Connector is used to pull messages from files in a directory and persist the messages
-to a Pulsar topic.
-
-### Source Configuration Options
-
-| Name | Required | Default | Description |
-|------|----------|---------|-------------|
-| inputDirectory | `true` | `null` | The input directory from which to pull files. |
-| recurse | `false` | `true` | Indicates whether or not to pull files from sub-directories. |
-| keepFile | `false` | `false` | If true, the file is not deleted after it has been processed and causes the file to be picked up continually. |
-| fileFilter | `false` | `[^\\.].*` | Only files whose names match the given regular expression will be picked up. |
-| pathFilter | `false` | `null` | When 'recurse' property is true, then only sub-directories whose path matches the given regular expression will be scanned. |
-| minimumFileAge | `false` | `0` | The minimum age that a file must be in order to be processed; any file younger than this amount of time (according to last modification date) will be ignored. |
-| maximumFileAge | `false` | `Long.MAX_VALUE` | The maximum age that a file must be in order to be processed; any file older than this amount of time (according to last modification date) will be ignored. |
-| minimumSize | `false` | `1` | The minimum size (in bytes) that a file must be in order to be processed. |
-| maximumSize | `false` | `Double.MAX_VALUE` | The maximum size (in bytes) that a file can be in order to be processed. |
-| ignoreHiddenFiles | `false` | `true` | Indicates whether or not hidden files should be ignored or not. |
-| pollingInterval | `false` | `10000` | Indicates how long to wait before performing a directory listing. |
-| numWorkers | `false` | `1` | The number of worker threads that will be processing the files. This allows you to process a larger number of files concurrently. However, setting this to a value greater than 1 will result in the data from multiple files being "intermingled" in the target topic. |
\ No newline at end of file