You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/16 20:14:31 UTC

[GitHub] [pulsar] EliHarper opened a new issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

EliHarper opened a new issue #8582:
URL: https://github.com/apache/pulsar/issues/8582


   **Describe the bug**
   While I'm not sure which version is right (--topics-pattern & --topics-Pattern are listed in documentation, --topicsPattern also mentioned in man-style error messages and I _think_ I saw --topic-pattern elsewhere), I created a sink that passed both micro_.* and micro_*; neither of which matched any of the expected pulsar topics.
   
   **To Reproduce**
   1. Run pulsar-all docker image on CentOS 7
   2. Create a Cassandra container as specified [here](https://pulsar.apache.org/docs/en/io-quickstart/#setup-a-cassandra-cluster)
   3. Configure Pulsar sink as specified [here ](https://pulsar.apache.org/docs/en/io-quickstart/#configure-a-cassandra-sink)
   4. Create a Cassandra sink using the --topics-pattern argument matching multiple topics with similar prefixes, ending with a wildcard.
   5. See if any messages are read from the topic - none showed for me in the database or using pulsar-admin sinks status.
   
   **Expected behavior**
   Pulsar messages from all matching topics read from pulsar and recorded in the Cassandra sink.
   
   **Desktop (please complete the following information):**
    - OS: CentOS 7, Docker version 19.03.13, build 4484c46d9d 
   
   **Additional context**
   I'm honestly fairly new to Pulsar. If I'm doing anything wrong, please correct me; I just believe this seems to be a bug since the same sink creation command worked when I used the --inputs argument instead with an explicit topic name. (in this example, micro_aggregator)
   
   Thank you! 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy edited a comment on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
wolfstudy edited a comment on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-745023239


   Thanks @EliHarper feedback, maybe we can close this issue, and then create an issue for whether the topics containing the schema can work correctly in the pulsar sink or pulsar functions? It seems that `--topics-pattern` can work correctly in pulsar sink and pulsar functions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-744159728


   @EliHarper Any update for this issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-743194973


   @EliHarper If `--topics-pattern` still does not work in your environment, please feel free to contact me or please let me know.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-745023239


   Thanks @EliHarper feedback, maybe we can close this issue, and then create an issue for whether the topics containing the schema can work correctly in the pulsar sink or pulsar functions?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] EliHarper commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
EliHarper commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-744702575


   > @EliHarper Any update for this issue?
   
   With your exact example, it does work in my environment. I believe the issue I'm having is due to the fact that my topics have more than one key : value to store, since it's a more complex topic with JSON (and I also tested with Avro) schema.
   
   I think that sort of invalidates my issue's main point. It might make sense to have an example of more complex storage with the Cassandra sink rather than a single key and column, but that can be moved to a different issue (or, I suppose, a feature request) in the future.
   
   Thanks for your help!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] EliHarper commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
EliHarper commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-729825374


   Also, I've reached a somewhat unrelated question: is it possible to store multiple fields in their respective columns with the Cassandra sink? The example illustrates an example with a single keyname and a single columnName. This works for me, but when I add in the rest of the fields I defined in the schema, the sink will only save the keyname and one columnName, both of which hold the entire message, and the rest of the values are null. This occurs with both Avro and JsonSchema.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy closed issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
wolfstudy closed issue #8582:
URL: https://github.com/apache/pulsar/issues/8582


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-743194113


   @EliHarper Sorry for the late reply. 
   
   I tried to use the latest 2.7.0 standalone mode to reproduce this scene, unfortunately, it did not reproduce, and it works normally in my local area.
   
   ### Reproduce step
   
   1. Set up a cassandra cluster.
   
   ```
   $ docker run -d --rm  --name=cassandra -p 9042:9042 cassandra
   ```
   
   1.1 check the cluster status
   
   ```
   $ docker exec cassandra nodetool status
   ```
   
   Output:
   ```
   Datacenter: datacenter1
   =======================
   Status=Up/Down
   |/ State=Normal/Leaving/Joining/Moving
   --  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
   UN  172.17.0.2  103.67 KiB  256          100.0%            af0e4b2f-84e0-4f0b-bb14-bd5f9070ff26  rack1
   ```
   
   2. Create keyspace and table.
   
   Run cqlsh:
   
   ```
   $ docker exec -ti cassandra cqlsh localhost
   Connected to Test Cluster at localhost:9042.
   [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
   Use HELP for help.
   cqlsh>
   ```
   
   In the cqlsh, create the pulsar_test_keyspace keyspace and the pulsar_test_table table.
   
   ```
   cqlsh> CREATE KEYSPACE pulsar_test_keyspace WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
   cqlsh> USE pulsar_test_keyspace;
   cqlsh:pulsar_test_keyspace> CREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);
   ```
   
   3. Prepare a cassandra sink yaml file and put it under examples directory as `cassandra-sink.yml`
   
   ```
   $ cat examples/cassandra-sink.yml
   configs:
       roots: "localhost:9042"
       keyspace: "pulsar_test_keyspace"
       columnFamily: "pulsar_test_table"
       keyname: "key"
       columnName: "col"
   ```
   
   4. Create a cassandra sink.
   
   ```
   $ bin/pulsar-admin sink create --tenant public --namespace default --name cassandra-test-sink --sink-type cassandra --sink-config-file examples/cassandra-sink.yml --topics-pattern 'micro_.*'
   ```
   
   Output:
   
   ```
   "Created successfully"
   ```
   
   4.1 Get sink info
   
   ```
   bin/pulsar-admin sink get --tenant public --namespace default --name cassandra-test-sink
   ```
   
   Ouput:
   
   ```
   {
     "tenant": "public",
     "namespace": "default",
     "name": "cassandra-test-sink",
     "className": "org.apache.pulsar.io.cassandra.CassandraStringSink",
     "inputSpecs": {
       "micro_.*": {
         "isRegexPattern": true,
         "schemaProperties": {},
         "consumerProperties": {}
       }
     },
     "configs": {
       "keyspace": "pulsar_test_keyspace",
       "columnFamily": "pulsar_test_table",
       "keyname": "key",
       "roots": "localhost:9042",
       "columnName": "col"
     },
     "parallelism": 1,
     "processingGuarantees": "ATLEAST_ONCE",
     "retainOrdering": false,
     "autoAck": true,
     "archive": "builtin://cassandra"
   }
   ```
   
   4.2 Get status of sink
   
   ```
   bin/pulsar-admin sink status --tenant public --namespace default --name cassandra-test-sink
   ```
   
   Output:
   
   ```
   {
     "numInstances" : 1,
     "numRunning" : 1,
     "instances" : [ {
       "instanceId" : 0,
       "status" : {
         "running" : true,
         "error" : "",
         "numRestarts" : 0,
         "numReadFromPulsar" : 0,
         "numSystemExceptions" : 0,
         "latestSystemExceptions" : [ ],
         "numSinkExceptions" : 0,
         "latestSinkExceptions" : [ ],
         "numWrittenToSink" : 0,
         "lastReceivedTime" : 0,
         "workerId" : "c-standalone-fw-localhost-8080"
       }
     } ]
   }
   ```
   
   5. Produce messages to the source topic.
   
   ```
   for i in {10000..10007}; do bin/pulsar-client produce -m "key-$i" -n 1 micro_0; done
   ```
   
   6. Check results in Cassandra.
   
   ```
   $ docker exec -ti cassandra cqlsh localhost
   Connected to Test Cluster at localhost:9042.
   [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
   Use HELP for help.
   cqlsh> use pulsar_test_keyspace;
   cqlsh:pulsar_test_keyspace> select * from pulsar_test_table;
   ```
   
   Output:
   
   ```
    key       | col
   -----------+-----------
      key-125 |   key-125
       key-32 |    key-32
      key-154 |   key-154
       key-28 |    key-28
      key-121 |   key-121
       key-47 |    key-47
      key-126 |   key-126
        key-5 |     key-5
       key-42 |    key-42
       key-25 |    key-25
      key-146 |   key-146
      key-109 |   key-109
      key-138 |   key-138
      key-157 |   key-157
      key-122 |   key-122
      key-141 |   key-141
      key-159 |   key-159
      key-117 |   key-117
       key-30 |    key-30
       key-29 |    key-29
      key-151 |   key-151
      key-108 |   key-108
        key-0 |     key-0
    key-10004 | key-10004
        key-9 |     key-9
    key-10002 | key-10002
       key-23 |    key-23
      key-140 |   key-140
      key-106 |   key-106
       key-36 |    key-36
      key-156 |   key-156
      key-153 |   key-153
        key-2 |     key-2
      key-115 |   key-115
       key-24 |    key-24
      key-161 |   key-161
      key-132 |   key-132
      key-111 |   key-111
       key-27 |    key-27
      key-142 |   key-142
        key-1 |     key-1
       key-46 |    key-46
      key-143 |   key-143
      key-131 |   key-131
       key-33 |    key-33
      key-123 |   key-123
       key-39 |    key-39
    key-10003 | key-10003
      key-136 |   key-136
      key-137 |   key-137
      key-152 |   key-152
      key-158 |   key-158
      key-148 |   key-148
        key-3 |     key-3
      key-133 |   key-133
      key-116 |   key-116
      key-114 |   key-114
      key-139 |   key-139
       key-45 |    key-45
      key-112 |   key-112
       key-41 |    key-41
      key-119 |   key-119
      key-155 |   key-155
      key-124 |   key-124
       key-49 |    key-49
      key-149 |   key-149
        key-6 |     key-6
    key-10007 | key-10007
        key-7 |     key-7
      key-113 |   key-113
        key-4 |     key-4
       key-34 |    key-34
    key-10001 | key-10001
       key-37 |    key-37
    key-10005 | key-10005
       key-26 |    key-26
       key-35 |    key-35
      key-144 |   key-144
      key-110 |   key-110
       key-31 |    key-31
      key-120 |   key-120
      key-134 |   key-134
        key-8 |     key-8
    key-10006 | key-10006
      key-147 |   key-147
       key-10 |    key-10
      key-118 |   key-118
       key-48 |    key-48
       key-40 |    key-40
      key-145 |   key-145
      key-150 |   key-150
      key-135 |   key-135
       key-43 |    key-43
       key-50 |    key-50
       key-44 |    key-44
      key-160 |   key-160
   
   (96 rows)
   ```
   
   > The output result will be different because of the data of the previous test, but we can see that the data such as `key-10002`, `key-10004` and ` key-10006` are the data generated by this test
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] EliHarper removed a comment on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink

Posted by GitBox <gi...@apache.org>.
EliHarper removed a comment on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-729825374


   Also, I've reached a somewhat unrelated question: is it possible to store multiple fields in their respective columns with the Cassandra sink? The example illustrates an example with a single keyname and a single columnName. This works for me, but when I add in the rest of the fields I defined in the schema, the sink will only save the keyname and one columnName, both of which hold the entire message, and the rest of the values are null. This occurs with both Avro and JsonSchema.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org