You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/16 20:14:31 UTC
[GitHub] [pulsar] EliHarper opened a new issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
EliHarper opened a new issue #8582:
URL: https://github.com/apache/pulsar/issues/8582
**Describe the bug**
While I'm not sure which version is right (--topics-pattern & --topics-Pattern are listed in documentation, --topicsPattern also mentioned in man-style error messages and I _think_ I saw --topic-pattern elsewhere), I created a sink that passed both micro_.* and micro_*; neither of which matched any of the expected pulsar topics.
**To Reproduce**
1. Run pulsar-all docker image on CentOS 7
2. Create a Cassandra container as specified [here](https://pulsar.apache.org/docs/en/io-quickstart/#setup-a-cassandra-cluster)
3. Configure Pulsar sink as specified [here ](https://pulsar.apache.org/docs/en/io-quickstart/#configure-a-cassandra-sink)
4. Create a Cassandra sink using the --topics-pattern argument matching multiple topics with similar prefixes, ending with a wildcard.
5. See if any messages are read from the topic - none showed for me in the database or using pulsar-admin sinks status.
**Expected behavior**
Pulsar messages from all matching topics read from pulsar and recorded in the Cassandra sink.
**Desktop (please complete the following information):**
- OS: CentOS 7, Docker version 19.03.13, build 4484c46d9d
**Additional context**
I'm honestly fairly new to Pulsar. If I'm doing anything wrong, please correct me; I just believe this seems to be a bug since the same sink creation command worked when I used the --inputs argument instead with an explicit topic name. (in this example, micro_aggregator)
Thank you!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] wolfstudy edited a comment on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
wolfstudy edited a comment on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-745023239
Thanks @EliHarper feedback, maybe we can close this issue, and then create an issue for whether the topics containing the schema can work correctly in the pulsar sink or pulsar functions? It seems that `--topics-pattern` can work correctly in pulsar sink and pulsar functions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-744159728
@EliHarper Any update for this issue?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-743194973
@EliHarper If `--topics-pattern` still does not work in your environment, please feel free to contact me or please let me know.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-745023239
Thanks @EliHarper feedback, maybe we can close this issue, and then create an issue for whether the topics containing the schema can work correctly in the pulsar sink or pulsar functions?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] EliHarper commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
EliHarper commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-744702575
> @EliHarper Any update for this issue?
With your exact example, it does work in my environment. I believe the issue I'm having is due to the fact that my topics have more than one key : value to store, since it's a more complex topic with JSON (and I also tested with Avro) schema.
I think that sort of invalidates my issue's main point. It might make sense to have an example of more complex storage with the Cassandra sink rather than a single key and column, but that can be moved to a different issue (or, I suppose, a feature request) in the future.
Thanks for your help!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] EliHarper commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
EliHarper commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-729825374
Also, I've reached a somewhat unrelated question: is it possible to store multiple fields in their respective columns with the Cassandra sink? The example illustrates an example with a single keyname and a single columnName. This works for me, but when I add in the rest of the fields I defined in the schema, the sink will only save the keyname and one columnName, both of which hold the entire message, and the rest of the values are null. This occurs with both Avro and JsonSchema.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] wolfstudy closed issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
wolfstudy closed issue #8582:
URL: https://github.com/apache/pulsar/issues/8582
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] wolfstudy commented on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-743194113
@EliHarper Sorry for the late reply.
I tried to use the latest 2.7.0 standalone mode to reproduce this scene, unfortunately, it did not reproduce, and it works normally in my local area.
### Reproduce step
1. Set up a cassandra cluster.
```
$ docker run -d --rm --name=cassandra -p 9042:9042 cassandra
```
1.1 check the cluster status
```
$ docker exec cassandra nodetool status
```
Output:
```
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.17.0.2 103.67 KiB 256 100.0% af0e4b2f-84e0-4f0b-bb14-bd5f9070ff26 rack1
```
2. Create keyspace and table.
Run cqlsh:
```
$ docker exec -ti cassandra cqlsh localhost
Connected to Test Cluster at localhost:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
```
In the cqlsh, create the pulsar_test_keyspace keyspace and the pulsar_test_table table.
```
cqlsh> CREATE KEYSPACE pulsar_test_keyspace WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
cqlsh> USE pulsar_test_keyspace;
cqlsh:pulsar_test_keyspace> CREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);
```
3. Prepare a cassandra sink yaml file and put it under examples directory as `cassandra-sink.yml`
```
$ cat examples/cassandra-sink.yml
configs:
roots: "localhost:9042"
keyspace: "pulsar_test_keyspace"
columnFamily: "pulsar_test_table"
keyname: "key"
columnName: "col"
```
4. Create a cassandra sink.
```
$ bin/pulsar-admin sink create --tenant public --namespace default --name cassandra-test-sink --sink-type cassandra --sink-config-file examples/cassandra-sink.yml --topics-pattern 'micro_.*'
```
Output:
```
"Created successfully"
```
4.1 Get sink info
```
bin/pulsar-admin sink get --tenant public --namespace default --name cassandra-test-sink
```
Ouput:
```
{
"tenant": "public",
"namespace": "default",
"name": "cassandra-test-sink",
"className": "org.apache.pulsar.io.cassandra.CassandraStringSink",
"inputSpecs": {
"micro_.*": {
"isRegexPattern": true,
"schemaProperties": {},
"consumerProperties": {}
}
},
"configs": {
"keyspace": "pulsar_test_keyspace",
"columnFamily": "pulsar_test_table",
"keyname": "key",
"roots": "localhost:9042",
"columnName": "col"
},
"parallelism": 1,
"processingGuarantees": "ATLEAST_ONCE",
"retainOrdering": false,
"autoAck": true,
"archive": "builtin://cassandra"
}
```
4.2 Get status of sink
```
bin/pulsar-admin sink status --tenant public --namespace default --name cassandra-test-sink
```
Output:
```
{
"numInstances" : 1,
"numRunning" : 1,
"instances" : [ {
"instanceId" : 0,
"status" : {
"running" : true,
"error" : "",
"numRestarts" : 0,
"numReadFromPulsar" : 0,
"numSystemExceptions" : 0,
"latestSystemExceptions" : [ ],
"numSinkExceptions" : 0,
"latestSinkExceptions" : [ ],
"numWrittenToSink" : 0,
"lastReceivedTime" : 0,
"workerId" : "c-standalone-fw-localhost-8080"
}
} ]
}
```
5. Produce messages to the source topic.
```
for i in {10000..10007}; do bin/pulsar-client produce -m "key-$i" -n 1 micro_0; done
```
6. Check results in Cassandra.
```
$ docker exec -ti cassandra cqlsh localhost
Connected to Test Cluster at localhost:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> use pulsar_test_keyspace;
cqlsh:pulsar_test_keyspace> select * from pulsar_test_table;
```
Output:
```
key | col
-----------+-----------
key-125 | key-125
key-32 | key-32
key-154 | key-154
key-28 | key-28
key-121 | key-121
key-47 | key-47
key-126 | key-126
key-5 | key-5
key-42 | key-42
key-25 | key-25
key-146 | key-146
key-109 | key-109
key-138 | key-138
key-157 | key-157
key-122 | key-122
key-141 | key-141
key-159 | key-159
key-117 | key-117
key-30 | key-30
key-29 | key-29
key-151 | key-151
key-108 | key-108
key-0 | key-0
key-10004 | key-10004
key-9 | key-9
key-10002 | key-10002
key-23 | key-23
key-140 | key-140
key-106 | key-106
key-36 | key-36
key-156 | key-156
key-153 | key-153
key-2 | key-2
key-115 | key-115
key-24 | key-24
key-161 | key-161
key-132 | key-132
key-111 | key-111
key-27 | key-27
key-142 | key-142
key-1 | key-1
key-46 | key-46
key-143 | key-143
key-131 | key-131
key-33 | key-33
key-123 | key-123
key-39 | key-39
key-10003 | key-10003
key-136 | key-136
key-137 | key-137
key-152 | key-152
key-158 | key-158
key-148 | key-148
key-3 | key-3
key-133 | key-133
key-116 | key-116
key-114 | key-114
key-139 | key-139
key-45 | key-45
key-112 | key-112
key-41 | key-41
key-119 | key-119
key-155 | key-155
key-124 | key-124
key-49 | key-49
key-149 | key-149
key-6 | key-6
key-10007 | key-10007
key-7 | key-7
key-113 | key-113
key-4 | key-4
key-34 | key-34
key-10001 | key-10001
key-37 | key-37
key-10005 | key-10005
key-26 | key-26
key-35 | key-35
key-144 | key-144
key-110 | key-110
key-31 | key-31
key-120 | key-120
key-134 | key-134
key-8 | key-8
key-10006 | key-10006
key-147 | key-147
key-10 | key-10
key-118 | key-118
key-48 | key-48
key-40 | key-40
key-145 | key-145
key-150 | key-150
key-135 | key-135
key-43 | key-43
key-50 | key-50
key-44 | key-44
key-160 | key-160
(96 rows)
```
> The output result will be different because of the data of the previous test, but we can see that the data such as `key-10002`, `key-10004` and ` key-10006` are the data generated by this test
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] EliHarper removed a comment on issue #8582: Pulsar sink creation --topics-pattern argument not working as expected in Docker with pulsar-all image & cassandra sink
Posted by GitBox <gi...@apache.org>.
EliHarper removed a comment on issue #8582:
URL: https://github.com/apache/pulsar/issues/8582#issuecomment-729825374
Also, I've reached a somewhat unrelated question: is it possible to store multiple fields in their respective columns with the Cassandra sink? The example illustrates an example with a single keyname and a single columnName. This works for me, but when I add in the rest of the fields I defined in the schema, the sink will only save the keyname and one columnName, both of which hold the entire message, and the rest of the values are null. This occurs with both Avro and JsonSchema.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org