You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Peter Hsu <ph...@hotmail.com> on 2023/07/24 10:22:41 UTC

Unable to get Kafka to work

Hi Members,
I am trying to setup connection to Kafka but have no success.
The sample code I followed is from:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
The error I am getting is
[cid:5f45c61a-6ac8-4480-8d91-849054a0732f]

I found similar issue in stack overflow but there was no solution:
https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro
[https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon@2.png?v=73d79a89bded]<https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
ReadFromKafka in Apache Beam python SDK doesn't work : java.io.IOException: error=2, No such file or directory<https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
I am trying to run a simple beam program in python which reads messages from Kafka Topic and print it to the console but I am getting this error and don't know what is the issue. WARNING:root:Wait...
stackoverflow.com

Does anyone have a working example?

Thanks for assistance,

Peter Hsu

Re: Unable to get Kafka to work

Posted by Bruno Volpato via user <us...@beam.apache.org>.
Hi Peter,

If I understand it correctly, you are using the DirectRunner -- KafkaIO is
implemented in Java, so you need to start an expansion service to access it
from your Python code (more information here:
https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/
).
Another alternative would be starting the pipeline with
--runner=DataflowRunner, as Dataflow should handle that for you.



On Mon, Jul 24, 2023 at 6:23 AM Peter Hsu <ph...@hotmail.com> wrote:

> Hi Members,
> I am trying to setup connection to Kafka but have no success.
> The sample code I followed is from:
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
> The error I am getting is
>
>
> I found similar issue in stack overflow but there was no solution:
>
> https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro
>
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> ReadFromKafka in Apache Beam python SDK doesn't work :
> java.io.IOException: error=2, No such file or directory
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> I am trying to run a simple beam program in python which reads messages
> from Kafka Topic and print it to the console but I am getting this error
> and don't know what is the issue. WARNING:root:Wait...
> stackoverflow.com
>
> Does anyone have a working example?
>
> Thanks for assistance,
>
> Peter Hsu
>

Re: Unable to get Kafka to work

Posted by Chamikara Jayalath <ch...@gmail.com>.
On Sat, Jul 29, 2023 at 2:43 AM Peter Hsu <ph...@hotmail.com> wrote:

> Hi,
>
> An update on the  unsupported signal issue.
>
> I tried exactly same code on Ubuntu and didn't encounter the same issue.
> Therefore, I believe the issue is only a problem in windows.
>

Will you be able to create a Beam Github issue for this ? -
https://beam.apache.org/community/contact-us/


>
> After switching to Ubuntu, template was created into GCP Bucket and I was
> able to create job from the template.
>
> Unfortunately, hit another issue: Timeout expired while fetching topic
> metadata.
>
> To prove there is no issue to the Confluent Cloud kafka server, a python
> consumer was created using confluent_kafka and was able to connect to the
> server with below config:
> bootstrap.servers=<servername>:9092
> security.protocol=SASL_SSL
> sasl.mechanisms=PLAIN
> sasl.username=<API KEY>
> sasl.password=<API Secret>
>
> When I tried same config in beam, I got jaas config error.
>
>
> After inserting jaas config, I got serviceName error.
>
> After inserting serviceName, I got timeout error
>
> the code failed in beam is
> ReadFromKafka(consumer_config={'<servername>:9092', 'security.protocol':
> 'SASL_SSL', 'sasl.mechanisms': 'PLAIN', 'sasl.jaas.config':
> 'org.apache.kafka.common.security.plain.PlainLoginModule required username="<API
> KEY>" password="<API Secret>" serviceName="kafka";', 'group.id':
> 'python_example_group_1', 'auto.offset.reset': 'earliest'}
>
>
> any suggestions?
>

I'm not exactly sure what's going on but it seems like workers are not able
to connect to the Kafka broker when reading somewhere in the following
method.

https://github.com/apache/beam/blob/b6687993b3b78fd4f9774062f3d94ee3e142a0f8/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1610

Can you try increasing the timeout or observe whether the connections are
being established from the Kafka side ?

Thanks,
Cham


>
> Regards,
>
> Peter Hsu
>
>
> ------------------------------
> *From:* Chamikara Jayalath via user <us...@beam.apache.org>
> *Sent:* Tuesday, 25 July 2023 4:59 AM
> *To:* user@beam.apache.org <us...@beam.apache.org>
> *Subject:* Re: Unable to get Kafka to work
>
> Are you running from a specialized/restricted environment (where starting
> a local process is disallowed, for example) ?
>
> Python Kafka transforms needs to start a Java expansion service as a
> process in the local machine where the job is submitted from
> (done automatically) and needs to connect to that using gRPC during the
> transform expansion (before submitting to a runner). Seems like this
> process failed and the transform is unable to connect to the expansion
> service for some reason.
>
> Thanks,
> Cham
>
> On Mon, Jul 24, 2023 at 3:23 AM Peter Hsu <ph...@hotmail.com> wrote:
>
> Hi Members,
> I am trying to setup connection to Kafka but have no success.
> The sample code I followed is from:
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
> The error I am getting is
>
>
> I found similar issue in stack overflow but there was no solution:
>
> https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro
>
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> ReadFromKafka in Apache Beam python SDK doesn't work :
> java.io.IOException: error=2, No such file or directory
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> I am trying to run a simple beam program in python which reads messages
> from Kafka Topic and print it to the console but I am getting this error
> and don't know what is the issue. WARNING:root:Wait...
> stackoverflow.com
>
> Does anyone have a working example?
>
> Thanks for assistance,
>
> Peter Hsu
>
>

Re: Unable to get Kafka to work

Posted by Peter Hsu <ph...@hotmail.com>.
Hi,

An update on the  unsupported signal issue.

I tried exactly same code on Ubuntu and didn't encounter the same issue. Therefore, I believe the issue is only a problem in windows.

After switching to Ubuntu, template was created into GCP Bucket and I was able to create job from the template.

Unfortunately, hit another issue: Timeout expired while fetching topic metadata.

To prove there is no issue to the Confluent Cloud kafka server, a python consumer was created using confluent_kafka and was able to connect to the server with below config:
bootstrap.servers=<servername>:9092
security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
sasl.username=<API KEY>
sasl.password=<API Secret>

When I tried same config in beam, I got jaas config error.
[cid:d96e0b70-ebdc-41a6-9f39-4822207e2392]
[cid:5d54b9e6-43d0-47e3-b00d-9bedc7bef33c]
After inserting jaas config, I got serviceName error.
[cid:92d6f8bb-d39c-4bac-9c6b-ef6baac76a2f]
After inserting serviceName, I got timeout error
[cid:d7b009d3-5ab7-43f5-87fc-7723f4cf4492]
the code failed in beam is
ReadFromKafka(consumer_config={'<servername>:9092', 'security.protocol': 'SASL_SSL', 'sasl.mechanisms': 'PLAIN', 'sasl.jaas.config': 'org.apache.kafka.common.security.plain.PlainLoginModule required username="<API KEY>" password="<API Secret>" serviceName="kafka";', 'group.id': 'python_example_group_1', 'auto.offset.reset': 'earliest'}


any suggestions?

Regards,

Peter Hsu


________________________________
From: Chamikara Jayalath via user <us...@beam.apache.org>
Sent: Tuesday, 25 July 2023 4:59 AM
To: user@beam.apache.org <us...@beam.apache.org>
Subject: Re: Unable to get Kafka to work

Are you running from a specialized/restricted environment (where starting a local process is disallowed, for example) ?

Python Kafka transforms needs to start a Java expansion service as a process in the local machine where the job is submitted from (done automatically) and needs to connect to that using gRPC during the transform expansion (before submitting to a runner). Seems like this process failed and the transform is unable to connect to the expansion service for some reason.

Thanks,
Cham

On Mon, Jul 24, 2023 at 3:23 AM Peter Hsu <ph...@hotmail.com>> wrote:
Hi Members,
I am trying to setup connection to Kafka but have no success.
The sample code I followed is from:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
The error I am getting is
[cid:1898941d74ecb971f161]

I found similar issue in stack overflow but there was no solution:
https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro
[https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon@2.png?v=73d79a89bded]<https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
ReadFromKafka in Apache Beam python SDK doesn't work : java.io.IOException: error=2, No such file or directory<https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
I am trying to run a simple beam program in python which reads messages from Kafka Topic and print it to the console but I am getting this error and don't know what is the issue. WARNING:root:Wait...
stackoverflow.com<http://stackoverflow.com/>

Does anyone have a working example?

Thanks for assistance,

Peter Hsu

Re: Unable to get Kafka to work

Posted by Chamikara Jayalath via user <us...@beam.apache.org>.
(resending to test https://issues.apache.org/jira/browse/INFRA-24850,
please ignore)

On Mon, Jul 24, 2023 at 11:59 AM Chamikara Jayalath <ch...@google.com>
wrote:

> Are you running from a specialized/restricted environment (where starting
> a local process is disallowed, for example) ?
>
> Python Kafka transforms needs to start a Java expansion service as a
> process in the local machine where the job is submitted from
> (done automatically) and needs to connect to that using gRPC during the
> transform expansion (before submitting to a runner). Seems like this
> process failed and the transform is unable to connect to the expansion
> service for some reason.
>
> Thanks,
> Cham
>
> On Mon, Jul 24, 2023 at 3:23 AM Peter Hsu <ph...@hotmail.com> wrote:
>
>> Hi Members,
>> I am trying to setup connection to Kafka but have no success.
>> The sample code I followed is from:
>>
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
>> The error I am getting is
>>
>>
>> I found similar issue in stack overflow but there was no solution:
>>
>> https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro
>>
>> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
>> ReadFromKafka in Apache Beam python SDK doesn't work :
>> java.io.IOException: error=2, No such file or directory
>> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
>> I am trying to run a simple beam program in python which reads messages
>> from Kafka Topic and print it to the console but I am getting this error
>> and don't know what is the issue. WARNING:root:Wait...
>> stackoverflow.com
>>
>> Does anyone have a working example?
>>
>> Thanks for assistance,
>>
>> Peter Hsu
>>
>

Re: Unable to get Kafka to work

Posted by Chamikara Jayalath via user <us...@beam.apache.org>.
Are you running from a specialized/restricted environment (where starting a
local process is disallowed, for example) ?

Python Kafka transforms needs to start a Java expansion service as a
process in the local machine where the job is submitted from
(done automatically) and needs to connect to that using gRPC during the
transform expansion (before submitting to a runner). Seems like this
process failed and the transform is unable to connect to the expansion
service for some reason.

Thanks,
Cham

On Mon, Jul 24, 2023 at 3:23 AM Peter Hsu <ph...@hotmail.com> wrote:

> Hi Members,
> I am trying to setup connection to Kafka but have no success.
> The sample code I followed is from:
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
> The error I am getting is
>
>
> I found similar issue in stack overflow but there was no solution:
>
> https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro
>
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> ReadFromKafka in Apache Beam python SDK doesn't work :
> java.io.IOException: error=2, No such file or directory
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> I am trying to run a simple beam program in python which reads messages
> from Kafka Topic and print it to the console but I am getting this error
> and don't know what is the issue. WARNING:root:Wait...
> stackoverflow.com
>
> Does anyone have a working example?
>
> Thanks for assistance,
>
> Peter Hsu
>

Re: Unable to get Kafka to work

Posted by XQ Hu via user <us...@beam.apache.org>.
Have you followed
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/README.md
to make sure Java is installed?

On Mon, Jul 24, 2023 at 6:22 AM Peter Hsu <ph...@hotmail.com> wrote:

> Hi Members,
> I am trying to setup connection to Kafka but have no success.
> The sample code I followed is from:
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
> The error I am getting is
>
>
> I found similar issue in stack overflow but there was no solution:
>
> https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro
>
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> ReadFromKafka in Apache Beam python SDK doesn't work :
> java.io.IOException: error=2, No such file or directory
> <https://stackoverflow.com/questions/75944387/readfromkafka-in-apache-beam-python-sdk-doesnt-work-java-io-ioexception-erro>
> I am trying to run a simple beam program in python which reads messages
> from Kafka Topic and print it to the console but I am getting this error
> and don't know what is the issue. WARNING:root:Wait...
> stackoverflow.com
>
> Does anyone have a working example?
>
> Thanks for assistance,
>
> Peter Hsu
>