You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by karan alang <ka...@gmail.com> on 2022/02/18 07:45:57 UTC
GCP Dataproc - error in importing KafkaProducer
Hello All,
I've a GCP Dataproc cluster, and I'm running a Spark StructuredStreaming
job on this.
I'm trying to use KafkaProducer to push aggregated data into a Kafka
topic, however when i import KafkaProducer
(from kafka import KafkaProducer),
it gives error
```
Traceback (most recent call last):
File
"/tmp/7e27e272e64b461dbdc2e5083dc23202/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
line 14, in <module>
from kafka.producer import KafkaProducer
File "/opt/conda/default/lib/python3.8/site-packages/kafka/__init__.py",
line 23, in <module>
from kafka.producer import KafkaProducer
File
"/opt/conda/default/lib/python3.8/site-packages/kafka/producer/__init__.py",
line 4, in <module>
from .simple import SimpleProducer
File
"/opt/conda/default/lib/python3.8/site-packages/kafka/producer/simple.py",
line 54
return '<SimpleProducer batch=%s>' % self.async
```
As part of the initialization actions, i'm installing the following :
---
pip install pypi
pip install kafka-python
pip install google-cloud-storage
pip install pandas
---
Additional details in stackoverflow :
https://stackoverflow.com/questions/71169869/gcp-dataproc-getting-error-in-importing-kafkaproducer
Any ideas on what needs to be to fix this ?
tia!