You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by karan alang <ka...@gmail.com> on 2022/02/18 07:45:57 UTC

GCP Dataproc - error in importing KafkaProducer

Hello All,

I've a GCP Dataproc cluster, and I'm running a Spark StructuredStreaming
job on this.
I'm trying to use KafkaProducer to push aggregated data into a Kafka
topic,  however when i import KafkaProducer
(from kafka import KafkaProducer),
it gives error

```

Traceback (most recent call last):

  File
"/tmp/7e27e272e64b461dbdc2e5083dc23202/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
line 14, in <module>

    from kafka.producer import KafkaProducer

  File "/opt/conda/default/lib/python3.8/site-packages/kafka/__init__.py",
line 23, in <module>

    from kafka.producer import KafkaProducer

  File
"/opt/conda/default/lib/python3.8/site-packages/kafka/producer/__init__.py",
line 4, in <module>

    from .simple import SimpleProducer

  File
"/opt/conda/default/lib/python3.8/site-packages/kafka/producer/simple.py",
line 54

    return '<SimpleProducer batch=%s>' % self.async
```

As part of the initialization actions, i'm installing the following :
---

pip install pypi
pip install kafka-python
pip install google-cloud-storage
pip install pandas

---

Additional details in stackoverflow :
https://stackoverflow.com/questions/71169869/gcp-dataproc-getting-error-in-importing-kafkaproducer

Any ideas on what needs to be to fix this ?
tia!