You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by karan alang <ka...@gmail.com> on 2022/02/18 07:44:29 UTC

GCP Dataproc - getting error in importing KafkaProducer

Hello All,

I've a GCP Dataproc cluster, and i'm running a Spark StructuredStreaming
job on this.
I'm trying to use KafkaProducer to push aggregated data into a Kafka
topic,  however when i import KafkaProducer (from kafka import
KafkaProducer), it gives error

```

Traceback (most recent call last):

  File
"/tmp/7e27e272e64b461dbdc2e5083dc23202/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
line 14, in <module>

    from kafka.producer import KafkaProducer

  File "/opt/conda/default/lib/python3.8/site-packages/kafka/__init__.py",
line 23, in <module>

    from kafka.producer import KafkaProducer

  File
"/opt/conda/default/lib/python3.8/site-packages/kafka/producer/__init__.py",
line 4, in <module>

    from .simple import SimpleProducer

  File
"/opt/conda/default/lib/python3.8/site-packages/kafka/producer/simple.py",
line 54

    return '<SimpleProducer batch=%s>' % self.async
```

As part of the initialization actions, i'm installing the following :
---

pip install pypi
pip install kafka-python
pip install google-cloud-storage
pip install pandas

---

Additional details in stackoverflow :
https://stackoverflow.com/questions/71169869/gcp-dataproc-getting-error-in-importing-kafkaproducer

Any ideas on what needs to be to fix this ?
tia!

Re: GCP Dataproc - getting error in importing KafkaProducer

Posted by Mich Talebzadeh <mi...@gmail.com>.
On Dataproc package kafka-python does not exist not installed as standard

sudo su - to root and install it as above


as root


 pip list|grep kafka

root@ctpcluster-m:~#

 pip install kafka-python
Collecting kafka-python
  Downloading kafka_python-2.0.2-py2.py3-none-any.whl (246 kB)
     |████████████████████████████████| 246 kB 22.0 MB/s
Installing collected packages: kafka-python
Successfully installed kafka-python-2.0.2
hduser@ctpcluster-m: /home/hduser> pip list|grep kafka
kafka-python                      2.0.2

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 18 Feb 2022 at 08:39, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Have you installed the correct package kafka-python?
>
>  *pip install kafka-python*
> Collecting kafka-python
>   Downloading kafka_python-2.0.2-py2.py3-none-any.whl (246 kB)
>      |████████████████████████████████| 246 kB 1.9 MB/s
> Installing collected packages: kafka-python
> Successfully installed kafka-python-2.0.2
>
>
> *pip list|grep kafka*
> *kafka-python                  2.0.2*
>
> *python3*
> Python 3.7.3 (default, Apr  3 2021, 20:42:31)
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> *>>> from kafka import KafkaProducer*
> *>>>*
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 18 Feb 2022 at 07:45, karan alang <ka...@gmail.com> wrote:
>
>> Hello All,
>>
>> I've a GCP Dataproc cluster, and i'm running a Spark StructuredStreaming
>> job on this.
>> I'm trying to use KafkaProducer to push aggregated data into a Kafka
>> topic,  however when i import KafkaProducer (from kafka import
>> KafkaProducer), it gives error
>>
>> ```
>>
>> Traceback (most recent call last):
>>
>>   File
>>
>> "/tmp/7e27e272e64b461dbdc2e5083dc23202/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
>> line 14, in <module>
>>
>>     from kafka.producer import KafkaProducer
>>
>>   File "/opt/conda/default/lib/python3.8/site-packages/kafka/__init__.py",
>> line 23, in <module>
>>
>>     from kafka.producer import KafkaProducer
>>
>>   File
>>
>> "/opt/conda/default/lib/python3.8/site-packages/kafka/producer/__init__.py",
>> line 4, in <module>
>>
>>     from .simple import SimpleProducer
>>
>>   File
>> "/opt/conda/default/lib/python3.8/site-packages/kafka/producer/simple.py",
>> line 54
>>
>>     return '<SimpleProducer batch=%s>' % self.async
>> ```
>>
>> As part of the initialization actions, i'm installing the following :
>> ---
>>
>> pip install pypi
>> pip install kafka-python
>> pip install google-cloud-storage
>> pip install pandas
>>
>> ---
>>
>> Additional details in stackoverflow :
>>
>> https://stackoverflow.com/questions/71169869/gcp-dataproc-getting-error-in-importing-kafkaproducer
>>
>> Any ideas on what needs to be to fix this ?
>> tia!
>>
>

Re: GCP Dataproc - getting error in importing KafkaProducer

Posted by Mich Talebzadeh <mi...@gmail.com>.
Have you installed the correct package kafka-python?

 *pip install kafka-python*
Collecting kafka-python
  Downloading kafka_python-2.0.2-py2.py3-none-any.whl (246 kB)
     |████████████████████████████████| 246 kB 1.9 MB/s
Installing collected packages: kafka-python
Successfully installed kafka-python-2.0.2


*pip list|grep kafka*
*kafka-python                  2.0.2*

*python3*
Python 3.7.3 (default, Apr  3 2021, 20:42:31)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
*>>> from kafka import KafkaProducer*
*>>>*


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 18 Feb 2022 at 07:45, karan alang <ka...@gmail.com> wrote:

> Hello All,
>
> I've a GCP Dataproc cluster, and i'm running a Spark StructuredStreaming
> job on this.
> I'm trying to use KafkaProducer to push aggregated data into a Kafka
> topic,  however when i import KafkaProducer (from kafka import
> KafkaProducer), it gives error
>
> ```
>
> Traceback (most recent call last):
>
>   File
>
> "/tmp/7e27e272e64b461dbdc2e5083dc23202/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
> line 14, in <module>
>
>     from kafka.producer import KafkaProducer
>
>   File "/opt/conda/default/lib/python3.8/site-packages/kafka/__init__.py",
> line 23, in <module>
>
>     from kafka.producer import KafkaProducer
>
>   File
>
> "/opt/conda/default/lib/python3.8/site-packages/kafka/producer/__init__.py",
> line 4, in <module>
>
>     from .simple import SimpleProducer
>
>   File
> "/opt/conda/default/lib/python3.8/site-packages/kafka/producer/simple.py",
> line 54
>
>     return '<SimpleProducer batch=%s>' % self.async
> ```
>
> As part of the initialization actions, i'm installing the following :
> ---
>
> pip install pypi
> pip install kafka-python
> pip install google-cloud-storage
> pip install pandas
>
> ---
>
> Additional details in stackoverflow :
>
> https://stackoverflow.com/questions/71169869/gcp-dataproc-getting-error-in-importing-kafkaproducer
>
> Any ideas on what needs to be to fix this ?
> tia!
>