You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Atheeth SH <at...@gmail.com> on 2023/03/03 10:55:12 UTC

Re: Unable to handle bignumeric datatype in spark/pyspark

Hi Rajnil,

Just curious, what version of spark-bigquery-connector are you using?

Thanks,
Atheeth

On Sat, 25 Feb 2023 at 23:48, Mich Talebzadeh <mi...@gmail.com>
wrote:

> sounds like it is cosmetric. The important point is that if the data
> stored in GBQ is valid?
>
>
> THT
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 25 Feb 2023 at 18:12, Rajnil Guha <ra...@gmail.com> wrote:
>
>> Hi All,
>>
>> I had created an issue on Stackoverflow(linked below) a few months back
>> about issues while handling bignumeric type values of BigQuery in Spark.
>>
>> link
>> <https://stackoverflow.com/questions/74719503/getting-error-while-reading-bignumeric-data-type-from-a-bigquery-table-using-apa>
>>
>> On Fri, Feb 24, 2023 at 3:54 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi Nidhi,
>>>
>>> can you create a BigQuery table with a  bignumeric and numeric column
>>> types, add a few lines and try to read into spark. through DF
>>>
>>> and do
>>>
>>>
>>> df.printSchema()
>>>
>>> df.show(5,False)
>>>
>>>
>>> HTH
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 24 Feb 2023 at 02:47, nidhi kher <kh...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> I am facing below issue in pyspark code:
>>>>
>>>> We are running spark code using dataproc serverless batch in google
>>>> cloud platform. Spark code is causing issue while writing the data to
>>>> bigquery table. In bigquery table , few of the columns have datatype as
>>>> bignumeric and spark code is changing the datatype from bignumeric to
>>>> numeric while writing the data. We need datatype to be kept as bignumeric
>>>> only as we need data of 38,20 precision.
>>>>
>>>>
>>>> Can we cast a column to bignumeric in spark sql dataframe like below
>>>> code for decimal:
>>>>
>>>>
>>>> df= spark.sql("""SELECT cast(col1 as decimal(38,20)) as col1 from
>>>> table1""")
>>>>
>>>> Spark version :3.3
>>>>
>>>> Pyspark version : 1.1
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Nidhi
>>>>
>>>

Re: Unable to handle bignumeric datatype in spark/pyspark

Posted by Atheeth SH <at...@gmail.com>.
Hi Rajnil,

Sorry for the multiple emails. It seems you are getting the
ModuleNotFoundError error was curious, have you tried using the
below-mentioned solution mentioned in the readme file?

Below is the link:-
https://github.com/GoogleCloudDataproc/spark-bigquery-connector#bignumeric-support

Also please find the code block solution.

if the code throws ModuleNotFoundError, please add the following code
before reading the BigNumeric data.

try:
    import pkg_resources

    pkg_resources.declare_namespace(__name__)
except ImportError:
    import pkgutil

    __path__ = pkgutil.extend_path(__path__, __name__)

Thanks,

Atheeth


On Fri, 3 Mar 2023 at 16:25, Atheeth SH <at...@gmail.com> wrote:

> Hi Rajnil,
>
> Just curious, what version of spark-bigquery-connector are you using?
>
> Thanks,
> Atheeth
>
> On Sat, 25 Feb 2023 at 23:48, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> sounds like it is cosmetric. The important point is that if the data
>> stored in GBQ is valid?
>>
>>
>> THT
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 25 Feb 2023 at 18:12, Rajnil Guha <ra...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I had created an issue on Stackoverflow(linked below) a few months back
>>> about issues while handling bignumeric type values of BigQuery in Spark.
>>>
>>> link
>>> <https://stackoverflow.com/questions/74719503/getting-error-while-reading-bignumeric-data-type-from-a-bigquery-table-using-apa>
>>>
>>> On Fri, Feb 24, 2023 at 3:54 PM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi Nidhi,
>>>>
>>>> can you create a BigQuery table with a  bignumeric and numeric column
>>>> types, add a few lines and try to read into spark. through DF
>>>>
>>>> and do
>>>>
>>>>
>>>> df.printSchema()
>>>>
>>>> df.show(5,False)
>>>>
>>>>
>>>> HTH
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, 24 Feb 2023 at 02:47, nidhi kher <kh...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>> I am facing below issue in pyspark code:
>>>>>
>>>>> We are running spark code using dataproc serverless batch in google
>>>>> cloud platform. Spark code is causing issue while writing the data to
>>>>> bigquery table. In bigquery table , few of the columns have datatype as
>>>>> bignumeric and spark code is changing the datatype from bignumeric to
>>>>> numeric while writing the data. We need datatype to be kept as bignumeric
>>>>> only as we need data of 38,20 precision.
>>>>>
>>>>>
>>>>> Can we cast a column to bignumeric in spark sql dataframe like below
>>>>> code for decimal:
>>>>>
>>>>>
>>>>> df= spark.sql("""SELECT cast(col1 as decimal(38,20)) as col1 from
>>>>> table1""")
>>>>>
>>>>> Spark version :3.3
>>>>>
>>>>> Pyspark version : 1.1
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Nidhi
>>>>>
>>>>