You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by François Pacull <fr...@architecture-performance.fr> on 2022/07/06 12:44:46 UTC

[Python] Cast decimal to string

Dear Arrow team and users, I have a simple question regarding the decimal data type with pyarrow. I am trying to cast a table with decimal columns to string, or to write it to a csv file. In both cases I get the error message:

    pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string

I understand that is not implemented yet, but is there by chance a way to get around this?
Thanks, François.

PS: I am using Python : 3.9.13 & pyarrow : 8.0.0
Here is a code snippet:

import decimal

import pyarrow as pa
import pyarrow.compute as pc
import pyarrow.csv

PREC, SCAL = 18, 9  # decimal precision & scale

context = decimal.getcontext()
context.prec = PREC
ref_decimal = decimal.Decimal('0.123456789')

float_numbers = [0.1, 654.5, 4.65742]
decimal_numbers = [
    decimal.Decimal(str(f)).quantize(ref_decimal) for f in float_numbers
]

pa_arr_dec = pa.array(
    decimal_numbers, type=pa.decimal128(precision=PREC, scale=SCAL)
)
pa_arr_str = pc.cast(pa_arr_dec, pa.string())


  Traceback (most recent call last):
    File "/home/francois/Workspace/.../scripts/pyarrow_decimal.py", line 21, in <module>
      pa_arr_str = pc.cast(pa_arr_dec, pa.string())
    File "/home/francois/miniconda3/envs/tableau2/lib/python3.9/site-packages/pyarrow/compute.py", line 376, in cast
      return call_function("cast", [arr], options)
    File "pyarrow/_compute.pyx", line 542, in pyarrow._compute.call_function
    File "pyarrow/_compute.pyx", line 341, in pyarrow._compute.Function.call
    File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
    File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
  pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string

RE: [Python] Cast decimal to string

Posted by François Pacull <fr...@architecture-performance.fr>.
Thanks for your answers! I ended up using python to convert decimal columns to string :

            schema = batch_table.schema
            for i, field in enumerate(schema):
                if pa.types.is_decimal(field.type):
                    column_in = batch_table.column(field.name)
                    column_out = pa.array(
                        [
                            str(v) if v is not None else None
                            for v in column_in.to_pylist()
                        ]
                    )
                    batch_table = batch_table.set_column(i, field.name, column_out)

François

Re: [Python] Cast decimal to string

Posted by Weston Pace <we...@gmail.com>.
I've added [1].  I agree, it should be a fairly easy fix, but requires
understanding where all the casting code lives.
arrow/compute/kernels/scalar_cast_string.cc would be a good place to
start if anyone is interested.  We have decimal->string methods in
arrow/util/decimal.h which can be used.
[1] https://issues.apache.org/jira/browse/ARROW-17042

On Mon, Jul 11, 2022 at 6:58 AM Wes McKinney <we...@gmail.com> wrote:
>
> Would someone like to open a Jira issue about this? This seems like an
> easy rough edge to fix
>
> On Wed, Jul 6, 2022 at 12:44 PM Weston Pace <we...@gmail.com> wrote:
> >
> > If precision is not important you can cast the column to float64 first.
> >
> >     >>> x = pa.array([1, 2, 3], type=pa.decimal128(6, 1))
> >     >>> x.cast(pa.float64()).cast(pa.string())
> >     <pyarrow.lib.StringArray object at 0x7fd23a52cd00>
> >     [
> >       "1",
> >       "2",
> >       "3"
> >     ]
> >
> > If precision is important you could use python or pandas to do the
> > conversion to string.
> >
> >     >>> pa.array([str(v) for v in x.to_pylist()])
> >     <pyarrow.lib.StringArray object at 0x7fd23a52cd00>
> >     [
> >       "1.0",
> >       "2.0",
> >       "3.0"
> >     ]
> >
> > On Wed, Jul 6, 2022 at 5:45 AM François Pacull
> > <fr...@architecture-performance.fr> wrote:
> > >
> > > Dear Arrow team and users, I have a simple question regarding the decimal data type with pyarrow. I am trying to cast a table with decimal columns to string, or to write it to a csv file. In both cases I get the error message:
> > >
> > >     pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string
> > >
> > > I understand that is not implemented yet, but is there by chance a way to get around this?
> > > Thanks, François.
> > >
> > > PS: I am using Python : 3.9.13 & pyarrow : 8.0.0
> > > Here is a code snippet:
> > >
> > > import decimal
> > >
> > > import pyarrow as pa
> > > import pyarrow.compute as pc
> > > import pyarrow.csv
> > >
> > > PREC, SCAL = 18, 9  # decimal precision & scale
> > >
> > > context = decimal.getcontext()
> > > context.prec = PREC
> > > ref_decimal = decimal.Decimal('0.123456789')
> > >
> > > float_numbers = [0.1, 654.5, 4.65742]
> > > decimal_numbers = [
> > >     decimal.Decimal(str(f)).quantize(ref_decimal) for f in float_numbers
> > > ]
> > >
> > > pa_arr_dec = pa.array(
> > >     decimal_numbers, type=pa.decimal128(precision=PREC, scale=SCAL)
> > > )
> > > pa_arr_str = pc.cast(pa_arr_dec, pa.string())
> > >
> > >
> > >   Traceback (most recent call last):
> > >     File "/home/francois/Workspace/.../scripts/pyarrow_decimal.py", line 21, in <module>
> > >       pa_arr_str = pc.cast(pa_arr_dec, pa.string())
> > >     File "/home/francois/miniconda3/envs/tableau2/lib/python3.9/site-packages/pyarrow/compute.py", line 376, in cast
> > >       return call_function("cast", [arr], options)
> > >     File "pyarrow/_compute.pyx", line 542, in pyarrow._compute.call_function
> > >     File "pyarrow/_compute.pyx", line 341, in pyarrow._compute.Function.call
> > >     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
> > >     File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
> > >   pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string

Re: [Python] Cast decimal to string

Posted by Wes McKinney <we...@gmail.com>.
Would someone like to open a Jira issue about this? This seems like an
easy rough edge to fix

On Wed, Jul 6, 2022 at 12:44 PM Weston Pace <we...@gmail.com> wrote:
>
> If precision is not important you can cast the column to float64 first.
>
>     >>> x = pa.array([1, 2, 3], type=pa.decimal128(6, 1))
>     >>> x.cast(pa.float64()).cast(pa.string())
>     <pyarrow.lib.StringArray object at 0x7fd23a52cd00>
>     [
>       "1",
>       "2",
>       "3"
>     ]
>
> If precision is important you could use python or pandas to do the
> conversion to string.
>
>     >>> pa.array([str(v) for v in x.to_pylist()])
>     <pyarrow.lib.StringArray object at 0x7fd23a52cd00>
>     [
>       "1.0",
>       "2.0",
>       "3.0"
>     ]
>
> On Wed, Jul 6, 2022 at 5:45 AM François Pacull
> <fr...@architecture-performance.fr> wrote:
> >
> > Dear Arrow team and users, I have a simple question regarding the decimal data type with pyarrow. I am trying to cast a table with decimal columns to string, or to write it to a csv file. In both cases I get the error message:
> >
> >     pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string
> >
> > I understand that is not implemented yet, but is there by chance a way to get around this?
> > Thanks, François.
> >
> > PS: I am using Python : 3.9.13 & pyarrow : 8.0.0
> > Here is a code snippet:
> >
> > import decimal
> >
> > import pyarrow as pa
> > import pyarrow.compute as pc
> > import pyarrow.csv
> >
> > PREC, SCAL = 18, 9  # decimal precision & scale
> >
> > context = decimal.getcontext()
> > context.prec = PREC
> > ref_decimal = decimal.Decimal('0.123456789')
> >
> > float_numbers = [0.1, 654.5, 4.65742]
> > decimal_numbers = [
> >     decimal.Decimal(str(f)).quantize(ref_decimal) for f in float_numbers
> > ]
> >
> > pa_arr_dec = pa.array(
> >     decimal_numbers, type=pa.decimal128(precision=PREC, scale=SCAL)
> > )
> > pa_arr_str = pc.cast(pa_arr_dec, pa.string())
> >
> >
> >   Traceback (most recent call last):
> >     File "/home/francois/Workspace/.../scripts/pyarrow_decimal.py", line 21, in <module>
> >       pa_arr_str = pc.cast(pa_arr_dec, pa.string())
> >     File "/home/francois/miniconda3/envs/tableau2/lib/python3.9/site-packages/pyarrow/compute.py", line 376, in cast
> >       return call_function("cast", [arr], options)
> >     File "pyarrow/_compute.pyx", line 542, in pyarrow._compute.call_function
> >     File "pyarrow/_compute.pyx", line 341, in pyarrow._compute.Function.call
> >     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
> >     File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
> >   pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string

Re: [Python] Cast decimal to string

Posted by Weston Pace <we...@gmail.com>.
If precision is not important you can cast the column to float64 first.

    >>> x = pa.array([1, 2, 3], type=pa.decimal128(6, 1))
    >>> x.cast(pa.float64()).cast(pa.string())
    <pyarrow.lib.StringArray object at 0x7fd23a52cd00>
    [
      "1",
      "2",
      "3"
    ]

If precision is important you could use python or pandas to do the
conversion to string.

    >>> pa.array([str(v) for v in x.to_pylist()])
    <pyarrow.lib.StringArray object at 0x7fd23a52cd00>
    [
      "1.0",
      "2.0",
      "3.0"
    ]

On Wed, Jul 6, 2022 at 5:45 AM François Pacull
<fr...@architecture-performance.fr> wrote:
>
> Dear Arrow team and users, I have a simple question regarding the decimal data type with pyarrow. I am trying to cast a table with decimal columns to string, or to write it to a csv file. In both cases I get the error message:
>
>     pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string
>
> I understand that is not implemented yet, but is there by chance a way to get around this?
> Thanks, François.
>
> PS: I am using Python : 3.9.13 & pyarrow : 8.0.0
> Here is a code snippet:
>
> import decimal
>
> import pyarrow as pa
> import pyarrow.compute as pc
> import pyarrow.csv
>
> PREC, SCAL = 18, 9  # decimal precision & scale
>
> context = decimal.getcontext()
> context.prec = PREC
> ref_decimal = decimal.Decimal('0.123456789')
>
> float_numbers = [0.1, 654.5, 4.65742]
> decimal_numbers = [
>     decimal.Decimal(str(f)).quantize(ref_decimal) for f in float_numbers
> ]
>
> pa_arr_dec = pa.array(
>     decimal_numbers, type=pa.decimal128(precision=PREC, scale=SCAL)
> )
> pa_arr_str = pc.cast(pa_arr_dec, pa.string())
>
>
>   Traceback (most recent call last):
>     File "/home/francois/Workspace/.../scripts/pyarrow_decimal.py", line 21, in <module>
>       pa_arr_str = pc.cast(pa_arr_dec, pa.string())
>     File "/home/francois/miniconda3/envs/tableau2/lib/python3.9/site-packages/pyarrow/compute.py", line 376, in cast
>       return call_function("cast", [arr], options)
>     File "pyarrow/_compute.pyx", line 542, in pyarrow._compute.call_function
>     File "pyarrow/_compute.pyx", line 341, in pyarrow._compute.Function.call
>     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
>     File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
>   pyarrow.lib.ArrowNotImplementedError: Unsupported cast from decimal128(18, 9) to utf8 using function cast_string