You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2022/09/05 12:44:04 UTC

[GitHub] [arrow-julia] svilupp opened a new issue, #335: Inconsistent handling of eltype Decimals.Decimal (with silent errors?)

svilupp opened a new issue, #335:
URL: https://github.com/apache/arrow-julia/issues/335

   First of all, thank you for the amazing package! I have noticed unexpected behaviour that I wanted to point out.
   
   **Expected behaviour:** rational numbers like 1.0 and 0.1 will be represented as Float; they can be saved and loaded again.
   
   **Actual behaviour:** 
   When writing column with eltype Decimals.Decimal, `Arrow.write(filename,df)` will give a method error (see below) and `Arrow.write(filename,df;compress=:lz4)` will complete without an error, but the resulting table is wrong when re-read (see MWE below).
   
   I've had a quick look at the code base and I cannot see any type checks - are those left to the user / MethodErrors?
   
   MWE:
   ```
   using Decimals
   using DataFrames, Arrow
   
   df=DataFrame(:a=>[Decimal(2.0)])
   
   # this will fail with error that Decimal cannot be saved
   Arrow.write("test.feather", df)
   #     nested task error: MethodError: no method matching write(::IOBuffer, ::Decimals.Decimal)
   
   # this will succeed
   Arrow.write("test.feather", df;compress=:lz4)
   
   # but the loaded dataframe will be rubbish
   df2=Arrow.Table("test.feather")|>DataFrame
   # 1×1 DataFrame
   #  Row │ a
   #      │ Float64
   # ─────┼─────────────
   #    1 │ 2.1509e-314
   
   ```
   
   Error stack trace from Arrow.write() without a keyword argument:
   > ERROR: TaskFailedException
   Stacktrace:
    [1] wait
      @ ./task.jl:345 [inlined]
    [2] close(writer::Arrow.Writer{IOStream})
      @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:230
    [3] open(::Arrow.var"#120#121"{DataFrame}, ::Type, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:file,), Tuple{Bool}}})
      @ Base ./io.jl:386
    [4] #write#119
      @ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:57 [inlined]
    [5] write(file_path::String, tbl::DataFrame)
      @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:56
    [6] top-level scope
      @ REPL[14]:1
   > nested task error: MethodError: no method matching write(::IOBuffer, ::Decimals.Decimal)
       Closest candidates are:
         write(::IO, ::Any) at io.jl:672
         write(::IO, ::Any, ::Any...) at io.jl:673
         write(::Base.GenericIOBuffer, ::UInt8) at iobuffer.jl:442
         ...
       Stacktrace:
        [1] write(io::IOBuffer, x::Decimals.Decimal)
          @ Base ./io.jl:672
        [2] writearray(io::IOStream, #unused#::Type{Decimals.Decimal}, col::Vector{Union{Missing, Decimals.Decimal}})
          @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/utils.jl:50
        [3] writebuffer(io::IOStream, col::Arrow.Primitive{Union{Missing, Decimals.Decimal}, Vector{Union{Missing, Decimals.Decimal}}}, alignment::Int64)
          @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/arraytypes/primitive.jl:102
        [4] write(io::IOStream, msg::Arrow.Message, blocks::Tuple{Vector{Arrow.Block}, Vector{Arrow.Block}}, sch::Base.RefValue{Tables.Schema}, alignment::Int64)
          @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:365
        [5] macro expansion
          @ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:149 [inlined]
        [6] (::Arrow.var"#122#124"{IOStream, Int64, Tuple{Vector{Arrow.Block}, Vector{Arrow.Block}}, Base.RefValue{Tables.Schema}, Arrow.OrderedChannel{Arrow.Message}})()
          @ Arrow ./threadingconstructs.jl:258
   
   
   **Package version**
     [69666777] Arrow v2.3.0
     [a93c6f00] DataFrames v1.3.4
     [194296ae] LibPQ v1.14.0
   
   **versioninfo()** (but it was the same on 1.7)
   Julia Version 1.8.0
   Commit 5544a0fab76 (2022-08-17 13:38 UTC)
   Platform Info:
   OS: macOS (arm64-apple-darwin21.3.0)
   CPU: 8 × Apple M1 Pro
   WORD_SIZE: 64
   LIBM: libopenlibm
   LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
   Threads: 6 on 6 virtual cores


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org