You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2022/09/05 12:44:04 UTC
[GitHub] [arrow-julia] svilupp opened a new issue, #335: Inconsistent handling of eltype Decimals.Decimal (with silent errors?)
svilupp opened a new issue, #335:
URL: https://github.com/apache/arrow-julia/issues/335
First of all, thank you for the amazing package! I have noticed unexpected behaviour that I wanted to point out.
**Expected behaviour:** rational numbers like 1.0 and 0.1 will be represented as Float; they can be saved and loaded again.
**Actual behaviour:**
When writing column with eltype Decimals.Decimal, `Arrow.write(filename,df)` will give a method error (see below) and `Arrow.write(filename,df;compress=:lz4)` will complete without an error, but the resulting table is wrong when re-read (see MWE below).
I've had a quick look at the code base and I cannot see any type checks - are those left to the user / MethodErrors?
MWE:
```
using Decimals
using DataFrames, Arrow
df=DataFrame(:a=>[Decimal(2.0)])
# this will fail with error that Decimal cannot be saved
Arrow.write("test.feather", df)
# nested task error: MethodError: no method matching write(::IOBuffer, ::Decimals.Decimal)
# this will succeed
Arrow.write("test.feather", df;compress=:lz4)
# but the loaded dataframe will be rubbish
df2=Arrow.Table("test.feather")|>DataFrame
# 1×1 DataFrame
# Row │ a
# │ Float64
# ─────┼─────────────
# 1 │ 2.1509e-314
```
Error stack trace from Arrow.write() without a keyword argument:
> ERROR: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:345 [inlined]
[2] close(writer::Arrow.Writer{IOStream})
@ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:230
[3] open(::Arrow.var"#120#121"{DataFrame}, ::Type, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:file,), Tuple{Bool}}})
@ Base ./io.jl:386
[4] #write#119
@ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:57 [inlined]
[5] write(file_path::String, tbl::DataFrame)
@ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:56
[6] top-level scope
@ REPL[14]:1
> nested task error: MethodError: no method matching write(::IOBuffer, ::Decimals.Decimal)
Closest candidates are:
write(::IO, ::Any) at io.jl:672
write(::IO, ::Any, ::Any...) at io.jl:673
write(::Base.GenericIOBuffer, ::UInt8) at iobuffer.jl:442
...
Stacktrace:
[1] write(io::IOBuffer, x::Decimals.Decimal)
@ Base ./io.jl:672
[2] writearray(io::IOStream, #unused#::Type{Decimals.Decimal}, col::Vector{Union{Missing, Decimals.Decimal}})
@ Arrow ~/.julia/packages/Arrow/ZlMFU/src/utils.jl:50
[3] writebuffer(io::IOStream, col::Arrow.Primitive{Union{Missing, Decimals.Decimal}, Vector{Union{Missing, Decimals.Decimal}}}, alignment::Int64)
@ Arrow ~/.julia/packages/Arrow/ZlMFU/src/arraytypes/primitive.jl:102
[4] write(io::IOStream, msg::Arrow.Message, blocks::Tuple{Vector{Arrow.Block}, Vector{Arrow.Block}}, sch::Base.RefValue{Tables.Schema}, alignment::Int64)
@ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:365
[5] macro expansion
@ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:149 [inlined]
[6] (::Arrow.var"#122#124"{IOStream, Int64, Tuple{Vector{Arrow.Block}, Vector{Arrow.Block}}, Base.RefValue{Tables.Schema}, Arrow.OrderedChannel{Arrow.Message}})()
@ Arrow ./threadingconstructs.jl:258
**Package version**
[69666777] Arrow v2.3.0
[a93c6f00] DataFrames v1.3.4
[194296ae] LibPQ v1.14.0
**versioninfo()** (but it was the same on 1.7)
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
CPU: 8 × Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 6 on 6 virtual cores
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org