You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Moelf (via GitHub)" <gi...@apache.org> on 2023/04/07 21:45:16 UTC

[GitHub] [arrow-julia] Moelf commented on issue #411: `Vector{UInt8}` mis-represented in metadata when writing to disk

Moelf commented on issue #411:
URL: https://github.com/apache/arrow-julia/issues/411#issuecomment-1500670163

   I did some digging
   ```diff
   diff --git a/src/arraytypes/arraytypes.jl b/src/arraytypes/arraytypes.jl
   index f3cee5d..a338004 100644
   --- a/src/arraytypes/arraytypes.jl
   +++ b/src/arraytypes/arraytypes.jl
   @@ -34,7 +34,9 @@ Base.deleteat!(x::T, inds) where {T <: ArrowVector} = throw(ArgumentError("`$T`
    function toarrowvector(x, i=1, de=Dict{Int64, Any}(), ded=DictEncoding[], meta=getmetadata(x); compression::Union{Nothing, Vector{LZ4FrameCompressor}, LZ4FrameCompressor, Vector{ZstdCompressor}, ZstdCompressor}=nothing, kw...)
        @debugv 2 "converting top-level column to arrow format: col = $(typeof(x)), compression = $compression, kw = $(values(kw))"
        @debugv 3 x
   +    @show typeof(x)
        A = arrowvector(x, i, 0, 0, de, ded, meta; compression=compression, kw...)
   +    @show typeof(A)
        if compression isa LZ4FrameCompressor
            A = compress(Meta.CompressionTypes.LZ4_FRAME, compression, A)
        elseif compression isa Vector{LZ4FrameCompressor}
   ```
   ```julia
   julia> data = (; x = [[0x01, 0x02], UInt8[], [0x03]], y = [[0, 1], Int[], [2,3]])
   (x = Vector{UInt8}[[0x01, 0x02], [], [0x03]], y = [[0, 1], Int64[], [2, 3]])
   
   julia> Arrow.write("/tmp/bug411.feather", data)
   typeof(x) = Vector{Vector{UInt8}}
   typeof(A) = Arrow.List{Vector{UInt8}, Int32, Arrow.ToList{UInt8, false, Vector{UInt8}, Int32}}
   typeof(x) = Vector{Vector{Int64}}
   typeof(A) = Arrow.List{Vector{Int64}, Int32, Arrow.Primitive{Int64, Arrow.ToList{Int64, false, Vector{Int64}, Int32}}}
   "/tmp/bug411.feather"
   ```
   
   the question is why `UInt8` is built `ToList` while `Int64` is Primitive while both of them seem to be possible primitive https://arrow.apache.org/docs/python/generated/pyarrow.uint8.html#pyarrow.uint8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org