You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matthew Topol (Jira)" <ji...@apache.org> on 2022/03/16 16:22:00 UTC

[jira] [Commented] (ARROW-15733) array.String offsets int32 overflow

    [ https://issues.apache.org/jira/browse/ARROW-15733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507704#comment-17507704 ] 

Matthew Topol commented on ARROW-15733:
---------------------------------------

[~astrelsky] This is the use case for the Large String arrow type which is not yet implemented in the Go library (same for the Large List / Large Binary types. Those types explicitly use a 64-bit integer for their offsets while the non-Large types use a 32-bit integer for their offsets. This issue should be changed to be an enhancement to implement the Large String type rather than be a bug.

> array.String offsets int32 overflow
> -----------------------------------
>
>                 Key: ARROW-15733
>                 URL: https://issues.apache.org/jira/browse/ARROW-15733
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go
>    Affects Versions: 7.0.0
>            Reporter: Andrew Strelsky
>            Priority: Minor
>
> {panel}
> panic: runtime error: slice bounds out of range [:-1352393031]
> goroutine 1 [running]:
> github.com/apache/arrow/go/v7/arrow/array.(*String).ValueBytes(...)
>         C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/array/string.go:74
> github.com/apache/arrow/go/v7/arrow/ipc.(*recordEncoder).visit(0xc193b85c80, 0xc193b9e060, \{0x10b5490, 0xc000050820})
>         C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/writer.go:435 +0x2194
> github.com/apache/arrow/go/v7/arrow/ipc.(*recordEncoder).visit(0xc193b85c80, 0xc193b9e060, \{0x10b5288, 0xc000050730})
>         C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/writer.go:533 +0x1431
> github.com/apache/arrow/go/v7/arrow/ipc.(*recordEncoder).Encode(0xc193b85c80, 0xc193b9e060, \{0x10b5838, 0xc193b8bc80})
>         C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/writer.go:267 +0x98
> github.com/apache/arrow/go/v7/arrow/ipc.(*FileWriter).Write(0xc00004e480, \{0x10b5838, 0xc193b8bc80})
>         C:/Users/astre/Documents/go/pkg/mod/github.com/apache/arrow/go/v7@v7.0.0/arrow/ipc/file_writer.go:342 +0x20d
> main.main()
> {panel}
> I have *a lot* of strings. The offsets should not only be unsigned but should also be larger than 4 bytes. Changing the offsets to a slice of uint32 was sufficient in my case but may not be for others.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)