You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Robert Purdom (Jira)" <ji...@apache.org> on 2022/07/22 02:00:00 UTC

[jira] [Updated] (ARROW-17169) [Go] goPanicIndex in firstTimeBitmapWriter.Finish()

     [ https://issues.apache.org/jira/browse/ARROW-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Purdom updated ARROW-17169:
----------------------------------
    Description: 
I'm working with complex parquet files with 500+ "root" columns where some fields are lists of structs, internally referred to as 'topics'.  Some of these structs have 100's of columns.  When reading a particular topic, I get an Index Panic at the line indicated below. This error occurs when the value for the topic is Null, as in, for this particular root record, this topic has no data.  The root is household data, the topic is auto, so the error occurs when the household has no autos.  The auto field is a Nullable List of Struct.

 
{code:go}
/* Finish() was called from defLevelsToBitmapInternal.

data values when panic occurs....
bw.length == 17531
bw.bitMask == 1
bw.pos == 3424
bw.length == 17531
len(bw.Buf) == 428
cap(bw.Buf) == 448
bw.byteOffset == 428
bw.curByte == 0
*/

// bitmap_writer.go
func (bw *firstTimeBitmapWriter) Finish() {
// store curByte into the bitmap
     if bw.length >0&& bw.bitMask !=0x01|| bw.pos < bw.length {
          bw.buf[int(bw.byteOffset)] = bw.curByte   // <---- Panic index
     }
}
{code}
In every case, when the panic occurs, bw.byteOffset == len(bw.Buf). I tested the below modification and it does remedy the bug. However, it's probably only masking the actual bug.
{code:go}
// Test version: No Panic
func (bw *firstTimeBitmapWriter) Finish() {
	// store curByte into the bitmap
	if bw.length > 0 && bw.bitMask != 0x01 || bw.pos < bw.length {
                if int(bw.byteOffset) == len(bw.Buf) {
                     bw.buf = append(bw.buf, bw.curByte)
                } else {
		     bw.buf[int(bw.byteOffset)] = bw.curByte
               }
	}
}{code}

  was:
I'm working with complex parquet files with 500+ "root" columns where some fields are lists of structs, internally referred to as 'topics'.  Some of these structs have 100's of columns.  When reading a particular topic, I get an Index Panic at the line indicated below. This error occurs when the value for the topic is Null, as in, for this particular root record, this topic has no data.  The root is household data, the topic is auto, so the error occurs when the household has no autos.  The auto field is a Nullable List of Struct.

 
{code:go}
/* Finish() was called from defLevelsToBitmapInternal.

data values when panic occurs....
bw.length == 17531
bw.bitMask == 1
bw.pos == 3424
bw.length == 17531
len(bw.Buf) == 428
cap(bw.Buf) == 448
bw.byteOffset == 428
bw.curByte == 0
*/

// bitmap_writer.go
func (bw *firstTimeBitmapWriter) Finish() {
// store curByte into the bitmap
     if bw.length >0&& bw.bitMask !=0x01|| bw.pos < bw.length {
          bw.buf[int(bw.byteOffset)] = bw.curByte   // <---- Panic index
     }
}
{code}
In every case, when the panic occurs, bw.byteOffset == len(bw.Buf). I tested the below modification and it does remedy the bug. However, it's probably only masking the actual bug.
{code:go}
// Test version: No Panic
func (bw *firstTimeBitmapWriter) Finish() {
	// store curByte into the bitmap
	if bw.length > 0 && bw.bitMask != 0x01 || bw.pos < bw.length {
                if bw.byteOffset == len(bw.Buf) {
                     bw.buf = append(bw.buf, bw.curByte)
                } else {
		     bw.buf[int(bw.byteOffset)] = bw.curByte
               }
	}
}{code}


> [Go] goPanicIndex in firstTimeBitmapWriter.Finish()
> ---------------------------------------------------
>
>                 Key: ARROW-17169
>                 URL: https://issues.apache.org/jira/browse/ARROW-17169
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go
>    Affects Versions: 9.0.0, 8.0.1
>         Environment: go (1.18.3), Linux, AMD64
>            Reporter: Robert Purdom
>            Priority: Critical
>
> I'm working with complex parquet files with 500+ "root" columns where some fields are lists of structs, internally referred to as 'topics'.  Some of these structs have 100's of columns.  When reading a particular topic, I get an Index Panic at the line indicated below. This error occurs when the value for the topic is Null, as in, for this particular root record, this topic has no data.  The root is household data, the topic is auto, so the error occurs when the household has no autos.  The auto field is a Nullable List of Struct.
>  
> {code:go}
> /* Finish() was called from defLevelsToBitmapInternal.
> data values when panic occurs....
> bw.length == 17531
> bw.bitMask == 1
> bw.pos == 3424
> bw.length == 17531
> len(bw.Buf) == 428
> cap(bw.Buf) == 448
> bw.byteOffset == 428
> bw.curByte == 0
> */
> // bitmap_writer.go
> func (bw *firstTimeBitmapWriter) Finish() {
> // store curByte into the bitmap
>      if bw.length >0&& bw.bitMask !=0x01|| bw.pos < bw.length {
>           bw.buf[int(bw.byteOffset)] = bw.curByte   // <---- Panic index
>      }
> }
> {code}
> In every case, when the panic occurs, bw.byteOffset == len(bw.Buf). I tested the below modification and it does remedy the bug. However, it's probably only masking the actual bug.
> {code:go}
> // Test version: No Panic
> func (bw *firstTimeBitmapWriter) Finish() {
> 	// store curByte into the bitmap
> 	if bw.length > 0 && bw.bitMask != 0x01 || bw.pos < bw.length {
>                 if int(bw.byteOffset) == len(bw.Buf) {
>                      bw.buf = append(bw.buf, bw.curByte)
>                 } else {
> 		     bw.buf[int(bw.byteOffset)] = bw.curByte
>                }
> 	}
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)