You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "candiduslynx (via GitHub)" <gi...@apache.org> on 2023/05/31 17:08:57 UTC

[GitHub] [arrow] candiduslynx opened a new pull request, #35851: [Go] Repro for nested null struct panic

candiduslynx opened a new pull request, #35851:
URL: https://github.com/apache/arrow/pull/35851

   @zeroshade I've found a disturbing behavior with structs (namely, nested ones) in pqarrow.
   
   This blocks https://github.com/cloudquery/filetypes/pull/172
   
   Could you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on pull request #35851: [Go] Repro for nested null struct panic

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on PR #35851:
URL: https://github.com/apache/arrow/pull/35851#issuecomment-1570662275

   @zeroshade I found an issue with the test & it should now pass, but I'm really confused about `array.NewStructData` behavior in conjunction with pqarrow.
   Could you please take a look at the code in https://github.com/cloudquery/filetypes/pull/172?
   Specifically, the issue I see is that the offsets/null bitmaps for the nested field are propagated strangely.
   It [panics](https://github.com/cloudquery/filetypes/actions/runs/5134969218/jobs/9239730687?pr=172#step:6:8) while trying to create Struct array from data, so I don't know where to look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on pull request #35851: MINOR [Go] Add nested struct test for pqarrow

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on PR #35851:
URL: https://github.com/apache/arrow/pull/35851#issuecomment-1571776931

   I'm closing this for now, however, it would've been nice to be able to use `array.NewStructData` on the read values


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on pull request #35851: MINOR [Go] Add nested struct test for pqarrow

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on PR #35851:
URL: https://github.com/apache/arrow/pull/35851#issuecomment-1570736903

   I guess the issue can be connected to https://github.com/apache/arrow/blob/main/go/parquet/pqarrow/encode_arrow_test.go#L1040-L1042
   ```
   // current impl of ArrayEquals for structs doesn't correctly handle nulls in the parent
   // with a non-nullable child when comparing. Since after the round trip, the data in the
   // child will have the nulls, not the original data.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx closed pull request #35851: MINOR [Go] Add nested struct test for pqarrow

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx closed pull request #35851: MINOR [Go] Add nested struct test for pqarrow
URL: https://github.com/apache/arrow/pull/35851


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35851: [Go] Repro for nested null struct panic

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35851:
URL: https://github.com/apache/arrow/pull/35851#issuecomment-1570604508

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #35851: [Go] Repro for nested null struct panic

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #35851:
URL: https://github.com/apache/arrow/pull/35851#discussion_r1212069531


##########
go/parquet/pqarrow/encode_arrow_test.go:
##########
@@ -931,6 +931,64 @@ func (ps *ParquetIOTestSuite) TestReadDecimals() {
 	ps.True(array.Equal(expected, chunked.Chunk(0)))
 }
 
+func (ps *ParquetIOTestSuite) TestReadNestedStruct() {
+	mem := memory.NewCheckedAllocator(memory.DefaultAllocator)
+	defer mem.AssertSize(ps.T(), 0)
+
+	dt := arrow.StructOf(arrow.Field{
+		Name: "nested",
+		Type: arrow.StructOf(
+			arrow.Field{Name: "bool", Type: arrow.FixedWidthTypes.Boolean},
+			arrow.Field{Name: "int32", Type: arrow.PrimitiveTypes.Int32},
+			arrow.Field{Name: "int64", Type: arrow.PrimitiveTypes.Int64},
+		),
+	})
+
+	builder := array.NewStructBuilder(mem, dt)
+	defer builder.Release()
+	nested := builder.FieldBuilder(0).(*array.StructBuilder)
+
+	builder.Append(true)
+	nested.Append(true)
+	nested.FieldBuilder(0).(*array.BooleanBuilder).Append(true)
+	nested.FieldBuilder(1).(*array.Int32Builder).Append(int32(-1))
+	nested.FieldBuilder(2).(*array.Int64Builder).Append(int64(-2))
+	builder.AppendNull()
+
+	expected := builder.NewStructArray()
+	defer expected.Release()
+
+	sc := schema.MustGroup(schema.NewGroupNode("schema", parquet.Repetitions.Required, schema.FieldList{
+		schema.Must(schema.NewPrimitiveNodeLogical("decimals", parquet.Repetitions.Required, schema.NewDecimalLogicalType(6, 3), parquet.Types.ByteArray, -1, -1)),
+	}, -1))
+
+	sink := encoding.NewBufferWriter(0, mem)
+	defer sink.Release()
+	writer := file.NewParquetWriter(sink, sc)
+
+	rgw := writer.AppendRowGroup()
+	cw, err := rgw.NextColumn()
+	ps.NoError(err)
+
+	props := pqarrow.NewArrowWriterProperties(pqarrow.WithAllocator(mem))
+	ctx := pqarrow.NewArrowWriteContext(context.TODO(), &props)
+	ps.NoError(pqarrow.WriteArrowToColumn(ctx, cw, expected, nil, nil, false))
+	ps.NoError(cw.Close())
+	ps.NoError(rgw.Close())
+	ps.NoError(writer.Close())

Review Comment:
   I'm extremely confused here, you're writing a nested struct arrow array to a column expecting a single leaf bytearray of decimal data.....? It feels like this should error at a minimum. There's no way this should work, unless I'm missing something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org