You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Chloe Thonin (JIRA)" <ji...@apache.org> on 2019/01/25 14:34:00 UTC
[jira] [Created] (BEAM-6507) Go SDK : Cannot use BigQuery
compatible Structs in PCollections
Chloe Thonin created BEAM-6507:
----------------------------------
Summary: Go SDK : Cannot use BigQuery compatible Structs in PCollections
Key: BEAM-6507
URL: https://issues.apache.org/jira/browse/BEAM-6507
Project: Beam
Issue Type: Bug
Components: sdk-go
Affects Versions: 2.8.0
Environment: Ubuntu 18.04 (bionic), golang version 1.10.4, latest beam sdk for Golang, Beam worker : Dataflow (GCP)
Reporter: Chloe Thonin
Assignee: Robert Burke
I want to create a PCollection of objects of this type :
{{type Ticket struct {}}
{{ Uid string `bigquery:"uid"`}}
{{ ShopUid string `bigquery:"shop_uid"`}}
{{ Zone TicketZone `bigquery:"zone"`}}
{{ TicketType string `bigquery:"type_ticket"`}}
{{ OperationType string `bigquery:"type_operation"`}}
{{ DateTime time.Time `bigquery:"datetime"`}}
{{ ProcessedAt time.Time `bigquery:"processed_at"`}}
{{ Clients int `bigquery:"clients"`}}
{{ Table *TicketTable `bigquery:"table,nullable"`}}
{{ Date TicketDate `bigquery:"date"`}}
{{ Time TicketTime `bigquery:"time"`}}
{{ Total TicketTotal `bigquery:"total"`}}
{{ Article []TicketArticle `bigquery:"article"`}}
{{ Encaissement []TicketEncaissement `bigquery:"encaissement"`}}
{{}}}
{{type TicketTable struct \{
Numero int `bigquery:"numero"`
SousNumero bigquery.NullInt64 `bigquery:"sous_numero"`
Couverts int `bigquery:"couverts"`
}}}
The process is to read raw XML data from GCP PubSub and process it to build a PCollection of "Tickets" so they can be sent in bulk to BigQuery using bigqueryio.Write()
{code:java}
ticketsCol, tableCol := beam.ParDo2(s, processXml, windowedCol)
{code}
Our processXml function definition is :
{code:java}
func processXml(
input *pb.PubsubMessage,
gcs func(types.Document),
table func(types.Ticket)) (error) {
// ...
str := fmt.Sprintf("%s", input.Data)
doc, err := xmlquery.Parse(strings.NewReader(str))
if err != nil {
panic(err)
fmt.Println(doc)
}
ticketTest := new(types.Ticket)
ticketTest.GetTicket(doc)
table(*ticketTest)
// ...
return nil
}{code}
The code successfully build but panic at runtime with this error :
{code:java}
panic: invalid DoFn: bad parameter type for main.processXml: func(types.Ticket){code}
We narrowed it down to InConcrete(t reflect.Type) in core/typex/class.go (line 116) : the type is Struct but it contains a non concrete field (*TicketTable)
Do you know if there is a workaround so we can build a PCollection of objects with nullable fields from bigquery POV ?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)