You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Chloe Thonin (JIRA)" <ji...@apache.org> on 2019/01/25 14:34:00 UTC

[jira] [Created] (BEAM-6507) Go SDK : Cannot use BigQuery compatible Structs in PCollections

Chloe Thonin created BEAM-6507:
----------------------------------

             Summary: Go SDK : Cannot use BigQuery compatible Structs in PCollections
                 Key: BEAM-6507
                 URL: https://issues.apache.org/jira/browse/BEAM-6507
             Project: Beam
          Issue Type: Bug
          Components: sdk-go
    Affects Versions: 2.8.0
         Environment: Ubuntu 18.04 (bionic), golang version 1.10.4, latest beam sdk for Golang, Beam worker : Dataflow (GCP)
            Reporter: Chloe Thonin
            Assignee: Robert Burke


I want to create a PCollection of objects of this type :

 

{{type Ticket struct {}}
{{ Uid string `bigquery:"uid"`}}
{{ ShopUid string `bigquery:"shop_uid"`}}
{{ Zone TicketZone `bigquery:"zone"`}}

{{ TicketType string `bigquery:"type_ticket"`}}
{{ OperationType string `bigquery:"type_operation"`}}

{{ DateTime time.Time `bigquery:"datetime"`}}
{{ ProcessedAt time.Time `bigquery:"processed_at"`}}

{{ Clients int `bigquery:"clients"`}}
{{ Table *TicketTable `bigquery:"table,nullable"`}}

{{ Date TicketDate `bigquery:"date"`}}
{{ Time TicketTime `bigquery:"time"`}}

{{ Total TicketTotal `bigquery:"total"`}}
{{ Article []TicketArticle `bigquery:"article"`}}
{{ Encaissement []TicketEncaissement `bigquery:"encaissement"`}}
{{}}}

{{type TicketTable struct \{
   Numero     int                `bigquery:"numero"`
   SousNumero bigquery.NullInt64 `bigquery:"sous_numero"`
   Couverts   int                `bigquery:"couverts"`
}}}

 

The process is to read raw XML data from GCP PubSub and process it to build a PCollection of "Tickets" so they can be sent in bulk to BigQuery using bigqueryio.Write()

 
{code:java}
ticketsCol, tableCol := beam.ParDo2(s, processXml, windowedCol)
{code}
Our processXml function definition is :

 

 
{code:java}
func processXml(
  input *pb.PubsubMessage, 
  gcs func(types.Document), 
  table func(types.Ticket)) (error) {

  // ...
  str := fmt.Sprintf("%s", input.Data)
  doc, err := xmlquery.Parse(strings.NewReader(str))
  if err != nil {
     panic(err)
     fmt.Println(doc)
  }
  ticketTest := new(types.Ticket)
  ticketTest.GetTicket(doc)

  table(*ticketTest)
  // ...

 return nil
}{code}
 

The code successfully build but panic at runtime with this error :

 
{code:java}
panic: invalid DoFn: bad parameter type for main.processXml: func(types.Ticket){code}
 

We narrowed it down to InConcrete(t reflect.Type) in core/typex/class.go (line 116) : the type is Struct but it contains a non concrete field (*TicketTable)

Do you know if there is a workaround so we can build a PCollection of objects with nullable fields from bigquery POV ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)