You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Burke (Jira)" <ji...@apache.org> on 2022/01/13 19:30:00 UTC

[jira] [Updated] (BEAM-6507) Go SDK : Cannot use BigQuery compatible Structs in PCollections

     [ https://issues.apache.org/jira/browse/BEAM-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Burke updated BEAM-6507:
-------------------------------
    Resolution: Abandoned
        Status: Resolved  (was: Open)

User never replied about the posted comment working or not. Marking as abandoned.

> Go SDK : Cannot use BigQuery compatible Structs in PCollections
> ---------------------------------------------------------------
>
>                 Key: BEAM-6507
>                 URL: https://issues.apache.org/jira/browse/BEAM-6507
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>    Affects Versions: Not applicable
>         Environment: Ubuntu 18.04 (bionic), golang version 1.10.4, latest beam sdk for Golang, Beam worker : Dataflow (GCP)
>            Reporter: Chloe Thonin
>            Priority: P3
>
> I want to create a PCollection of objects of this type :
>  
> {noformat}
> type Ticket struct {
>   Uid string `bigquery:"uid"`
>   ShopUid string `bigquery:"shop_uid"`
>   Zone TicketZone `bigquery:"zone"`
>   TicketType string `bigquery:"type_ticket"`
>   OperationType string `bigquery:"type_operation"`
>   DateTime time.Time `bigquery:"datetime"`
>   ProcessedAt time.Time `bigquery:"processed_at"`
>   Clients int `bigquery:"clients"`
>   Table *TicketTable `bigquery:"table,nullable"`
>   Date TicketDate `bigquery:"date"`
>   Time TicketTime `bigquery:"time"`
>   Total TicketTotal `bigquery:"total"`
>   Article []TicketArticle `bigquery:"article"`
>   Encaissement []TicketEncaissement `bigquery:"encaissement"`
> }
> {noformat}
>  
>  
> {noformat}
> type TicketTable struct {
>   Numero int `bigquery:"numero"`
>   SousNumero bigquery.NullInt64 `bigquery:"sous_numero"`
>   Couverts int `bigquery:"couverts"`
> }{noformat}
>  
>  The process is to read raw XML data from GCP PubSub and process it to build a PCollection of "Tickets" so they can be sent in bulk to BigQuery using bigqueryio.Write()
>  
> {code:java}
> ticketsCol, tableCol := beam.ParDo2(s, processXml, windowedCol)
> {code}
> Our processXml function definition is :
>  
>  
> {code:java}
> func processXml(
>   input *pb.PubsubMessage, 
>   gcs func(types.Document), 
>   table func(types.Ticket)) (error) {
>   // ...
>   str := fmt.Sprintf("%s", input.Data)
>   doc, err := xmlquery.Parse(strings.NewReader(str))
>   if err != nil {
>      panic(err)
>      fmt.Println(doc)
>   }
>   ticketTest := new(types.Ticket)
>   ticketTest.GetTicket(doc)
>   table(*ticketTest)
>   // ...
>  return nil
> }{code}
>  
> The code successfully build but panic at runtime with this error :
>  
> {code:java}
> panic: invalid DoFn: bad parameter type for main.processXml: func(types.Ticket){code}
>  
> We narrowed it down to InConcrete(t reflect.Type) in core/typex/class.go (line 116) : the type is Struct but it contains a non concrete field (*TicketTable)
> Do you know if there is a workaround so we can build a PCollection of objects with nullable fields from bigquery POV ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)