You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/02/28 19:58:07 UTC

[GitHub] amalakar opened a new pull request #7865: [FLINK-9650] [formats] add support for protobuf objects

amalakar opened a new pull request #7865: [FLINK-9650] [formats] add support for protobuf objects
URL: https://github.com/apache/flink/pull/7865
 
 
   flink-protobuf
   ==========
   
   This library adds support to flink for running sql against protobuf objects. Flink as of now
   supports avro and json files backed by JsonSchema only. To add support for sql, flink needs to know 
   the TypeInformation, this library provides TypeInformation for protobuf object.
   
   It uses protobuf apis to retrieve fields and types of a prorobuf object and than provides the
   field name, and type as a [PojoField](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/PojoField.java) to flink.
   
   Current limitations:
   
   - In protobuf object field names have underscore at the end like `loggedAt_`, so in the sql it needs
   to be referred as `loggedAt_` instead of `logged_at`. This should be fixable in flink apis, but 
   would need some digging around in the code. If we whitelist `Message` classes in `PojoField` that should help.
   
   - Some fields are not supported yet like `Enum` etc, but should be trivial to add support.
   
   With this it is posisble to run a query like the following in the stream of say `ride_requested`
   
   ```sql
   SELECT region_,
          count(*)
   FROM people
   WHERE currentAge_ > 40
     AND region_ IN ('SFO',
                    'BKN')
   GROUP BY region_
   ```
   
   Note: I have been a bit hasty to get this out, as this was sitting in our internal repo for a while and I haven't had the time to clean it up to make it flink ready. But also wanted to get the code out if someone wants to work on it they can work off this code rather than working on it from scratch. We have been using this for close to an year in production. Due to other commitments I may not get a chance to work on coding style/review comments immediately, so wouldn't mind if someone wants to improve this before merge. For example some there are pending TODO items like enum support/change in `PojoField` to make the sql nicer (no underscore) etc. 
   
   (Apologize for not conforming to the coding style and the rest of the guidelines yet, hoping it is still useful as a beta version patch and someone may find this useful).
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services