You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/02/28 19:58:07 UTC
[GitHub] amalakar opened a new pull request #7865: [FLINK-9650] [formats]
add support for protobuf objects
amalakar opened a new pull request #7865: [FLINK-9650] [formats] add support for protobuf objects
URL: https://github.com/apache/flink/pull/7865
flink-protobuf
==========
This library adds support to flink for running sql against protobuf objects. Flink as of now
supports avro and json files backed by JsonSchema only. To add support for sql, flink needs to know
the TypeInformation, this library provides TypeInformation for protobuf object.
It uses protobuf apis to retrieve fields and types of a prorobuf object and than provides the
field name, and type as a [PojoField](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/PojoField.java) to flink.
Current limitations:
- In protobuf object field names have underscore at the end like `loggedAt_`, so in the sql it needs
to be referred as `loggedAt_` instead of `logged_at`. This should be fixable in flink apis, but
would need some digging around in the code. If we whitelist `Message` classes in `PojoField` that should help.
- Some fields are not supported yet like `Enum` etc, but should be trivial to add support.
With this it is posisble to run a query like the following in the stream of say `ride_requested`
```sql
SELECT region_,
count(*)
FROM people
WHERE currentAge_ > 40
AND region_ IN ('SFO',
'BKN')
GROUP BY region_
```
Note: I have been a bit hasty to get this out, as this was sitting in our internal repo for a while and I haven't had the time to clean it up to make it flink ready. But also wanted to get the code out if someone wants to work on it they can work off this code rather than working on it from scratch. We have been using this for close to an year in production. Due to other commitments I may not get a chance to work on coding style/review comments immediately, so wouldn't mind if someone wants to improve this before merge. For example some there are pending TODO items like enum support/change in `PojoField` to make the sql nicer (no underscore) etc.
(Apologize for not conforming to the coding style and the rest of the guidelines yet, hoping it is still useful as a beta version patch and someone may find this useful).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services