You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/06/09 18:45:00 UTC
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552379#comment-17552379 ]
ASF GitHub Bot commented on PARQUET-1020:
-----------------------------------------
dossett commented on code in PR #963:
URL: https://github.com/apache/parquet-mr/pull/963#discussion_r893846424
##########
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java:
##########
@@ -115,27 +120,32 @@ public void prepareForWrite(RecordConsumer recordConsumer) {
public WriteContext init(Configuration configuration) {
// if no protobuf descriptor was given in constructor, load descriptor from configuration (set with setProtobufClass)
- if (protoMessage == null) {
- Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class);
- if (pbClass != null) {
- protoMessage = pbClass;
- } else {
- String msg = "Protocol buffer class not specified.";
- String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method.";
- throw new BadConfigurationException(msg + hint);
+ if (descriptor == null) {
+ if (protoMessage == null) {
+ Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class);
+ if (pbClass != null) {
+ protoMessage = pbClass;
+ } else {
+ String msg = "Protocol buffer class or descriptor not specified.";
+ String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method.";
+ throw new BadConfigurationException(msg + hint);
+ }
}
+ descriptor = Protobufs.getMessageDescriptor(protoMessage);
+ } else {
+ //Assume no specific Message extending class, so use DynamicMessage
+ protoMessage = DynamicMessage.class;
Review Comment:
Should this just be left null? Having it set implies it usable, but it's only used to determine the descriptor and get a protoschemaconverter and then only in cases when we don't have a descriptor.
I just worry that setting it could lead to assumptions later on that it's as usable as if it were a generated message and not a dynamic message.
##########
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java:
##########
@@ -115,27 +120,32 @@ public void prepareForWrite(RecordConsumer recordConsumer) {
public WriteContext init(Configuration configuration) {
// if no protobuf descriptor was given in constructor, load descriptor from configuration (set with setProtobufClass)
- if (protoMessage == null) {
- Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class);
- if (pbClass != null) {
- protoMessage = pbClass;
- } else {
- String msg = "Protocol buffer class not specified.";
- String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method.";
- throw new BadConfigurationException(msg + hint);
+ if (descriptor == null) {
+ if (protoMessage == null) {
+ Class<? extends Message> pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class);
+ if (pbClass != null) {
+ protoMessage = pbClass;
+ } else {
+ String msg = "Protocol buffer class or descriptor not specified.";
+ String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method.";
+ throw new BadConfigurationException(msg + hint);
+ }
}
+ descriptor = Protobufs.getMessageDescriptor(protoMessage);
+ } else {
+ //Assume no specific Message extending class, so use DynamicMessage
+ protoMessage = DynamicMessage.class;
Review Comment:
Should this just be left null? Having it set implies it usable, but it's only used to determine the descriptor and get a protoschemaconverter and then only in cases when we don't have a descriptor.
I just worry that setting it could lead to assumptions later on that it's as usable as if it were a generated message and not a dynamic message.
> Add support for Dynamic Messages in parquet-protobuf
> ----------------------------------------------------
>
> Key: PARQUET-1020
> URL: https://issues.apache.org/jira/browse/PARQUET-1020
> Project: Parquet
> Issue Type: New Feature
> Reporter: Alex Buck
> Assignee: Alex Buck
> Priority: Major
>
> Hello. We would like to pass in a DynamicMessage rather than using the generated protobuf classes to allow us to make our job very generic.
> I think this could be achieved by setting the descriptor upfront, similarly to how there is a ProtoParquetOutputFormat today.
> In ProtoWriteSupport in the init method it could then generate the parquet schema created by ProtoSchemaConverter using the passed in descriptor, rather than taking it from the generated proto class.
> Would there be interest in incorporating this change? If so does the approach above sound sensible? I am happy to do a pull request
> initial PR here: https://github.com/apache/parquet-mr/pull/414
--
This message was sent by Atlassian Jira
(v8.20.7#820007)