You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/08/03 13:55:04 UTC

[GitHub] [beam] egalpin commented on a change in pull request #15257: [BEAM-12601] Add append-only option

egalpin commented on a change in pull request #15257:
URL: https://github.com/apache/beam/pull/15257#discussion_r681775815



##########
File path: sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1512,10 +1528,14 @@ static String createBulkApiEntity(DocToBulk spec, String document, int backendVe
           isDelete = spec.getIsDeleteFn().apply(parsedDocument);
         }
       }
+      final boolean isAppendOnly = Boolean.TRUE.equals(spec.getAppendOnly());

Review comment:
       I also think that we should be consistent in how we make use of the various settings: either we should use the approach of `Boolean.TRUE.equals(...)` or the existing non-nullable approach setting a default via builder.  I personally prefer the method you've used for its flexibility and ease of maintenance over default values set via builder.
   
   Could you please update `getUsePartialUpdate` to be used in the same manner?
   

##########
File path: sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1333,6 +1337,18 @@ public DocToBulk withUsePartialUpdate(boolean usePartialUpdate) {
       return builder().setUsePartialUpdate(usePartialUpdate).build();
     }
 
+    /**
+     * Provide an instruction to control whether the target index should be considered append-only.
+     * For append-only indexes and/or data streams, only {@code create} operations will be issued.
+     * Updates and deletions are not allowed, so related options will be ignored.
+     *
+     * @param appendOnly set to true to allow one document appending

Review comment:
       nit:  typo I think.  `to allow only` rather than `to allow one`

##########
File path: sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1333,6 +1337,18 @@ public DocToBulk withUsePartialUpdate(boolean usePartialUpdate) {
       return builder().setUsePartialUpdate(usePartialUpdate).build();
     }
 
+    /**
+     * Provide an instruction to control whether the target index should be considered append-only.
+     * For append-only indexes and/or data streams, only {@code create} operations will be issued.

Review comment:
       One interesting point about create Vs. index:  unless specifying one's own document IDs, the two methods result in  the same behaviour[1].  I think having a note here in the docstring explaining that point would be a nice-to-have addition. Really, we should rely on users to consult the Elasticsearch (or any Sink's) docs to determine their needs, but illustrating that the 2 options are effectively equivalent depending on other settings might prompt users to double check the ES docs for their use case.
   
   [1] https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#docs-bulk-api-desc

##########
File path: sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1512,10 +1528,14 @@ static String createBulkApiEntity(DocToBulk spec, String document, int backendVe
           isDelete = spec.getIsDeleteFn().apply(parsedDocument);
         }
       }
+      final boolean isAppendOnly = Boolean.TRUE.equals(spec.getAppendOnly());
 
       if (isDelete) {
+        checkState(!isAppendOnly, "No deletions allowed for append-only indices");
         // delete request used for deleting a document
         return String.format("{ \"delete\" : %s }%n", documentMetadata);
+      } else if (isAppendOnly) {

Review comment:
       I wonder about this control structure a bit. The `isDelete` case is special because its truthiness will change depending on the document and a function applied to that document. The other settings all always have constant behaviour regardless of input document. 
   
   Given the number of branches here, I think it would be best to restructure this section to return early whenever possible and reduce the nesting of the logic
   
   So with both those points said, I think we should unnest the settings-branching making each setting its own `if-return` rather than having any `else/else if` such that all settings are flat but listed in order of precedence.
   
   Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org