You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/07/17 05:47:52 UTC

[GitHub] [pulsar] srkukarni opened a new pull request #7573: Allow null consume in BatchPushSource

srkukarni opened a new pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573


   <!--
   ### Contribution Checklist
     
     - Name the pull request in the form "[Issue XYZ][component] Title of the pull request", where *XYZ* should be replaced by the actual issue number.
       Skip *Issue XYZ* if there is no associated github issue for this pull request.
       Skip *component* if you are unsure about which is the best component. E.g. `[docs] Fix typo in produce method`.
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   *(If this PR fixes a github issue, please add `Fixes #<xyz>`.)*
   
   Fixes #<xyz>
   
   *(or if this PR is one task of a github issue, please add `Master Issue: #<xyz>` to link to the master issue.)*
   
   Master Issue: #<xyz>
   
   ### Motivation
   BatchSource records allow sources to return a null record to indicate that the batch is done.
   For BatchPushSource, since we are using LinkedBlockingQueue, user's cannot simply pass a null value. Thus we need a special mechanism to indicate the end of a batch.
   ### Modifications
   
   *Describe the modifications you've done.*
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API: (yes / no)
     - The schema: (yes / no / don't know)
     - The default values of configurations: (yes / no)
     - The wire protocol: (yes / no)
     - The rest endpoints: (yes / no)
     - The admin cli options: (yes / no)
     - Anything that affects deployment: (yes / no / don't know)
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] david-streamlio commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
david-streamlio commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456636580



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,26 +30,43 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
        The Null object is currently only inserted if the record passed in is null, so there aren't any fields. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] srkukarni commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
srkukarni commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456584747



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -41,17 +46,26 @@ public BatchPushSource() {
 
     @Override
     public Record<T> readNext() throws Exception {
-        return queue.take();
+        Record<T> record = queue.take();
+        if (record instanceof NullRecord) {
+            return null;
+        } else {
+            return record;
+        }
     }
 
     /**
      * Send this message to be written to Pulsar.
-     *
+     * Pass null if you you are done with this task
      * @param record next message from source which should be sent to a Pulsar topic
      */
     public void consume(Record<T> record) {
         try {
-            queue.put(record);
+            if (record != null) {
+                queue.put(record);
+            } else {
+                queue.put(new NullRecord());

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] srkukarni merged pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
srkukarni merged pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456563309



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,6 +30,13 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
       nvm




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] david-streamlio commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
david-streamlio commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456616415



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,26 +30,43 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
       Wouldn't it be easier to allow users to place a null value inside their existing Record class and then test for that condition rather than creating a NullRecord class?  e.g.
   
   `private static final boolean isNull(Record rec) {
   	 return (rec == null) || (rec.getValue() == null);
    }` 
   
   Then you could just call this method instead of using the `instanceof NullRecord` check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456564324



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -41,17 +46,26 @@ public BatchPushSource() {
 
     @Override
     public Record<T> readNext() throws Exception {
-        return queue.take();
+        Record<T> record = queue.take();
+        if (record instanceof NullRecord) {
+            return null;
+        } else {
+            return record;
+        }
     }
 
     /**
      * Send this message to be written to Pulsar.
-     *
+     * Pass null if you you are done with this task
      * @param record next message from source which should be sent to a Pulsar topic
      */
     public void consume(Record<T> record) {
         try {
-            queue.put(record);
+            if (record != null) {
+                queue.put(record);
+            } else {
+                queue.put(new NullRecord());

Review comment:
       A small optimization: Just declare a final variable and re-use




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] srkukarni commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
srkukarni commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456676390



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,26 +30,43 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
       That is the approach thats taken here. Users will be using their own record types and when they are done with the task, will do consume(null) to signify the end of the task. The NullRecord is strictly private class as an internal impl detail ofBatchPushSource




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456563081



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,6 +30,13 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
       Can we add some comments about how to use this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] david-streamlio commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
david-streamlio commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456616415



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,26 +30,43 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
       Wouldn't it be easier to allow users to place a null value inside their existing Record class and then test for that condition rather than creating a NullRecord class?  e.g.
   
   `private static final boolean isNull(Record rec) {
   	 return (rec == null) || (rec.getValue() == null);
    }` 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] david-streamlio commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
david-streamlio commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456637775



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,26 +30,43 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
       Letting users to continue to use their own Record types and using a null value inside the record seems like a more intuitive approach to me




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #7573: Allow null consume in BatchPushSource

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #7573:
URL: https://github.com/apache/pulsar/pull/7573#discussion_r456630329



##########
File path: pulsar-io/core/src/main/java/org/apache/pulsar/io/core/BatchPushSource.java
##########
@@ -32,26 +30,43 @@
  */
 public abstract class BatchPushSource<T> implements BatchSource<T> {
 
+    private static class NullRecord implements Record {

Review comment:
       The thing is, is a record with a null value the same as returning null?  Can a record have a null value but have other fields e.g. key with valid values




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org