You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by "Yizhou-Yang (via GitHub)" <gi...@apache.org> on 2023/01/30 11:16:54 UTC

[GitHub] [inlong] Yizhou-Yang opened a new pull request, #7290: time stash

Yizhou-Yang opened a new pull request, #7290:
URL: https://github.com/apache/inlong/pull/7290

   ### Prepare a Pull Request
   *(Change the title refer to the following example)*
   
   - Title Example: [INLONG-XYZ][Component] Title of the pull request
   
   *(The following *XYZ* should be replaced by the actual [GitHub Issue](https://github.com/apache/inlong/issues) number)*
   
   - Fixes #XYZ
   
   ### Motivation
   
   *Explain here the context, and why you're making that change. What is the problem you're trying to solve?*
   
   ### Modifications
   
   *Describe the modifications you've done.*
   
   ### Verifying this change
   
   *(Please pick either of the following options)*
   
   - [ ] This change is a trivial rework/code cleanup without any test coverage.
   
   - [ ] This change is already covered by existing tests, such as:
     *(please describe tests)*
   
   - [ ] This change added tests and can be verified as follows:
   
     *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a follow-up issue for adding the documentation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] Yizhou-Yang commented on pull request #7290: [INLONG-7292][Sort] S3DirtySink flushes too quickly

Posted by "Yizhou-Yang (via GitHub)" <gi...@apache.org>.
Yizhou-Yang commented on PR #7290:
URL: https://github.com/apache/inlong/pull/7290#issuecomment-1409883987

   > 
   
   the batchinterval was used to make sure that it will flush every  60000ms.
   But it does not stop the connector from calling .invoke before 60000ms. 
   This is to stop the S3dirtysink.invoke() call from flushing before 60000ms, so it is different


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] EMsnap commented on a diff in pull request #7290: [INLONG-7292][Sort] S3DirtySink flushes too quickly

Posted by "EMsnap (via GitHub)" <gi...@apache.org>.
EMsnap commented on code in PR #7290:
URL: https://github.com/apache/inlong/pull/7290#discussion_r1091361721


##########
inlong-sort/sort-connectors/base/src/main/java/org/apache/inlong/sort/base/dirty/sink/s3/S3DirtySink.java:
##########
@@ -130,8 +131,16 @@ public synchronized void invoke(DirtyData<T> dirtyData) throws Exception {
     }
 
     private boolean valid() {
-        return (s3Options.getBatchSize() > 0 && size >= s3Options.getBatchSize())
-                || batchBytes >= s3Options.getMaxBatchBytes();
+        // stash dirty data for at least a minute to avoid flushing too fast

Review Comment:
   the time calculation can be a seperate method 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] healchow commented on pull request #7290: [INLONG-7292][Sort] S3DirtySink flushes too quickly

Posted by "healchow (via GitHub)" <gi...@apache.org>.
healchow commented on PR #7290:
URL: https://github.com/apache/inlong/pull/7290#issuecomment-1409705993

   1. The `INLONG-7292` was incorrect in the PR title.
   2. Please add more info for the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] gong commented on pull request #7290: [INLONG-7292][Sort] S3DirtySink flushes too quickly

Posted by "gong (via GitHub)" <gi...@apache.org>.
gong commented on PR #7290:
URL: https://github.com/apache/inlong/pull/7290#issuecomment-1411409677

   PR description shoud be format: Motivation, Modifications


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] EMsnap commented on a diff in pull request #7290: [INLONG-7292][Sort] S3DirtySink flushes too quickly

Posted by "EMsnap (via GitHub)" <gi...@apache.org>.
EMsnap commented on code in PR #7290:
URL: https://github.com/apache/inlong/pull/7290#discussion_r1091363311


##########
inlong-sort/sort-connectors/base/src/main/java/org/apache/inlong/sort/base/dirty/sink/s3/S3DirtySink.java:
##########
@@ -130,8 +131,16 @@ public synchronized void invoke(DirtyData<T> dirtyData) throws Exception {
     }
 
     private boolean valid() {
-        return (s3Options.getBatchSize() > 0 && size >= s3Options.getBatchSize())
-                || batchBytes >= s3Options.getMaxBatchBytes();
+        // stash dirty data for at least a minute to avoid flushing too fast

Review Comment:
   and current time should be the time that calls System.currentTimeMillis(), the variable you defined here means the time before current 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] dockerzhang merged pull request #7290: [INLONG-7292][Sort] S3DirtySink flushes too quickly

Posted by "dockerzhang (via GitHub)" <gi...@apache.org>.
dockerzhang merged PR #7290:
URL: https://github.com/apache/inlong/pull/7290


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org