You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/11 22:29:55 UTC

[GitHub] [beam] y1chi opened a new pull request #13533: Set ptransform id for log entries

y1chi opened a new pull request #13533:
URL: https://github.com/apache/beam/pull/13533


   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam
 .apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.a
 pache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam
 .apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-745536267


   > Currently you have recorded the threadId-transformId mapping on
   > 
   > * startBundle
   > * finishBundle
   > * processElementForWindowObservingParDo
   > * processElementForParDo
   > 
   > There are some other places we also invoke user code:
   > 
   > * processElementForWindowObservingPairWithRestriction
   > * processElementForPairWithRestriction
   > * processElementForWindowObservingSplitRestriction
   > * processElementForSplitRestriction
   > * processElementForWindowObservingTruncateRestriction
   > * processElementForTruncateRestriction
   > * processElementForWindowObservingSizedElementAndRestriction
   > * tearDown
   > * invoke timer callback
   
   Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543741519



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {

Review comment:
       I agree. I don't think the concurrency risk is ever gonna be an issue, we can tell from enabling the test and track the history(I also manually tested another 50 times and tests all passed with matching step). I believe `we provide some information and they can be wrong in very rare cases` would still be more valuable than don't provide the information and probably won't cause too much trouble for users, the element count in streaming pipeline falls into this best effort category as well. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-745618127


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz edited a comment on pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz edited a comment on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-745540192


   > > Currently you have recorded the threadId-transformId mapping on
   > > 
   > > * startBundle
   > > * finishBundle
   > > * processElementForWindowObservingParDo
   > > * processElementForParDo
   > > 
   > > There are some other places we also invoke user code:
   > > 
   > > * processElementForWindowObservingPairWithRestriction
   > > * processElementForPairWithRestriction
   > > * processElementForWindowObservingSplitRestriction
   > > * processElementForSplitRestriction
   > > * processElementForWindowObservingTruncateRestriction
   > > * processElementForTruncateRestriction
   > > * processElementForWindowObservingSizedElementAndRestriction
   > > * tearDown
   > > * invoke timer callback
   > 
   > Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.
   
   `SDF` indeed is a `DoFn`, not a callback. The SDF author could add additional logging just like in a normal DoFn. If the purpose of this PR is to add step info for the log that users add in their code, then I think we should consider all these places where we will invoke user code.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on pull request #13533: Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-744894528


   R: @boyuanzz 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-746884452


   Task :sdks:java:harness:checkstyleMain is failing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r544524375



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import java.time.Duration;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {
+  private static final TransformProcessingThreadTracker INSTANCE =
+      new TransformProcessingThreadTracker();
+  private final LoadingCache<Long, String> threadIdToTransformIdMappings;
+
+  private TransformProcessingThreadTracker() {
+    this.threadIdToTransformIdMappings =
+        CacheBuilder.newBuilder()
+            .maximumSize(10000)
+            .expireAfterAccess(Duration.ofHours(1))
+            .build(
+                new CacheLoader<Long, String>() {
+                  @Override
+                  public String load(Long threadId) throws Exception {
+                    return "";

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543647354



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {

Review comment:
       I think there might be a very slight chance the processing thread moved onto another transform when the LogHandler haven't done transforming the log entries in previous one. But I think the it should be very rare(log transform should be almost instant) and I would argue that it's probably better to keep the logging just best effort instead of introducing locks to guarantee 100% metadata correctness? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-745723143


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543693603



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {

Review comment:
       I agree that test signals can give us more confidence. 
   
   > Current empty step in sdk logs have no values to users and can be considered almost 100% mismatch, so I think this PR should be at least an improvement to that.
   
   I would say, `We provide some information but they can be wrong` is not better than `We don't provide more information`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543669434



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {

Review comment:
       The argument for `it's probably better to keep the logging just best effort` is that it's ok to have step name mismatched with log message itself. Do you think it's acceptable when it happens?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543647354



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {

Review comment:
       I think there might be a very slight chance the processing thread moved onto another transform when the LogHandler haven't done transforming the log entries in previous one. But I think the it should be very rare(log transform should be almost instant) and I would argue that it's probably better to keep the logging just best practice instead of introducing locks to guarantee 100% metadata correctness? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543623610



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {

Review comment:
       Sorry that I'm not familiar with how logging service works, I'm wondering whether this will have multi-threading concurrency issue. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543839048



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import java.time.Duration;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {
+  private static final TransformProcessingThreadTracker INSTANCE =
+      new TransformProcessingThreadTracker();
+  private final LoadingCache<Long, String> threadIdToTransformIdMappings;
+
+  private TransformProcessingThreadTracker() {
+    this.threadIdToTransformIdMappings =
+        CacheBuilder.newBuilder()
+            .maximumSize(10000)
+            .expireAfterAccess(Duration.ofHours(1))
+            .build(
+                new CacheLoader<Long, String>() {
+                  @Override
+                  public String load(Long threadId) throws Exception {
+                    return "";

Review comment:
       Thanks! If you are not going to use the `load` method, you can just use the `Cache` and the `CacheBuilder`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543781511



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {
+  private static final TransformProcessingThreadTracker INSTANCE =
+      new TransformProcessingThreadTracker();
+  private final ConcurrentHashMap<Long, String> threadIdToTransformIdMappings;

Review comment:
       Another question is that will this map grow unlimitedly? I'm kind of concerning that it consumes too much memory with a long run instance(and the thread is not reused).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz commented on pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz commented on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-745540192


   > > Currently you have recorded the threadId-transformId mapping on
   > > 
   > > * startBundle
   > > * finishBundle
   > > * processElementForWindowObservingParDo
   > > * processElementForParDo
   > > 
   > > There are some other places we also invoke user code:
   > > 
   > > * processElementForWindowObservingPairWithRestriction
   > > * processElementForPairWithRestriction
   > > * processElementForWindowObservingSplitRestriction
   > > * processElementForSplitRestriction
   > > * processElementForWindowObservingTruncateRestriction
   > > * processElementForTruncateRestriction
   > > * processElementForWindowObservingSizedElementAndRestriction
   > > * tearDown
   > > * invoke timer callback
   > 
   > Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.
   
   `SDF` indeed is a `DoFn`, not a callback. The SDF author could add additional logging just like in a normal DoFn. If the purpose of this PR is to add step info for the log that users add in their code, then I think we should all these places where we will invoke user code.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543685064



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {

Review comment:
       I'm not sure if it is ever gonna happen, we get to see if the integration test is flaky(in my 30+ IT test runs mismatch never happens). If the occurrence is less than 0.01% I don't think it'll have actual impact on usability. Current empty step in sdk logs have no values to users and can be considered almost 100% mismatch, so I think this PR should be at least an improvement to that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] boyuanzz merged pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
boyuanzz merged pull request #13533:
URL: https://github.com/apache/beam/pull/13533


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on pull request #13533:
URL: https://github.com/apache/beam/pull/13533#issuecomment-745566587


   > > > Currently you have recorded the threadId-transformId mapping on
   > > > 
   > > > * startBundle
   > > > * finishBundle
   > > > * processElementForWindowObservingParDo
   > > > * processElementForParDo
   > > > 
   > > > There are some other places we also invoke user code:
   > > > 
   > > > * processElementForWindowObservingPairWithRestriction
   > > > * processElementForPairWithRestriction
   > > > * processElementForWindowObservingSplitRestriction
   > > > * processElementForSplitRestriction
   > > > * processElementForWindowObservingTruncateRestriction
   > > > * processElementForTruncateRestriction
   > > > * processElementForWindowObservingSizedElementAndRestriction
   > > > * tearDown
   > > > * invoke timer callback
   > > 
   > > 
   > > Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.
   > 
   > `SDF` indeed is a `DoFn`, not a callback. The SDF author could add additional logging just like in a normal DoFn. If the purpose of this PR is to add step info for the log that users add in their code, then I think we should consider all these places where we will invoke user code.
   
   got it, I'll add it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on a change in pull request #13533: [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field

Posted by GitBox <gi...@apache.org>.
y1chi commented on a change in pull request #13533:
URL: https://github.com/apache/beam/pull/13533#discussion_r543818457



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {
+  private static final TransformProcessingThreadTracker INSTANCE =
+      new TransformProcessingThreadTracker();
+  private final ConcurrentHashMap<Long, String> threadIdToTransformIdMappings;

Review comment:
       hmm yeah I think you are right, it's potentially an issue and I've changed to use a LoadingCache with expiration.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org