You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/18 01:13:59 UTC

[GitHub] [beam] ihji opened a new pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

ihji opened a new pull request #12609:
URL: https://github.com/apache/beam/pull/12609


   Work in progress.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_P
 ostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   ![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-678550143


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r473769047



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    if (rangeFrom < 0) {
+      throw new RuntimeException(String.format("only positive range allowed: %f", rangeFrom));
+    }
+    if (rangeFrom >= rangeTo) {
+      throw new RuntimeException(
+          String.format("rangeTo should be larger than rangeFrom: [%f, %f)", rangeFrom, rangeTo));
+    }
+    if (numOfBuckets <= 0) {
+      throw new RuntimeException(
+          String.format("numOfBuckets should be greater than zero: %d", numOfBuckets));
+    }
+    this.rangeFrom = rangeFrom;
+    this.rangeTo = rangeTo;
+    this.numOfBuckets = numOfBuckets;
+    this.ignoreOutOfRangeRecord = ignoreOutOfRangeRecord;
+    this.buckets = new long[numOfBuckets];
+    this.bucketSize = (rangeTo - rangeFrom) / numOfBuckets;
+    this.totalNumOfRecords = 0;
+  }
+
+  /**
+   * Create a histogram.
+   *
+   * @param rangeFrom The minimum value that this histogram can record. Cannot be negative.

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji removed a comment on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji removed a comment on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675792507






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r473766148



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;

Review comment:
       Done.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    if (rangeFrom < 0) {
+      throw new RuntimeException(String.format("only positive range allowed: %f", rangeFrom));
+    }
+    if (rangeFrom >= rangeTo) {
+      throw new RuntimeException(
+          String.format("rangeTo should be larger than rangeFrom: [%f, %f)", rangeFrom, rangeTo));
+    }
+    if (numOfBuckets <= 0) {
+      throw new RuntimeException(
+          String.format("numOfBuckets should be greater than zero: %d", numOfBuckets));
+    }
+    this.rangeFrom = rangeFrom;
+    this.rangeTo = rangeTo;
+    this.numOfBuckets = numOfBuckets;
+    this.ignoreOutOfRangeRecord = ignoreOutOfRangeRecord;
+    this.buckets = new long[numOfBuckets];
+    this.bucketSize = (rangeTo - rangeFrom) / numOfBuckets;
+    this.totalNumOfRecords = 0;
+  }
+
+  /**
+   * Create a histogram.
+   *
+   * @param rangeFrom The minimum value that this histogram can record. Cannot be negative.
+   * @param rangeTo The maximum value that this histogram can record. Cannot be smaller than or
+   *     equal to rangeFrom.
+   * @param numOfBuckets The number of buckets. Larger number of buckets implies a better resolution
+   *     for percentile estimation.
+   * @param ignoreOutOfRangeRecord Whether the out-of-range records are discarded. It will throw
+   *     RuntimeException for the out-of-range records if this is set to false.
+   * @return a new Histogram instance.
+   */
+  public static Histogram of(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    return new Histogram(rangeFrom, rangeTo, numOfBuckets, ignoreOutOfRangeRecord);
+  }
+
+  public static Histogram of(double rangeFrom, double rangeTo) {
+    return new Histogram(rangeFrom, rangeTo, DEFAULT_NUM_OF_BUCKETS, false);
+  }
+
+  public void record(double... values) {
+    for (double value : values) {
+      record(value);
+    }
+  }
+
+  public synchronized void clear() {
+    this.buckets = new long[numOfBuckets];
+    this.totalNumOfRecords = 0;
+  }
+
+  public synchronized void record(double value) {
+    if (value >= rangeTo || value < rangeFrom) {

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r474311248



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -125,6 +148,20 @@ public void finishBundle(FinishBundleContext context) throws Exception {
     for (ValueInSingleWindow<ErrorT> row : failedInserts) {
       context.output(failedOutputTag, row.getValue(), row.getTimestamp(), row.getWindow());
     }
+
+    if (histogram.getTotalCount() > options.getLatencyLoggingFrequency()) {

Review comment:
       Changed to use seconds instead of the number of requests. I think setting frequency in time unit is easier and more intuitive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji removed a comment on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji removed a comment on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675272693


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj merged pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
chamikaramj merged pull request #12609:
URL: https://github.com/apache/beam/pull/12609


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r473768924



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r474958315



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -131,7 +133,12 @@ public JobService getJobService(BigQueryOptions options) {
 
   @Override
   public DatasetService getDatasetService(BigQueryOptions options) {
-    return new DatasetServiceImpl(options);
+    return new DatasetServiceImpl(options, null);
+  }
+
+  @Override
+  public DatasetService getDatasetService(BigQueryOptions options, Histogram histogram) {

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675240302


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675792466






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r476978750



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {

Review comment:
       Please add backwards compatibility warnings to this so that we can remove this without breaking users.
   Also, could you clarify why we could not use an existing implementation ?

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -1014,34 +1021,33 @@ private static boolean nextBackOff(Sleeper sleeper, BackOff backoff) throws Inte
   }
 
   /** Returns a BigQuery client builder using the specified {@link BigQueryOptions}. */
-  private static Bigquery.Builder newBigQueryClient(BigQueryOptions options) {
+  private static Bigquery.Builder newBigQueryClient(
+      BigQueryOptions options, @Nullable Histogram requestLatencies) {
     RetryHttpRequestInitializer httpRequestInitializer =
         new RetryHttpRequestInitializer(ImmutableList.of(404));
     httpRequestInitializer.setCustomErrors(createBigQueryClientCustomErrors());
     httpRequestInitializer.setWriteTimeout(options.getHTTPWriteTimeout());
+    ImmutableList.Builder<HttpRequestInitializer> initBuilder = ImmutableList.builder();
+    Credentials credential = options.getGcpCredential();
+    initBuilder.add(
+        credential == null
+            ? new NullCredentialInitializer()
+            : new HttpCredentialsAdapter(credential));
+    // Do not log 404. It clutters the output and is possibly even required by the
+    // caller.
+    initBuilder.add(httpRequestInitializer);
+    if (requestLatencies != null) {
+      initBuilder.add(new LatencyRecordingHttpRequestInitializer(requestLatencies));
+    }
+    HttpRequestInitializer chainInitializer =

Review comment:
       Did you confirm that there's no performance impact for the default case due to new HTTP request initializer ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r474961668



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import com.google.auto.value.AutoValue;
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A histogram that supports estimated percentile with linear interpolation.
+ *
+ * <p>We may consider using Apache Commons or HdrHistogram library in the future for advanced
+ * features such as sparsely populated histograms.
+ */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private final BucketType bucketType;
+
+  private long[] buckets;
+  private long numOfRecords;
+  private long numTopRecords;
+  private long numBottomRecords;
+
+  private Histogram(BucketType bucketType) {
+    this.bucketType = bucketType;
+    this.buckets = new long[bucketType.getNumBuckets()];
+    this.numOfRecords = 0;
+    this.numTopRecords = 0;
+    this.numBottomRecords = 0;
+  }
+
+  /**
+   * Create a histogram with linear buckets.
+   *
+   * @param start Lower bound of a starting bucket.
+   * @param width Bucket width. Smaller width implies a better resolution for percentile estimation.
+   * @param numBuckets The number of buckets. Upper bound of an ending bucket is defined by start +
+   *     width * numBuckets.
+   * @return a new Histogram instance.
+   */
+  public static Histogram linear(double start, double width, int numBuckets) {
+    return new Histogram(LinearBuckets.of(start, width, numBuckets));
+  }
+
+  public void record(double... values) {
+    for (double value : values) {
+      record(value);
+    }
+  }
+
+  public synchronized void clear() {
+    this.buckets = new long[bucketType.getNumBuckets()];
+    this.numOfRecords = 0;
+    this.numTopRecords = 0;
+    this.numBottomRecords = 0;
+  }
+
+  public synchronized void record(double value) {
+    double rangeTo = bucketType.getRangeTo();
+    double rangeFrom = bucketType.getRangeFrom();
+    if (value >= rangeTo) {
+      LOG.warn("record is out of upper bound {}: {}", rangeTo, value);
+      numTopRecords++;
+    } else if (value < rangeFrom) {
+      LOG.warn("record is out of lower bound {}: {}", rangeFrom, value);
+      numBottomRecords++;
+    } else {
+      buckets[bucketType.getBucketIndex(value)]++;
+      numOfRecords++;
+    }
+  }
+
+  public synchronized long getTotalCount() {
+    return numOfRecords + numTopRecords + numBottomRecords;
+  }
+
+  public synchronized long getCount(int bucketIndex) {
+    return buckets[bucketIndex];

Review comment:
       It's just an accessor for inspecting the value of the internal array.
   
   And yes, I see your concern. The atomicity is not guaranteed if someone tries to access multiple values such as adding all buckets for calculating the total number of elements. I attached the comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675272693


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r477711001



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {

Review comment:
       I couldn't find any 3rd party library that satisfies our interface requirements such as exponential or explicit bucket definitions. Also, the current usecase is simple enough that we could avoid introducing additional dependencies.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji removed a comment on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji removed a comment on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-678550143


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-677944273


   @ajamato comments addressed. PTAL, Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675788489


   R: @chamikaramj 
   CC: @ajamato 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675785543


   Run Java_Examples_Dataflow PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r473765665



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -80,6 +89,20 @@
     this.toTableRow = toTableRow;
   }
 
+  @Setup
+  public void setup() {
+    // record latency upto 30 seconds in the resolution of 20ms
+    histogram = Histogram.of(0, 30000, 1500, true);

Review comment:
       Done.

##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LoggingHttpRequestInitializer.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.gcp.util;
+
+import com.google.api.client.http.HttpExecuteInterceptor;
+import com.google.api.client.http.HttpRequest;
+import com.google.api.client.http.HttpRequestInitializer;
+import com.google.api.client.http.HttpResponse;
+import com.google.api.client.http.HttpResponseInterceptor;
+import java.io.IOException;
+import org.apache.beam.sdk.util.Histogram;
+
+public class LoggingHttpRequestInitializer implements HttpRequestInitializer {

Review comment:
       Done.

##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LoggingHttpRequestInitializer.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.gcp.util;
+
+import com.google.api.client.http.HttpExecuteInterceptor;
+import com.google.api.client.http.HttpRequest;
+import com.google.api.client.http.HttpRequestInitializer;
+import com.google.api.client.http.HttpResponse;
+import com.google.api.client.http.HttpResponseInterceptor;
+import java.io.IOException;
+import org.apache.beam.sdk.util.Histogram;
+

Review comment:
       Done.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r473768767



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    if (rangeFrom < 0) {
+      throw new RuntimeException(String.format("only positive range allowed: %f", rangeFrom));
+    }
+    if (rangeFrom >= rangeTo) {
+      throw new RuntimeException(
+          String.format("rangeTo should be larger than rangeFrom: [%f, %f)", rangeFrom, rangeTo));
+    }
+    if (numOfBuckets <= 0) {
+      throw new RuntimeException(
+          String.format("numOfBuckets should be greater than zero: %d", numOfBuckets));
+    }
+    this.rangeFrom = rangeFrom;
+    this.rangeTo = rangeTo;
+    this.numOfBuckets = numOfBuckets;
+    this.ignoreOutOfRangeRecord = ignoreOutOfRangeRecord;
+    this.buckets = new long[numOfBuckets];
+    this.bucketSize = (rangeTo - rangeFrom) / numOfBuckets;
+    this.totalNumOfRecords = 0;
+  }
+
+  /**
+   * Create a histogram.
+   *
+   * @param rangeFrom The minimum value that this histogram can record. Cannot be negative.
+   * @param rangeTo The maximum value that this histogram can record. Cannot be smaller than or
+   *     equal to rangeFrom.
+   * @param numOfBuckets The number of buckets. Larger number of buckets implies a better resolution
+   *     for percentile estimation.
+   * @param ignoreOutOfRangeRecord Whether the out-of-range records are discarded. It will throw
+   *     RuntimeException for the out-of-range records if this is set to false.
+   * @return a new Histogram instance.
+   */
+  public static Histogram of(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {

Review comment:
       Done. Refactored `Histogram` so that we could easily add more bucket types.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ajamato commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ajamato commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r472624659



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;

Review comment:
       nit: rename to numBuckets

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    if (rangeFrom < 0) {
+      throw new RuntimeException(String.format("only positive range allowed: %f", rangeFrom));
+    }
+    if (rangeFrom >= rangeTo) {
+      throw new RuntimeException(
+          String.format("rangeTo should be larger than rangeFrom: [%f, %f)", rangeFrom, rangeTo));
+    }
+    if (numOfBuckets <= 0) {
+      throw new RuntimeException(
+          String.format("numOfBuckets should be greater than zero: %d", numOfBuckets));
+    }
+    this.rangeFrom = rangeFrom;
+    this.rangeTo = rangeTo;
+    this.numOfBuckets = numOfBuckets;
+    this.ignoreOutOfRangeRecord = ignoreOutOfRangeRecord;
+    this.buckets = new long[numOfBuckets];
+    this.bucketSize = (rangeTo - rangeFrom) / numOfBuckets;
+    this.totalNumOfRecords = 0;
+  }
+
+  /**
+   * Create a histogram.
+   *
+   * @param rangeFrom The minimum value that this histogram can record. Cannot be negative.

Review comment:
       Why restrict this to not allow negatives? Any range of numbers for bucket boundaries is valid

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {

Review comment:
       I think that we can remove the ignoreOutOfRangeRecord parameter. 
   
   Let the first and last bucket include up to +ve and -ve infinity
   i.e. for n bucket boundaries, the buckets are:
   
   (-INF, b0), [b0, b1), ... [bn, +INF)

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    if (rangeFrom < 0) {
+      throw new RuntimeException(String.format("only positive range allowed: %f", rangeFrom));
+    }
+    if (rangeFrom >= rangeTo) {
+      throw new RuntimeException(
+          String.format("rangeTo should be larger than rangeFrom: [%f, %f)", rangeFrom, rangeTo));
+    }
+    if (numOfBuckets <= 0) {
+      throw new RuntimeException(
+          String.format("numOfBuckets should be greater than zero: %d", numOfBuckets));
+    }
+    this.rangeFrom = rangeFrom;
+    this.rangeTo = rangeTo;
+    this.numOfBuckets = numOfBuckets;
+    this.ignoreOutOfRangeRecord = ignoreOutOfRangeRecord;
+    this.buckets = new long[numOfBuckets];
+    this.bucketSize = (rangeTo - rangeFrom) / numOfBuckets;
+    this.totalNumOfRecords = 0;
+  }
+
+  /**
+   * Create a histogram.
+   *
+   * @param rangeFrom The minimum value that this histogram can record. Cannot be negative.
+   * @param rangeTo The maximum value that this histogram can record. Cannot be smaller than or
+   *     equal to rangeFrom.
+   * @param numOfBuckets The number of buckets. Larger number of buckets implies a better resolution
+   *     for percentile estimation.
+   * @param ignoreOutOfRangeRecord Whether the out-of-range records are discarded. It will throw
+   *     RuntimeException for the out-of-range records if this is set to false.
+   * @return a new Histogram instance.
+   */
+  public static Histogram of(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {

Review comment:
       Would you mind please re-doing these parameters so that they represent a linear function. i.e.
   bucket boundaries are define by: 
   
   bi = width * x + start
   for n bucket boundaries, the buckets are:
   (-INF, b0), [b0, b1), ... [bn, +INF)
   
   public static Histogram of(
         double startBounds, double bucketWidth, int numOfBuckets, boolean ignoreOutOfRangeRecord)
   
   This was one of the histogram bucket definition styles we considered here:
   https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit#heading=h.279ygg7gw109
   
   I'll enhance this later to add constructors for using exponential and explicit buckets.

##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LoggingHttpRequestInitializer.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.gcp.util;
+
+import com.google.api.client.http.HttpExecuteInterceptor;
+import com.google.api.client.http.HttpRequest;
+import com.google.api.client.http.HttpRequestInitializer;
+import com.google.api.client.http.HttpResponse;
+import com.google.api.client.http.HttpResponseInterceptor;
+import java.io.IOException;
+import org.apache.beam.sdk.util.Histogram;
+
+public class LoggingHttpRequestInitializer implements HttpRequestInitializer {

Review comment:
       Add a comment describing the purpose of this callls, to record the latency of Http based API calls. I.e. for BigQuery IO

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -125,6 +148,20 @@ public void finishBundle(FinishBundleContext context) throws Exception {
     for (ValueInSingleWindow<ErrorT> row : failedInserts) {
       context.output(failedOutputTag, row.getValue(), row.getTimestamp(), row.getWindow());
     }
+
+    if (histogram.getTotalCount() > options.getLatencyLoggingFrequency()) {

Review comment:
       Any reason to suspect that this will not log, if we fail a request snd don't attempt further requests? A logger on a periodic interval might make more sense?

##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LoggingHttpRequestInitializer.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.gcp.util;
+
+import com.google.api.client.http.HttpExecuteInterceptor;
+import com.google.api.client.http.HttpRequest;
+import com.google.api.client.http.HttpRequestInitializer;
+import com.google.api.client.http.HttpResponse;
+import com.google.api.client.http.HttpResponseInterceptor;
+import java.io.IOException;
+import org.apache.beam.sdk.util.Histogram;
+

Review comment:
       Please rename to LatencyRecordingHttpRequestInitializer (So that we can possibly update it to use metrics instead of logging)

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {

Review comment:
       Personally, I think this Histogram class is sufficient for now, and we should proceed with this for now.
   There may be some arguments to use a histogram library in the future. I.e. to support sparsely populated histograms.
   Please add a TODO to consider implementing with other potential options, such as.
   
   Apache Commons
   https://www.baeldung.com/apache-commons-frequency
   
   GitHub Hdr Histogram
   https://github.com/HdrHistogram/HdrHistogram

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/FakeBigQueryServices.java
##########
@@ -68,6 +69,11 @@ public DatasetService getDatasetService(BigQueryOptions bqOptions) {
     return datasetService;
   }
 
+  @Override
+  public DatasetService getDatasetService(BigQueryOptions bqOptions, Histogram histogram) {

Review comment:
       I don't think we should be passing in the histogram to the factory for building the service.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A histogram that supports estimated percentile with linear interpolation. */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private static final int DEFAULT_NUM_OF_BUCKETS = 50;
+
+  private final double rangeFrom;
+  private final double rangeTo;
+  private final int numOfBuckets;
+
+  private long[] buckets;
+  private final double bucketSize;
+  private long totalNumOfRecords;
+
+  private final boolean ignoreOutOfRangeRecord;
+
+  private Histogram(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    if (rangeFrom < 0) {
+      throw new RuntimeException(String.format("only positive range allowed: %f", rangeFrom));
+    }
+    if (rangeFrom >= rangeTo) {
+      throw new RuntimeException(
+          String.format("rangeTo should be larger than rangeFrom: [%f, %f)", rangeFrom, rangeTo));
+    }
+    if (numOfBuckets <= 0) {
+      throw new RuntimeException(
+          String.format("numOfBuckets should be greater than zero: %d", numOfBuckets));
+    }
+    this.rangeFrom = rangeFrom;
+    this.rangeTo = rangeTo;
+    this.numOfBuckets = numOfBuckets;
+    this.ignoreOutOfRangeRecord = ignoreOutOfRangeRecord;
+    this.buckets = new long[numOfBuckets];
+    this.bucketSize = (rangeTo - rangeFrom) / numOfBuckets;
+    this.totalNumOfRecords = 0;
+  }
+
+  /**
+   * Create a histogram.
+   *
+   * @param rangeFrom The minimum value that this histogram can record. Cannot be negative.
+   * @param rangeTo The maximum value that this histogram can record. Cannot be smaller than or
+   *     equal to rangeFrom.
+   * @param numOfBuckets The number of buckets. Larger number of buckets implies a better resolution
+   *     for percentile estimation.
+   * @param ignoreOutOfRangeRecord Whether the out-of-range records are discarded. It will throw
+   *     RuntimeException for the out-of-range records if this is set to false.
+   * @return a new Histogram instance.
+   */
+  public static Histogram of(
+      double rangeFrom, double rangeTo, int numOfBuckets, boolean ignoreOutOfRangeRecord) {
+    return new Histogram(rangeFrom, rangeTo, numOfBuckets, ignoreOutOfRangeRecord);
+  }
+
+  public static Histogram of(double rangeFrom, double rangeTo) {
+    return new Histogram(rangeFrom, rangeTo, DEFAULT_NUM_OF_BUCKETS, false);
+  }
+
+  public void record(double... values) {
+    for (double value : values) {
+      record(value);
+    }
+  }
+
+  public synchronized void clear() {
+    this.buckets = new long[numOfBuckets];
+    this.totalNumOfRecords = 0;
+  }
+
+  public synchronized void record(double value) {
+    if (value >= rangeTo || value < rangeFrom) {

Review comment:
       Please place values in the first or last bucket, inlcuding values up to  -INF and +INF

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -80,6 +89,20 @@
     this.toTableRow = toTableRow;
   }
 
+  @Setup
+  public void setup() {
+    // record latency upto 30 seconds in the resolution of 20ms
+    histogram = Histogram.of(0, 30000, 1500, true);

Review comment:
       30s seems a bit small. What is the timeout set to? I recommend making the maximum at least another 30s or so larger than the timeout.

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -131,7 +133,12 @@ public JobService getJobService(BigQueryOptions options) {
 
   @Override
   public DatasetService getDatasetService(BigQueryOptions options) {
-    return new DatasetServiceImpl(options);
+    return new DatasetServiceImpl(options, null);
+  }
+
+  @Override
+  public DatasetService getDatasetService(BigQueryOptions options, Histogram histogram) {

Review comment:
       I don't think we should be passing in the histogram to the factory for building the service. Couldn't we just instantiate it inside the call to newBigQueryClient()




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-681313747


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji removed a comment on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji removed a comment on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-681313747


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji removed a comment on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji removed a comment on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675785543


   Run Java_Examples_Dataflow PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r474312991



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -131,7 +133,12 @@ public JobService getJobService(BigQueryOptions options) {
 
   @Override
   public DatasetService getDatasetService(BigQueryOptions options) {
-    return new DatasetServiceImpl(options);
+    return new DatasetServiceImpl(options, null);
+  }
+
+  @Override
+  public DatasetService getDatasetService(BigQueryOptions options, Histogram histogram) {

Review comment:
       We create a new service instance per bundle, so it would be hard to aggregate the result across multiple bundles if we don't pass a histogram instance from outside of the service constructor.
   
   We could also instantiate a histogram object inside the service constructor and merge them later but it's more time consuming (need an iteration over the bucket array).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ajamato commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ajamato commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r474338081



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -131,7 +133,12 @@ public JobService getJobService(BigQueryOptions options) {
 
   @Override
   public DatasetService getDatasetService(BigQueryOptions options) {
-    return new DatasetServiceImpl(options);
+    return new DatasetServiceImpl(options, null);
+  }
+
+  @Override
+  public DatasetService getDatasetService(BigQueryOptions options, Histogram histogram) {

Review comment:
       I see, could you rename the variable from "histogram" to something describing what the histogram is for, like "requestLatencies", "requestLatenciesHistogram"

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/Histogram.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import com.google.auto.value.AutoValue;
+import java.math.RoundingMode;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.DoubleMath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A histogram that supports estimated percentile with linear interpolation.
+ *
+ * <p>We may consider using Apache Commons or HdrHistogram library in the future for advanced
+ * features such as sparsely populated histograms.
+ */
+public class Histogram {
+  private static final Logger LOG = LoggerFactory.getLogger(Histogram.class);
+
+  private final BucketType bucketType;
+
+  private long[] buckets;
+  private long numOfRecords;
+  private long numTopRecords;
+  private long numBottomRecords;
+
+  private Histogram(BucketType bucketType) {
+    this.bucketType = bucketType;
+    this.buckets = new long[bucketType.getNumBuckets()];
+    this.numOfRecords = 0;
+    this.numTopRecords = 0;
+    this.numBottomRecords = 0;
+  }
+
+  /**
+   * Create a histogram with linear buckets.
+   *
+   * @param start Lower bound of a starting bucket.
+   * @param width Bucket width. Smaller width implies a better resolution for percentile estimation.
+   * @param numBuckets The number of buckets. Upper bound of an ending bucket is defined by start +
+   *     width * numBuckets.
+   * @return a new Histogram instance.
+   */
+  public static Histogram linear(double start, double width, int numBuckets) {
+    return new Histogram(LinearBuckets.of(start, width, numBuckets));
+  }
+
+  public void record(double... values) {
+    for (double value : values) {
+      record(value);
+    }
+  }
+
+  public synchronized void clear() {
+    this.buckets = new long[bucketType.getNumBuckets()];
+    this.numOfRecords = 0;
+    this.numTopRecords = 0;
+    this.numBottomRecords = 0;
+  }
+
+  public synchronized void record(double value) {
+    double rangeTo = bucketType.getRangeTo();
+    double rangeFrom = bucketType.getRangeFrom();
+    if (value >= rangeTo) {
+      LOG.warn("record is out of upper bound {}: {}", rangeTo, value);
+      numTopRecords++;
+    } else if (value < rangeFrom) {
+      LOG.warn("record is out of lower bound {}: {}", rangeFrom, value);
+      numBottomRecords++;
+    } else {
+      buckets[bucketType.getBucketIndex(value)]++;
+      numOfRecords++;
+    }
+  }
+
+  public synchronized long getTotalCount() {
+    return numOfRecords + numTopRecords + numBottomRecords;
+  }
+
+  public synchronized long getCount(int bucketIndex) {
+    return buckets[bucketIndex];

Review comment:
       What's the usage pattern here? If you call this one after another for each bucket index, then the histogram can be modified between calls, by another thread.
   Make this private if its only called within the class (which would mean its a non issue)
   
   It would be better to copy and return the buckets in a single get call instead
   
   Though, if you think that's bad for performance, then this might be fine. Though please add a comment warning about this potential threading issue




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on a change in pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on a change in pull request #12609:
URL: https://github.com/apache/beam/pull/12609#discussion_r477695112



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -1014,34 +1021,33 @@ private static boolean nextBackOff(Sleeper sleeper, BackOff backoff) throws Inte
   }
 
   /** Returns a BigQuery client builder using the specified {@link BigQueryOptions}. */
-  private static Bigquery.Builder newBigQueryClient(BigQueryOptions options) {
+  private static Bigquery.Builder newBigQueryClient(
+      BigQueryOptions options, @Nullable Histogram requestLatencies) {
     RetryHttpRequestInitializer httpRequestInitializer =
         new RetryHttpRequestInitializer(ImmutableList.of(404));
     httpRequestInitializer.setCustomErrors(createBigQueryClientCustomErrors());
     httpRequestInitializer.setWriteTimeout(options.getHTTPWriteTimeout());
+    ImmutableList.Builder<HttpRequestInitializer> initBuilder = ImmutableList.builder();
+    Credentials credential = options.getGcpCredential();
+    initBuilder.add(
+        credential == null
+            ? new NullCredentialInitializer()
+            : new HttpCredentialsAdapter(credential));
+    // Do not log 404. It clutters the output and is possibly even required by the
+    // caller.
+    initBuilder.add(httpRequestInitializer);
+    if (requestLatencies != null) {
+      initBuilder.add(new LatencyRecordingHttpRequestInitializer(requestLatencies));
+    }
+    HttpRequestInitializer chainInitializer =

Review comment:
       I didn't observe any performance difference before and after applying this change. Recording is done in a constant time and percentile estimation only requires a linear time depending on the number of buckets (which is also small like few thousands).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji removed a comment on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji removed a comment on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-675240302


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-681288423


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ihji removed a comment on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
ihji removed a comment on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-681288423


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12609: [BEAM-10699] Logging BigQuery streaming insert tail latencies

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12609:
URL: https://github.com/apache/beam/pull/12609#issuecomment-681185199


   LGTM. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org