You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Marcus Truscello (Jira)" <ji...@apache.org> on 2022/07/01 01:06:00 UTC

[jira] [Created] (ZEPPELIN-5758) BigQuery hits socket timeout before reaching "wait_time" setting

Marcus Truscello created ZEPPELIN-5758:
------------------------------------------

             Summary: BigQuery hits socket timeout before reaching "wait_time" setting
                 Key: ZEPPELIN-5758
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5758
             Project: Zeppelin
          Issue Type: Bug
          Components: interpreter-setting, Interpreters, zeppelin-interpreter
    Affects Versions: 0.10.1
            Reporter: Marcus Truscello
         Attachments: bigquery-timeout.patch, stacktrace.log

The {{zeppelin.bigquery.wait_time}} BigQuery interpreter parameter is only useful up to a value of 30 seconds. Anything beyond that exceeds the underlying HTTP client's default read timeout and will result in a {{java.net.SocketTimeoutException: Read timed out}} exception being thrown. (A full stack trace is attached.)

Google's Java API guide suggests overriding the {{HttpRequestInitializer}} to set the desired connect and read timeouts: [https://developers.google.com/api-client-library/java/google-api-java-client/errors#timeouts]

This exact approach isn't feasible because the BigQuery interpreter's {{createAuthorizedClient}} method is static. Instead, we can modify the solution to use an approach similar to this StackOverflow answer which uses the builder's {{{}setHttpRequestInitializer{}}}: [https://stackoverflow.com/a/32894630]

It should be noted that setting the read timeout too large likely won't provide any value.  Regardless of the {{timeoutMs}} value, BigQuery will always return a response within ~200 seconds regardless if the job has actually completed or not: 
[https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults#query-parameters]
Given that the BigQuery interpreter doesn't handle jobComplete being false, there's no reason to set the read timeout much larger than 200 seconds.
 
I've attached a diff of the changes I applied to fix this issue.  It should be noted that I am not a Java developer, so I apologize if the solution is a bit crude. :D
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)