You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "bvolpato (via GitHub)" <gi...@apache.org> on 2023/05/16 14:02:42 UTC

[GitHub] [beam] bvolpato opened a new pull request, #26714: Log the exception's cause when failed to parse TableRow

bvolpato opened a new pull request, #26714:
URL: https://github.com/apache/beam/pull/26714

   This will be useful to get more context for failures that happen when parsing the TableRow.
   
   Right now, user has no idea of knowing the reason, although it might be something simple (e.g., bad timestamp format)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on PR #26714:
URL: https://github.com/apache/beam/pull/26714#issuecomment-1549973284

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195418886


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   If throw exception and not caught in the DoFn then it will retry indefinitely and the pipeline essentially stuck. If choose to re-throw, it should still be handled by the DoFn that calls it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on PR #26714:
URL: https://github.com/apache/beam/pull/26714#issuecomment-1550258422

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195383215


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   Do we need to warn the potential data loss? Can we add some actionable steps for users such as validate the data schema matches the given one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26714:
URL: https://github.com/apache/beam/pull/26714#issuecomment-1549760170

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey merged pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey merged PR #26714:
URL: https://github.com/apache/beam/pull/26714


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195292633


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   agreed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195239450


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   I think both of these need to rethrow the exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195626037


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   this is the dead-letter codepath. These rows have already been successfully converted to protobuf upstream (by Beam), so failing conversion here is quite unexpected. The only scenario I can think of if there is data corruption on the wire, in which case there's not much we can do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195272376


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   +1 in improving this, but I think that will require a larger effort to avoid pipeline stalls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195398735


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   @reuvenlax Is it fine to just log this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on PR #26714:
URL: https://github.com/apache/beam/pull/26714#issuecomment-1549758160

   R: @johnjcasey 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on a diff in pull request #26714: Log the exception's cause when failed to parse TableRow

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on code in PR #26714:
URL: https://github.com/apache/beam/pull/26714#discussion_r1195906446


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -571,7 +571,7 @@ long flush(
                             failedRow, error.getRowIndexToErrorMessage().get(failedIndex)),
                         timestamp);
                   } catch (InvalidProtocolBufferException e) {
-                    LOG.error("Failed to insert row and could not parse the result!");
+                    LOG.error("Failed to insert row and could not parse the result!", e);

Review Comment:
   Thanks for the context @reuvenlax. Probably having the exception spit out would help in understanding the context, so this change should be still worth pushing through, what do you think?
   
   All we know today is that it's a `InvalidProtocolBufferException`, so I'm hoping to get more information (we could likely ask the user who got in this situation to use a snapshot, as they have been responsive).
   
   I will try to reproduce this locally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org