You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Carlos Alonso <ca...@mrcalonso.com> on 2018/04/02 14:16:30 UTC

BigQuery streaming insert errors

Hi everyone!!

I was wondering if there's any way to get the error why an insert
(streaming) failed. Looking at the code I think there's currently no way to
do that, as the BigQueryServicesImpl insertAll seems to discard the errors
and just add the failed TableRow instances into the failedInserts list.

It would be very nice to have an "enriched" TableRow returned instead that
contains the error information for further processing (in our use case
we're saving the failed ones into a different table for further analysis)

Could this be added as an enhancement or similar Issue in GH/Jira? Any
other ideas?

Thanks!

Re: BigQuery streaming insert errors

Posted by Carlos Alonso <ca...@mrcalonso.com>.
Filed https://issues.apache.org/jira/browse/BEAM-4257 and currently working
on it

On Sat, Apr 7, 2018 at 1:57 AM Gaurav Thakur <ga...@gmail.com> wrote:

> Carlos,
>
> I see your point.
> I was expecting the InsertRetryPolicy.Context to hold and give an handle
> to that information. Spoke too soon.
>
> Thanks, Gaurav
>
> On Fri, Apr 6, 2018 at 8:01 PM, Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> Hi Carlos,
>>
>> I don't think currently there's a way to collect the errors from BigQuery
>> for failed inserts. I agree that this can be useful addition. Feel free to
>> create a JIRA. Also, any contributions related to this are welcome.
>>
>> Thanks,
>> Cham
>>
>>
>> On Fri, Apr 6, 2018 at 12:29 AM Carlos Alonso <ca...@mrcalonso.com>
>> wrote:
>>
>>> Hi Gurav, many thanks for your response. I'm currently using retry
>>> policies, but imagine the following scenario:
>>>
>>> I'm trying to insert an existing field, even if we retry, it will still
>>> fail but I'll never be able to detect that within the pipeline, as
>>> getFailedInserts()
>>> https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedInserts-- only
>>> contains the TableRows that failed, not the reason.
>>>
>>> Adding the error as well won't be very hard as I understand it because
>>> BigQueryServicesImpl.insertAll|() actually know about it:
>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L750
>>>
>>> I think I would even volunteer to work on it if the community feels it
>>> makes sense as well.
>>>
>>> Regards
>>>
>>> On Fri, Apr 6, 2018 at 1:28 AM Gaurav Thakur <ga...@gmail.com>
>>> wrote:
>>>
>>>> Hi Carlos,
>>>>
>>>> Would an insert retry policy help you?
>>>> Please see this,
>>>> https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.Context.html
>>>>
>>>> Thanks, Gaurav
>>>>
>>>> On Fri, Apr 6, 2018 at 8:13 AM, Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> Im adding Cham as he might be knowledgeable about BQ IO, or he might
>>>>> be able to redirect to someone else.
>>>>> Cham, do you have guidance for Carlos here?
>>>>> Thanks
>>>>> -P.
>>>>>
>>>>>
>>>>> On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <ca...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> And... where could I catch that exception?
>>>>>>
>>>>>> Thanks!
>>>>>> On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:
>>>>>>
>>>>>>> Wouldn't the following code give you information about failed
>>>>>>> insertions (around line 790 in BigQueryServicesImpl) ?
>>>>>>>
>>>>>>>       if (!allErrors.isEmpty()) {
>>>>>>>         throw new IOException("Insert failed: " + allErrors);
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi everyone!!
>>>>>>>>
>>>>>>>> I was wondering if there's any way to get the error why an insert
>>>>>>>> (streaming) failed. Looking at the code I think there's currently no way to
>>>>>>>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>>>>>>>> and just add the failed TableRow instances into the failedInserts list.
>>>>>>>>
>>>>>>>> It would be very nice to have an "enriched" TableRow returned
>>>>>>>> instead that contains the error information for further processing (in our
>>>>>>>> use case we're saving the failed ones into a different table for further
>>>>>>>> analysis)
>>>>>>>>
>>>>>>>> Could this be added as an enhancement or similar Issue in GH/Jira?
>>>>>>>> Any other ideas?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>> Got feedback? go/pabloem-feedback
>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>
>>>>
>>>>
>

Re: BigQuery streaming insert errors

Posted by Gaurav Thakur <ga...@gmail.com>.
Carlos,

I see your point.
I was expecting the InsertRetryPolicy.Context to hold and give an handle to
that information. Spoke too soon.

Thanks, Gaurav

On Fri, Apr 6, 2018 at 8:01 PM, Chamikara Jayalath <ch...@google.com>
wrote:

> Hi Carlos,
>
> I don't think currently there's a way to collect the errors from BigQuery
> for failed inserts. I agree that this can be useful addition. Feel free to
> create a JIRA. Also, any contributions related to this are welcome.
>
> Thanks,
> Cham
>
>
> On Fri, Apr 6, 2018 at 12:29 AM Carlos Alonso <ca...@mrcalonso.com>
> wrote:
>
>> Hi Gurav, many thanks for your response. I'm currently using retry
>> policies, but imagine the following scenario:
>>
>> I'm trying to insert an existing field, even if we retry, it will still
>> fail but I'll never be able to detect that within the pipeline, as
>> getFailedInserts() https://beam.apache.org/documentation/
>> sdks/javadoc/2.4.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#
>> getFailedInserts-- only contains the TableRows that failed, not the
>> reason.
>>
>> Adding the error as well won't be very hard as I understand it because
>> BigQueryServicesImpl.insertAll|() actually know about it:
>> https://github.com/apache/beam/blob/master/sdks/java/io/
>> google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/
>> BigQueryServicesImpl.java#L750
>>
>> I think I would even volunteer to work on it if the community feels it
>> makes sense as well.
>>
>> Regards
>>
>> On Fri, Apr 6, 2018 at 1:28 AM Gaurav Thakur <ga...@gmail.com>
>> wrote:
>>
>>> Hi Carlos,
>>>
>>> Would an insert retry policy help you?
>>> Please see this, https://beam.apache.org/documentation/sdks/javadoc/2.
>>> 1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.Context.html
>>>
>>> Thanks, Gaurav
>>>
>>> On Fri, Apr 6, 2018 at 8:13 AM, Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> Im adding Cham as he might be knowledgeable about BQ IO, or he might be
>>>> able to redirect to someone else.
>>>> Cham, do you have guidance for Carlos here?
>>>> Thanks
>>>> -P.
>>>>
>>>>
>>>> On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <ca...@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> And... where could I catch that exception?
>>>>>
>>>>> Thanks!
>>>>> On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:
>>>>>
>>>>>> Wouldn't the following code give you information about failed
>>>>>> insertions (around line 790 in BigQueryServicesImpl) ?
>>>>>>
>>>>>>       if (!allErrors.isEmpty()) {
>>>>>>         throw new IOException("Insert failed: " + allErrors);
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone!!
>>>>>>>
>>>>>>> I was wondering if there's any way to get the error why an insert
>>>>>>> (streaming) failed. Looking at the code I think there's currently no way to
>>>>>>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>>>>>>> and just add the failed TableRow instances into the failedInserts list.
>>>>>>>
>>>>>>> It would be very nice to have an "enriched" TableRow returned
>>>>>>> instead that contains the error information for further processing (in our
>>>>>>> use case we're saving the failed ones into a different table for further
>>>>>>> analysis)
>>>>>>>
>>>>>>> Could this be added as an enhancement or similar Issue in GH/Jira?
>>>>>>> Any other ideas?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>
>>>>>> --
>>>> Got feedback? go/pabloem-feedback
>>>> <https://goto.google.com/pabloem-feedback>
>>>>
>>>
>>>

Re: BigQuery streaming insert errors

Posted by Chamikara Jayalath <ch...@google.com>.
Hi Carlos,

I don't think currently there's a way to collect the errors from BigQuery
for failed inserts. I agree that this can be useful addition. Feel free to
create a JIRA. Also, any contributions related to this are welcome.

Thanks,
Cham

On Fri, Apr 6, 2018 at 12:29 AM Carlos Alonso <ca...@mrcalonso.com> wrote:

> Hi Gurav, many thanks for your response. I'm currently using retry
> policies, but imagine the following scenario:
>
> I'm trying to insert an existing field, even if we retry, it will still
> fail but I'll never be able to detect that within the pipeline, as
> getFailedInserts()
> https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedInserts-- only
> contains the TableRows that failed, not the reason.
>
> Adding the error as well won't be very hard as I understand it because
> BigQueryServicesImpl.insertAll|() actually know about it:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L750
>
> I think I would even volunteer to work on it if the community feels it
> makes sense as well.
>
> Regards
>
> On Fri, Apr 6, 2018 at 1:28 AM Gaurav Thakur <ga...@gmail.com> wrote:
>
>> Hi Carlos,
>>
>> Would an insert retry policy help you?
>> Please see this,
>> https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.Context.html
>>
>> Thanks, Gaurav
>>
>> On Fri, Apr 6, 2018 at 8:13 AM, Pablo Estrada <pa...@google.com> wrote:
>>
>>> Im adding Cham as he might be knowledgeable about BQ IO, or he might be
>>> able to redirect to someone else.
>>> Cham, do you have guidance for Carlos here?
>>> Thanks
>>> -P.
>>>
>>>
>>> On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <ca...@mrcalonso.com>
>>> wrote:
>>>
>>>> And... where could I catch that exception?
>>>>
>>>> Thanks!
>>>> On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> Wouldn't the following code give you information about failed
>>>>> insertions (around line 790 in BigQueryServicesImpl) ?
>>>>>
>>>>>       if (!allErrors.isEmpty()) {
>>>>>         throw new IOException("Insert failed: " + allErrors);
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> Hi everyone!!
>>>>>>
>>>>>> I was wondering if there's any way to get the error why an insert
>>>>>> (streaming) failed. Looking at the code I think there's currently no way to
>>>>>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>>>>>> and just add the failed TableRow instances into the failedInserts list.
>>>>>>
>>>>>> It would be very nice to have an "enriched" TableRow returned instead
>>>>>> that contains the error information for further processing (in our use case
>>>>>> we're saving the failed ones into a different table for further analysis)
>>>>>>
>>>>>> Could this be added as an enhancement or similar Issue in GH/Jira?
>>>>>> Any other ideas?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>> --
>>> Got feedback? go/pabloem-feedback
>>> <https://goto.google.com/pabloem-feedback>
>>>
>>
>>

Re: BigQuery streaming insert errors

Posted by Carlos Alonso <ca...@mrcalonso.com>.
Hi Gurav, many thanks for your response. I'm currently using retry
policies, but imagine the following scenario:

I'm trying to insert an existing field, even if we retry, it will still
fail but I'll never be able to detect that within the pipeline, as
getFailedInserts()
https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedInserts--
only
contains the TableRows that failed, not the reason.

Adding the error as well won't be very hard as I understand it because
BigQueryServicesImpl.insertAll|() actually know about it:
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L750

I think I would even volunteer to work on it if the community feels it
makes sense as well.

Regards

On Fri, Apr 6, 2018 at 1:28 AM Gaurav Thakur <ga...@gmail.com> wrote:

> Hi Carlos,
>
> Would an insert retry policy help you?
> Please see this,
> https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.Context.html
>
> Thanks, Gaurav
>
> On Fri, Apr 6, 2018 at 8:13 AM, Pablo Estrada <pa...@google.com> wrote:
>
>> Im adding Cham as he might be knowledgeable about BQ IO, or he might be
>> able to redirect to someone else.
>> Cham, do you have guidance for Carlos here?
>> Thanks
>> -P.
>>
>>
>> On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <ca...@mrcalonso.com>
>> wrote:
>>
>>> And... where could I catch that exception?
>>>
>>> Thanks!
>>> On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> Wouldn't the following code give you information about failed
>>>> insertions (around line 790 in BigQueryServicesImpl) ?
>>>>
>>>>       if (!allErrors.isEmpty()) {
>>>>         throw new IOException("Insert failed: " + allErrors);
>>>>
>>>> Cheers
>>>>
>>>> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> Hi everyone!!
>>>>>
>>>>> I was wondering if there's any way to get the error why an insert
>>>>> (streaming) failed. Looking at the code I think there's currently no way to
>>>>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>>>>> and just add the failed TableRow instances into the failedInserts list.
>>>>>
>>>>> It would be very nice to have an "enriched" TableRow returned instead
>>>>> that contains the error information for further processing (in our use case
>>>>> we're saving the failed ones into a different table for further analysis)
>>>>>
>>>>> Could this be added as an enhancement or similar Issue in GH/Jira? Any
>>>>> other ideas?
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>> --
>> Got feedback? go/pabloem-feedback
>>
>
>

Re: BigQuery streaming insert errors

Posted by Gaurav Thakur <ga...@gmail.com>.
Hi Carlos,

Would an insert retry policy help you?
Please see this,
https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.Context.html

Thanks, Gaurav

On Fri, Apr 6, 2018 at 8:13 AM, Pablo Estrada <pa...@google.com> wrote:

> Im adding Cham as he might be knowledgeable about BQ IO, or he might be
> able to redirect to someone else.
> Cham, do you have guidance for Carlos here?
> Thanks
> -P.
>
>
> On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <ca...@mrcalonso.com>
> wrote:
>
>> And... where could I catch that exception?
>>
>> Thanks!
>> On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Wouldn't the following code give you information about failed insertions
>>> (around line 790 in BigQueryServicesImpl) ?
>>>
>>>       if (!allErrors.isEmpty()) {
>>>         throw new IOException("Insert failed: " + allErrors);
>>>
>>> Cheers
>>>
>>> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
>>> wrote:
>>>
>>>> Hi everyone!!
>>>>
>>>> I was wondering if there's any way to get the error why an insert
>>>> (streaming) failed. Looking at the code I think there's currently no way to
>>>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>>>> and just add the failed TableRow instances into the failedInserts list.
>>>>
>>>> It would be very nice to have an "enriched" TableRow returned instead
>>>> that contains the error information for further processing (in our use case
>>>> we're saving the failed ones into a different table for further analysis)
>>>>
>>>> Could this be added as an enhancement or similar Issue in GH/Jira? Any
>>>> other ideas?
>>>>
>>>> Thanks!
>>>>
>>>
>>> --
> Got feedback? go/pabloem-feedback
>

Re: BigQuery streaming insert errors

Posted by Pablo Estrada <pa...@google.com>.
Im adding Cham as he might be knowledgeable about BQ IO, or he might be
able to redirect to someone else.
Cham, do you have guidance for Carlos here?
Thanks
-P.

On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <ca...@mrcalonso.com> wrote:

> And... where could I catch that exception?
>
> Thanks!
> On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:
>
>> Wouldn't the following code give you information about failed insertions
>> (around line 790 in BigQueryServicesImpl) ?
>>
>>       if (!allErrors.isEmpty()) {
>>         throw new IOException("Insert failed: " + allErrors);
>>
>> Cheers
>>
>> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
>> wrote:
>>
>>> Hi everyone!!
>>>
>>> I was wondering if there's any way to get the error why an insert
>>> (streaming) failed. Looking at the code I think there's currently no way to
>>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>>> and just add the failed TableRow instances into the failedInserts list.
>>>
>>> It would be very nice to have an "enriched" TableRow returned instead
>>> that contains the error information for further processing (in our use case
>>> we're saving the failed ones into a different table for further analysis)
>>>
>>> Could this be added as an enhancement or similar Issue in GH/Jira? Any
>>> other ideas?
>>>
>>> Thanks!
>>>
>>
>> --
Got feedback? go/pabloem-feedback

Re: BigQuery streaming insert errors

Posted by Carlos Alonso <ca...@mrcalonso.com>.
And... where could I catch that exception?

Thanks!
On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:

> Wouldn't the following code give you information about failed insertions
> (around line 790 in BigQueryServicesImpl) ?
>
>       if (!allErrors.isEmpty()) {
>         throw new IOException("Insert failed: " + allErrors);
>
> Cheers
>
> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
> wrote:
>
>> Hi everyone!!
>>
>> I was wondering if there's any way to get the error why an insert
>> (streaming) failed. Looking at the code I think there's currently no way to
>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>> and just add the failed TableRow instances into the failedInserts list.
>>
>> It would be very nice to have an "enriched" TableRow returned instead
>> that contains the error information for further processing (in our use case
>> we're saving the failed ones into a different table for further analysis)
>>
>> Could this be added as an enhancement or similar Issue in GH/Jira? Any
>> other ideas?
>>
>> Thanks!
>>
>
>

Re: BigQuery streaming insert errors

Posted by Ted Yu <yu...@gmail.com>.
Wouldn't the following code give you information about failed insertions
(around line 790 in BigQueryServicesImpl) ?

      if (!allErrors.isEmpty()) {
        throw new IOException("Insert failed: " + allErrors);

Cheers

On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com> wrote:

> Hi everyone!!
>
> I was wondering if there's any way to get the error why an insert
> (streaming) failed. Looking at the code I think there's currently no way to
> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
> and just add the failed TableRow instances into the failedInserts list.
>
> It would be very nice to have an "enriched" TableRow returned instead that
> contains the error information for further processing (in our use case
> we're saving the failed ones into a different table for further analysis)
>
> Could this be added as an enhancement or similar Issue in GH/Jira? Any
> other ideas?
>
> Thanks!
>