You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Carlos Alonso <ca...@mrcalonso.com> on 2018/05/09 17:13:48 UTC

Re: BigQuery streaming insert errors

Filed https://issues.apache.org/jira/browse/BEAM-4257 and currently working
on it

On Sat, Apr 7, 2018 at 1:57 AM Gaurav Thakur <ga...@gmail.com> wrote:

> Carlos,
>
> I see your point.
> I was expecting the InsertRetryPolicy.Context to hold and give an handle
> to that information. Spoke too soon.
>
> Thanks, Gaurav
>
> On Fri, Apr 6, 2018 at 8:01 PM, Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> Hi Carlos,
>>
>> I don't think currently there's a way to collect the errors from BigQuery
>> for failed inserts. I agree that this can be useful addition. Feel free to
>> create a JIRA. Also, any contributions related to this are welcome.
>>
>> Thanks,
>> Cham
>>
>>
>> On Fri, Apr 6, 2018 at 12:29 AM Carlos Alonso <ca...@mrcalonso.com>
>> wrote:
>>
>>> Hi Gurav, many thanks for your response. I'm currently using retry
>>> policies, but imagine the following scenario:
>>>
>>> I'm trying to insert an existing field, even if we retry, it will still
>>> fail but I'll never be able to detect that within the pipeline, as
>>> getFailedInserts()
>>> https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedInserts-- only
>>> contains the TableRows that failed, not the reason.
>>>
>>> Adding the error as well won't be very hard as I understand it because
>>> BigQueryServicesImpl.insertAll|() actually know about it:
>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L750
>>>
>>> I think I would even volunteer to work on it if the community feels it
>>> makes sense as well.
>>>
>>> Regards
>>>
>>> On Fri, Apr 6, 2018 at 1:28 AM Gaurav Thakur <ga...@gmail.com>
>>> wrote:
>>>
>>>> Hi Carlos,
>>>>
>>>> Would an insert retry policy help you?
>>>> Please see this,
>>>> https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.Context.html
>>>>
>>>> Thanks, Gaurav
>>>>
>>>> On Fri, Apr 6, 2018 at 8:13 AM, Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> Im adding Cham as he might be knowledgeable about BQ IO, or he might
>>>>> be able to redirect to someone else.
>>>>> Cham, do you have guidance for Carlos here?
>>>>> Thanks
>>>>> -P.
>>>>>
>>>>>
>>>>> On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <ca...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> And... where could I catch that exception?
>>>>>>
>>>>>> Thanks!
>>>>>> On Mon, 2 Apr 2018 at 16:58, Ted Yu <yu...@gmail.com> wrote:
>>>>>>
>>>>>>> Wouldn't the following code give you information about failed
>>>>>>> insertions (around line 790 in BigQueryServicesImpl) ?
>>>>>>>
>>>>>>>       if (!allErrors.isEmpty()) {
>>>>>>>         throw new IOException("Insert failed: " + allErrors);
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <ca...@mrcalonso.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi everyone!!
>>>>>>>>
>>>>>>>> I was wondering if there's any way to get the error why an insert
>>>>>>>> (streaming) failed. Looking at the code I think there's currently no way to
>>>>>>>> do that, as the BigQueryServicesImpl insertAll seems to discard the errors
>>>>>>>> and just add the failed TableRow instances into the failedInserts list.
>>>>>>>>
>>>>>>>> It would be very nice to have an "enriched" TableRow returned
>>>>>>>> instead that contains the error information for further processing (in our
>>>>>>>> use case we're saving the failed ones into a different table for further
>>>>>>>> analysis)
>>>>>>>>
>>>>>>>> Could this be added as an enhancement or similar Issue in GH/Jira?
>>>>>>>> Any other ideas?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>> Got feedback? go/pabloem-feedback
>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>
>>>>
>>>>
>