You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Matteo Martignon (Jira)" <ji...@apache.org> on 2021/05/19 17:51:00 UTC

[jira] [Commented] (BEAM-7826) Problem loading ISO-8859-1 into BigQuery using DataFlow

    [ https://issues.apache.org/jira/browse/BEAM-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347803#comment-17347803 ] 

Matteo Martignon commented on BEAM-7826:
----------------------------------------

Same happens for specific Spanish characters characters like 'ñ' or accents 'á','é'. Dataflow workers default charset is US-ASCII.
{{Method }}{{getBytes(StandardCharsets.UTF_8) does not actually returns byte array UTF-8 encoded.}}
 
{{Example:}}[https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L33|https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L333]
[|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13262858]

> Problem loading ISO-8859-1 into BigQuery using DataFlow
> -------------------------------------------------------
>
>                 Key: BEAM-7826
>                 URL: https://issues.apache.org/jira/browse/BEAM-7826
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp, io-java-text
>    Affects Versions: 2.8.0
>            Reporter: Israel Gómez
>            Priority: P3
>
> Hi all,
> I'm trying to load an ISO-8859-1 file into BigQuery using DataFlow. I've built a template with Apache Beam Java. Everything works well but when I check the content of the Bigquery table I see that some characters like 'ñ' or accents 'á','é', etc. haven't been stored propertly, they have been stored as �.
> I've tried several charset changing before write into BigQuery. Also, I've created a special ISOCoder passed to the pipeline using the method setCoder(), but nothing works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)