You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Matteo Martignon (Jira)" <ji...@apache.org> on 2021/05/19 17:51:00 UTC
[jira] [Commented] (BEAM-7826) Problem loading ISO-8859-1 into
BigQuery using DataFlow
[ https://issues.apache.org/jira/browse/BEAM-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347803#comment-17347803 ]
Matteo Martignon commented on BEAM-7826:
----------------------------------------
Same happens for specific Spanish characters characters like 'ñ' or accents 'á','é'. Dataflow workers default charset is US-ASCII.
{{Method }}{{getBytes(StandardCharsets.UTF_8) does not actually returns byte array UTF-8 encoded.}}
{{Example:}}[https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L33|https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L333]
[|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13262858]
> Problem loading ISO-8859-1 into BigQuery using DataFlow
> -------------------------------------------------------
>
> Key: BEAM-7826
> URL: https://issues.apache.org/jira/browse/BEAM-7826
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp, io-java-text
> Affects Versions: 2.8.0
> Reporter: Israel Gómez
> Priority: P3
>
> Hi all,
> I'm trying to load an ISO-8859-1 file into BigQuery using DataFlow. I've built a template with Apache Beam Java. Everything works well but when I check the content of the Bigquery table I see that some characters like 'ñ' or accents 'á','é', etc. haven't been stored propertly, they have been stored as �.
> I've tried several charset changing before write into BigQuery. Also, I've created a special ISOCoder passed to the pipeline using the method setCoder(), but nothing works.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)