You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/09/02 17:08:38 UTC
[jira] [Updated] (BEAM-1800) Can't save datastore objects
[ https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Beam JIRA Bot updated BEAM-1800:
--------------------------------
Labels: (was: stale-P2)
> Can't save datastore objects
> ----------------------------
>
> Key: BEAM-1800
> URL: https://issues.apache.org/jira/browse/BEAM-1800
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Mike Lambert
> Priority: P3
>
> I can't seem to save my database objects using {{WriteToDatastore}}, as it errors out on a strange unicode issue when trying to write a batch. Stacktrace follows:
> {noformat}
> File "apache_beam/runners/common.py", line 195, in apache_beam.runners.common.DoFnRunner.receive (apache_beam/runners/common.c:5142)
> self.process(windowed_value)
> File "apache_beam/runners/common.py", line 267, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:7201)
> self.reraise_augmented(exn)
> File "apache_beam/runners/common.py", line 279, in apache_beam.runners.common.DoFnRunner.reraise_augmented (apache_beam/runners/common.c:7590)
> raise type(exn), args, sys.exc_info()[2]
> File "apache_beam/runners/common.py", line 263, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:7090)
> self._dofn_simple_invoker(element)
> File "apache_beam/runners/common.py", line 198, in apache_beam.runners.common.DoFnRunner._dofn_simple_invoker (apache_beam/runners/common.c:5262)
> self._process_outputs(element, self.dofn_process(element.value))
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py", line 354, in process
> self._flush_batch()
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py", line 363, in _flush_batch
> helper.write_mutations(self._datastore, self._project, self._mutations)
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", line 187, in write_mutations
> commit(commit_request)
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 174, in wrapper
> return fun(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", line 185, in commit
> datastore.commit(req)
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 140, in commit
> datastore_pb2.CommitResponse)
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 199, in _call_method
> method='POST', body=payload, headers=headers)
> File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 631, in new_request
> redirections, connection_type)
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request (response, content)
> = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request (response, content)
> = self._conn_request(conn, request_uri, method, body, headers)
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1273, in _conn_request
> conn.request(method, request_uri, body, headers)
> File "/usr/lib/python2.7/httplib.py", line 1039, in request
> self._send_request(method, url, body, headers)
> File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
> self.endheaders(body)
> File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
> self._send_output(message_body)
> File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
> msg += message_body TypeError: must be str, not unicode
> [while running 'write to datastore/Convert to Mutation']
> {noformat}
> My code is basically:
> {noformat}
> | 'convert from entity' >> beam.Map(ConvertFromEntity)
> | 'write to datastore' >> WriteToDatastore(client.project)
> {noformat}
> Where {{ConvertFromEntity}} converts from a google.cloud.datastore object (which has a nice API/interface) into the underlying protobuf (which is what the beam gcp/datastore library expects):
> {noformat}
> from google.cloud.datastore import helpers
> def ConvertFromEntity(entity):
> return helpers.entity_to_protobuf(entity)
> {noformat}
> I assume entity_to_protobuf works fine/normally, since it's also what is used by {{google/cloud/datastore/batch.py}} to write a bunch of {{entity_pb2.Entity}} objects into the {{datastore_pb2.CommitRequest.mutations[n].upsert}}:
> In batch.py: {{put() -> _assign_entity_to_pb() -> entity_to_protobuf()}}.
> In datastoreio.py: {{WriteToDatastore->DatastoreWriteFn.to_upsert_mutation->_Mutate.DatastoreWriteFn->helper.write_mutations}}
> Any idea what's going on here and why this doesn't work? Yes, I may have some unicode in my objects...but it works in my appengine DB/NDB usage. I will attempt to skip WriteToDatastore and just put unbatched entities using the datastore library and see if that goes any better for me...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)