You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/09/02 17:08:38 UTC

[jira] [Commented] (BEAM-1800) Can't save datastore objects

    [ https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189506#comment-17189506 ] 

Beam JIRA Bot commented on BEAM-1800:
-------------------------------------

This issue was marked "stale-P2" and has not received a public comment in 14 days. It is now automatically moved to P3. If you are still affected by it, you can comment and move it back to P2.

> Can't save datastore objects
> ----------------------------
>
>                 Key: BEAM-1800
>                 URL: https://issues.apache.org/jira/browse/BEAM-1800
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Mike Lambert
>            Priority: P3
>
> I can't seem to save my database objects using {{WriteToDatastore}}, as it errors out on a strange unicode issue when trying to write a batch. Stacktrace follows:
> {noformat}
> File "apache_beam/runners/common.py", line 195, in apache_beam.runners.common.DoFnRunner.receive (apache_beam/runners/common.c:5142)
>   self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 267, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:7201)
>   self.reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 279, in apache_beam.runners.common.DoFnRunner.reraise_augmented (apache_beam/runners/common.c:7590)
>   raise type(exn), args, sys.exc_info()[2] 
> File "apache_beam/runners/common.py", line 263, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:7090)
>   self._dofn_simple_invoker(element) 
> File "apache_beam/runners/common.py", line 198, in apache_beam.runners.common.DoFnRunner._dofn_simple_invoker (apache_beam/runners/common.c:5262)
>   self._process_outputs(element, self.dofn_process(element.value)) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py", line 354, in process
>   self._flush_batch() 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py", line 363, in _flush_batch
>   helper.write_mutations(self._datastore, self._project, self._mutations) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", line 187, in write_mutations
>   commit(commit_request) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 174, in wrapper
>   return fun(*args, **kwargs) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", line 185, in commit
>   datastore.commit(req) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 140, in commit
>   datastore_pb2.CommitResponse) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 199, in _call_method
>   method='POST', body=payload, headers=headers) 
> File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 631, in new_request
>   redirections, connection_type) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request (response, content)
>   = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request (response, content)
>   = self._conn_request(conn, request_uri, method, body, headers) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1273, in _conn_request
>   conn.request(method, request_uri, body, headers) 
> File "/usr/lib/python2.7/httplib.py", line 1039, in request
>   self._send_request(method, url, body, headers)
> File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
>    self.endheaders(body) 
> File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
>   self._send_output(message_body) 
> File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
>   msg += message_body TypeError: must be str, not unicode
> [while running 'write to datastore/Convert to Mutation']
> {noformat}
> My code is basically:
> {noformat}
>         | 'convert from entity' >> beam.Map(ConvertFromEntity)
>         | 'write to datastore' >> WriteToDatastore(client.project)
> {noformat}
> Where {{ConvertFromEntity}} converts from a google.cloud.datastore object (which has a nice API/interface) into the underlying protobuf (which is what the beam gcp/datastore library expects):
> {noformat}
> from google.cloud.datastore import helpers
> def ConvertFromEntity(entity):
>     return helpers.entity_to_protobuf(entity)
> {noformat}
> I assume entity_to_protobuf works fine/normally, since it's also what is used by {{google/cloud/datastore/batch.py}} to write a bunch of {{entity_pb2.Entity}} objects into the {{datastore_pb2.CommitRequest.mutations[n].upsert}}:
> In batch.py: {{put() -> _assign_entity_to_pb() -> entity_to_protobuf()}}.
> In datastoreio.py: {{WriteToDatastore->DatastoreWriteFn.to_upsert_mutation->_Mutate.DatastoreWriteFn->helper.write_mutations}}
> Any idea what's going on here and why this doesn't work? Yes, I may have some unicode in my objects...but it works in my appengine DB/NDB usage. I will attempt to skip WriteToDatastore and just put unbatched entities using the datastore library and see if that goes any better for me...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)