You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2019/12/04 00:18:00 UTC
[jira] [Commented] (BEAM-8884) Python MongoDBIO TypeError when
splitting
[ https://issues.apache.org/jira/browse/BEAM-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987381#comment-16987381 ]
Brian Hulette commented on BEAM-8884:
-------------------------------------
[~yichi] it looks like you wrote mongodbio, do you have any thoughts on this?
> Python MongoDBIO TypeError when splitting
> -----------------------------------------
>
> Key: BEAM-8884
> URL: https://issues.apache.org/jira/browse/BEAM-8884
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Brian Hulette
> Priority: Major
>
> From [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p1575350991134000]:
> I am trying to run a pipeline (defined with the Python SDK) on Dataflow that uses beam.io.ReadFromMongoDB. When dealing with very small datasets (<10mb) it runs fine, when trying to run it with slightly larger datasets (70mb), I always get this error:
> {code:}
> TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
> {code}
> Stack trace see below. Running it on a local machine works just fine. I would highly appreciate any pointers what this could be.
> I hope this is the right channel do address this.
> {code:}
> Traceback (most recent call last):
> File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
> work_executor.execute()
> File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 218, in execute
> self._split_task)
> File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 226, in _perform_source_split_considering_api_limits
> desired_bundle_size)
> File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 263, in _perform_source_split
> for split in source.split(desired_bundle_size):
> File "/usr/local/lib/python3.7/site-packages/apache_beam/io/mongodbio.py", line 174, in split
> bundle_end = min(stop_position, split_key_id)
> TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)