You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Eugene Nikolaiev (Jira)" <ji...@apache.org> on 2020/11/22 13:50:00 UTC

[jira] [Commented] (BEAM-3165) Mongo document read with non hex objectid

    [ https://issues.apache.org/jira/browse/BEAM-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236943#comment-17236943 ] 

Eugene Nikolaiev commented on BEAM-3165:
----------------------------------------

The MongoDB readers currently rely on standard hex _id field for bundles splitting using object id range tracker. So, no quick fix is possible. Either custom range trackers would need to be implemented, or (maybe) an option to disable splitting into bundles.

> Mongo document read with non hex objectid
> -----------------------------------------
>
>                 Key: BEAM-3165
>                 URL: https://issues.apache.org/jira/browse/BEAM-3165
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-mongodb
>    Affects Versions: 2.1.0
>            Reporter: Utkarsh Sopan
>            Priority: P3
>
> I have a mongo collection which has non-hex '_id' in form a string.
> I cant read them into a PCollection getting following exception
> Exception in thread "main" java.lang.IllegalArgumentException: invalid hexadecimal representation of an ObjectId: [somestring]
> 	at org.bson.types.ObjectId.parseHexString(ObjectId.java:523)
> 	at org.bson.types.ObjectId.<init>(ObjectId.java:237)
> 	at org.bson.json.JsonReader.visitObjectIdConstructor(JsonReader.java:674)
> 	at org.bson.json.JsonReader.readBsonType(JsonReader.java:197)
> 	at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:139)
> 	at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
> 	at org.bson.codecs.configuration.LazyCodec.decode(LazyCodec.java:47)
> 	at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215)
> 	at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141)
> 	at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
> 	at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215)
> 	at org.bson.codecs.DocumentCodec.readList(DocumentCodec.java:222)
> 	at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:208)
> 	at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141)
> 	at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
> 	at org.bson.Document.parse(Document.java:105)
> 	at org.bson.Document.parse(Document.java:90)
> 	at org.apache.beam.sdk.io.mongodb.MongoDbIO$BoundedMongoDbReader.start(MongoDbIO.java:472)
> 	at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141)
> 	at org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
> 	at org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)