You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (JIRA)" <ji...@apache.org> on 2019/04/19 08:55:00 UTC
[jira] [Commented] (BEAM-7081) MongoDbIO.splitKeysToFilters returns
incorrect filters with only one splitkey
[ https://issues.apache.org/jira/browse/BEAM-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821786#comment-16821786 ]
Ismaël Mejía commented on BEAM-7081:
------------------------------------
Since you already found the issue and the fix would you mind to submit a PR so we can merge it hanks!
> MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey
> -----------------------------------------------------------------------------
>
> Key: BEAM-7081
> URL: https://issues.apache.org/jira/browse/BEAM-7081
> Project: Beam
> Issue Type: Bug
> Components: io-java-mongodb
> Reporter: Roman van der Krogt
> Priority: Critical
>
> When there is only a single split key, splitKeysToFilters does not compute the correct result. For example, if the split key is "_id: 56", only the range filter "_id lower than or equal to 56" is produced. It should also include a filter "_id greater than 56". If this happens, the resulting PCollection includes only the data until the first split; the remainder is not included.
>
> This can be remedied with the following few lines:
>
> {{if (i == 0) {}}
> {{ // this is the first split in the list, the filter defines}}
> {{ // the range from the beginning up to this split}}
> {{ rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
> {{ getFilterString(idType, splitKey));}}
> {{ filters.add(formatFilter(rangeFilter, additionalFilter));}}
> {{{color:#f79232} {color}{color:#14892c}// If there is only one split, also generate a range from the split to the end{color}}}
> {{{color:#14892c} if ( splitKeys.size() == 1) {{color}}}
> {{{color:#14892c} rangeFilter = String.format("\{ $and: [ {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
> {{{color:#14892c} filters.add(formatFilter(rangeFilter, additionalFilter));{color}}}
> {{{color:#14892c} }{color}}}
> {{}}}
>
> The corresponding test case in MongoDbIOTest should be updated to the following:
>
> {{@Test}}
> {{public void testSplitIntoFilters() throws Exception {}}
> {{ // A single split will result in two filters}}
> {{ ArrayList<Document> documents = new ArrayList<>();}}
> {{ documents.add(new Document("_id", 56));}}
> {{ List<String> filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{ assertEquals(2, filters.size());}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}", filters.get(1));}}
> {{ // Add two more splits; now we should have 4 filters}}
> {{ documents.add(new Document("_id", 109));}}
> {{ documents.add(new Document("_id", 256));}}
> {{ filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{ assertEquals(4, filters.size());}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
> {{ assertEquals("{ $and: [ {\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
> {{ filters.get(1));}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
> {{ filters.get(2));}}
> {{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}", filters.get(3));}}
> {{}}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)