You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (JIRA)" <ji...@apache.org> on 2019/04/19 08:55:00 UTC

[jira] [Commented] (BEAM-7081) MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey

    [ https://issues.apache.org/jira/browse/BEAM-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821786#comment-16821786 ] 

Ismaël Mejía commented on BEAM-7081:
------------------------------------

Since you already found the issue and the fix would you mind to submit a PR so we can merge it hanks!

> MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-7081
>                 URL: https://issues.apache.org/jira/browse/BEAM-7081
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-mongodb
>            Reporter: Roman van der Krogt
>            Priority: Critical
>
> When there is only a single split key, splitKeysToFilters does not compute the correct result. For example, if the split key is "_id: 56", only the range filter "_id lower than or equal to 56" is produced. It should also include a filter "_id greater than 56". If this happens, the resulting PCollection includes only the data until the first split; the remainder is not included.
>  
> This can be remedied with the following few lines:
>  
> {{if (i == 0) {}}
> {{  // this is the first split in the list, the filter defines}}
> {{  // the range from the beginning up to this split}}
> {{  rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
> {{  getFilterString(idType, splitKey));}}
> {{  filters.add(formatFilter(rangeFilter, additionalFilter));}}
> {{{color:#f79232}  {color}{color:#14892c}// If there is only one split, also generate a range from the split to the end{color}}}
> {{{color:#14892c}  if ( splitKeys.size() == 1) {{color}}}
> {{{color:#14892c}    rangeFilter = String.format("\{ $and: [ {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
> {{{color:#14892c}    filters.add(formatFilter(rangeFilter, additionalFilter));{color}}}
> {{{color:#14892c}  }{color}}}
> {{}}}
>  
> The corresponding test case in MongoDbIOTest should be updated to the following:
>  
> {{@Test}}
> {{public void testSplitIntoFilters() throws Exception {}}
> {{  // A single split will result in two filters}}
> {{  ArrayList<Document> documents = new ArrayList<>();}}
> {{  documents.add(new Document("_id", 56));}}
> {{  List<String> filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{  assertEquals(2, filters.size());}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}", filters.get(1));}}
> {{  // Add two more splits; now we should have 4 filters}}
> {{  documents.add(new Document("_id", 109));}}
> {{  documents.add(new Document("_id", 256));}}
> {{  filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{  assertEquals(4, filters.size());}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
> {{  assertEquals("{ $and: [ {\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
> {{ filters.get(1));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
> {{ filters.get(2));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}", filters.get(3));}}
> {{}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)