You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Roman van der Krogt (JIRA)" <ji...@apache.org> on 2019/04/15 14:48:00 UTC

[jira] [Created] (BEAM-7081) MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey

Roman van der Krogt created BEAM-7081:
-----------------------------------------

             Summary: MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey
                 Key: BEAM-7081
                 URL: https://issues.apache.org/jira/browse/BEAM-7081
             Project: Beam
          Issue Type: Bug
          Components: io-java-mongodb
            Reporter: Roman van der Krogt


When there is only a single split key, splitKeysToFilters does not compute the correct result. For example, if the split key is "_id: 56", only the range filter "_id lower than or equal to 56" is produced. It should also include a filter "_id greater than 56". If this happens, the resulting PCollection includes only the data until the first split; the remainder is not included.

 

This can be remedied with the following few lines:

 

{{if (i == 0) {}}
{{  // this is the first split in the list, the filter defines}}
{{  // the range from the beginning up to this split}}
{{  rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
{{  getFilterString(idType, splitKey));}}
{{  filters.add(formatFilter(rangeFilter, additionalFilter));}}
{{{color:#f79232}  {color}{color:#14892c}// If there is only one split, also generate a range from the split to the end{color}}}
{{{color:#14892c}  if ( splitKeys.size() == 1) {{color}}}
{{{color:#14892c}    rangeFilter = String.format("\{ $and: [ {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
{{{color:#14892c}    filters.add(formatFilter(rangeFilter, additionalFilter));{color}}}
{{{color:#14892c}  }{color}}}
{{}}}

 

The corresponding test case in MongoDbIOTest should be updated to the following:

 

{{@Test}}
{{public void testSplitIntoFilters() throws Exception {}}
{{  // A single split will result in two filters}}
{{  ArrayList<Document> documents = new ArrayList<>();}}
{{  documents.add(new Document("_id", 56));}}
{{  List<String> filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
{{  assertEquals(2, filters.size());}}
{{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
{{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}", filters.get(1));}}

{{  // Add two more splits; now we should have 4 filters}}
{{  documents.add(new Document("_id", 109));}}
{{  documents.add(new Document("_id", 256));}}
{{  filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
{{  assertEquals(4, filters.size());}}
{{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
{{  assertEquals("{ $and: [ {\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
{{ filters.get(1));}}
{{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
{{ filters.get(2));}}
{{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}", filters.get(3));}}
{{}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)