You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Roman van der Krogt (JIRA)" <ji...@apache.org> on 2019/04/15 14:48:00 UTC
[jira] [Created] (BEAM-7081) MongoDbIO.splitKeysToFilters returns
incorrect filters with only one splitkey
Roman van der Krogt created BEAM-7081:
-----------------------------------------
Summary: MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey
Key: BEAM-7081
URL: https://issues.apache.org/jira/browse/BEAM-7081
Project: Beam
Issue Type: Bug
Components: io-java-mongodb
Reporter: Roman van der Krogt
When there is only a single split key, splitKeysToFilters does not compute the correct result. For example, if the split key is "_id: 56", only the range filter "_id lower than or equal to 56" is produced. It should also include a filter "_id greater than 56". If this happens, the resulting PCollection includes only the data until the first split; the remainder is not included.
This can be remedied with the following few lines:
{{if (i == 0) {}}
{{ // this is the first split in the list, the filter defines}}
{{ // the range from the beginning up to this split}}
{{ rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
{{ getFilterString(idType, splitKey));}}
{{ filters.add(formatFilter(rangeFilter, additionalFilter));}}
{{{color:#f79232} {color}{color:#14892c}// If there is only one split, also generate a range from the split to the end{color}}}
{{{color:#14892c} if ( splitKeys.size() == 1) {{color}}}
{{{color:#14892c} rangeFilter = String.format("\{ $and: [ {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
{{{color:#14892c} filters.add(formatFilter(rangeFilter, additionalFilter));{color}}}
{{{color:#14892c} }{color}}}
{{}}}
The corresponding test case in MongoDbIOTest should be updated to the following:
{{@Test}}
{{public void testSplitIntoFilters() throws Exception {}}
{{ // A single split will result in two filters}}
{{ ArrayList<Document> documents = new ArrayList<>();}}
{{ documents.add(new Document("_id", 56));}}
{{ List<String> filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
{{ assertEquals(2, filters.size());}}
{{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
{{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}", filters.get(1));}}
{{ // Add two more splits; now we should have 4 filters}}
{{ documents.add(new Document("_id", 109));}}
{{ documents.add(new Document("_id", 256));}}
{{ filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
{{ assertEquals(4, filters.size());}}
{{ assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
{{ assertEquals("{ $and: [ {\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
{{ filters.get(1));}}
{{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
{{ filters.get(2));}}
{{ assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}", filters.get(3));}}
{{}}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)