You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/05/08 05:54:00 UTC

[jira] [Work logged] (BEAM-7081) MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey

     [ https://issues.apache.org/jira/browse/BEAM-7081?focusedWorklogId=239015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239015 ]

ASF GitHub Bot logged work on BEAM-7081:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/May/19 05:53
            Start Date: 08/May/19 05:53
    Worklog Time Spent: 10m 
      Work Description: jbonofre commented on pull request #8359: [BEAM-7081] MongoDbIO: produce correct ranges for splitkeys
URL: https://github.com/apache/beam/pull/8359
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 239015)
    Time Spent: 1h 20m  (was: 1h 10m)

> MongoDbIO.splitKeysToFilters returns incorrect filters with only one splitkey
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-7081
>                 URL: https://issues.apache.org/jira/browse/BEAM-7081
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-mongodb
>            Reporter: Roman van der Krogt
>            Assignee: Roman van der Krogt
>            Priority: Critical
>             Fix For: 2.13.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When there is only a single split key, splitKeysToFilters does not compute the correct result. For example, if the split key is "_id: 56", only the range filter "_id lower than or equal to 56" is produced. It should also include a filter "_id greater than 56". If this happens, the resulting PCollection includes only the data until the first split; the remainder is not included.
>  
> This can be remedied with the following few lines:
>  
> {{if (i == 0) {}}
> {{  // this is the first split in the list, the filter defines}}
> {{  // the range from the beginning up to this split}}
> {{  rangeFilter = String.format("\{ $and: [ {\"_id\":{$lte:%s}}",}}
> {{  getFilterString(idType, splitKey));}}
> {{  filters.add(formatFilter(rangeFilter, additionalFilter));}}
> {{{color:#f79232}  {color}{color:#14892c}// If there is only one split, also generate a range from the split to the end{color}}}
> {{{color:#14892c}  if ( splitKeys.size() == 1) {{color}}}
> {{{color:#14892c}    rangeFilter = String.format("\{ $and: [ {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));{color}}}
> {{{color:#14892c}    filters.add(formatFilter(rangeFilter, additionalFilter));{color}}}
> {{{color:#14892c}  }{color}}}
> {{}}}
>  
> The corresponding test case in MongoDbIOTest should be updated to the following:
>  
> {{@Test}}
> {{public void testSplitIntoFilters() throws Exception {}}
> {{  // A single split will result in two filters}}
> {{  ArrayList<Document> documents = new ArrayList<>();}}
> {{  documents.add(new Document("_id", 56));}}
> {{  List<String> filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{  assertEquals(2, filters.size());}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"56\")}} ]}", filters.get(1));}}
> {{  // Add two more splits; now we should have 4 filters}}
> {{  documents.add(new Document("_id", 109));}}
> {{  documents.add(new Document("_id", 256));}}
> {{  filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);}}
> {{  assertEquals(4, filters.size());}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$lte:ObjectId(\"56\")}} ]}", filters.get(0));}}
> {{  assertEquals("{ $and: [ {\"_id\":({$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\")}} ]}",}}
> {{ filters.get(1));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\")}} ]}",}}
> {{ filters.get(2));}}
> {{  assertEquals("\{ $and: [ {\"_id\":{$gt:ObjectId(\"256\")}} ]}", filters.get(3));}}
> {{}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)