You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/01/15 06:14:53 UTC

[GitHub] [druid] jacobtolar opened a new pull request #9192: Add ReplaceMissingValueExtractionFn

jacobtolar opened a new pull request #9192: Add ReplaceMissingValueExtractionFn
URL: https://github.com/apache/druid/pull/9192
 
 
   ### Description
   
   This adds a new built-in extraction function to coalesce missing values to a provided value. I'd be happy to find out that there's an existing good way to do this; I ended up with a query like this: 
   ```
   ...
     "dimensions": [
       {
         "type": "extraction",
         "dimension": "field_name",
         "outputName": "output_field_name",
         "extractionFn": {
           "type": "regex",
           "expr": "(.+)",
           "replaceMissingValue": true,
           "replaceMissingValueWith": "0"
         }
       },
   ...
   ```
   
   This is obviously a roundabout way of achieving what I want to accomplish, but I couldn't find any other way to do this using the existing built-in extraction functions. (It also can't quite be done with the map-based `lookup` extractor). I'd be happy to learn if there's a better approach (?). 
   
   This PR provides an extraction function that allows you to directly execute this 'coalesce null values' operation. You'd configure it like this:
   
   ```
   ...
     "dimensions": [
       {
         "type": "extraction",
         "dimension": "field_name",
         "outputName": "output_field_name",
         "extractionFn": {
           "type": "replaceMissing",
           "replaceMissingValueWith": "0"
         }
       },
   ...
   ```
   
   It does this by extending `FunctionalExtraction` (passing in the identity function) and configuring the null replacement options in that class. Nothing too complicated.
   
   If this seems useful, I'm happy to update documentation/tests/etc as needed in this PR, but didn't want to go through that effort if already a better way to do this. 
   
   Thanks!
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed, *although I'm only passingly familiar with the Druid codebase*.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ReplaceMissingValueExtractionFn`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jacobtolar commented on issue #9192: Add ReplaceMissingValueExtractionFn

Posted by GitBox <gi...@apache.org>.
jacobtolar commented on issue #9192: Add ReplaceMissingValueExtractionFn
URL: https://github.com/apache/druid/pull/9192#issuecomment-575216089
 
 
   Ah. Looks like the tests run in two modes on Travis and my test fails in the SQL compatibility mode. Will look into it when I get some free time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jacobtolar closed pull request #9192: Add ReplaceMissingValueExtractionFn

Posted by GitBox <gi...@apache.org>.
jacobtolar closed pull request #9192: Add ReplaceMissingValueExtractionFn
URL: https://github.com/apache/druid/pull/9192
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jacobtolar commented on issue #9192: Add ReplaceMissingValueExtractionFn

Posted by GitBox <gi...@apache.org>.
jacobtolar commented on issue #9192: Add ReplaceMissingValueExtractionFn
URL: https://github.com/apache/druid/pull/9192#issuecomment-577343548
 
 
   Not sure what I was doing with `lookup` (I think maybe I used `{null: "0"}`, which is invalid json) when I played around with this earlier, but it does what I want:
   
   ```
   {
     "type":"lookup",
     "dimension":"some_dimension",
     "outputName":"some_dimension",
     "retainMissingValue":true,
     "lookup":{"type": "map", "map":{"":"0"}, "isOneToOne":false}
   }
   ```
   
   So, I don't think this change is really necessary; closing

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org