You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/22 16:19:34 UTC

[GitHub] [druid] LakshSingla opened a new pull request, #13138: Fix the Injector creation in HadoopTask

LakshSingla opened a new pull request, #13138:
URL: https://github.com/apache/druid/pull/13138

   ### Description
   
   As a part of the Guice refactoring, the injectors are built differently, having the ability to have or not have modules required for a class. HadoopTask uses the injector to create an instance of `ExtensionsLoader` which contains the `ExtensionsConfig` class. However, the modules in the injector are insufficient to supply the properties passed to the druid server to the `ExtensionsConfig` (POJO), due to which irrespective of the Druid's config, it will get instantiated with the default properties.
   This PR fixes the injector so that the `ExtensionsConfig` is instantiated properly.
   
   #### Testing
   To be added
   
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `HadoopTask`
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255341703

   @gianm, thanks for that info. As it turns out, the code that primes the injector, and preps the extensions is static: it will get loaded in any process that links to this code. Should this stuff be done in `run()` instead of static? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255327717

   Went ahead and added a log line containing the `ExtensionsConfig` in the constructor of the task post the approvals.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
gianm commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255352346

   > Cool. So we can leave well-enough alone for now. We can fix the static injectors later, but only after we ensure we have tests in this area so we don't break things a second time.
   
   That sounds like a good approach to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1256862460

   I think It's ok. I am going to merge this PR since you have verified the change manually. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255295404

   @gianm, thanks for your analysis. You raise an important issue around testing. It appears we have no tests for the case of the Hadoop task with extensions and that is concerning. It means I was able to break this area with no feedback that something was amiss.
   
   The workaround testing done here is to borrow some code from `StatusResource` and log the set of loaded extensions. Since we have no IT that runs this task, testing was done ad-hoc by hand on a local machine. Absolutely not ideal. But, perhaps good enough considering that extension loading is binary: they are either loaded or not.
   
   Moving forward, we do have an IT for this task which you identified: `ITHdfsToHdfsParallelIndexTest`. Due to memory limitations on Travis, that is not run as part of the Travis build. It is run elsewhere. But, that test does not also ensure that extensions are loaded.
   
   There are other HDFS-related tests in the `HDFS_DEEP_STORAGE` IT test group. None of them are run.
   
   We need to find a way to run these Hadoop tasks so that we get code coverage. We're discussing moving the build to a new build system. Perhaps that system can provide sufficient resources that we can run the HDFS (and Hadoop) containers in our ITs and so we can test this area. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255262891

   @LakshSingla, thanks for finding and fixing this regression. LGTM (non-binding). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255491813

   Build failing on the code coverage. IMO it shouldn't be required because the added method only adds a log line. Any workarounds here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
gianm commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255344194

   > @gianm, thanks for that info. As it turns out, the code that primes the injector, and preps the extensions is static: it will get loaded in any process that links to this code. Should this stuff be done in `run()` instead of static?
   
   Ideally, yes. It would require some refactoring. That would be nice, since static injectors are a bit goofy, although I also think it's fine to keep it the way it is. (It's been like this for a long time.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 merged pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 merged PR #13138:
URL: https://github.com/apache/druid/pull/13138


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255315310

   Regarding:
   > About testing: how did test your fix here
   
   Manual testing has been done to identify the cause of the original problem: submitted ingestion jobs weren't picking up the location of the custom location where druid deps were present. Testing on a local setup revealed that the `ExtensionsConfig` wasn't getting populated correctly irrespective of whatever was supplied in the config file. After discussion with @paul-rogers, it seemed like that the injector was at fault, and didn't contain the properties passed in the config to properly instantiate the `ExtensionsConfig`.
   Important note, the fix has been verified through manual logs/debug points, and the fix hasn't been verified on the original failing ingestion job yet. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
gianm commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255285420

   Thanks for the fix!
   
   The line you're adjusting is a regression in 24.0.0. From looking at the code, it seems that the impact here is the `index_hadoop` task won't properly realize which extensions are configured (since it isn't reading the server properties files) and therefore won't ship those jars to the Hadoop cluster. I think it would cause problems for jobs that use extensions such as sketches. That's a big enough issue that we should do a 24.0.1 release. So, I created a milestone and added this to it.
   
   About testing: how did test your fix here, and do you know why this wasn't caught as part of the 24.0.0 release testing? We'll want to make sure to test this case for 24.0.1 and future releases, ideally with an automated test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255349301

   Cool. So we can leave well-enough alone for now. We can fix the static injectors later, but only after we ensure we have tests in this area so we don't break things a second time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13138: Fix the Injector creation in HadoopTask

Posted by GitBox <gi...@apache.org>.
gianm commented on PR #13138:
URL: https://github.com/apache/druid/pull/13138#issuecomment-1255339885

   > Went ahead and added a log line containing the `ExtensionsConfig` in the constructor of the task post the approvals.
   
   Please don't add it there: tasks are serialized and deserialized at various times where they aren't actually run, so we don't want the constructor to log anything. Log messages should go to the `run` method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org