You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/02/01 04:01:33 UTC

[GitHub] [drill] GavinRay97 opened a new issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

GavinRay97 opened a new issue #2446:
URL: https://github.com/apache/drill/issues/2446


   Hello, I would like to embed Drill in a JVM application, running as a single node in-memory.
   I will feed it Calcite `RelNode` relational expressions to execute that my application is generating.
   
   Browsing the code to try to find out how best to go about this, I found in `ClusterFixtureBuilder.java`:
   
   https://github.com/apache/drill/blob/2decae18b85eeda51816e92d5a9e9e6e2f9ce8d5/exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixtureBuilder.java#L29-L43
   
   https://github.com/apache/drill/blob/2decae18b85eeda51816e92d5a9e9e6e2f9ce8d5/exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixtureBuilder.java#L279-L301
   
   But it looks like there is no Maven artifact or `.jar` to download to include this functionality as an end user =/
   
   I tried to copy-paste the primary classes, but there is a spiderweb of dependencies through out the `org.apache.drill.test` and `org.apache.drill.exec.testing` packages.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton edited a comment on issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Posted by GitBox <gi...@apache.org>.
jnturton edited a comment on issue #2446:
URL: https://github.com/apache/drill/issues/2446#issuecomment-1026489976


   There was some discussion in the mailing list in Novemever that might help you, maybe you can collaborate with @rymarm (who sent the emails) on this...
   
   EDIT: Oh dear, it looks like Pony Mail isn't very good at URLs.  But try searching for "Start embedded Drill on JDBC connection" with a date range of "the last year" here:
   
   https://lists.apache.org/list?dev@drill.apache.org


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2446:
URL: https://github.com/apache/drill/issues/2446#issuecomment-1026540149


   Great idea. I wrote a lot of that code originally, let me know if you have questions. The dependencies might be related to things like the test-only row set classes, integrations with JUnit for temporary directories and so on. It may be possible to split the class so that one class has only those bits and pieces needed for client apps. and a subclass adds the additional parts used by tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] rymarm commented on issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Posted by GitBox <gi...@apache.org>.
rymarm commented on issue #2446:
URL: https://github.com/apache/drill/issues/2446#issuecomment-1026745794


   @GavinRay97 several months before I have found that Drill can be run in "embedded mode" with a pretty simple configuration. To achieve this, you need to add the next dependencies to your project `pom.xml`:
   ```xml
   <dependencies>
           <dependency>
               <groupId>org.apache.drill.exec</groupId>
               <artifactId>drill-java-exec</artifactId>
               <version>1.19.0</version>
               <exclusions>
                   <exclusion>
                       <groupId>org.slf4j</groupId>
                       <artifactId>log4j-over-slf4j</artifactId>
                   </exclusion>
                   <exclusion>
                       <groupId>org.slf4j</groupId>
                       <artifactId>slf4j-log4j12</artifactId>
                   </exclusion>
               </exclusions>
           </dependency>
           <dependency>
               <groupId>org.apache.drill.exec</groupId>
               <artifactId>drill-jdbc</artifactId>
               <version>1.19.0</version>
               <exclusions>
                   <exclusion>
                       <groupId>org.slf4j</groupId>
                       <artifactId>log4j-over-slf4j</artifactId>
                   </exclusion>
                   <exclusion>
                       <groupId>org.slf4j</groupId>
                       <artifactId>slf4j-log4j12</artifactId>
                   </exclusion>
               </exclusions>
           </dependency>
           <dependency>
               <groupId>com.google.guava</groupId>
               <artifactId>guava</artifactId>
               <version>21.0</version>
           </dependency>
       </dependencies>
   ```
   And after that, you will be able to run embedded Drill with the following example code:
   ```java
       // This part is responsible for running the embedded Drill and establishing a connection to it.
       // "jdbc:drill:zk=local" is connection string to run embedded Drill 
       Connection connection = DriverManager.getConnection("jdbc:drill:zk=local"); 
       // Example of executing simple query
       Statement st = connection.createStatement();
       // `/home/maksym/Desktop/sample.csv` is path to csv file that I've created for the example
       ResultSet rs = st.executeQuery("select * from dfs.`/home/maksym/Desktop/sample.csv`");
       while (rs.next()) {
         System.out.println(rs.getString(1));
       }
       connection.close();
   ```
   
   I didn't find exhausting information on how exactly should be configured application: what dependencies are required, what properties are available, and so on. But you can dive into code and look at how embedded mode was implemented. Here is the departure point:
   https://github.com/apache/drill/blob/15b2f52260e4f0026f2dfafa23c5d32e0fb66502/exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillConnectionImpl.java#L104
   
   Besides this, you also find many Jira tickets that belong to issues with embedded Drill, here are several of them:
   [DRILL-2126](https://issues.apache.org/jira/browse/DRILL-2126),
   [DRILL-1654](https://issues.apache.org/jira/browse/DRILL-1654),
   [DRILL-1409](https://issues.apache.org/jira/browse/DRILL-1409)
   
   According to my investigation of code and manual tests, it seems, that embedded Drill works pretty well and the only issue is dependency conflicts, that is why in my example above, I added guava and excluded log jars. 
   
   I would like to gather as much information as possible about embedded Drill and add it to Drill documentation or make some code improvements to let users freely use this mode for their application. Of course, Drill was created as a distributed system, but Drill is so powerful tool that is also very useful even in single, embedded node mode. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers edited a comment on issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Posted by GitBox <gi...@apache.org>.
paul-rogers edited a comment on issue #2446:
URL: https://github.com/apache/drill/issues/2446#issuecomment-1027605931


   Thanks @rymarm  for the info! This is one of those cases where bug becomes a feature. The reason embedded Drill works via JDBC is that most of Drill ends up getting sucked into the JDBC driver for no good reason other than that the RPC code depends on everything else. That's lucky for you, but not so great for folks who just want a simple JDBC driver.
   
   As it turns out, the reason that SqlLine can run an embedded Drill is because the JDBC driver contains all the code. But, do we want a JDBC driver to include a Spunk connector, a PDF reader, support for Hadoop and all the rest? Kind of creates a rather fat client, and all those libraries conflict with that the surrounding app wants to do. This is why the JDBC driver build chucks a bunch of dependencies overboard.
   
   At some point (maybe Drill 2.0?) we need to create a simpler JDBC driver. At that point, the mechanism that @GavinRay97 original requested will be needed to start the server that the JDBC driver then connects to. We're not there now (far from it), but that's kind of where we should head. (There is a whole vector discussion that includes this topic.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2446:
URL: https://github.com/apache/drill/issues/2446#issuecomment-1027605931


   Thanks @GavinRay97 for the info! This is one of those cases where bug becomes a feature. The reason embedded Drill works via JDBC is that most of Drill ends up getting sucked into the JDBC driver for no good reason other than that the RPC code depends on everything else. That's lucky for you, but not so great for folks who just want a simple JDBC driver.
   
   As it turns out, the reason that SqlLine can run an embedded Drill is because the JDBC driver contains all the code. But, do we want a JDBC driver to include a Spunk connector, a PDF reader, support for Hadoop and all the rest? Kind of creates a rather fat client, and all those libraries conflict with that the surrounding app wants to do. This is why the JDBC driver build chucks a bunch of dependencies overboard.
   
   At some point (maybe Drill 2.0?) we need to create a simpler JDBC driver. At that point, the mechanism that you original requested will be needed to start the server that the JDBC driver then connects to. We're not there now (far from it), but that's kind of where we should head. (There is a whole vector discussion that includes this topic.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton edited a comment on issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Posted by GitBox <gi...@apache.org>.
jnturton edited a comment on issue #2446:
URL: https://github.com/apache/drill/issues/2446#issuecomment-1026489976


   There was some discussion in the mailing list in Novemever that might help you, maybe you can collaborate with @rymarm (who sent the emails) on this...
   
   EDIT: Oh dear, it looks like Pony Mail isn't very good at URLs.  Trying searching for "Start embedded Drill on JDBC connection" with a date range of "the last year" here:
   
   https://lists.apache.org/list?dev@drill.apache.org


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on issue #2446: Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Posted by GitBox <gi...@apache.org>.
jnturton commented on issue #2446:
URL: https://github.com/apache/drill/issues/2446#issuecomment-1026489976


   There was some discussion in the mailing list in Novemever that might help you, maybe you can collaborate with @rymarm (who sent the emails) on this...
   
   https://lists.apache.org/list?dev@drill.apache.org:lte=1y:Start%20embedded%20Drill%20on%20JDBC%20connection


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org