You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Igor Guzenko <ih...@gmail.com> on 2020/03/02 16:20:16 UTC

[DISCUSS] ExecConstants class refactoring

Hello Drillers,

I would like to refactor the ExecConstants class in order to group our
property keys by related sections. Neglecting some naming conventions for
the case, we can utilize Java interfaces with fields (which are constants
by default), and create something like:

public interface ExecOption {
  interface http {
    interface session {
      interface memory {
        String maximum = "drill.exec.http.session.memory.maximum";
        String reservation = "drill.exec.http.session.memory.reservation";
      }
    }

    interface jetty {
      interface server {
        String acceptors = "drill.exec.http.jetty.server.acceptors";
        String selectors = "drill.exec.http.jetty.server.selectors";
      }
    }
  }
}

So then with the help of Java imports, we can write client code like:

config.getLong(http.session.memory.reservation),
config.getLong(http.session.memory.maximum)

As a bonus, IDE's autocomplete and class structure features will allow an
exploration of available choices fairly easily, since all related options
will be related to specific nested interfaces instead of one top-level
class.

Any thoughts and suggestions are highly appreciated :)

Thanks,
Igor

Re: [DISCUSS] ExecConstants class refactoring

Posted by Arina Yelchiyeva <ar...@gmail.com>.
+1 for ExecConstants refactoring.
Igor, could you please create Jira to capture discussion in Jira and decision on the fix? Thanks.

Kind regards,
Arina

> On Mar 2, 2020, at 9:30 PM, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Igor,
> 
> You know the old joke about how you eat an elephant? One bite at a time.
> 
> Yes, trying to refactor java-exec in one go would be difficult, as hard as replacing value vectors in one go. Instead, we might do it in baby steps.
> 
> For example, as we get the new "SPI" mechanism to work for storage plugins, we have the option to pull the existing plugins into separate modules. Interestingly, these modules must be below java-exec (which becomes the integration point for the runtime modules. This means that the module to define the SPI must be even lower. This then requires that it be defined in terms of interfaces, some of which are implemented in java-exec.
> 
> The same process can be applied to operators. If we want an operator module, they must implement interfaces and accept interfaces, with the implementations residing elsewhere.
> 
> We've been gradually creating such interfaces, there are more to go. Each time we create one, we break apart a bit more of the tight coupling you mentioned.
> 
> 
> Why would we gradually do such work? Primarily so Drill can be extensible. We want storage plugins to be easily built outside of Drill and be somewhat cross-version compatible. The same argument could be made for specialized operators (specified in SQL as, say, some kind of table function.)
> 
> Further, I've seen over and over that tight coupling makes it ever harder to safely change anything, so changes get more expensive and the project stagnates. Drill might be different, but I'd not bet on it. So, we need modularity to allow the project to continue to flourish.
> 
> 
> I agree that is nothing we'd do any time soon. But, often if we know where we're going, we can slowly get our ducks in a line so we can eventually do it.
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Monday, March 2, 2020, 11:11:21 AM PST, Igor Guzenko <ih...@gmail.com> wrote:  
> 
> Hello Paul,
> 
> Thanks for quick response, I've been thinking a lot of times about the
> necessity to split java-exec into modules. Although it would be really cool
> to do it, at the moment it is almost impossible to do it correctly. Too
> many things are tightly coupled together and without good preparation,
> splitting will be a huge pain. By preparation I mean the long iterative
> process to collect related components into packages (by feature) and
> clearly define interaction APIs between the packages. Another long-standing
> issue which most probably will never be fixed... Huh, but at least we're
> moving towards the goal :)
> 
> Kind regards,
> Igor
> 
> On Mon, Mar 2, 2020 at 8:01 PM Paul Rogers <pa...@yahoo.com.invalid>
> wrote:
> 
>> Hi Igor,
>> 
>> Great idea; I've been noticing that file has gotten excessively large.
>> 
>> I wonder if we can split the file by topic instead of by the (often odd)
>> naming hierarchy which has evolved over the years. For example, one file
>> for internal server config options (thread counts, RPC stuff.) Another for
>> things related to local config (local file systems, etc.) Another related
>> to core operators (sorts, hash joins, etc.) Picking the right split will
>> require a bit of thought and sone experimentation.
>> 
>> 
>> There are three kinds of constants:
>> 
>> * Config variables (from drill-override.conf)
>> * System/session options
>> * Other random constants
>> 
>> One could argue that we should keep the three kinds together for each
>> topic. (That is, all the sort-related stuff in one place.) Whether that
>> means three well-known files in one place, three nested interfaces within a
>> single class could be debated.
>> 
>> One thing we probably should do is to separate out the string name of a
>> system/session property from the implementation of its validator. It used
>> to be that people would use the validator to access the option value. Most
>> newer code in the last several years uses the typed access methods with a
>> string key. So, we can move the validators into a OptionDefinitions
>> class/interface separate from the key definitions.
>> 
>> Most names are for the benefit of us: the poor humans who have to
>> understand them. The compiler would be happy with inline constant values.
>> Most names tend to be short to be easier to remember. For example, it is
>> easier to understand CLIENT_RPC_THREADS than
>> "drill.exec.rpc.user.client.threads".
>> 
>> 
>> Most costants are in ExecConstants. Some (but not all) constants for the
>> planner live in PlannerSettings. Oddly, some planner settings are in
>> ExecConstants. We might want to consolidate planner-related constants into
>> a single location.
>> 
>> One final thing to keep in mind is that the "java-exec" project has become
>> overly large. At some point, it might make sense to split it into
>> components, such as planner, server, exec engine, etc. So, if our constants
>> are grouped by functional units (rather than by name), it might make it
>> easier to reshuffle modules if we ever choose to do so.
>> 
>> Thanks,
>> - Paul
>> 
>> 
>> 
>>     On Monday, March 2, 2020, 8:20:32 AM PST, Igor Guzenko <
>> ihor.huzenko.igs@gmail.com> wrote:
>> 
>>   Hello Drillers,
>> 
>> I would like to refactor the ExecConstants class in order to group our
>> property keys by related sections. Neglecting some naming conventions for
>> the case, we can utilize Java interfaces with fields (which are constants
>> by default), and create something like:
>> 
>> public interface ExecOption {
>>   interface http {
>>     interface session {
>>       interface memory {
>>         String maximum = "drill.exec.http.session.memory.maximum";
>>         String reservation = "drill.exec.http.session.memory.reservation";
>>       }
>>     }
>> 
>>     interface jetty {
>>       interface server {
>>         String acceptors = "drill.exec.http.jetty.server.acceptors";
>>         String selectors = "drill.exec.http.jetty.server.selectors";
>>       }
>>     }
>>   }
>> }
>> 
>> So then with the help of Java imports, we can write client code like:
>> 
>> config.getLong(http.session.memory.reservation),
>> config.getLong(http.session.memory.maximum)
>> 
>> As a bonus, IDE's autocomplete and class structure features will allow an
>> exploration of available choices fairly easily, since all related options
>> will be related to specific nested interfaces instead of one top-level
>> class.
>> 
>> Any thoughts and suggestions are highly appreciated :)
>> 
>> Thanks,
>> Igor
>> 


Re: [DISCUSS] ExecConstants class refactoring

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Igor,

You know the old joke about how you eat an elephant? One bite at a time.

Yes, trying to refactor java-exec in one go would be difficult, as hard as replacing value vectors in one go. Instead, we might do it in baby steps.

For example, as we get the new "SPI" mechanism to work for storage plugins, we have the option to pull the existing plugins into separate modules. Interestingly, these modules must be below java-exec (which becomes the integration point for the runtime modules. This means that the module to define the SPI must be even lower. This then requires that it be defined in terms of interfaces, some of which are implemented in java-exec.

The same process can be applied to operators. If we want an operator module, they must implement interfaces and accept interfaces, with the implementations residing elsewhere.

We've been gradually creating such interfaces, there are more to go. Each time we create one, we break apart a bit more of the tight coupling you mentioned.


Why would we gradually do such work? Primarily so Drill can be extensible. We want storage plugins to be easily built outside of Drill and be somewhat cross-version compatible. The same argument could be made for specialized operators (specified in SQL as, say, some kind of table function.)

Further, I've seen over and over that tight coupling makes it ever harder to safely change anything, so changes get more expensive and the project stagnates. Drill might be different, but I'd not bet on it. So, we need modularity to allow the project to continue to flourish.


I agree that is nothing we'd do any time soon. But, often if we know where we're going, we can slowly get our ducks in a line so we can eventually do it.

Thanks,
- Paul

 

    On Monday, March 2, 2020, 11:11:21 AM PST, Igor Guzenko <ih...@gmail.com> wrote:  
 
 Hello Paul,

Thanks for quick response, I've been thinking a lot of times about the
necessity to split java-exec into modules. Although it would be really cool
to do it, at the moment it is almost impossible to do it correctly. Too
many things are tightly coupled together and without good preparation,
splitting will be a huge pain. By preparation I mean the long iterative
process to collect related components into packages (by feature) and
clearly define interaction APIs between the packages. Another long-standing
issue which most probably will never be fixed... Huh, but at least we're
moving towards the goal :)

Kind regards,
Igor

On Mon, Mar 2, 2020 at 8:01 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Igor,
>
> Great idea; I've been noticing that file has gotten excessively large.
>
> I wonder if we can split the file by topic instead of by the (often odd)
> naming hierarchy which has evolved over the years. For example, one file
> for internal server config options (thread counts, RPC stuff.) Another for
> things related to local config (local file systems, etc.) Another related
> to core operators (sorts, hash joins, etc.) Picking the right split will
> require a bit of thought and sone experimentation.
>
>
> There are three kinds of constants:
>
> * Config variables (from drill-override.conf)
> * System/session options
> * Other random constants
>
> One could argue that we should keep the three kinds together for each
> topic. (That is, all the sort-related stuff in one place.) Whether that
> means three well-known files in one place, three nested interfaces within a
> single class could be debated.
>
> One thing we probably should do is to separate out the string name of a
> system/session property from the implementation of its validator. It used
> to be that people would use the validator to access the option value. Most
> newer code in the last several years uses the typed access methods with a
> string key. So, we can move the validators into a OptionDefinitions
> class/interface separate from the key definitions.
>
> Most names are for the benefit of us: the poor humans who have to
> understand them. The compiler would be happy with inline constant values.
> Most names tend to be short to be easier to remember. For example, it is
> easier to understand CLIENT_RPC_THREADS than
> "drill.exec.rpc.user.client.threads".
>
>
> Most costants are in ExecConstants. Some (but not all) constants for the
> planner live in PlannerSettings. Oddly, some planner settings are in
> ExecConstants. We might want to consolidate planner-related constants into
> a single location.
>
> One final thing to keep in mind is that the "java-exec" project has become
> overly large. At some point, it might make sense to split it into
> components, such as planner, server, exec engine, etc. So, if our constants
> are grouped by functional units (rather than by name), it might make it
> easier to reshuffle modules if we ever choose to do so.
>
> Thanks,
> - Paul
>
>
>
>    On Monday, March 2, 2020, 8:20:32 AM PST, Igor Guzenko <
> ihor.huzenko.igs@gmail.com> wrote:
>
>  Hello Drillers,
>
> I would like to refactor the ExecConstants class in order to group our
> property keys by related sections. Neglecting some naming conventions for
> the case, we can utilize Java interfaces with fields (which are constants
> by default), and create something like:
>
> public interface ExecOption {
>  interface http {
>    interface session {
>      interface memory {
>        String maximum = "drill.exec.http.session.memory.maximum";
>        String reservation = "drill.exec.http.session.memory.reservation";
>      }
>    }
>
>    interface jetty {
>      interface server {
>        String acceptors = "drill.exec.http.jetty.server.acceptors";
>        String selectors = "drill.exec.http.jetty.server.selectors";
>      }
>    }
>  }
> }
>
> So then with the help of Java imports, we can write client code like:
>
> config.getLong(http.session.memory.reservation),
> config.getLong(http.session.memory.maximum)
>
> As a bonus, IDE's autocomplete and class structure features will allow an
> exploration of available choices fairly easily, since all related options
> will be related to specific nested interfaces instead of one top-level
> class.
>
> Any thoughts and suggestions are highly appreciated :)
>
> Thanks,
> Igor
>
  

Re: [DISCUSS] ExecConstants class refactoring

Posted by Igor Guzenko <ih...@gmail.com>.
Hello Paul,

Thanks for quick response, I've been thinking a lot of times about the
necessity to split java-exec into modules. Although it would be really cool
to do it, at the moment it is almost impossible to do it correctly. Too
many things are tightly coupled together and without good preparation,
splitting will be a huge pain. By preparation I mean the long iterative
process to collect related components into packages (by feature) and
clearly define interaction APIs between the packages. Another long-standing
issue which most probably will never be fixed... Huh, but at least we're
moving towards the goal :)

Kind regards,
Igor

On Mon, Mar 2, 2020 at 8:01 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Igor,
>
> Great idea; I've been noticing that file has gotten excessively large.
>
> I wonder if we can split the file by topic instead of by the (often odd)
> naming hierarchy which has evolved over the years. For example, one file
> for internal server config options (thread counts, RPC stuff.) Another for
> things related to local config (local file systems, etc.) Another related
> to core operators (sorts, hash joins, etc.) Picking the right split will
> require a bit of thought and sone experimentation.
>
>
> There are three kinds of constants:
>
> * Config variables (from drill-override.conf)
> * System/session options
> * Other random constants
>
> One could argue that we should keep the three kinds together for each
> topic. (That is, all the sort-related stuff in one place.) Whether that
> means three well-known files in one place, three nested interfaces within a
> single class could be debated.
>
> One thing we probably should do is to separate out the string name of a
> system/session property from the implementation of its validator. It used
> to be that people would use the validator to access the option value. Most
> newer code in the last several years uses the typed access methods with a
> string key. So, we can move the validators into a OptionDefinitions
> class/interface separate from the key definitions.
>
> Most names are for the benefit of us: the poor humans who have to
> understand them. The compiler would be happy with inline constant values.
> Most names tend to be short to be easier to remember. For example, it is
> easier to understand CLIENT_RPC_THREADS than
> "drill.exec.rpc.user.client.threads".
>
>
> Most costants are in ExecConstants. Some (but not all) constants for the
> planner live in PlannerSettings. Oddly, some planner settings are in
> ExecConstants. We might want to consolidate planner-related constants into
> a single location.
>
> One final thing to keep in mind is that the "java-exec" project has become
> overly large. At some point, it might make sense to split it into
> components, such as planner, server, exec engine, etc. So, if our constants
> are grouped by functional units (rather than by name), it might make it
> easier to reshuffle modules if we ever choose to do so.
>
> Thanks,
> - Paul
>
>
>
>     On Monday, March 2, 2020, 8:20:32 AM PST, Igor Guzenko <
> ihor.huzenko.igs@gmail.com> wrote:
>
>  Hello Drillers,
>
> I would like to refactor the ExecConstants class in order to group our
> property keys by related sections. Neglecting some naming conventions for
> the case, we can utilize Java interfaces with fields (which are constants
> by default), and create something like:
>
> public interface ExecOption {
>   interface http {
>     interface session {
>       interface memory {
>         String maximum = "drill.exec.http.session.memory.maximum";
>         String reservation = "drill.exec.http.session.memory.reservation";
>       }
>     }
>
>     interface jetty {
>       interface server {
>         String acceptors = "drill.exec.http.jetty.server.acceptors";
>         String selectors = "drill.exec.http.jetty.server.selectors";
>       }
>     }
>   }
> }
>
> So then with the help of Java imports, we can write client code like:
>
> config.getLong(http.session.memory.reservation),
> config.getLong(http.session.memory.maximum)
>
> As a bonus, IDE's autocomplete and class structure features will allow an
> exploration of available choices fairly easily, since all related options
> will be related to specific nested interfaces instead of one top-level
> class.
>
> Any thoughts and suggestions are highly appreciated :)
>
> Thanks,
> Igor
>

Re: [DISCUSS] ExecConstants class refactoring

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Igor,

Great idea; I've been noticing that file has gotten excessively large.

I wonder if we can split the file by topic instead of by the (often odd) naming hierarchy which has evolved over the years. For example, one file for internal server config options (thread counts, RPC stuff.) Another for things related to local config (local file systems, etc.) Another related to core operators (sorts, hash joins, etc.) Picking the right split will require a bit of thought and sone experimentation.


There are three kinds of constants:

* Config variables (from drill-override.conf)
* System/session options
* Other random constants

One could argue that we should keep the three kinds together for each topic. (That is, all the sort-related stuff in one place.) Whether that means three well-known files in one place, three nested interfaces within a single class could be debated.

One thing we probably should do is to separate out the string name of a system/session property from the implementation of its validator. It used to be that people would use the validator to access the option value. Most newer code in the last several years uses the typed access methods with a string key. So, we can move the validators into a OptionDefinitions class/interface separate from the key definitions.

Most names are for the benefit of us: the poor humans who have to understand them. The compiler would be happy with inline constant values. Most names tend to be short to be easier to remember. For example, it is easier to understand CLIENT_RPC_THREADS than "drill.exec.rpc.user.client.threads".


Most costants are in ExecConstants. Some (but not all) constants for the planner live in PlannerSettings. Oddly, some planner settings are in ExecConstants. We might want to consolidate planner-related constants into a single location.

One final thing to keep in mind is that the "java-exec" project has become overly large. At some point, it might make sense to split it into components, such as planner, server, exec engine, etc. So, if our constants are grouped by functional units (rather than by name), it might make it easier to reshuffle modules if we ever choose to do so.

Thanks,
- Paul

 

    On Monday, March 2, 2020, 8:20:32 AM PST, Igor Guzenko <ih...@gmail.com> wrote:  
 
 Hello Drillers,

I would like to refactor the ExecConstants class in order to group our
property keys by related sections. Neglecting some naming conventions for
the case, we can utilize Java interfaces with fields (which are constants
by default), and create something like:

public interface ExecOption {
  interface http {
    interface session {
      interface memory {
        String maximum = "drill.exec.http.session.memory.maximum";
        String reservation = "drill.exec.http.session.memory.reservation";
      }
    }

    interface jetty {
      interface server {
        String acceptors = "drill.exec.http.jetty.server.acceptors";
        String selectors = "drill.exec.http.jetty.server.selectors";
      }
    }
  }
}

So then with the help of Java imports, we can write client code like:

config.getLong(http.session.memory.reservation),
config.getLong(http.session.memory.maximum)

As a bonus, IDE's autocomplete and class structure features will allow an
exploration of available choices fairly easily, since all related options
will be related to specific nested interfaces instead of one top-level
class.

Any thoughts and suggestions are highly appreciated :)

Thanks,
Igor