You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/04/17 01:07:59 UTC

[jira] [Commented] (DRILL-2807) Revisit SQL array indexing _before GA/1.0_ (non-std. zero-based vs. std. one-based)

    [ https://issues.apache.org/jira/browse/DRILL-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498940#comment-14498940 ] 

Jacques Nadeau commented on DRILL-2807:
---------------------------------------

[~dsbos], thanks for sharing your thoughts.  It is always interesting to hear new perspectives on what we're doing on Drill.  I think your proposal has a narrower definition of consistency than what I would suggest.

A few things that I consider:

One of Drill's goals has always been to be very much in alignment with Javascript and JSON.  That is why our configuration files are JSON, our plans are JSON, our profiles are JSON and we're the first big data engine that is designed to work with all the things that come from that.  Consistency of data model paradigm aligns directly with JSON and Javascript, a far richer and covering set over SQL.

Complex tree traversal is a beast unto itself.  In this space, the SQL community's existing responses have seen poor adoption to newer models like those provided by tools like MongoDB and Couchbase.   Thus consistency with successful companion technologies supports a JSON like approach.  (Remember, we even have a native MongoDB connector.)

There are really two common complex object traversal models that are commonly used today.  The first is xpath and the second is javascript/json.  A SQL identifier's schema.table.column actually matches directly with the map reference model of JSON and thus continuing to match that pattern makes sense.  Consistency with a popular object traversal patterns is very important.

We made the design decision early on that we would not even attempt to follow or support the SQL specification with regards to complex data. JSON and Javascript are much more in alignment with our goals.  Consistency with what we communicated with the world is important to building new user trust.

We haven't had a single piece of feedback from users that this was confusing or an issue.  Despite us still being 1.0, we have a large number of users who have already been trained on the current approach.  Consistency with what our users already leverage is probably the most important thing we must maintain.

So in my mind, the current approach strongly supports consistency.  

I'm very comfortable with the current strategy and am -1 on a change.






> Revisit SQL array indexing _before GA/1.0_ (non-std. zero-based vs. std. one-based)
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-2807
>                 URL: https://issues.apache.org/jira/browse/DRILL-2807
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Daniel Barclay (Drill)
>
> Currently, Drill's SQL uses zero-based indexing for arrays.  ({{A\[0]}} gets the first element of the array value of {{A}}.)
> This is not compatible with standard SQL, which uses one-based indexing for arrays.  ({{A\[1]}} gets the first element of the array value of {{A}}.)
> &nbsp;
> Most probably, Drill should use standard one-base indexing.
> More definitely, this should be revisited before 1.0/GA, because it will be much harder to change the index origin later.
> (Even though it could be made changeable later, that would still be confusing--the meaning of an indexed array expression wouldn't be clear without knowledge of the current setting.)
> &nbsp;
> Some arguments directly for using standard SQL one-based array indexing are:
> 1. The SQL standard specifies that array indexing is one-based, so Drill should use one-based indexing.  \\  \\ This is true especially because Drill is intended to be compatible with standard SQL rather than having its own dialect of SQL (other than where necessary because of the essentially different nature of Drill).
> 2. Since standard SQL's array indexing is one-based, users familiar with arrays will expect one-based indexing. Drill shouldn't surprise users, or make them have to remember another difference between Drill and standard SQL.
> 3. Even users unfamiliar with SQL arrays will expect their indexing to be  one-based since the rest of standard SQL also uses one-based indexing:
>   - Column numbers are one-based:
>     -- in "ORDER BY 1"
>     -- in INFORMATION_SCHEMA.COLUMNS.ORDINAL_POSITION
>   - Character and binary string positions are one-based
>     -- in definition of SUBSTRING
>     -- in "If <start position> is not specified, then 1 (one) is implicit."
> &nbsp;
> The reason that Drill's SQL array indexing currently is zero-based is that previously it was thought that it would be good to be  consistent with indexing in JavaScript, which is expected to be used by a significant number of Drill users to process JSON output from or input to Drill.
> However, besides the direct arguments above, there are additional arguments for not making Drill's SQL non-standard in order to try to be consistent with JavaScript:
> 1. SQL simply isn't JavaScript, just as it isn't Java, C, C++, or any other zero-based language that can be used to process data read into or written from Drill. \\ \\ (One could argue that SQL _should_ have been designed to be zero-based in the first place, but it wasn't, and so now it simply isn't.)
> 2. Drill's SQL doesn't forgo compatibility with the SQL standard to make other things zero-based in order to try to be consistent with JavaScript:
>   - It doesn't make string index zero- based (in {{SUBSTRING}}, etc.).
>   - It doesn't make column indexes zero-based (e.g., changing standard "{{ORDER BY 1}}" to non-standard "{{ORDER BY 0}}").
> 3. If Drill were defining a JavaScript API for accessing SQL-related data from JavaScript (as JDBC provides access from Java), then the index origin used in method parameters in that API could be zero-based:
>   - It could be chosen to be zero-based to be consistent with JavaScript rather than being consistent with SQL.
>   - It could be chosen to be one-based to be consistent with SQL rather than being consistent with JavaScript.  However, note that it could still take and return JavaScript objects, which would still use JavaScript's zero-based indexing--just as JDBC uses SQL-consistent one-based indexing in its  methods, but also returns Java objects which of course are indexed in Java's zero-based way.
>   - (More JDBC/Java details:
>     -- is one-based in SQL part (JDBC Java objects representing SQL concepts):
>       --- column numbers (e.g., {{resultSet.getDate(1)}} for first column)
>       --- row numbers (e.g., {{resultSet.absolute(1)}} for first now) 
>       --- element numbers (e.g., {{array.getArray(1, 1)}} for first element, {{Array.getResultSet(...)}}'s element numbers)
>       --- character/byte numbers in Clob/Blob (e.g., {{getSubString(1, ...)}} for first, {{position(...)}}, {{getBytes(1...)}} starts at first)
>     -- is zero-based once back at pure-Java level (non-JDBC objects representing values):
>       --- {{resultSet.getString(...)}}, then {{.charAt(0)}} on string
>       --- {{resultSet.getBytes(...)}}, then {{\[0]}} on byte array
>       --- {{resultSet.getArray(...)}}, then {{.getArray(...)}}, then {{\[0]}} on array
>     -- So, if you want to use Java's zero-based indexing, you retrieve data into generic Java types and then do so.  Java's SQL API (JDBC) doesn't do everything zero-based even though the host language is zero-based.
> )
> However, making the JavaScript _API_'s indexing consistent with JavaScript still does not imply having Drill use non-standard zero-based indexing in Drill's _SQL_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)