You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/01/09 22:20:01 UTC

[Impala-ASF-CR] Major update to Impala + Kudu page

John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5649

Change subject: Major update to Impala + Kudu page
......................................................................

Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review start-to-finish.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_kudu.xml
2 files changed, 1,200 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 15:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

Line 269:       literal values that exactly match the types in the Kudu table, and do not
> Binary predicates and IN list predicates can be pushed to Kudu.
Done


http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS14, Line 147: Tablets are
              :               stored by <term>tablet servers</
> 'work' and 'computing' is kind of misleading given kudu is storage. How abo
Done


PS14, Line 150: Where practical, colocate the tablet servers on the same hosts as the DataNodes, although that is not 
> Though our recommendation will be to colocate Impalads with tservers.
Done


PS14, Line 220: duplicate
> duplicating
Leaving this as-is. You are avoid the condition of having duplicate data, not the action of duplicating the data.


PS14, Line 220:               On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
> I like this positive spin on our consistency limitations
Done. Hey, there's a ton of stuff that I wrote in the Oracle docs about the positives of being able to tighten or loosen the constraints on duplicates depending on the circumstances.


PS14, Line 546:                   <codeph>BIT_SHUFFLE</codeph>: rearrange the bits of the values to efficiently
              :                   compress sequences of values that are identical or vary only slightly based
              :                   on primary key order.
> according to the compression doc below, this is also compressed with lz4 af
Done


PS14, Line 562: No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT.
> can you file a JIRA please
Done. https://issues.cloudera.org/browse/IMPALA-4922


PS14, Line 1100: 
               :           See <xref keyref="kudu_tables"/>
> this doesnt render in the pdf
The relevant link destination is filled in via a different gerrit, not to worry.


PS14, Line 1153: In particular, do not rely on an <codeph>INSERT ... SELECT</codeph> statement
               :         that selects from the same table into which it is inserting, unless you include extra
               :         conditions in the <codeph>WHERE</codeph> clause to avoid reading the newly inserted rows
               :         within the same statement
> this gets repeated very similarly in the next section. not sure which one i
Done. It's intentional so I'll leave it as-is for the moment. One is "here's how to take this quirk into account during ETL" and the other is "here's the details of the quirk".


PS14, Line 1237: data that is read while a write
               :         operation is in progress
> kudu does have atomic per row operations, so this needs to be clear it refe
Done


http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_literals.xml
File docs/topics/impala_literals.xml:

PS14, Line 401: kudu_bl
This should actually be kudu_blurb to produce the Kudu-specific subheading.


PS14, Line 409: are part of the primary key.
> this is not true, default is nullable except for PK cols
Done


PS14, Line 415: 
              :       </p>
This constraint applies to every column, not to the composite value of all the PK columns.


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: Major update to Impala + Kudu page
......................................................................


Patch Set 4:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

PS4, Line 887: Any dropped range must not contain
             :       any data values in that range.
I don't believe this is true -- dropping a range is an efficient way to bulk-delete a bunch of data for workloads like rolling-window time series


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_compute_stats.xml
File docs/topics/impala_compute_stats.xml:

Line 55: <!-- Is kudu_partition_spec applicable here? -->
nope, because afaik we don't have partition-level stats for kudu (they aren't partitions as far as HMS is concerned)


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS4, Line 107: expression
'expression' here makes it sound like we support computed defaults. Perhaps it's better to say 'constant' or 'value'?


Line 122:   RANGE
this should be RANGE (<varname>pk_col</varname> [, ...]) right?


PS4, Line 125: constant
perhaps say 'tuple' or something instead? for composite PKs you need something like: PARTITION ('foo', 1) < VALUES


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_grant.xml
File docs/topics/impala_grant.xml:

Line 134:       Access to Kudu tables must be granted to roles as usual.
is it worth noting here that grant/revoke from Impala has no bearing on access via the direct Kudu APIs?


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS4, Line 39: use the Apache Kudu component
stored by Apache Kudu


PS4, Line 98: ultiple Kudu hosts.
should say "separated by commas"


PS4, Line 105: TBLPROPERTIES
should say which table property to set


PS4, Line 147: logically
I'd say these are physical


PS4, Line 155: Kudu tablet servers
no, just the number of replicas of a given table must be odd. You could have 4 tablet servers, but replication count 3


PS4, Line 156:  When you set up a new cluster, the number of tablet servers might not
             :           exactly match the number of DataNodes. If you add or remove one host from an existing
             :           cluster, be careful not to change the number of tablet servers from 3 to 2, or from 99
             :           to 100, and so on.
I'd scrap this whole <p>


PS4, Line 200: cannot contain any duplicate values
worth noting that it's the _tuple_ that has to be unique. EG you can have PRIMARY KEY (a, b) and have values (1,1), (1,2), (2, 1), (2,2). I don't think that's clear here.


PS4, Line 460:           <p>
             :             Because primary key columns cannot contain any <codeph>NULL</codeph> values, you can
             :             omit the <codeph>NOT NULL</codeph> clause for the primary key columns.
that's true, but I think it's good practice to specify NOT NULL regardless, in case we ever add the ability for a nullable PK.


Line 553:             them with the default <codeph>PLAIN</codeph> encoding.
I'd disagree with this -- in many cases encodings have a net speedup - there's some small CPU cost but the compression ratio can be very good. Especially if you have something like a timestamp value or transaction ID in your PK, bitshuffle will compress it quite well and have a negligible effect on performance of a key lookup, since the cost of decoding one page is nothing compared to the cost of the random disk IO, etc.


PS4, Line 592: user_id</codeph> values come from a specific set of strings
that's slightly unexpected. I think a better example for dictionary would be something like 'ship_status STRING" from tpch, or 'state' or 'country'


PS4, Line 597: subjects might start with the same first words
seems somewhat contrived. Prefix encoding isn't super useful for most cases so maybe not worth documenting. (we use it internally for storing the reified composite key index where it is quite effective)


PS4, Line 601: The original text
             :             from the <codeph>body</codeph> column is the most frequently accessed, therefore it
             :             uses the compression with the least CPU overhead to expand. The
             :             <codeph>spanish_translation</codeph> is accessed less frequently, therefore it uses
             :             a compression format that produces a smaller result but takes more CPU cycles to
hrm, I'm not sure I'd draw such a distinction between LZ4 and SNAPPY. In fact LZ4 has some extensions which we aren't yet using (but should be at some point) which will probably be able to get it to be more dense than snappy as well as being faster, at which point we'd stop encouraging snappy


PS4, Line 636: into units known as
             :             <term>cfiles</term>. The cfiles for each column can have a specified block size
I think I'd not mention cfiles here, but say that there is an underlying unit of IO which is the "block size".


PS4, Line 642:   Certain internal characteristics of the data storage use the block as a unit of
             :             measurement. For example, when a column uses dictionary encoding, each block
             :             contains its own dictionary representing only the column values in that block.
             :           <
nope, not this block.


PS4, Line 648:   By default, the block size is 256 KiB. The minimum block size is... The maximum
             :             block size is...
             :           </p>
             : 
             :           <p>
             :             A small block size is good for responsiveness during frequent DML operations. Less data is rearranged
             :             as rows are inserted, deleted, or updated.
             :           </p>
             : 
             :           <p>
             :             A large block size is good for fast query performance for large scan operations. When large volumes
             :             of data are retrieved by a query, it is more efficient to have a small number of decode and uncompress
             :             operations that each apply to a large number of rows.
             :           </p>
             : 
             : <codeblock>
I think it's probably better to just defer the user to the Kudu docs for block size instead of duplicating advice here, given it's a relatively advanced feature, and for most Impala cases the default will be fine.


PS4, Line 751: 60
this is cluster-dependent, and based on a Kudu configuration. Would not document a max, but just recommend something like 10 partitions per server in your cluster for large tables.


PS4, Line 788: -- 50 buckets for IDs beginning with a lowercase letter
             : -- plus 50 buckets for IDs beginning with an uppercase letter.
             : -
might be worth a note in the text above that this is multiplicative with any hash partition clauses


PS4, Line 874:           <p>
             :             When a range is removed, no data can exist in the table within that range. If some
             :             rows do have column values within the removed range, the operation fails.
             :           </p>
I don't believe this is true (or at least it's not our intention that it is)


PS4, Line 884:         <p>
             :           Kudu tables can also use a combination of hash and range partitioning.
             :         </p>
fill this out?


PS4, Line 908:           <p>
             :             To see the distribution of data in a Kudu table across the underlying buckets and
             :             partitions, use the <codeph>SHOW TABLE STATS</codeph> statement.
             :           </p>
I don't think this will show per-tablet sizes currently


PS4, Line 977:         <p>
             :           The database name on the Impala side is or isn't encoded into the underlying Kudu
             :           table name.
             :         </p>
not true anymore. now it's my_db::my_table


PS4, Line 1007:         Because Kudu manages its own storage layer is optimized for smaller block sizes than
              :         HDFS, and performs its own housekeeping to keep data evenly distributed, it is not
              :  
some missing words in this sentence


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_refresh.xml
File docs/topics/impala_refresh.xml:

Line 337:     <!-- Anything to say for REFRESH with Kudu? -->
AFAIK no


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_show.xml
File docs/topics/impala_show.xml:

Line 1240: <codeblock rev="1.4.0">[localhost:21000] &gt; show partitions census;
is this example supposed to be here?


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_tables.xml
File docs/topics/impala_tables.xml:

PS4, Line 308:         <codeph># Rows</codeph>, <codeph>Start Key</codeph>, <codeph>Stop Key</codeph>, <codeph>Leader Replica</codeph>, and <codeph># Replicas</codeph>.
             :   
worth noting that #rows is inaccurate


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5649

to look at the new patch set (#14).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,589 insertions(+), 352 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/14
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5649

to look at the new patch set (#15).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,590 insertions(+), 352 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/15
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 14:

OK, maybe "any second" =~ "about a week". But I think all Dimitris's final comments are now addressed.

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 4:

(32 comments)

Did all of the recent comments. I see some earlier ones still to do.

http://gerrit.cloudera.org:8080/#/c/5649/6/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS6, Line 3741: 
              : 
              : 
              : 
              : 
> We do need it when the changes happen externally.
Done. Per Dimitris's comment, I'm leaving this text unchanged.


http://gerrit.cloudera.org:8080/#/c/5649/6/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS6, Line 39: he Apache Kudu component.
> That still sounds weird. I'd switch to what Todd suggested.
Done. It is the nature though of language involving trademarks to always sound weird.


PS6, Line 45: The default Impala tables use data files stored on HDFS, which are ideal for bulk loads
            :       and queries using full-table scans. In contrast, Kudu can do efficient queries for data
            :       organized either in data warehouse style (with full table scans) or for OLTP-style
            :       workloads (with key-based and range-based lookups for single rows or groups of rows). Kudu
            :       tables are suitable for frequent small additions or changes.
> By default, Impala tables are stored in HDFS using various file formats. HD
Done


PS6, Line 55: work 
> work only
Done


PS6, Line 73: In these scenarios (such as for streaming data), it
            :         might be impractical to use Parquet tables because Parquet works best with
            :         multi-megabyte data files, requiring substantial overhead to replace or reorganize data
            :         files to accomodate frequent additions or changes to data. 
> I don't think we should emphasize Parquet here. It is a limitation of the s
Done. I'm going to give a nod to "simplifying the ETL pipeline" in the reworded 2nd paragraph on this page.


PS6, Line 78: without replacing the entire table contents
> remove. Just say "efficiently".
Done


PS6, Line 79: API
> Maybe mention supported languages (Python, Java, etc).
Done


PS6, Line 138: Data is physically divided automatically by Kudu. You do not deal with explicit
             :               partitions, as in typical large Impala tables. New data that arrives is organized
             :               based on the data values of each row, not kept together in partitions that must be
             :               created and managed individually.
> I don't agree with this description. You have to decide for each table the 
Let me reword to say you get a combination of control and flexibility, since you can still make as many narrow range partitions as you like, or specify wide range partitions or hashing on top of range partitions.


PS6, Line 147: Data is logically divided, and work is parallelized, based on units called
             :               <term>tablets</term> and <term>tablet servers</term>.
> This is pretty vague. You need to make the distinction between tablets and 
I'll elaborate a little more, than link to the Kudu docs for full definitions.


PS6, Line 169: 
> How about DROP TABLE?
I'm primarily covering new syntax here. Why don't I say Impala DDL "Enhancements" in the title since DROP TABLE is the same syntax as always.


PS6, Line 181: TABLE</codep
> incomplete sentence
Done


PS6, Line 184: familiarize yourself with Kudu-related concepts and syntax first.
             :       </p>
> incomplete sentence
Done. A sentence got inserted in the middle of another sentence, so the 2 incomplete ones are part of the same original sentence.


PS6, Line 214: y ones 
> What does "arrange" mean? If you refer to mapping of rows to tablets say so
Done. I'll say it maps the rows to tablets.


PS6, Line 215: clauses and are highly selective.
             :             </p>
> That is not necessarily true.
Done. I'll take out the mention of WHERE clauses. I think by definition the combination of primary key columns is highly selective because of the uniqueness aspect. So having a repetitive column as part of the primary key would be wasteful and therefore rare.


PS6, Line 234: 
> You mean the uniqueness and nullability constraints? These are indeed enfor
Done. OK, reworded as "these constraints".


PS6, Line 266: 
> constant expression
Done


PS6, Line 490: 
> colloquial phrasing, how about among rel. db mgmt systems
Done. I'll leave out the word "relational" because I always worry about people getting the wrong idea w.r.t. transactions, foreign keys, etc.


PS6, Line 561: 
             :         <title>COMPRESSION Attribute</title>
             : 
             :         <conbody>
> ?
That's a note to myself in an XML comment, it doesn't appear in the output. If I specify a bogus keyword with the ENCODING clause, the Impala error message tells me that it's expecting keywords including UNKNOWN and GROUP_VARINT, but in experiments I couldn't get Impala to accept a CREATE TABLE statement with ENCODING UNKNOWN or ENCODING GROUP_VARINT. Maybe there are some obsolete keywords left in our parser from earlier revisions of Kudu?


PS6, Line 677: 
> internally
Done.


PS6, Line 714: s.
> remove
Done


Line 740:   PARTITION BY HASH(id) PARTITIONS 50
> missing space
Done


PS6, Line 755: 
> there is no default
Done


PS6, Line 760: 
> multiple
Done


PS6, Line 778: 
             : <codeblock><![CDATA[
> I don't think this is a limitation
Ah right, I think it depends on the number of tablet servers and the cluster where I tried it happened to state an upper limit of 60 in the error message. But a bigger cluster would have stated a higher limit.  I'll reword the upper limit without trying to be too specific.


PS6, Line 856: p>
             : 
> I see what this is saying but I think this sentence will be  confusing. It 
Done. I'll clarify this particular sentence on this pass. Perhaps go into more detail about the gap aspect in a subsequent iteration.


PS6, Line 903: <codeph>CREATE TABLE</codeph> syntax displayed by this statement includes all the
             :             hash, range, or both clauses that reflect the original table structure pl
> this makes it sound like you can't drop a range unless it's empty which is 
Done


PS6, Line 993:   <concept id="kudu_etl">
             : 
> one of these says:
Already discussed in impala_common.xml. I'll leave as-is.


PS6, Line 1099: e. For example, you cannot do a sequence of
              :         <codeph>UPDATE</codeph> statements and only make the change visible after all the
              :         statements are finished. Also, if a DML statement fails partway through, any rows that
              :         were already inserted, deleted, or changed remain in the table; there is no rollback
              :         mechanism to undo the changes.
              :    
> looks like something is missing
I'm going to use some discretion about which assertions to back up and illustrate with examples. I'll remove most of the empty <codeblock> elements on this page for now, and fill in additional examples in a subsequent iteration.


PS6, Line 1207: 
              :           <p>
              :             Sentry authorization.
              :           </p>
              :         </li>
              : 
> empty code block?
This one in particular I'll delete for now rather than fill in. Because the success messages for DELETE / UPDATE / UPSERT don't report number of rows affected.


PS6, Line 1250: y using a clause such
> list the limitations?
OK, I'll reuse the same verbiage as under the GRANT statement. (There's a #include-like mechanism in this XML dialect so it'll show up as a 1-liner here, referencing an ID with the main text in impala_common.xml.)


PS6, Line 1308: 
              : 
              : 
> I don't see any other content
I've been toning down the compare-and-contrast with Parquet in other areas. I don't have enough concrete info right now to make a compelling section. I'm going to suppress this topic in the output by applying audience="hidden" in the <concept> tag.


PS6, Line 1323: 
              : 
              : 
> not sure if this section is useful as-is
Exactly, I'll hide but leave the skeleton here in case it makes sense to revisit later.


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#8).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_truncate_table.xml
18 files changed, 1,889 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 15: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: Major update to Impala + Kudu page
......................................................................


Patch Set 4:

(7 comments)

Addressed comments through the 'EXPLAIN' topic.

http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

PS4, Line 887: Any dropped range must not contain
             :       any data values in that range.
> I don't believe this is true -- dropping a range is an efficient way to bul
Done


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_compute_stats.xml
File docs/topics/impala_compute_stats.xml:

Line 55: <!-- Is kudu_partition_spec applicable here? -->
> nope, because afaik we don't have partition-level stats for kudu (they aren
Done


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS4, Line 107: expression
> 'expression' here makes it sound like we support computed defaults. Perhaps
Done. Changed to 'constant'. I was hoping that the default could be any expression evaluated row-by-row at runtime, like now() or upper(different_column).


Line 122:   RANGE
> this should be RANGE (<varname>pk_col</varname> [, ...]) right?
This is how I create a table with range partitioning but no hash partitioning:

create table range_only (x int primary key)
partition by range (partition 0 <= values <= 50, partition 51 <= values <= 100) stored as kudu

The name of the primary key column never gets mentioned in the RANGE clause. Is there an alternative notation that would mention X in this case?


PS4, Line 125: constant
> perhaps say 'tuple' or something instead? for composite PKs you need someth
I'll say constant_or_tuple for the italicized name, and add some examples showing the parens of the tuple notation. For DDL, I'm going to start by showing examples of the variations on the main Kudu page, then migrate them to or reuse them on the individual statement pages.


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

PS4, Line 239: he <codeph>EXPLAIN</codeph> statement displays equivalent plan
             :       information for queries against Kudu tables as for queries
             :       against HDFS-based tables.
> Don't we need to talk about the predicates that get pushed to Kudu?
Done


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_grant.xml
File docs/topics/impala_grant.xml:

Line 134:       Access to Kudu tables must be granted to roles as usual.
> is it worth noting here that grant/revoke from Impala has no bearing on acc
Done. (From this point on, any changes I mark as done will come in a patch set on Thursday.)


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5649/6/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS6, Line 3741:         Run <codeph>REFRESH <varname>table_name</varname></codeph> or
              :         <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
              :         for a Kudu table only after making a change to the Kudu table schema,
              :         such as adding or dropping a column, by a mechanism other than
              :         Impala.
> I think this is wrong- they're not needed. we always reload Kudu metadata.
We do need it when the changes happen externally.


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 6:

John, do we have a pdf from these sources? It helps immensely the review process of docs

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#2).

Change subject: Major update to Impala + Kudu page
......................................................................

Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_kudu.xml
4 files changed, 1,393 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 7:

(18 comments)

Addressed all comments, esp. splitting CREATE TABLE syntax for HDFS vs. Kudu.

http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

PS7, Line 250:  
> "in a SCAN KUDU node"
Done


PS7, Line 251: , and might involve transmitting
             :       non-matching rows that are filtered out on the Impala side.
> remove
Done


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS7, Line 76: scan performance close to that of Parquet
> Make sure you check with Mostafa about this claim.
Done


PS7, Line 147: The work is parallelized
             :               across units of computing called <term>tablet servers</term>.
> I believe the unit of computing is the tablet not the tablet server, unless
The tablet server is the compute resource, the tablet holds the actual data.


PS7, Line 149: between the tablets and tablet servers.
> I think it would be nice to mention that the user is not responsible for ma
Done


PS7, Line 168: (CREATE TABLE and ALTER TABLE)
> It is weird to mention CREATE without DROP. Please remove this and mention 
Since this section is only talking about the delta in syntax, I'll leave as-is.


PS7, Line 397: <p>
             :             For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph>
             :             values, because it is not practical to enforce a <q>not null</q> constraint on HDFS
             :             data files that could be prepared using external tools and ETL processes.
             :           </p>
             : 
             :           <p conref="../shared/impala_common.xml#common/pk_implies_not_null"/>
             : 
             :           <p>
             :             For example, a table containing geographic information might require the latitude
             :             and longitude coordinates to always be specified. Other attributes might be allowed
             :             to be <codeph>NULL</codeph>. For example, a location might not have a designated
             :             place name, its altitude might be unimportant, and its population might be initially
             :             unknown, to be filled in later.
             :           </p>
> You may want to swap these two paragraphs.
Done


PS7, Line 717: PARTITION BY
> Impala still uses "PARTITIONED BY" for HDFS tables.
Done


PS7, Line 727: . By setting
             :           up an effective partitioning scheme for a Kudu table, you can ensure that the work for
             :           a query can be parallelized evenly across the hosts in a cluster.
> Remove. Sometimes the goal is to scan as little as possible. You can say th
Whole paragraph was hidden per comment on a previous patch set. (Instead, we link people to the Kudu white paper for those kinds of details.)


PS7, Line 936: To see the distribution of data in a Kudu table across the underlying buckets and
             :             partitions, use the <codeph>SHOW TABLE STATS</codeph> statement.
> This is unfortunately not accurate. SHOW TABLE STATS will only show the dis
Done. Reworded rather than deleting entirely.


PS7, Line 1122: change
> "changes"
Done


PS7, Line 1159: strong consistency for order of operations
> I am not sure I know what that means.
Done


PS7, Line 1159: total
              :         success or total failure of a multi-row statement
> This is "atomicity". Maybe just mention atomic multi-row statements.
Done


PS7, Line 1160: or data that is read while a write
              :         operation is in progress
> Isolation.
I added the technical terminology into my wording.


PS7, Line 1288: <title>Memory Usage for Operations on Kudu Tables</title>
              : 
              :       <conbody>
              : 
              :         <p>
              :           The Apache Kudu architecture, topology, and data storage techniques result in
              :           different patterns of memory usage for Impala statements than with HDFS-backed tables.
              :         </p>
> I don't find this particularly informative and suggest we remove it unless 
Yes, audience="hidden" earlier in the <concept> tag will make it invisible.


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_literals.xml
File docs/topics/impala_literals.xml:

PS7, Line 408: most Impala tables
> Impala tables backed by HDFS or S3? "most" is kind of vague
Done


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_revoke.xml
File docs/topics/impala_revoke.xml:

PS7, Line 115: access to a Kudu table is <q>all or nothing</q>.
> "only table-level permissions are enforced in Kudu tables. Column-level per
Done


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_tables.xml
File docs/topics/impala_tables.xml:

PS7, Line 293: using the Apache Kudu storage system
> "stored in Apache Kudu"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: Major update to Impala + Kudu page
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5649/4//COMMIT_MSG
Commit Message:

PS4, Line 7: Major update to Impala + Kudu page
FYI: I posted this PDF:

http://impala-pdf.s3.amazonaws.com/impala-kudu.pdf

with all the info from this code review + the separate code review for the DDL statements.


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 8:

(15 comments)

Addressed some early comments from Dimitris and Todd that I skipped over before. (Some had already been fixed based on comments on subsequent patch sets.)

http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS4, Line 697: 
             : 
             :         </conbody>
             : 
             :       </concept>
             : 
             :     </concept>
> Remove this paragraph. You can use Kudu's white paper (section 3.2) for a b
I'll hide the paragraph and link to the white paper. (The white paper uses the old DISTRIBUTE BY syntax so it's not perfect as background reading for users of Impala 2.8.)


PS4, Line 706: 
             :       <title>Partitioning for Kudu Tables</title>
             : 
             :       <conbody>
             : 
             :         <p>
             :           Kudu tables use special mechanisms to distribute data among the underlying
             :           tablet servers. Although we refer to such tables as partitioned tables, they are
             :           distinguished from traditional Impala partitioned tables by use of different clauses
             :           on the <codeph>CREATE TABLE</codeph> statement. Kudu tables use
             :           <code
> I don't understand the point of this paragraph before even presenting the f
Since now I'm referring people to the Kudu white paper which mentions the old syntax, let's leave this here for the moment.


PS4, Line 727: 
> remove
Done


PS4, Line 731: rent ways to divide the data for each
             :           column, or even for different value ranges within a column. This flexibility le
> You need to mention the drawback of using hash partitioning as well. i.e. q
Done


PS4, Line 743:             which used an experimental fork of the Impala code. For example, the
             :             <codeph>DISTRIBUTE BY</codeph> clause is now <codeph>PA
> That claim in not universally true. I would just remove it or make it case 
Done


PS4, Line 751: 
> this is cluster-dependent, and based on a Kudu configuration. Would not doc
Done


PS4, Line 765: ibuted, instead of
             :             clumping together all in the same bucket. Spreading new rows across the buckets this
             :             way lets insertion 
> That's not necessarily the primary reason for using range partitioning. You
Done


PS4, Line 767: in parallel across multiple tablet servers.
             :             Separating the hashed values can impose additional overhead on queries, where
> Can we add a formal syntax here?
For the moment I'm going to link over to the CREATE TABLE page. Later I'll see if I can extract just the pieces for the relevant clauses and reuse them here.


PS4, Line 774: y 20,000 rows per partition.
> You need to talk when VALUE and VALUES is used (single vs multi-column rang
Isn't VALUE / VALUES independent of single or multiple columns in the range? I thought it was dependent on whether there was only 1 comparison operator or 2:

Single column case:
VALUE <= constant
constant <= VALUES < constant

Multiple column case:
VALUE <= (constant,constant)
(constant,constant) <= VALUES < (constant,constant)


PS4, Line 788:               The largest number of buckets that you can create with a <codeph>PARTITIONS</codeph>
             :               clause varies depending on the number of tablet servers in the cluster, while the smallest is 2.
             :  
> might be worth a note in the text above that this is multiplicative with an
Done


PS4, Line 874:           </p>
             : 
             :           <p>
             :             Ra
> I don't believe this is true (or at least it's not our intention that it is
Done


PS4, Line 884:   partition value = 'C', partition value = 'D', partition value = 'F')
             : ]]>
             : </codeblock>
> fill this out?
For this high-level overview stuff, I will use simple toy tables just to illustrate the syntax without the trappings of a real enterprise-class table.


PS4, Line 908: alter table year_ranges add range partition 1890 <= values < 1893;
             : ]]>
             : </codeblock>
             : 
> I don't think this will show per-tablet sizes currently
Done


PS4, Line 977: 
             : <!-- To do: fill in example. -->
             : <codeblock><![CDATA[
             : 
> not true anymore. now it's my_db::my_table
Done


PS4, Line 1007: 
              :         <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
              :  
> some missing words in this sentence
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#6).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_truncate_table.xml
18 files changed, 1,904 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#5).

Change subject: Major update to Impala + Kudu page
......................................................................

Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_truncate_table.xml
18 files changed, 1,777 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 13:

(8 comments)

New patch set coming any second.

http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

PS13, Line 64: ADD [IF NOT EXISTS] | DROP [IF EXISTS] } PARTITION
Add another line here for ADD RANGE PARTITION.

(The ADD/DROP lines are already getting split apart as part of a different code review. That will happen here as part of resolving a merge conflict.)


PS13, Line 86: <ph rev="kudu"><varname>kudu_partition_spec</varname></ph>
Take this out of there because it doesn't apply to the basic ADD PARTITION and DROP PARTITION clauses.


http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS13, Line 85: [PRIMARY KEY (<varname>col_name</varname>[, ...]]
> This doesn't belong here.
Done


PS13, Line 234: codeblock
Sprinkle some rev="kudu" eyecatchers on these Kudu-only codeblocks.


PS13, Line 244: <codeblock>CREATE TABLE [IF NOT EXISTS] <varname>db_name</varname>.]<varname>table_name</varname>
              :   [COMMENT '<varname>table_comment</varname>']
              :   STORED AS KUDU
              :   [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
              : AS
              :   <varname>select_statement</varname></codeblock>
> This is not correct. You're missing the PRIMARY KEY and PARTITION BY clause
Done


PS13, Line 411: impala::username.kudu_t1
> nit: I don't think this is a good example. I would be really surprised if s
Done


http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS13, Line 93: kudu.master_addresses
> That's the name of the table property. Can you verify that name of the impa
Done. Right, the config option is -kudu_master_hosts


PS13, Line 101: kudu.master_addresses
Same change to -kudu_master_hosts as in previous paragraph. Clean up the TBLPROPERTIES wording in this paragraph to reflect better the kudu.master_addresses param for that clause.


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5649/11/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS11, Line 81: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
             :   (<varname>col_name</varname> <varname>data_type</varname>
             :     <ph rev="kudu IMPALA-3719">[<varname>kudu_column_attribute</varname> ...]</ph>
             :     [COMMENT '<varname>col_comment</varname>']
             :     [, ...]
             :     [PRIMARY KEY (<varname>col_name</varname>[, ...]]
             :   )
             :   [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
             :   [COMMENT '<varname>table_comment</varname>']
             :   [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
             :   <ph rev="kudu">[PARTITION BY <varname>kudu_partition_clause</varname></ph>
             :   [
             :    [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
             :   ]
             :   [LOCATION '<varname>hdfs_path</varname>']
> I think we can improve the syntax definition to avoid common mistakes and c
I'm fine with not trying to put HDFS and Kudu tables together, or Dimitris' suggestion above.


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 14:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

Line 269:       require any casting, can be pushed to Kudu.
Binary predicates and IN list predicates can be pushed to Kudu.


http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS14, Line 147: The work is parallelized
              :               across units of computing called
'work' and 'computing' is kind of misleading given kudu is storage. How about:

Tablets are stored by tablet servers.


PS14, Line 150: You can colocate the tablet servers on the same hosts as the DataNodes, although that is not required.
Though our recommendation will be to colocate Impalads with tservers.


PS14, Line 220:               On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
I like this positive spin on our consistency limitations


PS14, Line 220: duplicate
duplicating


PS14, Line 546:                   <codeph>BIT_SHUFFLE</codeph>: rearrange the bits of the values to efficiently
              :                   compress sequences of values that are identical or vary only slightly based
              :                   on primary key order.
according to the compression doc below, this is also compressed with lz4 after the shuffle algorithm is applied


PS14, Line 562: No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT.
can you file a JIRA please


PS14, Line 1100: 
               :           See <xref keyref="kudu_tables"/>
this doesnt render in the pdf


PS14, Line 1153: In particular, do not rely on an <codeph>INSERT ... SELECT</codeph> statement
               :         that selects from the same table into which it is inserting, unless you include extra
               :         conditions in the <codeph>WHERE</codeph> clause to avoid reading the newly inserted rows
               :         within the same statement
this gets repeated very similarly in the next section. not sure which one it's better suited for, but looks like a duplication right now


PS14, Line 1237: data that is read while a write
               :         operation is in progress
kudu does have atomic per row operations, so this needs to be clear it refers to impala statements that can read or write multiple rows in the same query.

how about: or data that is read across multiple rows in a SELECT statement while a concurrent DML statement is modifying rows.


http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_literals.xml
File docs/topics/impala_literals.xml:

PS14, Line 409: Kudu tables default to the <codeph>NOT NULL</codeph> setting for each column.
this is not true, default is nullable except for PK cols


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 15: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: Major update to Impala + Kudu page
......................................................................


Patch Set 4:

(31 comments)

Initial round of comments.

http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

PS4, Line 239: he <codeph>EXPLAIN</codeph> statement displays equivalent plan
             :       information for queries against Kudu tables as for queries
             :       against HDFS-based tables.
Don't we need to talk about the predicates that get pushed to Kudu?


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_invalidate_metadata.xml
File docs/topics/impala_invalidate_metadata.xml:

PS4, Line 245: <!-- Anything to say for INVALIDATE METADATA with Kudu? -->
             :     <!-- <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/> -->
Where do we talk about Kudu table metadata? We need to mention that source of truth for table metadata (e.g. schema, partitions) is Kudu and not HMS. Invalidate metadata and refresh cause the table metadata to be reloaded from Kudu and any external schema changes to be reflected to the catalog.


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS4, Line 149: tablet servers is independent of the number of DataNodes in the cluster.
One question that someone might have is whether tablet servers and data nodes need to be collocated or not for functional or performance reasons. It would be nice to comment on that here.


PS4, Line 201: For a
             :           partitioned Kudu table,
Remove. There are only partitioned Kudu tables.


PS4, Line 221: For example, if an <codeph>INSERT</codeph> operation fails partway through, only some of the
             :               new rows might be present in the table. You can re-run the same <codeph>INSERT</codeph>, and
             :               only the missing rows will be added. Or if data in the table is stale, you can run an
             :               <codeph>UPSERT</codeph> statement that brings the data up to date, without the possibility
             :               of creating duplicate copies of existing rows.
Why blending the semantics of upsert and insert with the description of primary keys? I don't think this section belongs here.


PS4, Line 233: These restrictions on the primary key columns
How about the nullability restrictions on non-primary key columns. Where are they enforced?


PS4, Line 247: <p>
             :           Kudu can do extra optimizations for queries that refer to the primary key columns in
             :           the <codeph>WHERE</codeph> clause. It is not required though to include the primary
             :           key columns in the <codeph>WHERE</codeph> clause of every query.
             :         </p>
I am not sure I understand the purpose of this paragraph here. This either belongs to the place where you discuss schema design or even better performance tuning.


PS4, Line 269: PRIMARY KEY
             : | [NOT] NULL
             : | ENCODING <varname>codec</varname>
             : | COMPRESSION <varname>algorithm</varname>
             : | DEFAULT <varname>expression</varname>
             : | BLOCK_SIZE <varname>number</varname>
Wouldn't it make more sense to first describe the Kudu-specific CREATE TABLE syntax and then analyze the individual components of it?


PS4, Line 299: You can specify the <codeph>PRIMARY KEY</codeph> attribute either inline in a single
             :             column definition, or as a separate clause at the end of the column list:
Even though this is the place to formally define the syntax for PRIMARY KEYs, you seem to have more detailed information about PKs in the paragraph in L238. I would remove that paragraph entirely and move that content here.


PS4, Line 360: <p>
             :             The notion of primary key only applies to Kudu tables. Every Kudu table requires a
             :             primary key. The primary key consists of one or more columns. You must specify any
             :             primary key columns first in the column list.
             :           </p>
Move to the beginning or in the general description of primary keys for Kudu tables.


PS4, Line 368: Therefore, only
             :             include a column in the primary key specification if you are certain of its value at
             :             the time each row is inserted.
Remove.


PS4, Line 383: Because the primary key columns are typically carefully planned and have known
             :             characteristics in terms of range and distribution, those columns are usually good
             :             candidates for using with the <codeph>PARTITION BY</codeph> clause of the
             :             <codeph>CREATE TABLE</codeph> statement.
Only primary key columns can be used in the PARTITION BY clause. This is a requirement and not some nice-to-have property.


PS4, Line 399: <p>
             :             For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph>
             :             values, because it is not practical to enforce a <q>not null</q> constraint on HDFS
             :             data files that could be prepared using external tools and ETL processes.
             :           </p>
Move below L411.


PS4, Line 407: If a
             :             missing data field is considered to be a serious problem
I would rephrase it to something like "If an application requires a field to be always specified...".


Line 438:             operations.
Maybe we should extend this paragraph and say that for those reasons, it is recommended that users specify nullability constraints when they are known.


PS4, Line 449: -- This query can return a count of 0 very quickly because
             : -- column C3 by definition cannot contain any null values.
             : SELECT COUNT(*) FROM c3_has_no_nulls WHERE c3 IS NULL;
Do you suggest that this is not executed because we know it doesn't return any results? Have you verified this? The profile of a query like that doesn't seem to suggest any optimization like that, but I could be wrong.


PS4, Line 531: By default, each
             :             column uses the <q>plain</q> encoding where the data is stored unchanged
I would mention the encodings and maybe some guidelines on which one to use for different data types. Maybe check with Todd on recommendations.


PS4, Line 567: You can specify a compression algorithm to use for each column in a Kudu table. This
             :             attribute imposes more CPU overhead when retrieving the values than the
             :             <codeph>ENCODING</codeph> attribute does. Therefore, use it primarily for columns
             :             that are rarely included in result sets, or rarely used in <codeph>WHERE</codeph>
             :             clauses, or for very long strings that do not benefit much from the less-expensive
             :             <codeph>ENCODING</codeph> attribute.
Every compression alg has a tradeoff of compression ratio and cpu overhead and choosing the right comp. algorithm is always tricky and data dependent. Have you discussed these recommendations with Mostafa?


PS4, Line 690: PARTITIONED
PARTITION


PS4, Line 691: range specification clauses rather than a simple <codeph>PARTITIONED BY</codeph>
             :           clause that specifies only a column name and creates a new partition for each
             :           different value.
Why not putting the formal syntax instead of trying to describe the differences between Kudu and HDFS partitioning?


PS4, Line 697: With Kudu tables, all of the columns involved in these clauses must be primary key
             :           columns. These clauses let you specify different ways to divide the data for each
             :           column, or even for different value ranges within a column. This flexibility lets you
             :           avoid problems with uneven distribution of data, where the partitioning scheme for
             :           HDFS tables might result in some partitions being much larger than others. By setting
             :           up an effective partitioning scheme for a Kudu table, you can ensure that the work for
             :           a query can be parallelized evenly across the hosts in a cluster.
Remove this paragraph. You can use Kudu's white paper (section 3.2) for a better description of table partitioning.


PS4, Line 706: <note>
             :           <p>
             :             The Impala DDL syntax for Kudu tables is different than in early Kudu versions,
             :             which used an experimental fork of the Impala code. For example, the
             :             <codeph>DISTRIBUTE BY</codeph> clause is now <codeph>PARTITION BY</codeph>, the
             :             <codeph>INTO <varname>n</varname> BUCKETS</codeph> clause is now
             :             <codeph>PARTITIONS <varname>n</varname></codeph>and the range partitioning syntax
             :             is reworked to replace the <codeph>SPLIT ROWS</codeph> clause with more expressive
             :             syntax involving comparison operators.
             :           </p>
             :         </note>
I don't understand the point of this paragraph before even presenting the formal syntax for partitioning Kudu tables. If you want to mention that the old syntax is not supported add a note at the end.


PS4, Line 727: simplest
remove


PS4, Line 727: default type
Why "default"? Both hash and range need to be explicitly specified. None of them is implied.


PS4, Line 728: inserted rows are divided up between a fixed number
             :             of <q>buckets</q> in an essentially random way. Applying the hash function to the
             :             column values ensures that rows with similar values are evenly distributed, instead of
             :             clumping together all in the same bucket.
Why not just saying that rows are assigned to buckets by applying a hash function on the values of the columns used to define the hash partitioning?


PS4, Line 731: Spreading new rows across the buckets this
             :             way lets insertion operations work in parallel across all the tablet servers.
You need to mention the drawback of using hash partitioning as well. i.e. queries with range predicates may end up scanning multiple tables.


PS4, Line 743: -- No matter how the values are distributed in the source table, the data will be
             : -- evenly distributed between the buckets in the destination table.
That claim in not universally true. I would just remove it or make it case specific. e.g. since ids are unique you expect the rows to be evenly distributed across partitions.


PS4, Line 765: for when you know the
             :             expected distribution of values in a column and can make informed decisions about
             :             how to divide them.
That's not necessarily the primary reason for using range partitioning. You may want to stick to the formal description and syntax here and add a section about use cases (schema design) later.


PS4, Line 767: more <codeph>RANGE</codeph> clauses to the
             :             <codeph>CREATE TABLE</codeph> statement, following the <codeph>PARTITION BY</codeph
Can we add a formal syntax here?


PS4, Line 774: <codeph>VALUE</codeph> or <codeph>VALUES</codeph>
You need to talk when VALUE and VALUES is used (single vs multi-column range) and give proper examples for both cases.


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_tables.xml
File docs/topics/impala_tables.xml:

PS4, Line 293: Tables using the Apache Kudu storage system are treated specially, because Kudu manages its data independently of HDFS files.
             :         Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
             :         managed internally by Kudu.
Where do we talk about external vs managed Kudu tables?


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#11).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Created some examples of PARTITION BY HASH and PARTITION BY RANGE.

Reassured w.r.t. -1 in SHOW TABLE STATS output.

Filled in better details and real examples for time series use case.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,209 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 6:

(15 comments)

Another round of comments. I've seen that not all previous comments have been addressed, so I'll wait for a new patch before continuing this review.

http://gerrit.cloudera.org:8080/#/c/5649/6/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS6, Line 39: the Apache Kudu component
That still sounds weird. I'd switch to what Todd suggested.


PS6, Line 45: The default Impala tables use data files stored on HDFS, which are ideal for bulk loads
            :       and queries using full-table scans. In contrast, Kudu can do efficient queries for data
            :       organized either in data warehouse style (with full table scans) or for OLTP-style
            :       workloads (with key-based and range-based lookups for single rows or groups of rows). Kudu
            :       tables are suitable for frequent small additions or changes.
By default, Impala tables are stored in HDFS using various file formats. HDFS files allow for fast bulk loads (appends) and full-table scans but cannot support in-place updates (updates, deletes). Kudu is an alternative storage engine that can be used in Impala and supports both in-place updates (for OLTP-style operations) and fast scans (for data-warehouse/analytic operations).


PS6, Line 55: work 
work only


PS6, Line 73: In these scenarios (such as for streaming data), it
            :         might be impractical to use Parquet tables because Parquet works best with
            :         multi-megabyte data files, requiring substantial overhead to replace or reorganize data
            :         files to accomodate frequent additions or changes to data. 
I don't think we should emphasize Parquet here. It is a limitation of the storage engine not the file format. You can mention parquet as an example of a commonly used file format.


PS6, Line 78: without replacing the entire table contents
remove. Just say "efficiently".


PS6, Line 79: API
Maybe mention supported languages (Python, Java, etc).


PS6, Line 138: Data is physically divided automatically by Kudu. You do not deal with explicit
             :               partitions, as in typical large Impala tables. New data that arrives is organized
             :               based on the data values of each row, not kept together in partitions that must be
             :               created and managed individually.
I don't agree with this description. You have to decide for each table the partitioning scheme and all its details (number of partitions, actual range partitions, etc). What you don't control is the mapping of rows to physical nodes.


PS6, Line 147: Data is physically divided, and work is parallelized, based on units called
             :               <term>tablets</term> and <term>tablet servers</term>.
This is pretty vague. You need to make the distinction between tablets and tablet servers more clear.


PS6, Line 169: CREATE TABLE and ALTER TABLE
How about DROP TABLE?


PS6, Line 181: Because Kudu
incomplete sentence


PS6, Line 184: tables have features and properties that do not apply to other kinds of Impala tables,
             :         familiarize yourself with Kudu-related concepts and syntax first.
incomplete sentence


PS6, Line 214: arrange
What does "arrange" mean? If you refer to mapping of rows to tablets say so, otherwise remove.


PS6, Line 215: The primary key columns are typically ones that are frequently used in <codeph>WHERE</codeph>
             :               clauses and are highly selective.
That is not necessarily true.


PS6, Line 234: These restrictions
You mean the uniqueness and nullability constraints? These are indeed enforced in Kudu but I wouldn't call them restrictions. Allowing PRIMARY KEY and NOT NULL on only on Kudu tables is a restriction enforced by Impala during the analysis.


PS6, Line 714: evenly
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#12).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,209 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 14:

(2 comments)

Just closing off a couple of comments added by me in the last patch set.

http://gerrit.cloudera.org:8080/#/c/5649/14/docs/topics/impala_literals.xml
File docs/topics/impala_literals.xml:

PS14, Line 401: example
> This should actually be kudu_blurb to produce the Kudu-specific subheading.
Done


PS14, Line 415: no row in the table can have nulls in all the
              :         primary key columns
> This constraint applies to every column, not to the composite value of all 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 13:

John, when do you plan to post a new patch that addresses the last comments? I think after that, we should be able to +2 it.

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Reviewed-on: http://gerrit.cloudera.org:8080/5649
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,590 insertions(+), 352 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 13: Code-Review+1

(5 comments)

I'll let MJ and/or Todd make a final pass.

http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS13, Line 85: [PRIMARY KEY (<varname>col_name</varname>[, ...]]
This doesn't belong here.


PS13, Line 244: <codeblock>CREATE TABLE [IF NOT EXISTS] <varname>db_name</varname>.]<varname>table_name</varname>
              :   [COMMENT '<varname>table_comment</varname>']
              :   STORED AS KUDU
              :   [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
              : AS
              :   <varname>select_statement</varname></codeblock>
This is not correct. You're missing the PRIMARY KEY and PARTITION BY clauses.


PS13, Line 411: impala::username.kudu_t1
nit: I don't think this is a good example. I would be really surprised if someone actually created a Kudu table outside of Impala named as "impala::username....". Pls change the table name.


http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS13, Line 93: kudu.master_addresses
That's the name of the table property. Can you verify that name of the impalad configuration param?


PS13, Line 148: There is a many-to-many relationship
              :               between the tablets and tablet servers, managed automatically by Kudu.
Isn't more clear to say that each tablet server can store multiple tablets and that tablets are replicated across different tablet servers?


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 6:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/5649/6/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS6, Line 3741:         Run <codeph>REFRESH <varname>table_name</varname></codeph> or
              :         <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
              :         for a Kudu table only after making a change to the Kudu table schema,
              :         such as adding or dropping a column, by a mechanism other than
              :         Impala.
I think this is wrong- they're not needed. we always reload Kudu metadata.


http://gerrit.cloudera.org:8080/#/c/5649/6/docs/topics/impala_grant.xml
File docs/topics/impala_grant.xml:

Line 147:       authorization, currently the Sentry support is considered preliminary.
and subject to change


http://gerrit.cloudera.org:8080/#/c/5649/6/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS6, Line 266: expression
constant expression


PS6, Line 490: within the database world
colloquial phrasing, how about among rel. db mgmt systems


PS6, Line 561: 
             : UNKNOWN, AUTO_ENCODING, PLAIN_ENCODING, PREFIX_ENCODING, GROUP_VARINT, RLE, DICT_ENCODING, BIT_SHUFFLE
             : 
             : No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT.
?


PS6, Line 677: behind the scenes
internally


Line 740:             <codeph>PARTITIONS <varname>n</varname></codeph>and the range partitioning syntax
missing space


PS6, Line 755: , default
there is no default


PS6, Line 760: all
multiple

we can't promise it's all (there may also be skew in how partitions get mapped to tservers)


PS6, Line 778: he largest number of buckets that you can create with a <codeph>PARTITIONS</codeph>
             :               clause is 60
I don't think this is a limitation


PS6, Line 856: For range-partitioned Kudu tables, the range clauses must cover all the possible data
             :             values for the applicable columns.
I see what this is saying but I think this sentence will be  confusing. It makes it sound like you can't have gaps in the space.


PS6, Line 903: When a range is removed, no data can exist in the table within that range. If some
             :             rows do have column values within the removed range, the operation fails.
this makes it sound like you can't drop a range unless it's empty which is not true.


PS6, Line 993:         <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
             :         <p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
one of these says:
" Run REFRESH table_name
or INVALIDATE METADATA table_name for a Kudu table only after making a change to the Kudu table
schema, such as adding or dropping a column, by a mechanism other than Impala."

But Impala will always load the latest metadata from Kudu, so REFRESH / INV MD are not required.


PS6, Line 1099: :
              :       </p>
              : 
              : <codeblock><![CDATA[
              : 
              : ]]>
looks like something is missing


PS6, Line 1207: :
              :       </p>
              : 
              : <codeblock><![CDATA[
              : 
              : ]]>
empty code block?


PS6, Line 1250: Sentry authorization.
list the limitations?


PS6, Line 1308:  This section describes how Kudu stores and
              :           retrieves columnar data, to help you understand performance and storage considerations
              :           of Kudu tables as compared with Parquet tables.
I don't see any other content


PS6, Line 1323: 
              :           The Apache Kudu architecture, topology, and data storage techniques result in
              :           different patterns of memory usage for Impala statements than with HDFS-backed tables.
not sure if this section is useful as-is


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#13).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,310 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#3).

Change subject: Major update to Impala + Kudu page
......................................................................

Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_partitioning.xml
5 files changed, 1,400 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 15:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/41/

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#9).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
28 files changed, 1,919 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 7:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/5649/7/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS7, Line 3736: The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph>
              :         statements are needed less frequently for Kudu tables than for
              :         HDFS-backed tables. Neither statement is needed when data is
              :         added to, removed, or updated in a Kudu table, even if the changes
              :         are made directly to Kudu through a client program using the Kudu API.
              :         Run <codeph>REFRESH <varname>table_name</varname></codeph> or
              :         <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
              :         for a Kudu table only after making a change to the Kudu table schema,
              :         such as adding or dropping a column, by a mechanism other than
              :         Impala.
I would start by saying when to use REFRESH and INVALIDATE METADATA for Kudu table and then talk about the cases where it is not required.


PS7, Line 3751: internal
Don't we also call them "managed"? Not sure.


PS7, Line 3749: The distinction between internal and external tables has some special
              :         details for Kudu tables. Tables created entirely through Impala are
              :         internal tables.
It also has "special details" for non-Kudu tables. I would simply say "Impala supports both internal (managed) and external Kudu tables. Internal tables.... "


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

PS7, Line 833: can specify
"can only specify"


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS7, Line 338: clauses
I think you forgot caching which is also not supported for Kudu.


PS7, Line 352: <codeph>CREATE TABLE AS SELECT</codeph>
This is wrong. We do support CTAS for internal Kudu tables. See AnalyzeDDLTest.java for some examples.


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_drop_table.xml
File docs/topics/impala_drop_table.xml:

PS7, Line 160: managed
It's good to be consistent. In some places we use "managed" and in others "internal".


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

PS7, Line 250:  
"in a SCAN KUDU node"


PS7, Line 251: , and might involve transmitting
             :       non-matching rows that are filtered out on the Impala side.
remove


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS7, Line 76: scan performance close to that of Parquet
Make sure you check with Mostafa about this claim.


PS7, Line 147: The work is parallelized
             :               across units of computing called <term>tablet servers</term>.
I believe the unit of computing is the tablet not the tablet server, unless I am misinterpreting the term. Please check with Todd.


PS7, Line 149: between the tablets and tablet servers.
I think it would be nice to mention that the user is not responsible for managing this mapping of tablet to tablet server.


PS7, Line 168: (CREATE TABLE and ALTER TABLE)
It is weird to mention CREATE without DROP. Please remove this and mention in the paragraph below that you can also drop Kudu tables through Impala.


PS7, Line 397: <p>
             :             For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph>
             :             values, because it is not practical to enforce a <q>not null</q> constraint on HDFS
             :             data files that could be prepared using external tools and ETL processes.
             :           </p>
             : 
             :           <p conref="../shared/impala_common.xml#common/pk_implies_not_null"/>
             : 
             :           <p>
             :             For example, a table containing geographic information might require the latitude
             :             and longitude coordinates to always be specified. Other attributes might be allowed
             :             to be <codeph>NULL</codeph>. For example, a location might not have a designated
             :             place name, its altitude might be unimportant, and its population might be initially
             :             unknown, to be filled in later.
             :           </p>
You may want to swap these two paragraphs.


PS7, Line 717: PARTITION BY
Impala still uses "PARTITIONED BY" for HDFS tables.


PS7, Line 727: . By setting
             :           up an effective partitioning scheme for a Kudu table, you can ensure that the work for
             :           a query can be parallelized evenly across the hosts in a cluster.
Remove. Sometimes the goal is to scan as little as possible. You can say that more flexible schema designs allows you to perform more elaborate tuning based on the characteristics of the workload or something along these lines.


PS7, Line 936: To see the distribution of data in a Kudu table across the underlying buckets and
             :             partitions, use the <codeph>SHOW TABLE STATS</codeph> statement.
This is unfortunately not accurate. SHOW TABLE STATS will only show the distribution of replicas to tablet servers. Also, it will always show -1 in the #rows column and the key ranges in hex, so the output is confusing. I would just remove this paragraph.


PS7, Line 1122: change
"changes"


PS7, Line 1159: strong consistency for order of operations
I am not sure I know what that means.


PS7, Line 1159: total
              :         success or total failure of a multi-row statement
This is "atomicity". Maybe just mention atomic multi-row statements.


PS7, Line 1160: or data that is read while a write
              :         operation is in progress
Isolation.


PS7, Line 1288: <title>Memory Usage for Operations on Kudu Tables</title>
              : 
              :       <conbody>
              : 
              :         <p>
              :           The Apache Kudu architecture, topology, and data storage techniques result in
              :           different patterns of memory usage for Impala statements than with HDFS-backed tables.
              :         </p>
I don't find this particularly informative and suggest we remove it unless someone objects.


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_literals.xml
File docs/topics/impala_literals.xml:

PS7, Line 408: most Impala tables
Impala tables backed by HDFS or S3? "most" is kind of vague


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_revoke.xml
File docs/topics/impala_revoke.xml:

PS7, Line 115: access to a Kudu table is <q>all or nothing</q>.
"only table-level permissions are enforced in Kudu tables. Column-level permissions as well as permissions on certain operations, such as INSERT, are not supported. "


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_tables.xml
File docs/topics/impala_tables.xml:

PS7, Line 293: using the Apache Kudu storage system
"stored in Apache Kudu"


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 11:

(1 comment)

John, sending a new patch without indicating which comments have been addressed doesn't really help the review process. Can you please respond to the comments?

http://gerrit.cloudera.org:8080/#/c/5649/11/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS11, Line 81: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
             :   (<varname>col_name</varname> <varname>data_type</varname>
             :     <ph rev="kudu IMPALA-3719">[<varname>kudu_column_attribute</varname> ...]</ph>
             :     [COMMENT '<varname>col_comment</varname>']
             :     [, ...]
             :     [PRIMARY KEY (<varname>col_name</varname>[, ...]]
             :   )
             :   [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
             :   [COMMENT '<varname>table_comment</varname>']
             :   [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
             :   <ph rev="kudu">[PARTITION BY <varname>kudu_partition_clause</varname></ph>
             :   [
             :    [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
             :   ]
             :   [LOCATION '<varname>hdfs_path</varname>']
I think we can improve the syntax definition to avoid common mistakes and confusion from a user's point of view. For instance, instead of having [PARTITIONED BY...] and after 3 lines [PARTITION BY kudu...] we can replace them with something like:
[partition_clause]

partition_clause::=
 |hdfs_partition_clause
 |kudu_partition_clause

hdfs_partition_clause ::= PARTITIONED BY ....
kudu_partition_clause ::= PARTITION BY

Also, now that I think of it, it doesn't make sense to merge the syntax for Kudu and non-kudu tables as they differ significantly. Todd, MJ what do you guys think?


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 6:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_invalidate_metadata.xml
File docs/topics/impala_invalidate_metadata.xml:

PS4, Line 245: <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
             :     <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
> Where do we talk about Kudu table metadata? We need to mention that source 
OK, I'm going to implement that with a #include-style mechanism to use the same wording under both REFRESH and INVALIDATE METADATA. The actual text will go in docs/shared/impala_common.xml.


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS4, Line 39: d by the Apache Kudu componen
> stored by Apache Kudu
OK. I still phrase it as "the Apache Kudu component" just because trademark rules say to always use the trademark as an adjective. E.g. "the ___ database system", "the ___ product", "the ___ storage engine", etc.


PS4, Line 98: ultiple Kudu hosts 
> should say "separated by commas"
Done


PS4, Line 105: TBLPROPERTIES
> should say which table property to set
Done


PS4, Line 147: physicall
> I'd say these are physical
Done


PS4, Line 149: tablet servers is independent of the number of DataNodes in the cluster.
> One question that someone might have is whether tablet servers and data nod
I will mention that it's possible but not required. That's something I got from Juan's training session. Is there some nuance where, e.g. a join of a Kudu table with a Parquet table would benefit from tablet server and impalad running on the same host?


PS4, Line 156: 
             :           One consideration for the cluster topology is that the number of replicas for a Kudu table
             :           must be odd.
             :         </p>
> I'd scrap this whole <p>
Leave the first sentence or not? There was a fair bit of discussion of tablets and tablet servers in the JD and Juan training sessions. I feel like I'm undercovering it from the Impala side. Although I'm happy to leave that for a future update, or link over to the Kudu docs for such detail.


PS4, Line 200: umns, whose values are combined and
> worth noting that it's the _tuple_ that has to be unique. EG you can have P
Done


PS4, Line 201:  be unique and cannot contain any
             :           <codeph>NULL</codeph> v
> Remove. There are only partitioned Kudu tables.
Done


PS4, Line 221: On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
             :               For example, if an <codeph>INSERT</codeph> operation fails partway through, only some of the
             :               new rows might be present in the table. You can re-run the same <codeph>INSERT</codeph>, and
             :               only the missing rows will be added. Or if data in the table is stale, you can run an
             :               <codeph>UPSERT</codeph> statement that brings 
> Why blending the semantics of upsert and insert with the description of pri
This is a subject I see come up again and again in Kudu discussions, so I might need to include it in multiple places -- when introducing the notion of the PK uniqueness constraint, in the reference pages for INSERT / UPSERT, and in discussions of ETL procedures. Why don't I leave it here for the time being and see what user feedback is like.


PS4, Line 233: codeph> clauses and <codeph>NOT NULL</codeph>
> How about the nullability restrictions on non-primary key columns. Where ar
I'll take out the "on the primary key columns" wording.


PS4, Line 247: onbody>
             : 
             :         <p>
             :           For the general syntax of the <codeph>CREATE TABLE</codeph>
             :           st
> I am not sure I understand the purpose of this paragraph here. This either 
I'll remove it from here.


PS4, Line 269: 
             :         <p outputclass="toc inpage">
             :           See the following sections for details about each column attribute.
             :         </p>
             : 
             :       </conbody>
> Wouldn't it make more sense to first describe the Kudu-specific CREATE TABL
I've rearranged this area to be under a "DDL" subheading, and added a couple of links back to the CREATE TABLE page. It'll be a work item in the future to fine-tune how much syntax gets repeated here, or how much detail and examples goes into the SQL Ref pages for DDL and DML statements.


PS4, Line 299: 
             :           <p>
> Even though this is the place to formally define the syntax for PRIMARY KEY
Done


PS4, Line 368: 
             :             The contents of the primary key columns cannot be changed by an
             :             <codeph>UPDATE</codeph> or <co
> Remove.
Why don't I rephrase it a little. Juan's presentation mentioned how performance could suffer if there were more then 5-6 columns in the primary key.


PS4, Line 383: ept>
             : 
             :       <concept id="kudu_not_null_attribute">
             : 
> Only primary key columns can be used in the PARTITION BY clause. This is a 
Done


PS4, Line 399: <p>
             :             For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph>
             :             values, because it is not practical to enforce a <q>not null</q> constraint on HDFS
             :             data files that could be prepared using external tools and ETL processes.
             :           </p>
> Move below L411.
Done


PS4, Line 407: 
             :             For example, a table containing geographic information m
> I would rephrase it to something like "If an application requires a field t
Done


Line 438:           </p>
> Maybe we should extend this paragraph and say that for those reasons, it is
Done


PS4, Line 449: 
             :       <concept id="kudu_default_attribute">
             : 
> Do you suggest that this is not executed because we know it doesn't return 
This deduction was disproved by the EXPLAIN plan, so removing this example.


PS4, Line 460:             function calls.
             :           </p>
             : 
> that's true, but I think it's good practice to specify NOT NULL regardless,
I won't hint at future features, but why don't I say the code is more self-describing with NOT NULL included.


PS4, Line 531: 
             :               <!-- GROUP_VARINT is internal use only, not documenting that although 
> I would mention the encodings and maybe some guidelines on which one to use
Got some more details on these from Dan Burkert. For the real heavy lifting of definitions and usage, I'll link to the Kudu docs.

I see the Impala parser accepts GROUP_VARINT but that gave me an error applying it to TINYINT or BIGINT columns. Does it serve any purpose right now?


Line 553:               <li>
> I'd disagree with this -- in many cases encodings have a net speedup - ther
Done


PS4, Line 567: >
             :             The following example shows the Impala keywords representing the encoding types.
             :             (The Impala keywords match the symbolic names used within Kudu.)
             :             For usage guidelines on the different kinds of encoding, see
             :             <xref href="https://kudu.apache.org/docs/schema_design.html" scope="external" format="html">the Kudu documentation</xref>.
             :             The <codeph>DESCRIBE</codeph> output
> Every compression alg has a tradeoff of compression ratio and cpu overhead 
I'll ping him offline rather than add him as a reviewer and dump the whole thing on him.


PS4, Line 592: ------+-----------------+
> that's slightly unexpected. I think a better example for dictionary would b
Done


PS4, Line 597:    | AUTO_ENCODING   |
> seems somewhat contrived. Prefix encoding isn't super useful for most cases
OK, I'll take out this sentence entirely.


PS4, Line 601: 
             : | c7   | string  | false       | true     | PREFIX_ENCODING |
             : +------+---------+-------------+----------+-----------------+
             : </codeblock>
             : 
> hrm, I'm not sure I'd draw such a distinction between LZ4 and SNAPPY. In fa
Done


PS4, Line 636: 
             :           <p>
> I think I'd not mention cfiles here, but say that there is an underlying un
Done


PS4, Line 642:   therefore this column is a good candidate for dictionary encoding. The
             :             <codeph>post_id</codeph> column contains an ascending sequence of integers, where
             :             several leading bits are likely to be all zeroes, therefore this column is a good
             :            
> nope, not this block.
Done


PS4, Line 648:   they employ the <codeph>COMPRESSION</codeph> attribute instead. The ideal compression
             :             codec in each case would require some experimentation to determine how much space
             :             savings it provided and how much CPU overhead it added, based on real-world data.
             :           </p>
             : 
             : <codeblock>
             : CREATE TABLE blog_posts
             : (
             :   user_id STRING ENCODING DICT_ENCODING,
             :   post_id BIGINT ENCODING BIT_SHUFFLE,
             :   subject STRING ENCODING PLAIN_ENCODING,
             :   body STRING COMPRESSION LZ4,
             :   spanish_translation STRING COMPRESSION SNAPPY,
             :   esperanto_translation STRING COMPRESSION ZLIB,
             :   PRIMARY KEY (user_id, post_id)
             : ) PARTITION
> I think it's probably better to just defer the user to the Kudu docs for bl
Where's the right page to link to? I find most references to block size on client API reference pages like

https://kudu.apache.org/cpp-client-api/classkudu_1_1client_1_1KuduColumnStorageAttributes.html

which don't have much usage advice.


PS4, Line 690: 
> PARTITION
Done


PS4, Line 691: LE performance_for_benchmark_xyz
             : (
             :   id BIGINT PRIMARY KEY,
> Why not putting the formal syntax instead of trying to describe the differe
I'm trying to outline the concepts behind Kudu partitioning before sending readers off to the CREATE TABLE or ALTER TABLE pages for the full syntax.


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_refresh.xml
File docs/topics/impala_refresh.xml:

Line 337:     <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
> AFAIK no
I'm going to reuse the same discussion of when to run REFRESH or INVALIDATE METADATA, and why they're needed less frequently, as I put under INVALIDATE METADATA.


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_show.xml
File docs/topics/impala_show.xml:

Line 1240:         partition covers the same range as originally specified.
> is this example supposed to be here?
This is the spot for examples of this statement for tables of all types. I'll clarify with some intro text, and move the earlier Kudu example down here too.


http://gerrit.cloudera.org:8080/#/c/5649/4/docs/topics/impala_tables.xml
File docs/topics/impala_tables.xml:

PS4, Line 293: Tables using the Apache Kudu storage system are treated specially, because Kudu manages its data independently of HDFS files.
             :         Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
             :         managed internally by Kudu.
> Where do we talk about external vs managed Kudu tables?
Details to be added. I have been working purely with Impala-created tables so my knowledge about external tables is mainly from the Kudu training sessions.


PS4, Line 308:         <codeph># Rows</codeph> (although this number is not currently computed), <codeph>Start Key</codeph>, <codeph>Stop Key</codeph>, <codeph>Leader Replica</codeph>, and <codeph># Replicas</codeph>.
             :   
> worth noting that #rows is inaccurate
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#7).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_truncate_table.xml
18 files changed, 1,876 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#4).

Change subject: Major update to Impala + Kudu page
......................................................................

Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_truncate_table.xml
18 files changed, 1,750 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#10).

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,024 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5649/10/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS10, Line 562: No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT.
If this isn't behaving properly can you file a JIRA?


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Ambreen Kazi <am...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes