You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/06/13 20:40:16 UTC

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7175

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................

IMPALA-5333: [DOCS] Document Impala ADLS support

Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
---
M docs/impala.ditamap
M docs/shared/impala_common.xml
A docs/topics/impala_adls.xml
M docs/topics/impala_parquet_file_size.xml
4 files changed, 721 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7175/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#3).

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................

IMPALA-5333: [DOCS] Document Impala ADLS support

Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
---
M docs/impala.ditamap
M docs/shared/impala_common.xml
A docs/topics/impala_adls.xml
M docs/topics/impala_insert.xml
M docs/topics/impala_load_data.xml
M docs/topics/impala_parquet_file_size.xml
6 files changed, 723 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7175/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................

IMPALA-5333: [DOCS] Document Impala ADLS support

Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
---
M docs/impala.ditamap
M docs/shared/impala_common.xml
A docs/topics/impala_adls.xml
M docs/topics/impala_parquet_file_size.xml
4 files changed, 721 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7175/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 1:

(12 comments)

Mostly finished. Need to consult with Sailesh on the environment variable business, and test all this out with a real cluster & credentials. (Might make some modifications to the example output after such hands-on testing.)

http://gerrit.cloudera.org:8080/#/c/7175/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS2, Line 1072: <p rev="2.9.0 IMPALA-5333" id="adls_dml_performance">
              :         Because of differences between ADLS and traditional filesystems, DML operations
              :         for ADLS tables can take longer than for tables on HDFS.
              :         <draft-comment>
              :           Is there anything to say on this subject, if ADLS doesn't have
              :           the same file-moving-to-the-trashcan performance overhead as S3?
              :         </draft-comment>
              :       </p>
> This isn't necessarily true for ADLS the way it's true for S3. In S3 we don
Done


PS2, Line 1098: Because data files written to ADLS do not have a default block size
> This should be more like:
Done


PS2, Line 1153:       <p rev="2.9.0 IMPALA-5333" id="adls_dml">
              :         In <keyword keyref="impala29_full"/> and higher, the Impala DML statements (<codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>,
              :         and <codeph>CREATE TABLE AS SELECT</codeph>) can write data into a table or partition that resides in the
              :         Azure Data Lake Store (ADLS).
              :         The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
              :         partitions is specified by an <codeph>adl://</codeph> prefix in the
              :         <codeph>LOCATION</codeph> attribute of
              :         <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements.
              :         If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
              :         issue a <codeph>REFRESH</codeph> statement for the table before using Impala to query the ADLS data.
              :       </p>
> This was added for S3 since INSERT and LOAD DATA were added in a later rele
Why don't I leave this in because I can reuse the same text underneath the INSERT and LOAD DATA statements, where it will be a useful reminder / reassurance.


http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS1, Line 59: </conbody>
Add some prereq info here about basic ADLS setup, both on the Azure side and from Hadoop.


PS1, Line 209: or on earlier Impala releases without DML support for ADLS
> We don't have earlier releases with any support for ADLS, so I don't think 
Done


Line 219:           <li>
> Need to consult with Sailesh. I haven't had hands-on experience with ADLS y
Done


PS1, Line 261:         You point
> Almost. I'll word it as "To X, do Y". (Although now the wording on the S3 p
Done


PS1, Line 271: bucket
> We call them stores in ADLS. So probably just do a find replace of "bucket"
Done


PS1, Line 291: impala-demo
> The stores have the format "adl://<store>.azuredatalakestore.net/path/to/fi
Done


PS1, Line 313: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> I've not used the ADLS command line tool. It seemed a little hard to setup.
Done. I'll use the same hadoop fs syntax. I agree someone would need a lot of ADLS / Azure experience to be proficient with the ADLS command-line tools.


PS1, Line 329: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> ditto
Done


http://gerrit.cloudera.org:8080/#/c/7175/2/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS2, Line 617: IMPALA-5383
Take out this stray JIRA number. It's addressed by the 'adls_block_splitting' item above which mentions the PARQUET_FILE_SIZE query option.


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7175/3/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS3, Line 134:  or the <
> Since the link text is "INSERT Statement" it is not necessary to include it
Done


PS3, Line 158: <codeblock><![CDATA[
             : <property>
             :    <name>dfs.adls.oauth2.access.token.provider.type</name>
             :    <value>ClientCredential</value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.client.id</name>
             :    <value><varname>your_client_id</varname></value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.credential</name>
             :    <value><varname>your_client_secret</varname></value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.refresh.url</name>
             :    <value><varname>refresh_URL</varname></value>
             : </property>
             : ]]>
> Do you tell users where they should get "your_client_id," "your_client_secr
I'm relying to some extent on users to work out the precise details from the Microsoft and Hadoop instructions I linked towards the top of this page.  In the downstream docs I may elaborate in some areas and simplify in others by linking to or reusing info from the overall CDH + ADLS pages, but that's not appropriate here.

BTW, I haven't actually succeeded in getting an ADLS connection working yet despite some back-and-forth with the local experts.


PS3, Line 179: <p>
             :           Check
> We have 4 environment variables for the Impala minicluster:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Laurel Hale (Code Review)" <ge...@cloudera.org>.
Laurel Hale has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7175/3/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS3, Line 158: <codeblock><![CDATA[
             : <property>
             :    <name>dfs.adls.oauth2.access.token.provider.type</name>
             :    <value>ClientCredential</value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.client.id</name>
             :    <value><varname>your_client_id</varname></value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.credential</name>
             :    <value><varname>your_client_secret</varname></value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.refresh.url</name>
             :    <value><varname>refresh_URL</varname></value>
             : </property>
             : ]]>
> I'm relying to some extent on users to work out the precise details from th
I was able to get Hive on MR2 to connect with John Zhuge's help, but the Microsoft documentation is complete, so this is fine.


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 4: Code-Review+2

Bumping to +2 based on Sailesh's and Laurel's comments.

-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS1, Line 261:         You point
Maybe just an imperative "Point" ?


PS1, Line 279:         For example,
Because the paragraph starting at L269 mentions a database with a LOCATION, I expected this example also to have a database with a LOCATION. I was confused that it didn't.

One solution would be to swap the ordering of this example so that it comes before the paragraph that starts on L269. Swapping the two would also make the example starting at L285 fit in nicely, since it *does* use a database with a LOCATION.


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 4: Code-Review+1

Thanks for working on this John! LGTM
Feel free to upgrade to a +2 if Laurel has no more comments.

-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/137/

-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#4).

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................

IMPALA-5333: [DOCS] Document Impala ADLS support

Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
---
M docs/impala.ditamap
M docs/shared/impala_common.xml
A docs/topics/impala_adls.xml
M docs/topics/impala_insert.xml
M docs/topics/impala_load_data.xml
M docs/topics/impala_parquet_file_size.xml
6 files changed, 718 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7175/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Laurel Hale (Code Review)" <ge...@cloudera.org>.
Laurel Hale has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 3:

(2 comments)

Couple of issues found

http://gerrit.cloudera.org:8080/#/c/7175/3/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS3, Line 134: statement
Since the link text is "INSERT Statement" it is not necessary to include it again--creates a duplication of the word. Remove this "statement."


PS3, Line 158: <codeblock><![CDATA[
             : <property>
             :    <name>dfs.adls.oauth2.access.token.provider.type</name>
             :    <value>ClientCredential</value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.client.id</name>
             :    <value><varname>your_client_id</varname></value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.credential</name>
             :    <value><varname>your_client_secret</varname></value>
             : </property>
             : <property>
             :    <name>dfs.adls.oauth2.refresh.url</name>
             :    <value><varname>refresh_URL</varname></value>
             : </property>
             : ]]>
Do you tell users where they should get "your_client_id," "your_client_secret," and the "refresh_URL"? These values come from values you set when you create your ADLS service principal and the refresh URL comes from the "Endpoints" setting in the Azure portal (Azure Active Directory > App registrations > Endpoints).


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 1:

(3 comments)

I've read through roughly half of impala_adls.xml and noticed a few things.

http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

Line 154:       
L154 has trailing white space. It's best to remove trailing white space from text that is under revision control, whether it be prose, code, or markup.


PS1, Line 156:         As an alternative, specify the credentials in environment variables before starting the <cmdname>impalad</cmdname>
             :         daemon.
Is it possible to list a mapping between core-site.xml settings above and environment variable names?


Line 219:           <li>
This is still empty. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 3:

(1 comment)

Will +2 after this last point has been cleared.

http://gerrit.cloudera.org:8080/#/c/7175/3/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS3, Line 179: As an alternative, specify the credentials in environment variables before starting the <cmdname>impalad</cmdname>
             :         daemon.
We have 4 environment variables for the Impala minicluster:
https://github.com/apache/incubator-impala/blob/master/bin/impala-config.sh#L243-L246

But that's just for the minicluster and not for remote clusters. If the docs are not aimed at the minicluster, then we don't need to mention this so as to avoid confusion.


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


IMPALA-5333: [DOCS] Document Impala ADLS support

Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Reviewed-on: http://gerrit.cloudera.org:8080/7175
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Reviewed-by: Laurel Hale <la...@cloudera.com>
Reviewed-by: John Russell <jr...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/impala.ditamap
M docs/shared/impala_common.xml
A docs/topics/impala_adls.xml
M docs/topics/impala_insert.xml
M docs/topics/impala_load_data.xml
M docs/topics/impala_parquet_file_size.xml
6 files changed, 718 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Laurel Hale: Looks good to me, but someone else must approve
  Sailesh Mukil: Looks good to me, but someone else must approve
  John Russell: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

Line 154:       
> L154 has trailing white space. It's best to remove trailing white space fro
Absolutely. I've gotten spoiled by a pre-commit hook that auto-removes any trailing spaces. Wow, looking at the hook I see that check & fixup has been commented out for some time now! I see that a few more have gotten into other files but I'll fix those separately.


PS1, Line 156:         As an alternative, specify the credentials in environment variables before starting the <cmdname>impalad</cmdname>
             :         daemon.
> Is it possible to list a mapping between core-site.xml settings above and e
Yes. Need to consult with Sailesh or I think there is some CDH doc for ADLS I can adapt.


Line 219:           <li>
> This is still empty. :)
Need to consult with Sailesh. I haven't had hands-on experience with ADLS yet the way I have with S3.


PS1, Line 261:         You point
> Maybe just an imperative "Point" ?
Almost. I'll word it as "To X, do Y". (Although now the wording on the S3 page is slightly different. Too much trouble to track down every instance in upstream and downstream docs to reconcile them all.)


PS1, Line 279:         For example,
> Because the paragraph starting at L269 mentions a database with a LOCATION,
Done


PS1, Line 313: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
What's the ADLS command line syntax to do this? (I'm basing this on an S3 example with 'aws s3' commands.


PS1, Line 329: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
Same question as on line 313.


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/7175/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS2, Line 1072: <p rev="2.9.0 IMPALA-5333" id="adls_dml_performance">
              :         Because of differences between ADLS and traditional filesystems, DML operations
              :         for ADLS tables can take longer than for tables on HDFS.
              :         <draft-comment>
              :           Is there anything to say on this subject, if ADLS doesn't have
              :           the same file-moving-to-the-trashcan performance overhead as S3?
              :         </draft-comment>
              :       </p>
This isn't necessarily true for ADLS the way it's true for S3. In S3 we don't have renames which is what this point tries to highlight, which isn't a problem for ADLS. So for ADLS, we needn't mention this.


PS2, Line 1098: Because data files written to ADLS do not have a default block size
This should be more like:

"Because ADLS doesn't expose the block sizes of its files..."


PS2, Line 1153:       <p rev="2.9.0 IMPALA-5333" id="adls_dml">
              :         In <keyword keyref="impala29_full"/> and higher, the Impala DML statements (<codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>,
              :         and <codeph>CREATE TABLE AS SELECT</codeph>) can write data into a table or partition that resides in the
              :         Azure Data Lake Store (ADLS).
              :         The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
              :         partitions is specified by an <codeph>adl://</codeph> prefix in the
              :         <codeph>LOCATION</codeph> attribute of
              :         <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements.
              :         If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
              :         issue a <codeph>REFRESH</codeph> statement for the table before using Impala to query the ADLS data.
              :       </p>
This was added for S3 since INSERT and LOAD DATA were added in a later release than SELECT etc. Do we need to mention this for ADLS? We have support for both reads and writes in the same release.


http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS1, Line 209: or on earlier Impala releases without DML support for ADLS
We don't have earlier releases with any support for ADLS, so I don't think that this point is worth mentioning.


PS1, Line 271: ph>adl
We call them stores in ADLS. So probably just do a find replace of "bucket" with "store"


PS1, Line 291: ibute.
The stores have the format "adl://<store>.azuredatalakestore.net/path/to/file"


PS1, Line 313: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> What's the ADLS command line syntax to do this? (I'm basing this on an S3 e
I've not used the ADLS command line tool. It seemed a little hard to setup. So I've been getting by with hadoop fs -ls "adl://blah"

Do we have to mention the ADLS command line tool?


PS1, Line 329: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> Same question as on line 313.
ditto


http://gerrit.cloudera.org:8080/#/c/7175/2/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS2, Line 405: impala-demo
Did this work?


PS2, Line 589: ADLS does not have the
             :           same block notion as HDFS
ADLS does not expose block sizes like HDFS does,


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes