You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sahil Takiar (Code Review)" <ge...@cloudera.org> on 2019/11/02 18:19:45 UTC

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14621


Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................

IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Writes to text tables on ABFS are failing because HADOOP-15860 recently
changed the ABFS behavior when writing files / folders that end with a
'.'. ABFS explicitly does not allow files / folders that end with a dot.
From the ABFS docs: "Avoid blob names that end with a dot (.), a forward
slash (/), or a sequence or combination of the two."

The behavior prior to HADOOP-15860 was to simply drop any trailing dots
when writing files or folders, but that can lead to various issues
because clients may try to read back a file that should exist on ABFS,
but doesn't. HADOOP-15860 changed the behavior so that any attempt to
write a file or folder with a trailing dot fails on ABFS.

Impala writes all text files with a trailing dot due to some odd
behavior in hdfs-table-sink.cc. The table sink writes files with
a "file extension" which is dependent on the file type. For example,
Parquet files have a file extension of ".parq". For some reason, text
files had no file extension, so Impala would try to write text files of
the following form:
"244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0.".

This patch adds the ".txt" extension to all written text files and
modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to
a filename if there is no file extension.

Testing:
* Ran core tests
* Re-ran affected ABFS tests

Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
2 files changed, 5 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/14621/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Nov 2019 05:49:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................

IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Writes to text tables on ABFS are failing because HADOOP-15860 recently
changed the ABFS behavior when writing files / folders that end with a
'.'. ABFS explicitly does not allow files / folders that end with a dot.
From the ABFS docs: "Avoid blob names that end with a dot (.), a forward
slash (/), or a sequence or combination of the two."

The behavior prior to HADOOP-15860 was to simply drop any trailing dots
when writing files or folders, but that can lead to various issues
because clients may try to read back a file that should exist on ABFS,
but doesn't. HADOOP-15860 changed the behavior so that any attempt to
write a file or folder with a trailing dot fails on ABFS.

Impala writes all text files with a trailing dot due to some odd
behavior in hdfs-table-sink.cc. The table sink writes files with
a "file extension" which is dependent on the file type. For example,
Parquet files have a file extension of ".parq". For some reason, text
files had no file extension, so Impala would try to write text files of
the following form:
"244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0.".

Several tables created during dataload, such as alltypes, already use
the '.txt' extension for their files. These tables are not created via
Impala's INSERT code path, they are copied into the table. However,
there are several tables created during dataload, such as
alltypesinsert, that are created via Impala. This patch will change
the files in these tables so that they end in '.txt'.

This patch adds the ".txt" extension to all written text files and
modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to
a filename if there is no file extension.

Testing:
* Ran core tests
* Re-ran affected ABFS tests
* Added test to validate that the correct file extension is used for
Parquet and text tables
* Manually validated that without the addition of the '.txt' file
extension, files are not written with a trailing dot

Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Reviewed-on: http://gerrit.cloudera.org:8080/14621
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M tests/query_test/test_insert.py
3 files changed, 35 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14621/2/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/14621/2/tests/query_test/test_insert.py@313
PS2, Line 313: class TestInsertFileExtension(ImpalaTestSuite):
> flake8: E302 expected 2 blank lines, found 1
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Nov 2019 22:03:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14621

to look at the new patch set (#4).

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................

IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Writes to text tables on ABFS are failing because HADOOP-15860 recently
changed the ABFS behavior when writing files / folders that end with a
'.'. ABFS explicitly does not allow files / folders that end with a dot.
From the ABFS docs: "Avoid blob names that end with a dot (.), a forward
slash (/), or a sequence or combination of the two."

The behavior prior to HADOOP-15860 was to simply drop any trailing dots
when writing files or folders, but that can lead to various issues
because clients may try to read back a file that should exist on ABFS,
but doesn't. HADOOP-15860 changed the behavior so that any attempt to
write a file or folder with a trailing dot fails on ABFS.

Impala writes all text files with a trailing dot due to some odd
behavior in hdfs-table-sink.cc. The table sink writes files with
a "file extension" which is dependent on the file type. For example,
Parquet files have a file extension of ".parq". For some reason, text
files had no file extension, so Impala would try to write text files of
the following form:
"244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0.".

Several tables created during dataload, such as alltypes, already use
the '.txt' extension for their files. These tables are not created via
Impala's INSERT code path, they are copied into the table. However,
there are several tables created during dataload, such as
alltypesinsert, that are created via Impala. This patch will change
the files in these tables so that they end in '.txt'.

This patch adds the ".txt" extension to all written text files and
modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to
a filename if there is no file extension.

Testing:
* Ran core tests
* Re-ran affected ABFS tests
* Added test to validate that the correct file extension is used for
Parquet and text tables
* Manually validated that without the addition of the '.txt' file
extension, files are not written with a trailing dot

Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M tests/query_test/test_insert.py
3 files changed, 35 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/14621/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Nov 2019 01:25:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14621/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14621/3//COMMIT_MSG@21
PS3, Line 21: Impala writes all text files with a trailing dot due to some odd
            : behavior in hdfs-table-sink.cc
> Can you mention that text tables created during dataload already used .txt 
Done. There are actually some tables (like alltypesinsert) that contain text data and are created via Impala during dataload. I guess we happen to not run any 'show files' tests against these tables.



-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Nov 2019 16:25:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4936/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Nov 2019 22:47:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4934/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Nov 2019 19:03:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 4: Code-Review+2

This looks good to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Nov 2019 00:43:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14621

to look at the new patch set (#3).

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................

IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Writes to text tables on ABFS are failing because HADOOP-15860 recently
changed the ABFS behavior when writing files / folders that end with a
'.'. ABFS explicitly does not allow files / folders that end with a dot.
From the ABFS docs: "Avoid blob names that end with a dot (.), a forward
slash (/), or a sequence or combination of the two."

The behavior prior to HADOOP-15860 was to simply drop any trailing dots
when writing files or folders, but that can lead to various issues
because clients may try to read back a file that should exist on ABFS,
but doesn't. HADOOP-15860 changed the behavior so that any attempt to
write a file or folder with a trailing dot fails on ABFS.

Impala writes all text files with a trailing dot due to some odd
behavior in hdfs-table-sink.cc. The table sink writes files with
a "file extension" which is dependent on the file type. For example,
Parquet files have a file extension of ".parq". For some reason, text
files had no file extension, so Impala would try to write text files of
the following form:
"244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0.".

This patch adds the ".txt" extension to all written text files and
modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to
a filename if there is no file extension.

Testing:
* Ran core tests
* Re-ran affected ABFS tests
* Added test to validate that the correct file extension is used for
Parquet and text tables
* Manually validated that without the addition of the '.txt' file
extension, files are not written with a trailing dot

Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M tests/query_test/test_insert.py
3 files changed, 35 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/14621/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4935/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Nov 2019 22:44:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14621/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14621/3//COMMIT_MSG@21
PS3, Line 21: Impala writes all text files with a trailing dot due to some odd
            : behavior in hdfs-table-sink.cc
Can you mention that text tables created during dataload already used .txt extension? I was surprised at first that no existing test had to be changed (e.g. ones that use SHOW FILES), but then I realized that functional.alltypes and friends already used .txt extension as they are copied instead of created by INSERT.



-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Nov 2019 15:26:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14621/2/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/14621/2/tests/query_test/test_insert.py@313
PS2, Line 313: class TestInsertFileExtension(ImpalaTestSuite):
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Nov 2019 22:01:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14621

to look at the new patch set (#2).

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................

IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Writes to text tables on ABFS are failing because HADOOP-15860 recently
changed the ABFS behavior when writing files / folders that end with a
'.'. ABFS explicitly does not allow files / folders that end with a dot.
From the ABFS docs: "Avoid blob names that end with a dot (.), a forward
slash (/), or a sequence or combination of the two."

The behavior prior to HADOOP-15860 was to simply drop any trailing dots
when writing files or folders, but that can lead to various issues
because clients may try to read back a file that should exist on ABFS,
but doesn't. HADOOP-15860 changed the behavior so that any attempt to
write a file or folder with a trailing dot fails on ABFS.

Impala writes all text files with a trailing dot due to some odd
behavior in hdfs-table-sink.cc. The table sink writes files with
a "file extension" which is dependent on the file type. For example,
Parquet files have a file extension of ".parq". For some reason, text
files had no file extension, so Impala would try to write text files of
the following form:
"244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0.".

This patch adds the ".txt" extension to all written text files and
modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to
a filename if there is no file extension.

Testing:
* Ran core tests
* Re-ran affected ABFS tests
* Added test to validate that the correct file extension is used for
Parquet and text tables
* Manually validated that without the addition of the '.txt' file
extension, files are not written with a trailing dot

Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M tests/query_test/test_insert.py
3 files changed, 34 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/14621/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14621 )

Change subject: IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5175/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Gerrit-Change-Number: 14621
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Nov 2019 01:25:54 +0000
Gerrit-HasComments: No