You are viewing a plain text version of this content. The canonical link for it is here.
Posted to codereview@trafodion.apache.org by liuyu000 <gi...@git.apache.org> on 2017/12/19 07:45:48 UTC
[GitHub] incubator-trafodion pull request #1356: [TRAFODION-2855] Correct the syntax ...
GitHub user liuyu000 opened a pull request:
https://github.com/apache/incubator-trafodion/pull/1356
[TRAFODION-2855] Correct the syntax descriptions of LOAD Statement for *Trafodion SQL Reference Manual* 2
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liuyu000/incubator-trafodion LoadStatement2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/1356.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1356
----
commit 30f3ea9a2b3ed6bd52e83227911b46b17dd9c99b
Author: liu.yu <yu...@esgyn.cn>
Date: 2017-12-19T07:42:53Z
Correct the syntax descriptions of LOAD Statement for *Trafodion Reference Manual* 2
----
---
[GitHub] incubator-trafodion pull request #1356: [TRAFODION-2855] Correct the syntax ...
Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1356#discussion_r158144904
--- Diff: docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc ---
@@ -443,24 +443,39 @@ specify one or more of these options:
** `CONTINUE ON ERROR`
+
-LOAD statement will continue after errors encountered while scanning rows from source table.
+LOAD statement will continue after ignorable errors while scanning rows from source table or loading into the target table. The ignorable errors are usually data conversion errors.
+
Errors during the load or sort phase will cause the LOAD statement to abort.
+
-Error rows will be logged by default in HDFS files in the directory `/user/trafodion/bulkload/logs`. The default name of the error files will be of the form `ERR_<three-part-target-table-name>_<date>_<id>`, where `<id>` is a numeric identifier unique to the process where the error was seen.
-+
-This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified and it is not enabled by default.
+This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified.
** `LOG ERROR ROWS [TO _error-location-name_]`
+*** Error rows
+
If error rows must be written to a specified location, then specify TO _error-location-name_, otherwise they will be written to the default location.
+`_error-location-name_` must be a HDFS directory name to which trafodion has write access.
+
-Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
-The actual log file location is displayed in the load command output.
+Error rows will be logged in HDFS files in the *directory* `/user/trafodion/bulkload/logs` if the error log location is not specified.
++
+The default name of the *subdirectory* is `_ERR_catalog.schema.target_table_date_id_`, where `_id_` is a numeric identifier timestamp (YYYYMMDD_HHMMSS) unique to the process where the error was seen.
++
+The default name of the *error file* is `_loggingFileNamePrefix_catalog.schema.target_table_instanceID_`, where `_loggingFileNamePrefix_` is hive_scan_err or traf_upsert_err depending on the data source table, and `_instanceID_` is the ID of instance starting from 0, generally there is only one instance.
--- End diff --
Suggest "...is the instance ID starting from 0, ..."
---
[GitHub] incubator-trafodion pull request #1356: [TRAFODION-2855] Correct the syntax ...
Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1356#discussion_r158144917
--- Diff: docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc ---
@@ -443,24 +443,39 @@ specify one or more of these options:
** `CONTINUE ON ERROR`
+
-LOAD statement will continue after errors encountered while scanning rows from source table.
+LOAD statement will continue after ignorable errors while scanning rows from source table or loading into the target table. The ignorable errors are usually data conversion errors.
+
Errors during the load or sort phase will cause the LOAD statement to abort.
+
-Error rows will be logged by default in HDFS files in the directory `/user/trafodion/bulkload/logs`. The default name of the error files will be of the form `ERR_<three-part-target-table-name>_<date>_<id>`, where `<id>` is a numeric identifier unique to the process where the error was seen.
-+
-This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified and it is not enabled by default.
+This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified.
** `LOG ERROR ROWS [TO _error-location-name_]`
+*** Error rows
+
If error rows must be written to a specified location, then specify TO _error-location-name_, otherwise they will be written to the default location.
+`_error-location-name_` must be a HDFS directory name to which trafodion has write access.
+
-Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
-The actual log file location is displayed in the load command output.
+Error rows will be logged in HDFS files in the *directory* `/user/trafodion/bulkload/logs` if the error log location is not specified.
++
+The default name of the *subdirectory* is `_ERR_catalog.schema.target_table_date_id_`, where `_id_` is a numeric identifier timestamp (YYYYMMDD_HHMMSS) unique to the process where the error was seen.
++
+The default name of the *error file* is `_loggingFileNamePrefix_catalog.schema.target_table_instanceID_`, where `_loggingFileNamePrefix_` is hive_scan_err or traf_upsert_err depending on the data source table, and `_instanceID_` is the ID of instance starting from 0, generally there is only one instance.
++
+For example, the full path of the table test_load_log is `/user/trafodion/bulkload/logs/test/ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918/traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0`,
++
+where:
++
+1. `/user/trafodion/bulkload/logs/test` is the default name of *directory*.
++
+2. `ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918` is the default name of *subdirectory*.
++
+3. `traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0` is the default name of *error file*.
-*** `_error-location-name_`
+*** Error logs
++
+Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
+
-must be a HDFS directory name to which trafodion has write access.
+The actual log file location is displayed in the load command output. It is recommended that use the same location for load as it’s easier to find the error logs.
--- End diff --
Suggest "It is recommended that you use..." (add the word "you")
---
[GitHub] incubator-trafodion pull request #1356: [TRAFODION-2855] Correct the syntax ...
Posted by liuyu000 <gi...@git.apache.org>.
Github user liuyu000 commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1356#discussion_r158206030
--- Diff: docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc ---
@@ -443,24 +443,39 @@ specify one or more of these options:
** `CONTINUE ON ERROR`
+
-LOAD statement will continue after errors encountered while scanning rows from source table.
+LOAD statement will continue after ignorable errors while scanning rows from source table or loading into the target table. The ignorable errors are usually data conversion errors.
+
Errors during the load or sort phase will cause the LOAD statement to abort.
+
-Error rows will be logged by default in HDFS files in the directory `/user/trafodion/bulkload/logs`. The default name of the error files will be of the form `ERR_<three-part-target-table-name>_<date>_<id>`, where `<id>` is a numeric identifier unique to the process where the error was seen.
-+
-This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified and it is not enabled by default.
+This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified.
** `LOG ERROR ROWS [TO _error-location-name_]`
+*** Error rows
+
If error rows must be written to a specified location, then specify TO _error-location-name_, otherwise they will be written to the default location.
+`_error-location-name_` must be a HDFS directory name to which trafodion has write access.
+
-Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
-The actual log file location is displayed in the load command output.
+Error rows will be logged in HDFS files in the *directory* `/user/trafodion/bulkload/logs` if the error log location is not specified.
++
+The default name of the *subdirectory* is `_ERR_catalog.schema.target_table_date_id_`, where `_id_` is a numeric identifier timestamp (YYYYMMDD_HHMMSS) unique to the process where the error was seen.
++
+The default name of the *error file* is `_loggingFileNamePrefix_catalog.schema.target_table_instanceID_`, where `_loggingFileNamePrefix_` is hive_scan_err or traf_upsert_err depending on the data source table, and `_instanceID_` is the ID of instance starting from 0, generally there is only one instance.
--- End diff --
OK, thanks Dave :)
---
[GitHub] incubator-trafodion pull request #1356: [TRAFODION-2855] Correct the syntax ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-trafodion/pull/1356
---
[GitHub] incubator-trafodion pull request #1356: [TRAFODION-2855] Correct the syntax ...
Posted by liuyu000 <gi...@git.apache.org>.
Github user liuyu000 commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1356#discussion_r158206657
--- Diff: docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc ---
@@ -443,24 +443,39 @@ specify one or more of these options:
** `CONTINUE ON ERROR`
+
-LOAD statement will continue after errors encountered while scanning rows from source table.
+LOAD statement will continue after ignorable errors while scanning rows from source table or loading into the target table. The ignorable errors are usually data conversion errors.
+
Errors during the load or sort phase will cause the LOAD statement to abort.
+
-Error rows will be logged by default in HDFS files in the directory `/user/trafodion/bulkload/logs`. The default name of the error files will be of the form `ERR_<three-part-target-table-name>_<date>_<id>`, where `<id>` is a numeric identifier unique to the process where the error was seen.
-+
-This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified and it is not enabled by default.
+This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified.
** `LOG ERROR ROWS [TO _error-location-name_]`
+*** Error rows
+
If error rows must be written to a specified location, then specify TO _error-location-name_, otherwise they will be written to the default location.
+`_error-location-name_` must be a HDFS directory name to which trafodion has write access.
+
-Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
-The actual log file location is displayed in the load command output.
+Error rows will be logged in HDFS files in the *directory* `/user/trafodion/bulkload/logs` if the error log location is not specified.
++
+The default name of the *subdirectory* is `_ERR_catalog.schema.target_table_date_id_`, where `_id_` is a numeric identifier timestamp (YYYYMMDD_HHMMSS) unique to the process where the error was seen.
++
+The default name of the *error file* is `_loggingFileNamePrefix_catalog.schema.target_table_instanceID_`, where `_loggingFileNamePrefix_` is hive_scan_err or traf_upsert_err depending on the data source table, and `_instanceID_` is the ID of instance starting from 0, generally there is only one instance.
++
+For example, the full path of the table test_load_log is `/user/trafodion/bulkload/logs/test/ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918/traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0`,
++
+where:
++
+1. `/user/trafodion/bulkload/logs/test` is the default name of *directory*.
++
+2. `ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918` is the default name of *subdirectory*.
++
+3. `traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0` is the default name of *error file*.
-*** `_error-location-name_`
+*** Error logs
++
+Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
+
-must be a HDFS directory name to which trafodion has write access.
+The actual log file location is displayed in the load command output. It is recommended that use the same location for load as it’s easier to find the error logs.
--- End diff --
OK, thanks for your eagle eye, Dave :)
---