You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "morningman (via GitHub)" <gi...@apache.org> on 2023/01/30 15:39:33 UTC

[GitHub] [doris] morningman commented on a diff in pull request #16055: [feature](Load)Suppot skip specific lines number for csv stream load

morningman commented on code in PR #16055:
URL: https://github.com/apache/doris/pull/16055#discussion_r1090766395


##########
docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md:
##########
@@ -180,6 +180,8 @@ ERRORS:
 
 25. trim_double_quotes: 布尔类型,默认值为 false,为 true 时表示裁剪掉 csv 文件每个字段最外层的双引号。
 
+26. skip_lines: <version since="1.2" type="inline"> 整数类型, 默认值为0, 含义为跳过csv文件的前几行. 当设置format设置为csv_with_names或、csv_with_names_and_types时, 该参数会失效. </version>

Review Comment:
   ```suggestion
   26. skip_lines: <version since="dev" type="inline"> 整数类型, 默认值为0, 含义为跳过csv文件的前几行. 当设置format设置为 `csv_with_names` 或、`csv_with_names_and_types` 时, 该参数会失效. </version>
   ```



##########
be/src/vec/exec/format/csv/csv_reader.cpp:
##########
@@ -88,14 +88,18 @@ CsvReader::~CsvReader() = default;
 Status CsvReader::init_reader(bool is_load) {
     // set the skip lines and start offset
     int64_t start_offset = _range.start_offset;
-    if (start_offset == 0 && _params.__isset.file_attributes &&
-        _params.file_attributes.__isset.header_type &&
-        _params.file_attributes.header_type.size() > 0) {
-        std::string header_type = to_lower(_params.file_attributes.header_type);
-        if (header_type == BeConsts::CSV_WITH_NAMES) {
-            _skip_lines = 1;
-        } else if (header_type == BeConsts::CSV_WITH_NAMES_AND_TYPES) {
-            _skip_lines = 2;
+    if (start_offset == 0) {
+        // check header typer first
+        if (_params.__isset.file_attributes && _params.file_attributes.__isset.header_type &&
+            _params.file_attributes.header_type.size() > 0) {
+            std::string header_type = to_lower(_params.file_attributes.header_type);
+            if (header_type == BeConsts::CSV_WITH_NAMES) {
+                _skip_lines = 1;
+            } else if (header_type == BeConsts::CSV_WITH_NAMES_AND_TYPES) {
+                _skip_lines = 2;
+            }
+        } else if (_params.file_attributes.__isset.skip_lines) {

Review Comment:
   Need to check `_params.__isset.file_attributes`?



##########
docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md:
##########
@@ -183,6 +183,8 @@ ERRORS:
 
 25. trim_double_quotes: Boolean type, The default value is false. True means that the outermost double quotes of each field in the csv file are trimmed.
 
+26. skip_lines: <version since="1.2" type="inline"> Integer type, the default value is 0. It will skip some lines in the head of csv file. It will be disable when format is csv_with_names or csv_with_names_and_types. </version>

Review Comment:
   ```suggestion
   26. skip_lines: <version since="dev" type="inline"> Integer type, the default value is 0. It will skip some lines in the head of csv file. It will be disabled when format is `csv_with_names` or `csv_with_names_and_types`. </version>
   ```



##########
fe/fe-core/src/main/cup/sql_parser.cup:
##########
@@ -621,7 +621,8 @@ terminal String
     KW_AUTO,
     KW_PREPARE,
     KW_EXECUTE,
-    KW_LINES;
+    KW_LINES,
+    KW_IGNORE;

Review Comment:
   Need to add `KW_IGNORE` to the `keywords ::=` entry



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org