You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/12/23 02:35:15 UTC

[GitHub] [orc] coderex2522 opened a new pull request #975: ORC-1055: [C++] Add timezone options for csv-import tool

coderex2522 opened a new pull request #975:
URL: https://github.com/apache/orc/pull/975


   ### What changes were proposed in this pull request?
   The pull request provides the csv-import tool with support for timezone settings
   
   
   ### Why are the changes needed?
   The patch is used to solve the ORC-1055 bug.
   
   ### How was this patch tested?
   Test with some csv files containing timestamp columns.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #975: ORC-1055: [C++] Set timezone writer options for csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1001833835


   Thank you for updating, @coderex2522 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] coderex2522 commented on pull request #975: ORC-1055: [C++] Set timezone writer options for csv-import tool

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1000629140


   @wgtmac Hi, please take a look!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] coderex2522 commented on pull request #975: ORC-1055: [C++] Set timezone writer options for csv-import tool

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1000679222


   These timezones alias information is a reference to https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/util/calendar/ZoneInfoFile.java line:220-244


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang edited a comment on pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
guiyanakuang edited a comment on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1004664919


   @coderex2522 Don't worry, it has nothing to do with your pr
   Appveyor's webhook is being removed.
   https://issues.apache.org/jira/browse/INFRA-22692
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] coderex2522 commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r776603215



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {
+  // create an ORC file from importing the CSV file
+  const std::string pgm1 = findProgram("tools/src/csv-import");
+  const std::string csvFile =
+    findExample("TestCSVFileImport.testTimezoneOption.csv");
+  const std::string orcFile = "/tmp/test_csv_import_test_timezone_option.orc";
+  const std::string schema = "'struct<_a:timestamp>'";
+  std::string output;
+  std::string error;
+
+  std::string option = "--timezone=America/Los_Angeles";

Review comment:
       If we don't set timezone for the WriterOptions, the writer will write the timestamp with the GMT timezone. And the orc-contents tool will scan the content in the GMT timezone.This case should not be successful?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1004664919


   @coderex2522 It looks like ORC-1073 forgot to remove appveyor.yml in labeler.yml. Don't worry, it has nothing to do with your pr
   https://github.com/apache/orc/blob/d84df2129adea01469871fc1ba6be5b758cbd7ac/.github/labeler.yml#L22


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] coderex2522 commented on pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1004658427


   @williamhyun Hi, can you help investigate this problem in continuous-integration/appveyor/pr? 
   The error message shows "The build phase is set to "MSBuild" mode (default), but no Visual Studio project or solution files were found in the root directory. If you are not building Visual Studio project switch build mode to "Script" and provide your custom build command."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] coderex2522 commented on pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1001843144


   The PR title and descrition has been modified, @dongjoon-hyun please take a look again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] coderex2522 commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r776601386



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {

Review comment:
       I test the case again when I delete the line"options.setTimezoneName(timezoneName);" in CSVFileImport.cc. The case is failed. The error message show
   "Expected: expected
         Which is: "{\"_a\": \"2021-12-26 16:00:00.0\"}\n"
   To be equal to: output
         Which is: "{\"_a\": \"2021-12-27 00:00:00.0\"}\n"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on pull request #975: ORC-1055: [C++] Set timezone writer options for csv-import tool

Posted by GitBox <gi...@apache.org>.
wgtmac commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1001326292


   > * ed before your PR. Do
   
   +1 for adding a test case. We can directly use the csv file from the JIRA.
   
   After some investigation, I think the Java odlMappings may introduce new issues. For example, **CST** can either be Central Standard Time (America/Chicago) or China Standard Time (Asia/Shanghai).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun edited a comment on pull request #975: ORC-1055: [C++] Set timezone writer options for csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1001833835


   Thank you for updating, @coderex2522 . Could you revise the PR title and description according to the new code?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r776207567



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {

Review comment:
       This test case succeeds without your patch, `CSVFileImport.cc`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r776875356



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {
+  // create an ORC file from importing the CSV file
+  const std::string pgm1 = findProgram("tools/src/csv-import");
+  const std::string csvFile =
+    findExample("TestCSVFileImport.testTimezoneOption.csv");
+  const std::string orcFile = "/tmp/test_csv_import_test_timezone_option.orc";
+  const std::string schema = "'struct<_a:timestamp>'";
+  std::string output;
+  std::string error;
+
+  std::string option = "--timezone=America/Los_Angeles";

Review comment:
       @coderex2522 . Yes, the test case should fail without your patch and should pass with your patch. That's the purpose of verification of your code's contribution.
   
   And, +1 for @wgtmac 's suggestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r776875356



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {
+  // create an ORC file from importing the CSV file
+  const std::string pgm1 = findProgram("tools/src/csv-import");
+  const std::string csvFile =
+    findExample("TestCSVFileImport.testTimezoneOption.csv");
+  const std::string orcFile = "/tmp/test_csv_import_test_timezone_option.orc";
+  const std::string schema = "'struct<_a:timestamp>'";
+  std::string output;
+  std::string error;
+
+  std::string option = "--timezone=America/Los_Angeles";

Review comment:
       @coderex2522 . Yes, the test case should fail without your patch and should pass with your patch. That's the purpose of verification of your code's contribution.
   
   And, +1 for @wgtmac .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #975:
URL: https://github.com/apache/orc/pull/975


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
wgtmac commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r776782643



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {
+  // create an ORC file from importing the CSV file
+  const std::string pgm1 = findProgram("tools/src/csv-import");
+  const std::string csvFile =
+    findExample("TestCSVFileImport.testTimezoneOption.csv");
+  const std::string orcFile = "/tmp/test_csv_import_test_timezone_option.orc";
+  const std::string schema = "'struct<_a:timestamp>'";
+  std::string output;
+  std::string error;
+
+  std::string option = "--timezone=America/Los_Angeles";

Review comment:
       We can add another test to use timezone as Europe/Paris or Asia/Shanghai.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] coderex2522 commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r777799211



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {
+  // create an ORC file from importing the CSV file
+  const std::string pgm1 = findProgram("tools/src/csv-import");
+  const std::string csvFile =
+    findExample("TestCSVFileImport.testTimezoneOption.csv");
+  const std::string orcFile = "/tmp/test_csv_import_test_timezone_option.orc";
+  const std::string schema = "'struct<_a:timestamp>'";
+  std::string output;
+  std::string error;
+
+  std::string option = "--timezone=America/Los_Angeles";

Review comment:
       > We can add another test to use timezone as Europe/Paris or Asia/Shanghai.
   
   Add the timezone 'Europe/Paris' test case done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #975:
URL: https://github.com/apache/orc/pull/975#issuecomment-1005127845


   Merged to main/branch-1.7 for Apache ORC 1.7.3.
   
   cc @williamhyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
wgtmac commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r775935479



##########
File path: tools/src/CSVFileImport.cc
##########
@@ -280,15 +280,19 @@ void usage() {
             << "                  [-s <size>] [--stripe=<size>]\n"
             << "                  [-c <size>] [--block=<size>]\n"
             << "                  [-b <size>] [--batch=<size>]\n"
+            << "                  [-t <string>] [--timezone=<string>]\n"
             << "                  <schema> <input> <output>\n"
             << "Import CSV file into an Orc file using the specified schema.\n"
+            << "The timezone can be viewed in the directory /usr/share/zoneinfo\n"

Review comment:
       - It should be stated clearly that the timezone is writer timezone of timestamp types.
   - You have specified it as a **required_argument** which conflicts with the comment here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #975: ORC-1055: [C++] Add the timezone option for the csv-import tool

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #975:
URL: https://github.com/apache/orc/pull/975#discussion_r776207883



##########
File path: tools/test/TestCSVFileImport.cc
##########
@@ -53,3 +53,27 @@ TEST (TestCSVFileImport, test10rows) {
   EXPECT_EQ(expected, output);
   EXPECT_EQ("", error);
 }
+
+TEST (TestCSVFileImport, testTimezoneOption) {
+  // create an ORC file from importing the CSV file
+  const std::string pgm1 = findProgram("tools/src/csv-import");
+  const std::string csvFile =
+    findExample("TestCSVFileImport.testTimezoneOption.csv");
+  const std::string orcFile = "/tmp/test_csv_import_test_timezone_option.orc";
+  const std::string schema = "'struct<_a:timestamp>'";
+  std::string output;
+  std::string error;
+
+  std::string option = "--timezone=America/Los_Angeles";

Review comment:
       Maybe, this test case always succeeds at `America/Los_Angeles` timezone? If then, can we choose less popular timezone?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org