You are viewing a plain text version of this content. The canonical link for it is here.
Posted to codereview@trafodion.apache.org by traflm <gi...@git.apache.org> on 2016/04/05 04:14:50 UTC

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

GitHub user traflm opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/414

    [TRAFODION-1912]support automatic convert DOS format during bulkload

    Introduce a new CQD : HIVE_SCAN_SPECIAL_MODE
    which is a bitmap, to allow different 'tolerance' suggested by JIRA 1911.
    This patch is for DOS format handling. When HIVE_SCAN_SPECIAL_MODE set to 1, HIVE_SCAN will automatically replace \r\n to space\n
    
    Please help to review.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/traflm/incubator-trafodion TRAFODION-1912

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #414
    
----
commit f48cc2b69043b33c1abd2b9cec9e3bf1e2299d11
Author: Liu Ming <ov...@sina.com>
Date:   2016-04-05T02:05:48Z

    [TRAFODION-1912]support automatic convert DOS format during bulkload

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by traflm <gi...@git.apache.org>.
Github user traflm commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/414#discussion_r58715937
  
    --- Diff: core/sql/executor/ExHdfsScan.h ---
    @@ -429,41 +432,92 @@ class ExOrcFastAggrTcb  : public ExOrcScanTcb
     
     #define RANGE_DELIMITER '\002'
     
    -inline char *hdfs_strchr(const char *s, int c, const char *end, NABoolean checkRangeDelimiter)
    +inline char *hdfs_strchr(char *s, int c, const char *end, NABoolean checkRangeDelimiter, int mode = 0)
     {
       char *curr = (char *)s;
    -
    -  while (curr < end) {
    +  int count=0;
    +  if( (mode & HIVE_MODE_DOSFORMAT ) == 0)
    +  {
    +   while (curr < end) {
         if (*curr == c)
    +    {
            return curr;
    +    }
         if (checkRangeDelimiter &&*curr == RANGE_DELIMITER)
            return NULL;
         curr++;
    +   }
    +  }
    +  else
    +  {
    +   while (curr < end) {
    +     if (*curr == c)
    +     {
    +         if(count>0 && c == '\n')
    +         {
    +           if(s[count-1] == '\r') s[count-1] = ' '; 
    --- End diff --
    
    \0 will not work, I will keep it as space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by selvaganesang <gi...@git.apache.org>.
Github user selvaganesang commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/414#discussion_r58537932
  
    --- Diff: core/sql/executor/ExHdfsScan.h ---
    @@ -429,41 +432,92 @@ class ExOrcFastAggrTcb  : public ExOrcScanTcb
     
     #define RANGE_DELIMITER '\002'
     
    -inline char *hdfs_strchr(const char *s, int c, const char *end, NABoolean checkRangeDelimiter)
    +inline char *hdfs_strchr(char *s, int c, const char *end, NABoolean checkRangeDelimiter, int mode = 0)
     {
       char *curr = (char *)s;
    -
    -  while (curr < end) {
    +  int count=0;
    +  if( (mode & HIVE_MODE_DOSFORMAT ) == 0)
    +  {
    +   while (curr < end) {
         if (*curr == c)
    +    {
            return curr;
    +    }
         if (checkRangeDelimiter &&*curr == RANGE_DELIMITER)
            return NULL;
         curr++;
    +   }
    +  }
    +  else
    +  {
    +   while (curr < end) {
    +     if (*curr == c)
    +     {
    +         if(count>0 && c == '\n')
    +         {
    +           if(s[count-1] == '\r') s[count-1] = ' '; 
    +         }
    +         return curr;
    +      }
    +      if (checkRangeDelimiter &&*curr == RANGE_DELIMITER)
    +         return NULL;
    +    curr++;
    +    count++;
    +   }
       }
       return NULL;
     }
     
     
    -inline char *hdfs_strchr(const char *s, int rd, int cd, const char *end, NABoolean checkRangeDelimiter, NABoolean *rdSeen)
    +inline char *hdfs_strchr(char *s, int rd, int cd, const char *end, NABoolean checkRangeDelimiter, NABoolean *rdSeen, int mode = 0)
     {
       char *curr = (char *)s;
    -
    -  while (curr < end) {
    -    if (*curr == rd) {
    -       *rdSeen = TRUE;
    -       return curr;
    -    }
    -    else
    -    if (*curr == cd) {
    -       *rdSeen = FALSE;
    -       return curr;
    +  int count = 0;
    +  if( (mode & HIVE_MODE_DOSFORMAT)>0 )  //check outside the while loop to make it faster
    +  {
    +    while (curr < end) {
    +      if (*curr == rd) {
    +         if(count>0 && rd == '\n')
    +         {
    +             if(s[count-1] == '\r') s[count-1] = ' ';
    --- End diff --
    
    Same comment as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by selvaganesang <gi...@git.apache.org>.
Github user selvaganesang commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/414#discussion_r58536750
  
    --- Diff: core/sql/executor/ExHdfsScan.cpp ---
    @@ -367,6 +367,9 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
       char cursorId[8];
       HdfsFileInfo *hdfo = NULL;
       Lng32 openType = 0;
    +  int hiveScanMode = 0;
    +
    +  hiveScanMode = CmpCommon::getDefaultLong(HIVE_SCAN_SPECIAL_MODE);
    --- End diff --
    
    Usually, CQDs are not accessed directly in executor operators directly. It should be passed as a flag in the TDB. It is possible that the query is compiled in a different process, then this CQD setting won't be available in the executor layer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by selvaganesang <gi...@git.apache.org>.
Github user selvaganesang commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/414#discussion_r58537871
  
    --- Diff: core/sql/executor/ExHdfsScan.h ---
    @@ -429,41 +432,92 @@ class ExOrcFastAggrTcb  : public ExOrcScanTcb
     
     #define RANGE_DELIMITER '\002'
     
    -inline char *hdfs_strchr(const char *s, int c, const char *end, NABoolean checkRangeDelimiter)
    +inline char *hdfs_strchr(char *s, int c, const char *end, NABoolean checkRangeDelimiter, int mode = 0)
     {
       char *curr = (char *)s;
    -
    -  while (curr < end) {
    +  int count=0;
    +  if( (mode & HIVE_MODE_DOSFORMAT ) == 0)
    +  {
    +   while (curr < end) {
         if (*curr == c)
    +    {
            return curr;
    +    }
         if (checkRangeDelimiter &&*curr == RANGE_DELIMITER)
            return NULL;
         curr++;
    +   }
    +  }
    +  else
    +  {
    +   while (curr < end) {
    +     if (*curr == c)
    +     {
    +         if(count>0 && c == '\n')
    +         {
    +           if(s[count-1] == '\r') s[count-1] = ' '; 
    --- End diff --
    
    What happens if the column happens to be numeric column.  Is the space character tolerated in the conversion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by traflm <gi...@git.apache.org>.
Github user traflm commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/414#discussion_r58537747
  
    --- Diff: core/sql/executor/ExHdfsScan.cpp ---
    @@ -367,6 +367,9 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
       char cursorId[8];
       HdfsFileInfo *hdfo = NULL;
       Lng32 openType = 0;
    +  int hiveScanMode = 0;
    +
    +  hiveScanMode = CmpCommon::getDefaultLong(HIVE_SCAN_SPECIAL_MODE);
    --- End diff --
    
    thanks Selva, this explains my puzzle. In the hive regress test, I have to 
    cqd CALL_EMBEDDED_ARKCMP 'ON';
    It must be due to the reason you mentioned, here, I will add this as a TDB flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by traflm <gi...@git.apache.org>.
Github user traflm commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/414#discussion_r58714988
  
    --- Diff: core/sql/executor/ExHdfsScan.h ---
    @@ -429,41 +432,92 @@ class ExOrcFastAggrTcb  : public ExOrcScanTcb
     
     #define RANGE_DELIMITER '\002'
     
    -inline char *hdfs_strchr(const char *s, int c, const char *end, NABoolean checkRangeDelimiter)
    +inline char *hdfs_strchr(char *s, int c, const char *end, NABoolean checkRangeDelimiter, int mode = 0)
     {
       char *curr = (char *)s;
    -
    -  while (curr < end) {
    +  int count=0;
    +  if( (mode & HIVE_MODE_DOSFORMAT ) == 0)
    +  {
    +   while (curr < end) {
         if (*curr == c)
    +    {
            return curr;
    +    }
         if (checkRangeDelimiter &&*curr == RANGE_DELIMITER)
            return NULL;
         curr++;
    +   }
    +  }
    +  else
    +  {
    +   while (curr < end) {
    +     if (*curr == c)
    +     {
    +         if(count>0 && c == '\n')
    +         {
    +           if(s[count-1] == '\r') s[count-1] = ' '; 
    --- End diff --
    
    I tested with DATE and INT, both works well with extra space. But maybe \0 is better. Let me try it out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by selvaganesang <gi...@git.apache.org>.
Github user selvaganesang commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/414#discussion_r58536813
  
    --- Diff: core/sql/executor/ExHdfsScan.cpp ---
    @@ -367,6 +367,9 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
       char cursorId[8];
       HdfsFileInfo *hdfo = NULL;
       Lng32 openType = 0;
    +  int hiveScanMode = 0;
    +
    +  hiveScanMode = CmpCommon::getDefaultLong(HIVE_SCAN_SPECIAL_MODE);
    --- End diff --
    
    Please get this CQD passed as an attribute in TDB


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1912]support automati...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-trafodion/pull/414


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---