You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Samet Karadag (Jira)" <ji...@apache.org> on 2020/08/26 11:54:00 UTC

[jira] [Updated] (SQOOP-3480) if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douplequotes

     [ https://issues.apache.org/jira/browse/SQOOP-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Samet Karadag updated SQOOP-3480:
---------------------------------
    Description: 
if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douple quotes.

Example;

gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4 --class=org.apache.sqoop.Sqoop --jars=$libs – import -Dmapreduce.job.user.classpath.first=true --connect=jdbc:**** --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile

 

causes this field;  <test field "  > 

to enclosed and escaped by this; <"test field """"">

Which has 2 double quotes

Bigquery requires double quotes as escap char. and field should be also enclosed by " for newlines.

 

code should be change;

in FieldFormatter.java;

if (escapingLegal) {
 // escaping is legal. Escape any instances of the escape char itself.
 withEscapes = str.replace("" + escape, "" + escape + escape);
 } else {
 // no need to double-escape
 withEscapes = str;
 }

// if we have an enclosing character, and escaping is legal, then the
 // encloser must always be escaped.
 if (escapingLegal) {
 withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
 }

 

to this;

boolean alreadyEscaped=false

 

if (escapingLegal and !alreadyEscaped) {
 // escaping is legal. Escape any instances of the escape char itself.
 withEscapes = str.replace("" + escape, "" + escape + escape);

alreadyEscaped = true
 } else {
 // no need to double-escape
 withEscapes = str;
 }

// if we have an enclosing character, and escaping is legal, then the
 // encloser must always be escaped.
 if (escapingLegal and !alreadyEscaped) {
 withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
 }

 

  was:
if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douple quotes.

Example;

gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4 --class=org.apache.sqoop.Sqoop --jars=$libs -- import -Dmapreduce.job.user.classpath.first=true --connect=jdbc:**** --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile

 

causes this field;  <test field "  > 

to enclosed and escaped by this; <"test field """"">

Which has 2 double quotes

Bigquery requires double quotes as escap char. and field should be also enclosed by " for newlines.

 


> if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douplequotes
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-3480
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3480
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Samet Karadag
>            Priority: Blocker
>
> if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douple quotes.
> Example;
> gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4 --class=org.apache.sqoop.Sqoop --jars=$libs – import -Dmapreduce.job.user.classpath.first=true --connect=jdbc:**** --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile
>  
> causes this field;  <test field "  > 
> to enclosed and escaped by this; <"test field """"">
> Which has 2 double quotes
> Bigquery requires double quotes as escap char. and field should be also enclosed by " for newlines.
>  
> code should be change;
> in FieldFormatter.java;
> if (escapingLegal) {
>  // escaping is legal. Escape any instances of the escape char itself.
>  withEscapes = str.replace("" + escape, "" + escape + escape);
>  } else {
>  // no need to double-escape
>  withEscapes = str;
>  }
> // if we have an enclosing character, and escaping is legal, then the
>  // encloser must always be escaped.
>  if (escapingLegal) {
>  withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
>  }
>  
> to this;
> boolean alreadyEscaped=false
>  
> if (escapingLegal and !alreadyEscaped) {
>  // escaping is legal. Escape any instances of the escape char itself.
>  withEscapes = str.replace("" + escape, "" + escape + escape);
> alreadyEscaped = true
>  } else {
>  // no need to double-escape
>  withEscapes = str;
>  }
> // if we have an enclosing character, and escaping is legal, then the
>  // encloser must always be escaped.
>  if (escapingLegal and !alreadyEscaped) {
>  withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
>  }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)