You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Samet Karadag (Jira)" <ji...@apache.org> on 2020/08/26 11:54:00 UTC
[jira] [Updated] (SQOOP-3480) if enclosed-by and escaped-by
characters are both double quote (\"). This causes duplicate escapes and
thus duplicate characters in douplequotes
[ https://issues.apache.org/jira/browse/SQOOP-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Samet Karadag updated SQOOP-3480:
---------------------------------
Description:
if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douple quotes.
Example;
gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4 --class=org.apache.sqoop.Sqoop --jars=$libs – import -Dmapreduce.job.user.classpath.first=true --connect=jdbc:**** --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile
causes this field; <test field " >
to enclosed and escaped by this; <"test field """"">
Which has 2 double quotes
Bigquery requires double quotes as escap char. and field should be also enclosed by " for newlines.
code should be change;
in FieldFormatter.java;
if (escapingLegal) {
// escaping is legal. Escape any instances of the escape char itself.
withEscapes = str.replace("" + escape, "" + escape + escape);
} else {
// no need to double-escape
withEscapes = str;
}
// if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal) {
withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
}
to this;
boolean alreadyEscaped=false
if (escapingLegal and !alreadyEscaped) {
// escaping is legal. Escape any instances of the escape char itself.
withEscapes = str.replace("" + escape, "" + escape + escape);
alreadyEscaped = true
} else {
// no need to double-escape
withEscapes = str;
}
// if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal and !alreadyEscaped) {
withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
}
was:
if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douple quotes.
Example;
gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4 --class=org.apache.sqoop.Sqoop --jars=$libs -- import -Dmapreduce.job.user.classpath.first=true --connect=jdbc:**** --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile
causes this field; <test field " >
to enclosed and escaped by this; <"test field """"">
Which has 2 double quotes
Bigquery requires double quotes as escap char. and field should be also enclosed by " for newlines.
> if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douplequotes
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SQOOP-3480
> URL: https://issues.apache.org/jira/browse/SQOOP-3480
> Project: Sqoop
> Issue Type: Bug
> Reporter: Samet Karadag
> Priority: Blocker
>
> if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douple quotes.
> Example;
> gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4 --class=org.apache.sqoop.Sqoop --jars=$libs – import -Dmapreduce.job.user.classpath.first=true --connect=jdbc:**** --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile
>
> causes this field; <test field " >
> to enclosed and escaped by this; <"test field """"">
> Which has 2 double quotes
> Bigquery requires double quotes as escap char. and field should be also enclosed by " for newlines.
>
> code should be change;
> in FieldFormatter.java;
> if (escapingLegal) {
> // escaping is legal. Escape any instances of the escape char itself.
> withEscapes = str.replace("" + escape, "" + escape + escape);
> } else {
> // no need to double-escape
> withEscapes = str;
> }
> // if we have an enclosing character, and escaping is legal, then the
> // encloser must always be escaped.
> if (escapingLegal) {
> withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
> }
>
> to this;
> boolean alreadyEscaped=false
>
> if (escapingLegal and !alreadyEscaped) {
> // escaping is legal. Escape any instances of the escape char itself.
> withEscapes = str.replace("" + escape, "" + escape + escape);
> alreadyEscaped = true
> } else {
> // no need to double-escape
> withEscapes = str;
> }
> // if we have an enclosing character, and escaping is legal, then the
> // encloser must always be escaped.
> if (escapingLegal and !alreadyEscaped) {
> withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
> }
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)