You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by KayVajj <va...@gmail.com> on 2015/07/29 19:37:02 UTC

Fwd: Sqoop Codegen Null String

Sorry the earlier email I sent didn't show up while I searched for it so
resending.

Hi,

I have a question with the Sqoop CodeGen. I'm trying to load data from a
DB. I have used the codgen tool to generate the java code. I wanted to
treat the null-strings and null-non-strings as

--null-string '\\N'
--null-non-string '\\N'


Now the code that is generated looks like (The below code is excerpt from
__loadFromFields method in the generated code)

 __cur_str = __it.next();
if (__cur_str.equals("null")) { this.org_id = null; } else {
  this.org_id = __cur_str;
}

I was wondering even with the input options specifically provided it still
treats string "null" as the null string as if I did not provide. Then after
some code browsing, I saw the below code in org.apache.sqoop.orm.ClassWriter

private void parseNullVal(String javaType, String colName, StringBuilder
sb) {
    if (javaType.equals("String")) {
     sb.append("    if (__cur_str.equals(\""
         + this.options.getInNullStringValue() + "\")) { this.");
      sb.append(colName);
      sb.append(" = null; } else {\n");
    } else {
      sb.append("    if (__cur_str.equals(\""
         + this.options.getInNullNonStringValue());
      sb.append("\") || __cur_str.length() == 0) { this.");
      sb.append(colName);
      sb.append(" = null; } else {\n");
    }
  }

This tells me that the loadFromFields will be correct if I set the below
options

--input-null-string '\\N'
--input-null-non-string '\\N'

My understanding is these values are to be set only if we are writing to
the DB and not while reading. I'm not writing to the DB yet I ended up
setting both set of options which resulted in the below code in
__loadFromFields method in the new generated code

   __cur_str = __it.next();
    if (__cur_str.equals("\\N")) { this.org_id = null; } else {
      this.org_id = __cur_str;
    }

Is this a bug?

Thanks

Kay

Re: Sqoop Codegen Null String

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey man,

--null-string and --null-non-string are used when serializing for writing
to Hadoop. Check out
https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/orm/ClassWriter.java#L360
and
https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/orm/ClassWriter.java#L1326
.

-Abe

On Wed, Jul 29, 2015 at 10:37 AM, KayVajj <va...@gmail.com> wrote:

> Sorry the earlier email I sent didn't show up while I searched for it so
> resending.
>
> Hi,
>
> I have a question with the Sqoop CodeGen. I'm trying to load data from a
> DB. I have used the codgen tool to generate the java code. I wanted to
> treat the null-strings and null-non-strings as
>
> --null-string '\\N'
> --null-non-string '\\N'
>
>
> Now the code that is generated looks like (The below code is excerpt from
> __loadFromFields method in the generated code)
>
>  __cur_str = __it.next();
> if (__cur_str.equals("null")) { this.org_id = null; } else {
>   this.org_id = __cur_str;
> }
>
> I was wondering even with the input options specifically provided it still
> treats string "null" as the null string as if I did not provide. Then after
> some code browsing, I saw the below code in org.apache.sqoop.orm.ClassWriter
>
> private void parseNullVal(String javaType, String colName, StringBuilder
> sb) {
>     if (javaType.equals("String")) {
>      sb.append("    if (__cur_str.equals(\""
>          + this.options.getInNullStringValue() + "\")) { this.");
>       sb.append(colName);
>       sb.append(" = null; } else {\n");
>     } else {
>       sb.append("    if (__cur_str.equals(\""
>          + this.options.getInNullNonStringValue());
>       sb.append("\") || __cur_str.length() == 0) { this.");
>       sb.append(colName);
>       sb.append(" = null; } else {\n");
>     }
>   }
>
> This tells me that the loadFromFields will be correct if I set the below
> options
>
> --input-null-string '\\N'
> --input-null-non-string '\\N'
>
> My understanding is these values are to be set only if we are writing to
> the DB and not while reading. I'm not writing to the DB yet I ended up
> setting both set of options which resulted in the below code in
> __loadFromFields method in the new generated code
>
>    __cur_str = __it.next();
>     if (__cur_str.equals("\\N")) { this.org_id = null; } else {
>       this.org_id = __cur_str;
>     }
>
> Is this a bug?
>
> Thanks
>
> Kay
>
>
>