You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by shakun grover <s2...@gmail.com> on 2014/09/24 08:00:18 UTC

Sqoop2 - Sequence Output File has (null) appended with the original values

Hi All,

Whenever I do a Sqoop Import using the following command:

Name: test

Database configuration

Schema name: test
Table name: emp
Table SQL statement:
Table column names: name,id
Partition column name: id
Nulls in partition column: true
Boundary query:

Output configuration

Storage type:
  0 : HDFS
Choose: 0
Output format:
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 1
Output directory: /tmp/Seq1/1

Throttling resources

Extractors:
Loaders:
Job was successfully updated with status FINE

It gives me the following output file:
'Tom',1 (null)
''Blue',2 (null)
'James',3 (null)
'Tom',4 (null)
'Erik',5 (null)


I want to know that why it is appending (null) in the output sequence file.

Any help will be highly appreciated.

Thanks in advance!!

-- 
Thanks & Regards,
Shakun Grover

Re: Sqoop2 - Sequence Output File has (null) appended with the original values

Posted by Gwen Shapira <gs...@cloudera.com>.
I don't have a good work-around at the moment, but can you open a Jira? I
believe we can and should fix it (by grabbing the PK from the DB and using
it as a key in a sequence file).

On Wed, Sep 24, 2014 at 10:12 PM, shakun grover <s2...@gmail.com> wrote:

> Thanks Jarcec for your reply.
> Yes I am using generic tool (hadoop dfs -text) for viewing the output.
> But could you please tell me that how can I avoid using 'value' field for
> Sequence file as I am using GenericJdbcConnector for importing data from
> RDBMS to HDFS through Sqoop2.
>
>
> On Wed, Sep 24, 2014 at 5:43 PM, Jarek Jarcec Cecho <ja...@apache.org>
> wrote:
>
> > Hi Shakun,
> > SequenceFile always contains key-value pairs - that is how the format is
> > defined. However this doesn’t suite Sqoop as we consider entire row as
> > “key” and hence we’re not using the “value” field - and that is the null
> > that you’re observing. If you use generic tool such as (hadoop dfs -text)
> > you will get generic output that will include the value field and hence
> > will show a null string. Simply don’t use the “value” field in your
> > application and you will be good to go!
> >
> > Jarcec
> >
> > On Sep 23, 2014, at 11:00 PM, shakun grover <s2...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > Whenever I do a Sqoop Import using the following command:
> > >
> > > Name: test
> > >
> > > Database configuration
> > >
> > > Schema name: test
> > > Table name: emp
> > > Table SQL statement:
> > > Table column names: name,id
> > > Partition column name: id
> > > Nulls in partition column: true
> > > Boundary query:
> > >
> > > Output configuration
> > >
> > > Storage type:
> > >  0 : HDFS
> > > Choose: 0
> > > Output format:
> > >  0 : TEXT_FILE
> > >  1 : SEQUENCE_FILE
> > > Choose: 1
> > > Output directory: /tmp/Seq1/1
> > >
> > > Throttling resources
> > >
> > > Extractors:
> > > Loaders:
> > > Job was successfully updated with status FINE
> > >
> > > It gives me the following output file:
> > > 'Tom',1 (null)
> > > ''Blue',2 (null)
> > > 'James',3 (null)
> > > 'Tom',4 (null)
> > > 'Erik',5 (null)
> > >
> > >
> > > I want to know that why it is appending (null) in the output sequence
> > file.
> > >
> > > Any help will be highly appreciated.
> > >
> > > Thanks in advance!!
> > >
> > > --
> > > Thanks & Regards,
> > > Shakun Grover
> >
> >
>
>
> --
> Thanks & Regards,
> Shakun Grover
>

Re: Sqoop2 - Sequence Output File has (null) appended with the original values

Posted by shakun grover <s2...@gmail.com>.
Thanks Jarcec for your reply.
Yes I am using generic tool (hadoop dfs -text) for viewing the output.
But could you please tell me that how can I avoid using 'value' field for
Sequence file as I am using GenericJdbcConnector for importing data from
RDBMS to HDFS through Sqoop2.


On Wed, Sep 24, 2014 at 5:43 PM, Jarek Jarcec Cecho <ja...@apache.org>
wrote:

> Hi Shakun,
> SequenceFile always contains key-value pairs - that is how the format is
> defined. However this doesn’t suite Sqoop as we consider entire row as
> “key” and hence we’re not using the “value” field - and that is the null
> that you’re observing. If you use generic tool such as (hadoop dfs -text)
> you will get generic output that will include the value field and hence
> will show a null string. Simply don’t use the “value” field in your
> application and you will be good to go!
>
> Jarcec
>
> On Sep 23, 2014, at 11:00 PM, shakun grover <s2...@gmail.com> wrote:
>
> > Hi All,
> >
> > Whenever I do a Sqoop Import using the following command:
> >
> > Name: test
> >
> > Database configuration
> >
> > Schema name: test
> > Table name: emp
> > Table SQL statement:
> > Table column names: name,id
> > Partition column name: id
> > Nulls in partition column: true
> > Boundary query:
> >
> > Output configuration
> >
> > Storage type:
> >  0 : HDFS
> > Choose: 0
> > Output format:
> >  0 : TEXT_FILE
> >  1 : SEQUENCE_FILE
> > Choose: 1
> > Output directory: /tmp/Seq1/1
> >
> > Throttling resources
> >
> > Extractors:
> > Loaders:
> > Job was successfully updated with status FINE
> >
> > It gives me the following output file:
> > 'Tom',1 (null)
> > ''Blue',2 (null)
> > 'James',3 (null)
> > 'Tom',4 (null)
> > 'Erik',5 (null)
> >
> >
> > I want to know that why it is appending (null) in the output sequence
> file.
> >
> > Any help will be highly appreciated.
> >
> > Thanks in advance!!
> >
> > --
> > Thanks & Regards,
> > Shakun Grover
>
>


-- 
Thanks & Regards,
Shakun Grover

Re: Sqoop2 - Sequence Output File has (null) appended with the original values

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Shakun,
SequenceFile always contains key-value pairs - that is how the format is defined. However this doesn’t suite Sqoop as we consider entire row as “key” and hence we’re not using the “value” field - and that is the null that you’re observing. If you use generic tool such as (hadoop dfs -text) you will get generic output that will include the value field and hence will show a null string. Simply don’t use the “value” field in your application and you will be good to go!

Jarcec

On Sep 23, 2014, at 11:00 PM, shakun grover <s2...@gmail.com> wrote:

> Hi All,
> 
> Whenever I do a Sqoop Import using the following command:
> 
> Name: test
> 
> Database configuration
> 
> Schema name: test
> Table name: emp
> Table SQL statement:
> Table column names: name,id
> Partition column name: id
> Nulls in partition column: true
> Boundary query:
> 
> Output configuration
> 
> Storage type:
>  0 : HDFS
> Choose: 0
> Output format:
>  0 : TEXT_FILE
>  1 : SEQUENCE_FILE
> Choose: 1
> Output directory: /tmp/Seq1/1
> 
> Throttling resources
> 
> Extractors:
> Loaders:
> Job was successfully updated with status FINE
> 
> It gives me the following output file:
> 'Tom',1 (null)
> ''Blue',2 (null)
> 'James',3 (null)
> 'Tom',4 (null)
> 'Erik',5 (null)
> 
> 
> I want to know that why it is appending (null) in the output sequence file.
> 
> Any help will be highly appreciated.
> 
> Thanks in advance!!
> 
> -- 
> Thanks & Regards,
> Shakun Grover