You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Amir Mohammad Saied <am...@gmail.com> on 2013/12/16 14:42:55 UTC

Specifying row key in sqoop-import

Hi,

I'm using Sqoop to import (only one column of) a table from MySQL to HDFS.
I'd like records to be stored as SequenceFiles so I can run Mahout's
"seq2sparse" to generate Vectors from them later.

I've two questions regarding the import process:

1) Dumping SequenceFiles generated by sqoop-import, I realized the row
"Key" is automatically generated by Sqoop, and is not the "id" column of
the MySQL table row. Can I ask sqoop-import to use the row's "id" field as
Key?

2) If its possible to set row "Key" (above question), can I cast it to a
specific class using sqoop-import?

Thanks,

amir

Re: Specifying row key in sqoop-import

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Amir,
Sqoop will generate special class when importing table (even with only one column) and will use this class as a key for the SequenceFile. I'm not familiar with mahout, so I'm not sure if this format can be consumed by it.

Jarcec

On Mon, Dec 16, 2013 at 01:42:55PM +0000, Amir Mohammad Saied wrote:
> Hi,
> 
> I'm using Sqoop to import (only one column of) a table from MySQL to HDFS.
> I'd like records to be stored as SequenceFiles so I can run Mahout's
> "seq2sparse" to generate Vectors from them later.
> 
> I've two questions regarding the import process:
> 
> 1) Dumping SequenceFiles generated by sqoop-import, I realized the row
> "Key" is automatically generated by Sqoop, and is not the "id" column of
> the MySQL table row. Can I ask sqoop-import to use the row's "id" field as
> Key?
> 
> 2) If its possible to set row "Key" (above question), can I cast it to a
> specific class using sqoop-import?
> 
> Thanks,
> 
> amir