You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Marc Sturm <ma...@nyp.org> on 2012/04/11 18:28:42 UTC

loading as sequencefile and running an hadoop mapreduce job

Hi,

I am new to hadoop and sqoop. So far I was able to run a single node hadoop cluster on my mac and I am trying to load data from sql server using sqoop 1.3 and Microsoft's sqoop connector.
The data is stored as varbinary column (though it is text blob) and I am loading it into hadoop with sqoop using the --as-sequencefile option. I believe the result in the hdfs file is a serialization of the java object of a class generated automatically by sqoop, the class name is the table name and extends SqoopRecord: let's call it table_name.java . This was done successfully.

Now, I am trying to run a MapReduce job against this file but it is failing, I added the class table_name.java in my jar. But when I run the mapreduce job, I get "ClassNotFoundException: com.cloudera.sqoop.lib.SqoopRecord".
Even with the option -libjars sqoop-1.3.0.jar.

I hope all this makes sense to you. If you can help me understand what the problem is or point me to the right documentation that would be great.

Thanks,
Marc


________________________________
This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.


--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




RE: loading as sequencefile and running an hadoop mapreduce job

Posted by Marc Sturm <ma...@nyp.org>.
Thanks for your help Cheolsoo, I added the sqoop jar to hadoop's lib dir and now my mr job runs fine.
Marc

From: Cheolsoo Park [mailto:cheolsoo@cloudera.com]
Sent: Wednesday, April 11, 2012 4:06 PM
To: user@sqoop.apache.org
Subject: Re: loading as sequencefile and running an hadoop mapreduce job

Hi Mark,

I was totally wrong about sequence files in my previous email. In fact, I realized that SqoopRecord is needed by MR jobs to deserialize sequence files. Again, I am sorry for the confusion.

Thanks,
Cheolsoo
On Wed, Apr 11, 2012 at 11:52 AM, Cheolsoo Park <ch...@cloudera.com>> wrote:
Hi Mark,

It would be helpful if you could provide complete log with the --verbose option on.

I believe the result in the hdfs file is a serialization of the java object of a class generated automatically by sqoop, the class name is the table name and extends SqoopRecord: let's call it table_name.java .

A serialization of 'table_name' is not the result.  The auto-generated Java class is only for Sqoop to interface with the DB. The result is sequence files that contain data.

Now, I am trying to run a MapReduce job against this file but it is failing, I added the class table_name.java in my jar. But when I run the mapreduce job, I get "ClassNotFoundException: com.cloudera.sqoop.lib.SqoopRecord". Even with the option -libjars sqoop-1.3.0.jar.

I am not clear what MR jobs you're running here.

1) If you're importing data, I am wondering why you have to do this manually since it should be automatically done by Sqoop: compile table_name into a jar, load the jar into hdfs, pass the path to the jar to import mapper jobs, etc

2) If you're running your own MR jobs on imported data, they don't need to know about 'tabe_name' or 'SqoopRecord' since data are already in sequence file format, so your MR jobs should be able to understand them.

Hope this is helpful.

Thanks,
Cheolsoo

On Wed, Apr 11, 2012 at 9:28 AM, Marc Sturm <ma...@nyp.org>> wrote:
Hi,

I am new to hadoop and sqoop. So far I was able to run a single node hadoop cluster on my mac and I am trying to load data from sql server using sqoop 1.3 and Microsoft's sqoop connector.
The data is stored as varbinary column (though it is text blob) and I am loading it into hadoop with sqoop using the --as-sequencefile option. I believe the result in the hdfs file is a serialization of the java object of a class generated automatically by sqoop, the class name is the table name and extends SqoopRecord: let's call it table_name.java . This was done successfully.

Now, I am trying to run a MapReduce job against this file but it is failing, I added the class table_name.java in my jar. But when I run the mapreduce job, I get "ClassNotFoundException: com.cloudera.sqoop.lib.SqoopRecord".
Even with the option -libjars sqoop-1.3.0.jar.

I hope all this makes sense to you. If you can help me understand what the problem is or point me to the right documentation that would be great.

Thanks,
Marc


________________________________
This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.

--------------------



This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.



--------------------



This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.








--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




Re: loading as sequencefile and running an hadoop mapreduce job

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Mark,

I was totally wrong about sequence files in my previous email. In fact, I
realized that SqoopRecord is needed by MR jobs to deserialize sequence
files. Again, I am sorry for the confusion.

Thanks,
Cheolsoo

On Wed, Apr 11, 2012 at 11:52 AM, Cheolsoo Park <ch...@cloudera.com>wrote:

> Hi Mark,
>
> It would be helpful if you could provide complete log with the --verbose
> option on.
>
> I believe the result in the hdfs file is a serialization of the java
>> object of a class generated automatically by sqoop, the class name is the
>> table name and extends SqoopRecord: let’s call it table_name.java .
>
>
> A serialization of 'table_name' is not the result.  The auto-generated
> Java class is only for Sqoop to interface with the DB. The result is
> sequence files that contain data.
>
> Now, I am trying to run a MapReduce job against this file but it is
>> failing, I added the class table_name.java in my jar. But when I run the
>> mapreduce job, I get “ClassNotFoundException:
>> com.cloudera.sqoop.lib.SqoopRecord”. Even with the option –libjars
>> sqoop-1.3.0.jar.
>
> **
>
> I am not clear what MR jobs you're running here.
>
> 1) If you're importing data, I am wondering why you have to do this
> manually since it should be automatically done by Sqoop: compile table_name
> into a jar, load the jar into hdfs, pass the path to the jar to import
> mapper jobs, etc
>
> 2) If you're running your own MR jobs on imported data, they don't need to
> know about 'tabe_name' or 'SqoopRecord' since data are already in sequence
> file format, so your MR jobs should be able to understand them.
>
> Hope this is helpful.
>
> Thanks,
> Cheolsoo
>
> On Wed, Apr 11, 2012 at 9:28 AM, Marc Sturm <ma...@nyp.org> wrote:
>
>>  Hi,****
>>
>> ** **
>>
>> I am new to hadoop and sqoop. So far I was able to run a single node
>> hadoop cluster on my mac and I am trying to load data from sql server using
>> sqoop 1.3 and Microsoft’s sqoop connector.****
>>
>> The data is stored as varbinary column (though it is text blob) and I am
>> loading it into hadoop with sqoop using the --as-sequencefile option. I
>> believe the result in the hdfs file is a serialization of the java object
>> of a class generated automatically by sqoop, the class name is the table
>> name and extends SqoopRecord: let’s call it table_name.java . This was done
>> successfully.****
>>
>> ** **
>>
>> Now, I am trying to run a MapReduce job against this file but it is
>> failing, I added the class table_name.java in my jar. But when I run the
>> mapreduce job, I get “ClassNotFoundException:
>> com.cloudera.sqoop.lib.SqoopRecord”.****
>>
>> Even with the option –libjars sqoop-1.3.0.jar.****
>>
>> ** **
>>
>> I hope all this makes sense to you. If you can help me understand what
>> the problem is or point me to the right documentation that would be great.
>> ****
>>
>> ** **
>>
>> Thanks,****
>>
>> Marc****
>>
>> ** **
>>
>> ------------------------------
>> This electronic message is intended to be for the use only of the named
>> recipient, and may contain information that is confidential or privileged.
>> If you are not the intended recipient, you are hereby notified that any
>> disclosure, copying, distribution or use of the contents of this message is
>> strictly prohibited. If you have received this message in error or are not
>> the named recipient, please notify us immediately by contacting the sender
>> at the electronic mail address noted above, and delete and destroy all
>> copies of this message. Thank you.
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>
>>
>>
>>
>

Re: loading as sequencefile and running an hadoop mapreduce job

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Mark,

It would be helpful if you could provide complete log with the --verbose
option on.

I believe the result in the hdfs file is a serialization of the java object
> of a class generated automatically by sqoop, the class name is the table
> name and extends SqoopRecord: let’s call it table_name.java .


A serialization of 'table_name' is not the result.  The auto-generated Java
class is only for Sqoop to interface with the DB. The result is sequence
files that contain data.

Now, I am trying to run a MapReduce job against this file but it is
> failing, I added the class table_name.java in my jar. But when I run the
> mapreduce job, I get “ClassNotFoundException:
> com.cloudera.sqoop.lib.SqoopRecord”. Even with the option –libjars
> sqoop-1.3.0.jar.

**

I am not clear what MR jobs you're running here.

1) If you're importing data, I am wondering why you have to do this
manually since it should be automatically done by Sqoop: compile table_name
into a jar, load the jar into hdfs, pass the path to the jar to import
mapper jobs, etc

2) If you're running your own MR jobs on imported data, they don't need to
know about 'tabe_name' or 'SqoopRecord' since data are already in sequence
file format, so your MR jobs should be able to understand them.

Hope this is helpful.

Thanks,
Cheolsoo

On Wed, Apr 11, 2012 at 9:28 AM, Marc Sturm <ma...@nyp.org> wrote:

>  Hi,****
>
> ** **
>
> I am new to hadoop and sqoop. So far I was able to run a single node
> hadoop cluster on my mac and I am trying to load data from sql server using
> sqoop 1.3 and Microsoft’s sqoop connector.****
>
> The data is stored as varbinary column (though it is text blob) and I am
> loading it into hadoop with sqoop using the --as-sequencefile option. I
> believe the result in the hdfs file is a serialization of the java object
> of a class generated automatically by sqoop, the class name is the table
> name and extends SqoopRecord: let’s call it table_name.java . This was done
> successfully.****
>
> ** **
>
> Now, I am trying to run a MapReduce job against this file but it is
> failing, I added the class table_name.java in my jar. But when I run the
> mapreduce job, I get “ClassNotFoundException:
> com.cloudera.sqoop.lib.SqoopRecord”.****
>
> Even with the option –libjars sqoop-1.3.0.jar.****
>
> ** **
>
> I hope all this makes sense to you. If you can help me understand what the
> problem is or point me to the right documentation that would be great.****
>
> ** **
>
> Thanks,****
>
> Marc****
>
> ** **
>
> ------------------------------
> This electronic message is intended to be for the use only of the named
> recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or use of the contents of this message is
> strictly prohibited. If you have received this message in error or are not
> the named recipient, please notify us immediately by contacting the sender
> at the electronic mail address noted above, and delete and destroy all
> copies of this message. Thank you.
>
> --------------------
>
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>
>
> --------------------
>
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>
>
>
>