You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Nathan Schile <na...@gmail.com> on 2014/05/21 00:21:53 UTC

DataBaseSource configuration issue

I am having trouble using the DataBaseSource class from crunch-contrib. I
am using version 0.8.2+32-cdh4.4.0 of crunch-contrib and 2.0.0-mr1-cdh4.4.0
of hadoop-core. The DataBaseSource class is setting the property
"mapreduce.jdbc.driver.class" on the Hadoop configuration [1] to specify
the JDBC driver to use, while when trying to get a connection to the
database in DBConfiguration#getConnection [2] it is reading property
"mapred.jdbc.driver.class" to retrieve the driver class to use. This
property mismatch is causing the connection to not be established. I would
have expected "mapred.jdbc.driver.class" property to be used within
DataBaseSource since MR1 is being used. I decompiled
crunch-contrib:2.0.0-mr1-cdh4.4.0 jar using [3] and looked at the
DataBaseSource class and it was using "mapreduce.jdbc.driver.class". It
makes me think that crunch-contrib:2.0.0-mr1-cdh4.4.0 was compiled with a
hadoop-core version that was not 2.0.0-mr1-cdh4.4.0. Has anyone ran into
this issue before? Thanks.


[1]
https://github.com/apache/crunch/blob/master/crunch-contrib/src/main/java/org/apache/crunch/contrib/io/jdbc/DataBaseSource.java#L55

[2]
https://repository.cloudera.com/cloudera/public/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.4.0/

[3] http://jd.benow.ca/

Re: DataBaseSource configuration issue

Posted by Micah Whitacre <mk...@gmail.com>.
It's a workaround but you should be able to manually set the correct
configuration using the Source.inputConf(...)[1] method and set the correct
additional property.

[1] -
http://crunch.apache.org/apidocs/0.8.2/org/apache/crunch/Source.html#inputConf(java.lang.String,
java.lang.String)


On Tue, May 20, 2014 at 6:06 PM, Josh Wills <jw...@cloudera.com> wrote:

> crunch-contrib was likely the Hadoop 2.0.0 APIs, not the MR1 APIs. I
> didn't realize there was a difference between the two in the value of that
> property, which is certainly my bad. I rarely (ever?) read anything from
> databases as part of MR jobs, and hadn't run into that one before.
>
>
> On Tue, May 20, 2014 at 3:21 PM, Nathan Schile <na...@gmail.com>wrote:
>
>> I am having trouble using the DataBaseSource class from crunch-contrib. I
>> am using version 0.8.2+32-cdh4.4.0 of crunch-contrib and 2.0.0-mr1-cdh4.4.0
>> of hadoop-core. The DataBaseSource class is setting the property
>> "mapreduce.jdbc.driver.class" on the Hadoop configuration [1] to specify
>> the JDBC driver to use, while when trying to get a connection to the
>> database in DBConfiguration#getConnection [2] it is reading property
>> "mapred.jdbc.driver.class" to retrieve the driver class to use. This
>> property mismatch is causing the connection to not be established. I would
>> have expected "mapred.jdbc.driver.class" property to be used within
>> DataBaseSource since MR1 is being used. I decompiled
>> crunch-contrib:2.0.0-mr1-cdh4.4.0 jar using [3] and looked at the
>> DataBaseSource class and it was using "mapreduce.jdbc.driver.class". It
>> makes me think that crunch-contrib:2.0.0-mr1-cdh4.4.0 was compiled with a
>> hadoop-core version that was not 2.0.0-mr1-cdh4.4.0. Has anyone ran into
>> this issue before? Thanks.
>>
>>
>> [1]
>> https://github.com/apache/crunch/blob/master/crunch-contrib/src/main/java/org/apache/crunch/contrib/io/jdbc/DataBaseSource.java#L55
>>
>> [2]
>> https://repository.cloudera.com/cloudera/public/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.4.0/
>>
>> [3] http://jd.benow.ca/
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Re: DataBaseSource configuration issue

Posted by Josh Wills <jw...@cloudera.com>.
crunch-contrib was likely the Hadoop 2.0.0 APIs, not the MR1 APIs. I didn't
realize there was a difference between the two in the value of that
property, which is certainly my bad. I rarely (ever?) read anything from
databases as part of MR jobs, and hadn't run into that one before.


On Tue, May 20, 2014 at 3:21 PM, Nathan Schile <na...@gmail.com>wrote:

> I am having trouble using the DataBaseSource class from crunch-contrib. I
> am using version 0.8.2+32-cdh4.4.0 of crunch-contrib and 2.0.0-mr1-cdh4.4.0
> of hadoop-core. The DataBaseSource class is setting the property
> "mapreduce.jdbc.driver.class" on the Hadoop configuration [1] to specify
> the JDBC driver to use, while when trying to get a connection to the
> database in DBConfiguration#getConnection [2] it is reading property
> "mapred.jdbc.driver.class" to retrieve the driver class to use. This
> property mismatch is causing the connection to not be established. I would
> have expected "mapred.jdbc.driver.class" property to be used within
> DataBaseSource since MR1 is being used. I decompiled
> crunch-contrib:2.0.0-mr1-cdh4.4.0 jar using [3] and looked at the
> DataBaseSource class and it was using "mapreduce.jdbc.driver.class". It
> makes me think that crunch-contrib:2.0.0-mr1-cdh4.4.0 was compiled with a
> hadoop-core version that was not 2.0.0-mr1-cdh4.4.0. Has anyone ran into
> this issue before? Thanks.
>
>
> [1]
> https://github.com/apache/crunch/blob/master/crunch-contrib/src/main/java/org/apache/crunch/contrib/io/jdbc/DataBaseSource.java#L55
>
> [2]
> https://repository.cloudera.com/cloudera/public/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.4.0/
>
> [3] http://jd.benow.ca/
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>