You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Corey Nolet <cj...@gmail.com> on 2014/11/05 23:49:54 UTC

Configuring custom input format

I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD.
Creating the new RDD works fine but setting up the configuration file via
the static methods on input formats that require a Hadoop Job object is
proving to be difficult.

Trying to new up my own Job object with the
SparkContext.hadoopConfiguration is throwing the exception on line 283 of
this grepcode:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job

Looking in the SparkContext code, I'm seeing that it's newing up Job
objects just fine using nothing but the configuraiton. Using
SparkContext.textFile() appears to be working for me. Any ideas? Has anyone
else run into this as well? Is it possible to have a method like
SparkContext.getJob() or something similar?

Thanks.

Re: Configuring custom input format

Posted by Matei Zaharia <ma...@gmail.com>.

Yeah, unfortunately that will be up to them to fix, though it wouldn't hurt to send them a JIRA mentioning this.

Matei

> On Nov 25, 2014, at 2:58 PM, Corey Nolet <cj...@gmail.com> wrote:
> 
> I was wiring up my job in the shell while i was learning Spark/Scala. I'm getting more comfortable with them both now so I've been mostly testing through Intellij with mock data as inputs.
> 
> I think the problem lies more on Hadoop than Spark as the Job object seems to check it's state and throw an exception when the toString() method is called before the Job has physically been submitted.
> 
> On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
> How are you creating the object in your Scala shell? Maybe you can write a function that directly returns the RDD, without assigning the object to a temporary variable.
> 
> Matei
> 
>> On Nov 5, 2014, at 2:54 PM, Corey Nolet <cjnolet@gmail.com <ma...@gmail.com>> wrote:
>> 
>> The closer I look @ the stack trace in the Scala shell, it appears to be the call to toString() that is causing the construction of the Job object to fail. Is there a ways to suppress this output since it appears to be hindering my ability to new up this object?
>> 
>> On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet <cjnolet@gmail.com <ma...@gmail.com>> wrote:
>> I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. 
>> 
>> Trying to new up my own Job object with the SparkContext.hadoopConfiguration is throwing the exception on line 283 of this grepcode:
>> 
>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job>
>> 
>> Looking in the SparkContext code, I'm seeing that it's newing up Job objects just fine using nothing but the configuraiton. Using SparkContext.textFile() appears to be working for me. Any ideas? Has anyone else run into this as well? Is it possible to have a method like SparkContext.getJob() or something similar?
>> 
>> Thanks.
>> 
>> 
> 
>

Re: Configuring custom input format

Posted by Corey Nolet <cj...@gmail.com>.

I was wiring up my job in the shell while i was learning Spark/Scala. I'm
getting more comfortable with them both now so I've been mostly testing
through Intellij with mock data as inputs.

I think the problem lies more on Hadoop than Spark as the Job object seems
to check it's state and throw an exception when the toString() method is
called before the Job has physically been submitted.

On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> How are you creating the object in your Scala shell? Maybe you can write a
> function that directly returns the RDD, without assigning the object to a
> temporary variable.
>
> Matei
>
> On Nov 5, 2014, at 2:54 PM, Corey Nolet <cj...@gmail.com> wrote:
>
> The closer I look @ the stack trace in the Scala shell, it appears to be
> the call to toString() that is causing the construction of the Job object
> to fail. Is there a ways to suppress this output since it appears to be
> hindering my ability to new up this object?
>
> On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet <cj...@gmail.com> wrote:
>
>> I'm trying to use a custom input format with
>> SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting
>> up the configuration file via the static methods on input formats that
>> require a Hadoop Job object is proving to be difficult.
>>
>> Trying to new up my own Job object with the
>> SparkContext.hadoopConfiguration is throwing the exception on line 283 of
>> this grepcode:
>>
>>
>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job
>>
>> Looking in the SparkContext code, I'm seeing that it's newing up Job
>> objects just fine using nothing but the configuraiton. Using
>> SparkContext.textFile() appears to be working for me. Any ideas? Has anyone
>> else run into this as well? Is it possible to have a method like
>> SparkContext.getJob() or something similar?
>>
>> Thanks.
>>
>>
>
>

Re: Configuring custom input format

Posted by Matei Zaharia <ma...@gmail.com>.

How are you creating the object in your Scala shell? Maybe you can write a function that directly returns the RDD, without assigning the object to a temporary variable.

Matei

> On Nov 5, 2014, at 2:54 PM, Corey Nolet <cj...@gmail.com> wrote:
> 
> The closer I look @ the stack trace in the Scala shell, it appears to be the call to toString() that is causing the construction of the Job object to fail. Is there a ways to suppress this output since it appears to be hindering my ability to new up this object?
> 
> On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet <cjnolet@gmail.com <ma...@gmail.com>> wrote:
> I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. 
> 
> Trying to new up my own Job object with the SparkContext.hadoopConfiguration is throwing the exception on line 283 of this grepcode:
> 
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job>
> 
> Looking in the SparkContext code, I'm seeing that it's newing up Job objects just fine using nothing but the configuraiton. Using SparkContext.textFile() appears to be working for me. Any ideas? Has anyone else run into this as well? Is it possible to have a method like SparkContext.getJob() or something similar?
> 
> Thanks.
> 
>

Re: Configuring custom input format

Posted by Corey Nolet <cj...@gmail.com>.

The closer I look @ the stack trace in the Scala shell, it appears to be
the call to toString() that is causing the construction of the Job object
to fail. Is there a ways to suppress this output since it appears to be
hindering my ability to new up this object?

On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet <cj...@gmail.com> wrote:

> I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD.
> Creating the new RDD works fine but setting up the configuration file via
> the static methods on input formats that require a Hadoop Job object is
> proving to be difficult.
>
> Trying to new up my own Job object with the
> SparkContext.hadoopConfiguration is throwing the exception on line 283 of
> this grepcode:
>
>
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job
>
> Looking in the SparkContext code, I'm seeing that it's newing up Job
> objects just fine using nothing but the configuraiton. Using
> SparkContext.textFile() appears to be working for me. Any ideas? Has anyone
> else run into this as well? Is it possible to have a method like
> SparkContext.getJob() or something similar?
>
> Thanks.
>
>

Re: Configuring custom input format

Posted by Harihar Nahak <hn...@wynyardgroup.com>.

Hi, 

I'm trying to make custom input format for CSV file, if you can share little
bit more what you read as input and what things you have implemented. I'll
try to replicate the same things. If I find something interesting at my end
I'll let you know. 

Thanks,
Harihar    



-----
--Harihar
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-custom-input-format-tp18220p19800.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org