You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shambhavi Punja <sp...@usc.edu> on 2015/04/30 19:01:08 UTC

Json Parsing in map reduce.

Hi,

I am working on an assignment on Hadoop Map reduce. I am very new to Map Reduce.

The assignment has many sections but for now I am trying to parse JSON data.

The input(i.e. value) to the map function is a single record of the form    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
I am interested only in the getting the frequency of value1.

Following is the map- reduce job.

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
        	      private final static IntWritable one = new IntWritable(1);
        	      private Text word = new Text();
        
        	      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
                      String line = value.toString();
                      String[] tuple = line.split("(?<=\\}),\\s");
                      try{
                      JSONObject obj = new JSONObject(tuple[1]);
                      String id = obj.getString(“key");
                          word.set(id);
                          output.collect(word, one);
                      }
                      catch(JSONException e){
                          e.printStackTrace();
                      }
                  }
            }
        
    
    	    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
        	      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
            	        int sum = 0;
            	        while (values.hasNext()) {
                	          sum += values.next().get();
                	        }
            	        output.collect(key, new IntWritable(sum));
            	      }
        	    }

I successfully compiled the java code using the json and hadoop jars. Created a jar. But wen I run the Hadoop command I am getting the following exceptions.


15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/30 00:36:49 INFO mapred.JobClient: Running job: job_local1121514690_0001
15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task: attempt_local1121514690_0001_m_000000_0
15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
15/04/30 00:36:49 INFO mapred.MapTask: Processing split: file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 10 more
Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
	at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
	... 15 more
Caused by: java.lang.ClassNotFoundException: org.json.JSONException
	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 22 more
15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
15/04/30 00:36:50 INFO mapred.JobClient: Job complete: job_local1121514690_0001
15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
	at org.myorg.Wordcount.main(Wordcount.java:64)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)


PS: When I modify the same code and exclude the JSON parsing i.e. find frequency of {‘key’:’value1’} section of the example input, all works well.


Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
I see you mentioned it is a record (in a line) so would be fine n it has
other text data too in that record.
On May 3, 2015 1:45 AM, "Sandeep Khurana" <sk...@gmail.com> wrote:

> This code won't work if the json spans more than one line in the input
> files.
> On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:
>
>> Hi Shahab,
>>
>> Thanks. That helped.
>>
>> Regards,
>> Shambhavi
>>
>> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> The reason is that the Json parsing code is in a 3rd party library which
>>> is not included in the default  map reduce/hadoop distribution. You have to
>>> add them in your classpath at *runtime*. There are multiple ways to do
>>> it (which also depends upon how you plan to run and package/deploy your
>>> code.)
>>>
>>> Check out this:
>>>
>>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>>
>>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>>
>>> Regards,
>>> Shahab
>>>
>>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working on an assignment on Hadoop Map reduce. I am very new to
>>>> Map Reduce.
>>>>
>>>> The assignment has many sections but for now I am trying to parse JSON
>>>> data.
>>>>
>>>> The input(i.e. value) to the map function is a single record of the
>>>> form    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>>> I am interested only in the getting the frequency of value1.
>>>>
>>>> Following is the map- reduce job.
>>>>
>>>> public static class Map extends MapReduceBase implements
>>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>>               private final static IntWritable one = new IntWritable(1
>>>> );
>>>>               private Text word = new Text();
>>>>
>>>>
>>>>               public void map(LongWritable key, Text value,
>>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>>> IOException {
>>>>                       String line = value.toString();
>>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>>                       try{
>>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>>                       String id = obj.getString(“key");
>>>>                           word.set(id);
>>>>                           output.collect(word, one);
>>>>                       }
>>>>                       catch(JSONException e){
>>>>                           e.printStackTrace();
>>>>                       }
>>>>                   }
>>>>             }
>>>>
>>>>
>>>>
>>>>
>>>>         public static class Reduce extends MapReduceBase implements
>>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>>               public void reduce(Text key, Iterator<IntWritable>
>>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>>> throws IOException {
>>>>                     int sum = 0;
>>>>                     while (values.hasNext()) {
>>>>                           sum += values.next().get();
>>>>                         }
>>>>                     output.collect(key, new IntWritable(sum));
>>>>                   }
>>>>             }
>>>>
>>>> I successfully compiled the java code using the json and hadoop jars.
>>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>>> exceptions.
>>>>
>>>>
>>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes where
>>>> applicable
>>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>>> loaded
>>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>>> attempt_local1121514690_0001_m_000000_0
>>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>>> null
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor
>>>> complete.
>>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>>> object
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> ... 10 more
>>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>>> at java.lang.Class.forName0(Native Method)
>>>> at java.lang.Class.forName(Class.java:344)
>>>> at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>> ... 15 more
>>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 22 more
>>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>>> Exception in thread "main" java.io.IOException: Job failed!
>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>>
>>>>
>>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>>
>>>>
>>>
>>

Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
I see you mentioned it is a record (in a line) so would be fine n it has
other text data too in that record.
On May 3, 2015 1:45 AM, "Sandeep Khurana" <sk...@gmail.com> wrote:

> This code won't work if the json spans more than one line in the input
> files.
> On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:
>
>> Hi Shahab,
>>
>> Thanks. That helped.
>>
>> Regards,
>> Shambhavi
>>
>> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> The reason is that the Json parsing code is in a 3rd party library which
>>> is not included in the default  map reduce/hadoop distribution. You have to
>>> add them in your classpath at *runtime*. There are multiple ways to do
>>> it (which also depends upon how you plan to run and package/deploy your
>>> code.)
>>>
>>> Check out this:
>>>
>>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>>
>>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>>
>>> Regards,
>>> Shahab
>>>
>>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working on an assignment on Hadoop Map reduce. I am very new to
>>>> Map Reduce.
>>>>
>>>> The assignment has many sections but for now I am trying to parse JSON
>>>> data.
>>>>
>>>> The input(i.e. value) to the map function is a single record of the
>>>> form    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>>> I am interested only in the getting the frequency of value1.
>>>>
>>>> Following is the map- reduce job.
>>>>
>>>> public static class Map extends MapReduceBase implements
>>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>>               private final static IntWritable one = new IntWritable(1
>>>> );
>>>>               private Text word = new Text();
>>>>
>>>>
>>>>               public void map(LongWritable key, Text value,
>>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>>> IOException {
>>>>                       String line = value.toString();
>>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>>                       try{
>>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>>                       String id = obj.getString(“key");
>>>>                           word.set(id);
>>>>                           output.collect(word, one);
>>>>                       }
>>>>                       catch(JSONException e){
>>>>                           e.printStackTrace();
>>>>                       }
>>>>                   }
>>>>             }
>>>>
>>>>
>>>>
>>>>
>>>>         public static class Reduce extends MapReduceBase implements
>>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>>               public void reduce(Text key, Iterator<IntWritable>
>>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>>> throws IOException {
>>>>                     int sum = 0;
>>>>                     while (values.hasNext()) {
>>>>                           sum += values.next().get();
>>>>                         }
>>>>                     output.collect(key, new IntWritable(sum));
>>>>                   }
>>>>             }
>>>>
>>>> I successfully compiled the java code using the json and hadoop jars.
>>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>>> exceptions.
>>>>
>>>>
>>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes where
>>>> applicable
>>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>>> loaded
>>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>>> attempt_local1121514690_0001_m_000000_0
>>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>>> null
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor
>>>> complete.
>>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>>> object
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> ... 10 more
>>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>>> at java.lang.Class.forName0(Native Method)
>>>> at java.lang.Class.forName(Class.java:344)
>>>> at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>> ... 15 more
>>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 22 more
>>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>>> Exception in thread "main" java.io.IOException: Job failed!
>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>>
>>>>
>>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>>
>>>>
>>>
>>

Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
I see you mentioned it is a record (in a line) so would be fine n it has
other text data too in that record.
On May 3, 2015 1:45 AM, "Sandeep Khurana" <sk...@gmail.com> wrote:

> This code won't work if the json spans more than one line in the input
> files.
> On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:
>
>> Hi Shahab,
>>
>> Thanks. That helped.
>>
>> Regards,
>> Shambhavi
>>
>> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> The reason is that the Json parsing code is in a 3rd party library which
>>> is not included in the default  map reduce/hadoop distribution. You have to
>>> add them in your classpath at *runtime*. There are multiple ways to do
>>> it (which also depends upon how you plan to run and package/deploy your
>>> code.)
>>>
>>> Check out this:
>>>
>>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>>
>>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>>
>>> Regards,
>>> Shahab
>>>
>>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working on an assignment on Hadoop Map reduce. I am very new to
>>>> Map Reduce.
>>>>
>>>> The assignment has many sections but for now I am trying to parse JSON
>>>> data.
>>>>
>>>> The input(i.e. value) to the map function is a single record of the
>>>> form    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>>> I am interested only in the getting the frequency of value1.
>>>>
>>>> Following is the map- reduce job.
>>>>
>>>> public static class Map extends MapReduceBase implements
>>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>>               private final static IntWritable one = new IntWritable(1
>>>> );
>>>>               private Text word = new Text();
>>>>
>>>>
>>>>               public void map(LongWritable key, Text value,
>>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>>> IOException {
>>>>                       String line = value.toString();
>>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>>                       try{
>>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>>                       String id = obj.getString(“key");
>>>>                           word.set(id);
>>>>                           output.collect(word, one);
>>>>                       }
>>>>                       catch(JSONException e){
>>>>                           e.printStackTrace();
>>>>                       }
>>>>                   }
>>>>             }
>>>>
>>>>
>>>>
>>>>
>>>>         public static class Reduce extends MapReduceBase implements
>>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>>               public void reduce(Text key, Iterator<IntWritable>
>>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>>> throws IOException {
>>>>                     int sum = 0;
>>>>                     while (values.hasNext()) {
>>>>                           sum += values.next().get();
>>>>                         }
>>>>                     output.collect(key, new IntWritable(sum));
>>>>                   }
>>>>             }
>>>>
>>>> I successfully compiled the java code using the json and hadoop jars.
>>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>>> exceptions.
>>>>
>>>>
>>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes where
>>>> applicable
>>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>>> loaded
>>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>>> attempt_local1121514690_0001_m_000000_0
>>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>>> null
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor
>>>> complete.
>>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>>> object
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> ... 10 more
>>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>>> at java.lang.Class.forName0(Native Method)
>>>> at java.lang.Class.forName(Class.java:344)
>>>> at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>> ... 15 more
>>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 22 more
>>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>>> Exception in thread "main" java.io.IOException: Job failed!
>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>>
>>>>
>>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>>
>>>>
>>>
>>

Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
I see you mentioned it is a record (in a line) so would be fine n it has
other text data too in that record.
On May 3, 2015 1:45 AM, "Sandeep Khurana" <sk...@gmail.com> wrote:

> This code won't work if the json spans more than one line in the input
> files.
> On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:
>
>> Hi Shahab,
>>
>> Thanks. That helped.
>>
>> Regards,
>> Shambhavi
>>
>> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> The reason is that the Json parsing code is in a 3rd party library which
>>> is not included in the default  map reduce/hadoop distribution. You have to
>>> add them in your classpath at *runtime*. There are multiple ways to do
>>> it (which also depends upon how you plan to run and package/deploy your
>>> code.)
>>>
>>> Check out this:
>>>
>>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>>
>>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>>
>>> Regards,
>>> Shahab
>>>
>>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working on an assignment on Hadoop Map reduce. I am very new to
>>>> Map Reduce.
>>>>
>>>> The assignment has many sections but for now I am trying to parse JSON
>>>> data.
>>>>
>>>> The input(i.e. value) to the map function is a single record of the
>>>> form    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>>> I am interested only in the getting the frequency of value1.
>>>>
>>>> Following is the map- reduce job.
>>>>
>>>> public static class Map extends MapReduceBase implements
>>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>>               private final static IntWritable one = new IntWritable(1
>>>> );
>>>>               private Text word = new Text();
>>>>
>>>>
>>>>               public void map(LongWritable key, Text value,
>>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>>> IOException {
>>>>                       String line = value.toString();
>>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>>                       try{
>>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>>                       String id = obj.getString(“key");
>>>>                           word.set(id);
>>>>                           output.collect(word, one);
>>>>                       }
>>>>                       catch(JSONException e){
>>>>                           e.printStackTrace();
>>>>                       }
>>>>                   }
>>>>             }
>>>>
>>>>
>>>>
>>>>
>>>>         public static class Reduce extends MapReduceBase implements
>>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>>               public void reduce(Text key, Iterator<IntWritable>
>>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>>> throws IOException {
>>>>                     int sum = 0;
>>>>                     while (values.hasNext()) {
>>>>                           sum += values.next().get();
>>>>                         }
>>>>                     output.collect(key, new IntWritable(sum));
>>>>                   }
>>>>             }
>>>>
>>>> I successfully compiled the java code using the json and hadoop jars.
>>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>>> exceptions.
>>>>
>>>>
>>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes where
>>>> applicable
>>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>>> loaded
>>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>>> attempt_local1121514690_0001_m_000000_0
>>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>>> null
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor
>>>> complete.
>>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>>> object
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> ... 10 more
>>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>>> at java.lang.Class.forName0(Native Method)
>>>> at java.lang.Class.forName(Class.java:344)
>>>> at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>> ... 15 more
>>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 22 more
>>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>>> job_local1121514690_0001
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>>> Exception in thread "main" java.io.IOException: Job failed!
>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>>
>>>>
>>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>>
>>>>
>>>
>>

Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
This code won't work if the json spans more than one line in the input
files.
On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:

> Hi Shahab,
>
> Thanks. That helped.
>
> Regards,
> Shambhavi
>
> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> The reason is that the Json parsing code is in a 3rd party library which
>> is not included in the default  map reduce/hadoop distribution. You have to
>> add them in your classpath at *runtime*. There are multiple ways to do
>> it (which also depends upon how you plan to run and package/deploy your
>> code.)
>>
>> Check out this:
>>
>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>
>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>
>> Regards,
>> Shahab
>>
>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>
>>> Hi,
>>>
>>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>>> Reduce.
>>>
>>> The assignment has many sections but for now I am trying to parse JSON
>>> data.
>>>
>>> The input(i.e. value) to the map function is a single record of the form
>>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>> I am interested only in the getting the frequency of value1.
>>>
>>> Following is the map- reduce job.
>>>
>>> public static class Map extends MapReduceBase implements
>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>               private final static IntWritable one = new IntWritable(1);
>>>               private Text word = new Text();
>>>
>>>
>>>               public void map(LongWritable key, Text value,
>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>> IOException {
>>>                       String line = value.toString();
>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>                       try{
>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>                       String id = obj.getString(“key");
>>>                           word.set(id);
>>>                           output.collect(word, one);
>>>                       }
>>>                       catch(JSONException e){
>>>                           e.printStackTrace();
>>>                       }
>>>                   }
>>>             }
>>>
>>>
>>>
>>>
>>>         public static class Reduce extends MapReduceBase implements
>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>               public void reduce(Text key, Iterator<IntWritable>
>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>> throws IOException {
>>>                     int sum = 0;
>>>                     while (values.hasNext()) {
>>>                           sum += values.next().get();
>>>                         }
>>>                     output.collect(key, new IntWritable(sum));
>>>                   }
>>>             }
>>>
>>> I successfully compiled the java code using the json and hadoop jars.
>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>> exceptions.
>>>
>>>
>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>> attempt_local1121514690_0001_m_000000_0
>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>> null
>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>> object
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> ... 10 more
>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:344)
>>> at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>> ... 15 more
>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> ... 22 more
>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>> Exception in thread "main" java.io.IOException: Job failed!
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>
>>>
>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>
>>>
>>
>

Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
This code won't work if the json spans more than one line in the input
files.
On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:

> Hi Shahab,
>
> Thanks. That helped.
>
> Regards,
> Shambhavi
>
> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> The reason is that the Json parsing code is in a 3rd party library which
>> is not included in the default  map reduce/hadoop distribution. You have to
>> add them in your classpath at *runtime*. There are multiple ways to do
>> it (which also depends upon how you plan to run and package/deploy your
>> code.)
>>
>> Check out this:
>>
>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>
>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>
>> Regards,
>> Shahab
>>
>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>
>>> Hi,
>>>
>>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>>> Reduce.
>>>
>>> The assignment has many sections but for now I am trying to parse JSON
>>> data.
>>>
>>> The input(i.e. value) to the map function is a single record of the form
>>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>> I am interested only in the getting the frequency of value1.
>>>
>>> Following is the map- reduce job.
>>>
>>> public static class Map extends MapReduceBase implements
>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>               private final static IntWritable one = new IntWritable(1);
>>>               private Text word = new Text();
>>>
>>>
>>>               public void map(LongWritable key, Text value,
>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>> IOException {
>>>                       String line = value.toString();
>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>                       try{
>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>                       String id = obj.getString(“key");
>>>                           word.set(id);
>>>                           output.collect(word, one);
>>>                       }
>>>                       catch(JSONException e){
>>>                           e.printStackTrace();
>>>                       }
>>>                   }
>>>             }
>>>
>>>
>>>
>>>
>>>         public static class Reduce extends MapReduceBase implements
>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>               public void reduce(Text key, Iterator<IntWritable>
>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>> throws IOException {
>>>                     int sum = 0;
>>>                     while (values.hasNext()) {
>>>                           sum += values.next().get();
>>>                         }
>>>                     output.collect(key, new IntWritable(sum));
>>>                   }
>>>             }
>>>
>>> I successfully compiled the java code using the json and hadoop jars.
>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>> exceptions.
>>>
>>>
>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>> attempt_local1121514690_0001_m_000000_0
>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>> null
>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>> object
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> ... 10 more
>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:344)
>>> at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>> ... 15 more
>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> ... 22 more
>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>> Exception in thread "main" java.io.IOException: Job failed!
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>
>>>
>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>
>>>
>>
>

Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
This code won't work if the json spans more than one line in the input
files.
On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:

> Hi Shahab,
>
> Thanks. That helped.
>
> Regards,
> Shambhavi
>
> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> The reason is that the Json parsing code is in a 3rd party library which
>> is not included in the default  map reduce/hadoop distribution. You have to
>> add them in your classpath at *runtime*. There are multiple ways to do
>> it (which also depends upon how you plan to run and package/deploy your
>> code.)
>>
>> Check out this:
>>
>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>
>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>
>> Regards,
>> Shahab
>>
>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>
>>> Hi,
>>>
>>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>>> Reduce.
>>>
>>> The assignment has many sections but for now I am trying to parse JSON
>>> data.
>>>
>>> The input(i.e. value) to the map function is a single record of the form
>>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>> I am interested only in the getting the frequency of value1.
>>>
>>> Following is the map- reduce job.
>>>
>>> public static class Map extends MapReduceBase implements
>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>               private final static IntWritable one = new IntWritable(1);
>>>               private Text word = new Text();
>>>
>>>
>>>               public void map(LongWritable key, Text value,
>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>> IOException {
>>>                       String line = value.toString();
>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>                       try{
>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>                       String id = obj.getString(“key");
>>>                           word.set(id);
>>>                           output.collect(word, one);
>>>                       }
>>>                       catch(JSONException e){
>>>                           e.printStackTrace();
>>>                       }
>>>                   }
>>>             }
>>>
>>>
>>>
>>>
>>>         public static class Reduce extends MapReduceBase implements
>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>               public void reduce(Text key, Iterator<IntWritable>
>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>> throws IOException {
>>>                     int sum = 0;
>>>                     while (values.hasNext()) {
>>>                           sum += values.next().get();
>>>                         }
>>>                     output.collect(key, new IntWritable(sum));
>>>                   }
>>>             }
>>>
>>> I successfully compiled the java code using the json and hadoop jars.
>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>> exceptions.
>>>
>>>
>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>> attempt_local1121514690_0001_m_000000_0
>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>> null
>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>> object
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> ... 10 more
>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:344)
>>> at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>> ... 15 more
>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> ... 22 more
>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>> Exception in thread "main" java.io.IOException: Job failed!
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>
>>>
>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>
>>>
>>
>

Re: Json Parsing in map reduce.

Posted by Sandeep Khurana <sk...@gmail.com>.
This code won't work if the json spans more than one line in the input
files.
On May 3, 2015 1:41 AM, "Shambhavi Punja" <sp...@usc.edu> wrote:

> Hi Shahab,
>
> Thanks. That helped.
>
> Regards,
> Shambhavi
>
> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> The reason is that the Json parsing code is in a 3rd party library which
>> is not included in the default  map reduce/hadoop distribution. You have to
>> add them in your classpath at *runtime*. There are multiple ways to do
>> it (which also depends upon how you plan to run and package/deploy your
>> code.)
>>
>> Check out this:
>>
>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>>
>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>
>> Regards,
>> Shahab
>>
>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>>
>>> Hi,
>>>
>>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>>> Reduce.
>>>
>>> The assignment has many sections but for now I am trying to parse JSON
>>> data.
>>>
>>> The input(i.e. value) to the map function is a single record of the form
>>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>>> I am interested only in the getting the frequency of value1.
>>>
>>> Following is the map- reduce job.
>>>
>>> public static class Map extends MapReduceBase implements
>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>               private final static IntWritable one = new IntWritable(1);
>>>               private Text word = new Text();
>>>
>>>
>>>               public void map(LongWritable key, Text value,
>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>> IOException {
>>>                       String line = value.toString();
>>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>>                       try{
>>>                       JSONObject obj = new JSONObject(tuple[1]);
>>>                       String id = obj.getString(“key");
>>>                           word.set(id);
>>>                           output.collect(word, one);
>>>                       }
>>>                       catch(JSONException e){
>>>                           e.printStackTrace();
>>>                       }
>>>                   }
>>>             }
>>>
>>>
>>>
>>>
>>>         public static class Reduce extends MapReduceBase implements
>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>               public void reduce(Text key, Iterator<IntWritable>
>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
>>> throws IOException {
>>>                     int sum = 0;
>>>                     while (values.hasNext()) {
>>>                           sum += values.next().get();
>>>                         }
>>>                     output.collect(key, new IntWritable(sum));
>>>                   }
>>>             }
>>>
>>> I successfully compiled the java code using the json and hadoop jars.
>>> Created a jar. But wen I run the Hadoop command I am getting the following
>>> exceptions.
>>>
>>>
>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>>> attempt_local1121514690_0001_m_000000_0
>>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>> null
>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>>> object
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> ... 10 more
>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:344)
>>> at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>> ... 15 more
>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> ... 22 more
>>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>>> job_local1121514690_0001
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>>> Exception in thread "main" java.io.IOException: Job failed!
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>> at org.myorg.Wordcount.main(Wordcount.java:64)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>
>>>
>>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>>
>>>
>>
>

Re: Json Parsing in map reduce.

Posted by Shambhavi Punja <sp...@usc.edu>.
Hi Shahab,

Thanks. That helped.

Regards,
Shambhavi

On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> The reason is that the Json parsing code is in a 3rd party library which
> is not included in the default  map reduce/hadoop distribution. You have to
> add them in your classpath at *runtime*. There are multiple ways to do it
> (which also depends upon how you plan to run and package/deploy your code.)
>
> Check out this:
>
> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> Regards,
> Shahab
>
> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>
>> Hi,
>>
>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>> Reduce.
>>
>> The assignment has many sections but for now I am trying to parse JSON
>> data.
>>
>> The input(i.e. value) to the map function is a single record of the form
>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>> I am interested only in the getting the frequency of value1.
>>
>> Following is the map- reduce job.
>>
>> public static class Map extends MapReduceBase implements
>> Mapper<LongWritable, Text, Text, IntWritable> {
>>               private final static IntWritable one = new IntWritable(1);
>>               private Text word = new Text();
>>
>>
>>               public void map(LongWritable key, Text value,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                       String line = value.toString();
>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>                       try{
>>                       JSONObject obj = new JSONObject(tuple[1]);
>>                       String id = obj.getString(“key");
>>                           word.set(id);
>>                           output.collect(word, one);
>>                       }
>>                       catch(JSONException e){
>>                           e.printStackTrace();
>>                       }
>>                   }
>>             }
>>
>>
>>
>>
>>         public static class Reduce extends MapReduceBase implements
>> Reducer<Text, IntWritable, Text, IntWritable> {
>>               public void reduce(Text key, Iterator<IntWritable> values,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                     int sum = 0;
>>                     while (values.hasNext()) {
>>                           sum += values.next().get();
>>                         }
>>                     output.collect(key, new IntWritable(sum));
>>                   }
>>             }
>>
>> I successfully compiled the java code using the json and hadoop jars.
>> Created a jar. But wen I run the Hadoop command I am getting the following
>> exceptions.
>>
>>
>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>> job_local1121514690_0001
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1121514690_0001_m_000000_0
>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>> object
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>> Caused by: java.lang.RuntimeException: Error in configuring object
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> ... 10 more
>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:344)
>> at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> ... 15 more
>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 22 more
>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>> job_local1121514690_0001
>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>> at org.myorg.Wordcount.main(Wordcount.java:64)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>
>>
>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>
>>
>

Re: Json Parsing in map reduce.

Posted by Shambhavi Punja <sp...@usc.edu>.
Hi Shahab,

Thanks. That helped.

Regards,
Shambhavi

On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> The reason is that the Json parsing code is in a 3rd party library which
> is not included in the default  map reduce/hadoop distribution. You have to
> add them in your classpath at *runtime*. There are multiple ways to do it
> (which also depends upon how you plan to run and package/deploy your code.)
>
> Check out this:
>
> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> Regards,
> Shahab
>
> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>
>> Hi,
>>
>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>> Reduce.
>>
>> The assignment has many sections but for now I am trying to parse JSON
>> data.
>>
>> The input(i.e. value) to the map function is a single record of the form
>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>> I am interested only in the getting the frequency of value1.
>>
>> Following is the map- reduce job.
>>
>> public static class Map extends MapReduceBase implements
>> Mapper<LongWritable, Text, Text, IntWritable> {
>>               private final static IntWritable one = new IntWritable(1);
>>               private Text word = new Text();
>>
>>
>>               public void map(LongWritable key, Text value,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                       String line = value.toString();
>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>                       try{
>>                       JSONObject obj = new JSONObject(tuple[1]);
>>                       String id = obj.getString(“key");
>>                           word.set(id);
>>                           output.collect(word, one);
>>                       }
>>                       catch(JSONException e){
>>                           e.printStackTrace();
>>                       }
>>                   }
>>             }
>>
>>
>>
>>
>>         public static class Reduce extends MapReduceBase implements
>> Reducer<Text, IntWritable, Text, IntWritable> {
>>               public void reduce(Text key, Iterator<IntWritable> values,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                     int sum = 0;
>>                     while (values.hasNext()) {
>>                           sum += values.next().get();
>>                         }
>>                     output.collect(key, new IntWritable(sum));
>>                   }
>>             }
>>
>> I successfully compiled the java code using the json and hadoop jars.
>> Created a jar. But wen I run the Hadoop command I am getting the following
>> exceptions.
>>
>>
>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>> job_local1121514690_0001
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1121514690_0001_m_000000_0
>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>> object
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>> Caused by: java.lang.RuntimeException: Error in configuring object
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> ... 10 more
>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:344)
>> at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> ... 15 more
>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 22 more
>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>> job_local1121514690_0001
>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>> at org.myorg.Wordcount.main(Wordcount.java:64)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>
>>
>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>
>>
>

Re: Json Parsing in map reduce.

Posted by Shambhavi Punja <sp...@usc.edu>.
Hi Shahab,

Thanks. That helped.

Regards,
Shambhavi

On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> The reason is that the Json parsing code is in a 3rd party library which
> is not included in the default  map reduce/hadoop distribution. You have to
> add them in your classpath at *runtime*. There are multiple ways to do it
> (which also depends upon how you plan to run and package/deploy your code.)
>
> Check out this:
>
> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> Regards,
> Shahab
>
> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>
>> Hi,
>>
>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>> Reduce.
>>
>> The assignment has many sections but for now I am trying to parse JSON
>> data.
>>
>> The input(i.e. value) to the map function is a single record of the form
>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>> I am interested only in the getting the frequency of value1.
>>
>> Following is the map- reduce job.
>>
>> public static class Map extends MapReduceBase implements
>> Mapper<LongWritable, Text, Text, IntWritable> {
>>               private final static IntWritable one = new IntWritable(1);
>>               private Text word = new Text();
>>
>>
>>               public void map(LongWritable key, Text value,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                       String line = value.toString();
>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>                       try{
>>                       JSONObject obj = new JSONObject(tuple[1]);
>>                       String id = obj.getString(“key");
>>                           word.set(id);
>>                           output.collect(word, one);
>>                       }
>>                       catch(JSONException e){
>>                           e.printStackTrace();
>>                       }
>>                   }
>>             }
>>
>>
>>
>>
>>         public static class Reduce extends MapReduceBase implements
>> Reducer<Text, IntWritable, Text, IntWritable> {
>>               public void reduce(Text key, Iterator<IntWritable> values,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                     int sum = 0;
>>                     while (values.hasNext()) {
>>                           sum += values.next().get();
>>                         }
>>                     output.collect(key, new IntWritable(sum));
>>                   }
>>             }
>>
>> I successfully compiled the java code using the json and hadoop jars.
>> Created a jar. But wen I run the Hadoop command I am getting the following
>> exceptions.
>>
>>
>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>> job_local1121514690_0001
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1121514690_0001_m_000000_0
>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>> object
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>> Caused by: java.lang.RuntimeException: Error in configuring object
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> ... 10 more
>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:344)
>> at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> ... 15 more
>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 22 more
>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>> job_local1121514690_0001
>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>> at org.myorg.Wordcount.main(Wordcount.java:64)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>
>>
>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>
>>
>

Re: Json Parsing in map reduce.

Posted by Shambhavi Punja <sp...@usc.edu>.
Hi Shahab,

Thanks. That helped.

Regards,
Shambhavi

On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> The reason is that the Json parsing code is in a 3rd party library which
> is not included in the default  map reduce/hadoop distribution. You have to
> add them in your classpath at *runtime*. There are multiple ways to do it
> (which also depends upon how you plan to run and package/deploy your code.)
>
> Check out this:
>
> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
>
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> Regards,
> Shahab
>
> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:
>
>> Hi,
>>
>> I am working on an assignment on Hadoop Map reduce. I am very new to Map
>> Reduce.
>>
>> The assignment has many sections but for now I am trying to parse JSON
>> data.
>>
>> The input(i.e. value) to the map function is a single record of the form
>>    xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
>> I am interested only in the getting the frequency of value1.
>>
>> Following is the map- reduce job.
>>
>> public static class Map extends MapReduceBase implements
>> Mapper<LongWritable, Text, Text, IntWritable> {
>>               private final static IntWritable one = new IntWritable(1);
>>               private Text word = new Text();
>>
>>
>>               public void map(LongWritable key, Text value,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                       String line = value.toString();
>>                       String[] tuple = line.split("(?<=\\}),\\s");
>>                       try{
>>                       JSONObject obj = new JSONObject(tuple[1]);
>>                       String id = obj.getString(“key");
>>                           word.set(id);
>>                           output.collect(word, one);
>>                       }
>>                       catch(JSONException e){
>>                           e.printStackTrace();
>>                       }
>>                   }
>>             }
>>
>>
>>
>>
>>         public static class Reduce extends MapReduceBase implements
>> Reducer<Text, IntWritable, Text, IntWritable> {
>>               public void reduce(Text key, Iterator<IntWritable> values,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>                     int sum = 0;
>>                     while (values.hasNext()) {
>>                           sum += values.next().get();
>>                         }
>>                     output.collect(key, new IntWritable(sum));
>>                   }
>>             }
>>
>> I successfully compiled the java code using the json and hadoop jars.
>> Created a jar. But wen I run the Hadoop command I am getting the following
>> exceptions.
>>
>>
>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
>> job_local1121514690_0001
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1121514690_0001_m_000000_0
>> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
>> java.lang.Exception: java.lang.RuntimeException: Error in configuring
>> object
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>> Caused by: java.lang.RuntimeException: Error in configuring object
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> ... 10 more
>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:344)
>> at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> ... 15 more
>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 22 more
>> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
>> job_local1121514690_0001
>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>> at org.myorg.Wordcount.main(Wordcount.java:64)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>
>>
>> PS: When I modify the same code and exclude the JSON parsing i.e. find
>> frequency of {‘key’:’value1’} section of the example input, all works well.
>>
>>
>

Re: Json Parsing in map reduce.

Posted by Shahab Yunus <sh...@gmail.com>.
The reason is that the Json parsing code is in a 3rd party library which is
not included in the default  map reduce/hadoop distribution. You have to
add them in your classpath at *runtime*. There are multiple ways to do it
(which also depends upon how you plan to run and package/deploy your code.)

Check out this:
https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

Regards,
Shahab

On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:

> Hi,
>
> I am working on an assignment on Hadoop Map reduce. I am very new to Map
> Reduce.
>
> The assignment has many sections but for now I am trying to parse JSON
> data.
>
> The input(i.e. value) to the map function is a single record of the form
>  xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
> I am interested only in the getting the frequency of value1.
>
> Following is the map- reduce job.
>
> public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, Text, IntWritable> {
>               private final static IntWritable one = new IntWritable(1);
>               private Text word = new Text();
>
>
>               public void map(LongWritable key, Text value,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                       String line = value.toString();
>                       String[] tuple = line.split("(?<=\\}),\\s");
>                       try{
>                       JSONObject obj = new JSONObject(tuple[1]);
>                       String id = obj.getString(“key");
>                           word.set(id);
>                           output.collect(word, one);
>                       }
>                       catch(JSONException e){
>                           e.printStackTrace();
>                       }
>                   }
>             }
>
>
>
>
>         public static class Reduce extends MapReduceBase implements
> Reducer<Text, IntWritable, Text, IntWritable> {
>               public void reduce(Text key, Iterator<IntWritable> values,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                     int sum = 0;
>                     while (values.hasNext()) {
>                           sum += values.next().get();
>                         }
>                     output.collect(key, new IntWritable(sum));
>                   }
>             }
>
> I successfully compiled the java code using the json and hadoop jars.
> Created a jar. But wen I run the Hadoop command I am getting the following
> exceptions.
>
>
> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
> job_local1121514690_0001
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
> attempt_local1121514690_0001_m_000000_0
> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
> java.lang.Exception: java.lang.RuntimeException: Error in configuring
> object
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 10 more
> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:344)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> ... 15 more
> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 22 more
> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
> job_local1121514690_0001
> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> at org.myorg.Wordcount.main(Wordcount.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> PS: When I modify the same code and exclude the JSON parsing i.e. find
> frequency of {‘key’:’value1’} section of the example input, all works well.
>
>

Re: Json Parsing in map reduce.

Posted by Shahab Yunus <sh...@gmail.com>.
The reason is that the Json parsing code is in a 3rd party library which is
not included in the default  map reduce/hadoop distribution. You have to
add them in your classpath at *runtime*. There are multiple ways to do it
(which also depends upon how you plan to run and package/deploy your code.)

Check out this:
https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

Regards,
Shahab

On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:

> Hi,
>
> I am working on an assignment on Hadoop Map reduce. I am very new to Map
> Reduce.
>
> The assignment has many sections but for now I am trying to parse JSON
> data.
>
> The input(i.e. value) to the map function is a single record of the form
>  xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
> I am interested only in the getting the frequency of value1.
>
> Following is the map- reduce job.
>
> public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, Text, IntWritable> {
>               private final static IntWritable one = new IntWritable(1);
>               private Text word = new Text();
>
>
>               public void map(LongWritable key, Text value,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                       String line = value.toString();
>                       String[] tuple = line.split("(?<=\\}),\\s");
>                       try{
>                       JSONObject obj = new JSONObject(tuple[1]);
>                       String id = obj.getString(“key");
>                           word.set(id);
>                           output.collect(word, one);
>                       }
>                       catch(JSONException e){
>                           e.printStackTrace();
>                       }
>                   }
>             }
>
>
>
>
>         public static class Reduce extends MapReduceBase implements
> Reducer<Text, IntWritable, Text, IntWritable> {
>               public void reduce(Text key, Iterator<IntWritable> values,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                     int sum = 0;
>                     while (values.hasNext()) {
>                           sum += values.next().get();
>                         }
>                     output.collect(key, new IntWritable(sum));
>                   }
>             }
>
> I successfully compiled the java code using the json and hadoop jars.
> Created a jar. But wen I run the Hadoop command I am getting the following
> exceptions.
>
>
> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
> job_local1121514690_0001
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
> attempt_local1121514690_0001_m_000000_0
> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
> java.lang.Exception: java.lang.RuntimeException: Error in configuring
> object
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 10 more
> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:344)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> ... 15 more
> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 22 more
> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
> job_local1121514690_0001
> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> at org.myorg.Wordcount.main(Wordcount.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> PS: When I modify the same code and exclude the JSON parsing i.e. find
> frequency of {‘key’:’value1’} section of the example input, all works well.
>
>

Re: Json Parsing in map reduce.

Posted by Shahab Yunus <sh...@gmail.com>.
The reason is that the Json parsing code is in a 3rd party library which is
not included in the default  map reduce/hadoop distribution. You have to
add them in your classpath at *runtime*. There are multiple ways to do it
(which also depends upon how you plan to run and package/deploy your code.)

Check out this:
https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

Regards,
Shahab

On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:

> Hi,
>
> I am working on an assignment on Hadoop Map reduce. I am very new to Map
> Reduce.
>
> The assignment has many sections but for now I am trying to parse JSON
> data.
>
> The input(i.e. value) to the map function is a single record of the form
>  xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
> I am interested only in the getting the frequency of value1.
>
> Following is the map- reduce job.
>
> public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, Text, IntWritable> {
>               private final static IntWritable one = new IntWritable(1);
>               private Text word = new Text();
>
>
>               public void map(LongWritable key, Text value,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                       String line = value.toString();
>                       String[] tuple = line.split("(?<=\\}),\\s");
>                       try{
>                       JSONObject obj = new JSONObject(tuple[1]);
>                       String id = obj.getString(“key");
>                           word.set(id);
>                           output.collect(word, one);
>                       }
>                       catch(JSONException e){
>                           e.printStackTrace();
>                       }
>                   }
>             }
>
>
>
>
>         public static class Reduce extends MapReduceBase implements
> Reducer<Text, IntWritable, Text, IntWritable> {
>               public void reduce(Text key, Iterator<IntWritable> values,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                     int sum = 0;
>                     while (values.hasNext()) {
>                           sum += values.next().get();
>                         }
>                     output.collect(key, new IntWritable(sum));
>                   }
>             }
>
> I successfully compiled the java code using the json and hadoop jars.
> Created a jar. But wen I run the Hadoop command I am getting the following
> exceptions.
>
>
> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
> job_local1121514690_0001
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
> attempt_local1121514690_0001_m_000000_0
> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
> java.lang.Exception: java.lang.RuntimeException: Error in configuring
> object
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 10 more
> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:344)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> ... 15 more
> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 22 more
> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
> job_local1121514690_0001
> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> at org.myorg.Wordcount.main(Wordcount.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> PS: When I modify the same code and exclude the JSON parsing i.e. find
> frequency of {‘key’:’value1’} section of the example input, all works well.
>
>

Re: Json Parsing in map reduce.

Posted by Shahab Yunus <sh...@gmail.com>.
The reason is that the Json parsing code is in a 3rd party library which is
not included in the default  map reduce/hadoop distribution. You have to
add them in your classpath at *runtime*. There are multiple ways to do it
(which also depends upon how you plan to run and package/deploy your code.)

Check out this:
https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

Regards,
Shahab

On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <sp...@usc.edu> wrote:

> Hi,
>
> I am working on an assignment on Hadoop Map reduce. I am very new to Map
> Reduce.
>
> The assignment has many sections but for now I am trying to parse JSON
> data.
>
> The input(i.e. value) to the map function is a single record of the form
>  xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
> I am interested only in the getting the frequency of value1.
>
> Following is the map- reduce job.
>
> public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, Text, IntWritable> {
>               private final static IntWritable one = new IntWritable(1);
>               private Text word = new Text();
>
>
>               public void map(LongWritable key, Text value,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                       String line = value.toString();
>                       String[] tuple = line.split("(?<=\\}),\\s");
>                       try{
>                       JSONObject obj = new JSONObject(tuple[1]);
>                       String id = obj.getString(“key");
>                           word.set(id);
>                           output.collect(word, one);
>                       }
>                       catch(JSONException e){
>                           e.printStackTrace();
>                       }
>                   }
>             }
>
>
>
>
>         public static class Reduce extends MapReduceBase implements
> Reducer<Text, IntWritable, Text, IntWritable> {
>               public void reduce(Text key, Iterator<IntWritable> values,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                     int sum = 0;
>                     while (values.hasNext()) {
>                           sum += values.next().get();
>                         }
>                     output.collect(key, new IntWritable(sum));
>                   }
>             }
>
> I successfully compiled the java code using the json and hadoop jars.
> Created a jar. But wen I run the Hadoop command I am getting the following
> exceptions.
>
>
> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
> job_local1121514690_0001
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
> attempt_local1121514690_0001_m_000000_0
> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
> java.lang.Exception: java.lang.RuntimeException: Error in configuring
> object
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 10 more
> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:344)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> ... 15 more
> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 22 more
> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
> job_local1121514690_0001
> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> at org.myorg.Wordcount.main(Wordcount.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> PS: When I modify the same code and exclude the JSON parsing i.e. find
> frequency of {‘key’:’value1’} section of the example input, all works well.
>
>