You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "AMARNATH, Balachandar" <BA...@airbus.com> on 2013/03/06 06:37:13 UTC

Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.


RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

Re: Map reduce technique

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Balachandar,

In MapReduce, interpreting input files as key value pairs is accomplished
through InputFormats.  Some common InputFormats are TextInputFormat, which
uses lines in a text file as values and their byte offset into the file as
keys, KeyValueTextInputFormat, which interprets the first token on a line
as the key and the rest as the value, and WholeFileInputFormat, which uses
an entire line as a value.  If you wanted to process an image file in a
specific way, you would probably need to supply your own InputFormat.

Does that help?

-Sandy

On Tue, Mar 5, 2013 at 9:37 PM, AMARNATH, Balachandar <
BALACHANDAR.AMARNATH@airbus.com> wrote:

>  Hi,
>
> I am new to map reduce paradigm. I read in a tutorial that says that ‘map’
> function splits the data and into key value pairs. This means, the
> map-reduce framework automatically splits the data into pieces or do we
> need to explicitly provide the method to split the data into pieces. If it
> does automatically, how it splits an image file (size etc)? I see,
> processing of an image file as a whole will give different results than
> processing them in chunks.
>
>
>
> With thanks and regards
> Balachandar
>
>
>
>
> The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
> If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
> Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
> All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.
>
>

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
Few points,
  Sequence File Layout
[cid:image001.jpg@01CE1A5C.F60B1D40]<http://www.bodhtree.com/bigdata.php>


From: Samir Kumar Das Mohapatra
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

Re: Map reduce technique

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Balachandar,

In MapReduce, interpreting input files as key value pairs is accomplished
through InputFormats.  Some common InputFormats are TextInputFormat, which
uses lines in a text file as values and their byte offset into the file as
keys, KeyValueTextInputFormat, which interprets the first token on a line
as the key and the rest as the value, and WholeFileInputFormat, which uses
an entire line as a value.  If you wanted to process an image file in a
specific way, you would probably need to supply your own InputFormat.

Does that help?

-Sandy

On Tue, Mar 5, 2013 at 9:37 PM, AMARNATH, Balachandar <
BALACHANDAR.AMARNATH@airbus.com> wrote:

>  Hi,
>
> I am new to map reduce paradigm. I read in a tutorial that says that ‘map’
> function splits the data and into key value pairs. This means, the
> map-reduce framework automatically splits the data into pieces or do we
> need to explicitly provide the method to split the data into pieces. If it
> does automatically, how it splits an image file (size etc)? I see,
> processing of an image file as a whole will give different results than
> processing them in chunks.
>
>
>
> With thanks and regards
> Balachandar
>
>
>
>
> The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
> If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
> Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
> All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.
>
>

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
Few points,
  Sequence File Layout
[cid:image001.jpg@01CE1A5C.F60B1D40]<http://www.bodhtree.com/bigdata.php>


From: Samir Kumar Das Mohapatra
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

Re: Map reduce technique

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Balachandar,

In MapReduce, interpreting input files as key value pairs is accomplished
through InputFormats.  Some common InputFormats are TextInputFormat, which
uses lines in a text file as values and their byte offset into the file as
keys, KeyValueTextInputFormat, which interprets the first token on a line
as the key and the rest as the value, and WholeFileInputFormat, which uses
an entire line as a value.  If you wanted to process an image file in a
specific way, you would probably need to supply your own InputFormat.

Does that help?

-Sandy

On Tue, Mar 5, 2013 at 9:37 PM, AMARNATH, Balachandar <
BALACHANDAR.AMARNATH@airbus.com> wrote:

>  Hi,
>
> I am new to map reduce paradigm. I read in a tutorial that says that ‘map’
> function splits the data and into key value pairs. This means, the
> map-reduce framework automatically splits the data into pieces or do we
> need to explicitly provide the method to split the data into pieces. If it
> does automatically, how it splits an image file (size etc)? I see,
> processing of an image file as a whole will give different results than
> processing them in chunks.
>
>
>
> With thanks and regards
> Balachandar
>
>
>
>
> The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
> If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
> Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
> All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.
>
>

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

Re: Map reduce technique

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Balachandar,

In MapReduce, interpreting input files as key value pairs is accomplished
through InputFormats.  Some common InputFormats are TextInputFormat, which
uses lines in a text file as values and their byte offset into the file as
keys, KeyValueTextInputFormat, which interprets the first token on a line
as the key and the rest as the value, and WholeFileInputFormat, which uses
an entire line as a value.  If you wanted to process an image file in a
specific way, you would probably need to supply your own InputFormat.

Does that help?

-Sandy

On Tue, Mar 5, 2013 at 9:37 PM, AMARNATH, Balachandar <
BALACHANDAR.AMARNATH@airbus.com> wrote:

>  Hi,
>
> I am new to map reduce paradigm. I read in a tutorial that says that ‘map’
> function splits the data and into key value pairs. This means, the
> map-reduce framework automatically splits the data into pieces or do we
> need to explicitly provide the method to split the data into pieces. If it
> does automatically, how it splits an image file (size etc)? I see,
> processing of an image file as a whole will give different results than
> processing them in chunks.
>
>
>
> With thanks and regards
> Balachandar
>
>
>
>
> The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
> If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
> Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
> All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.
>
>

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
Few points,
  Sequence File Layout
[cid:image001.jpg@01CE1A5C.F60B1D40]<http://www.bodhtree.com/bigdata.php>


From: Samir Kumar Das Mohapatra
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
Few points,
  Sequence File Layout
[cid:image001.jpg@01CE1A5C.F60B1D40]<http://www.bodhtree.com/bigdata.php>


From: Samir Kumar Das Mohapatra
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
            job.setInputFormatClass(SequenceFileInputFormat.class);

Just you have to follow Hadoop API from apache web-site

Hints:

1)     Create sequence file prior to the Job.(Java Algorithm )

Example POC: You have to change based on your requirement



import java.io.IOException;

import java.net.URI;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IOUtils;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;



//White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 5375-5384). OReilly Media - A. Kindle Edition.



public class SequenceFileWriteDemo {



    private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };



    public static void main( String[] args) throws IOException {

                    //local file path

                    String uri = "/home/hadoop/Desktop/Image/test_02.txt";

                    Configuration conf = new Configuration();

                    FileSystem fs = FileSystem.get(URI.create( uri), conf);

                    Path path = new Path( uri);

                    IntWritable key = new IntWritable();

                    Text value = new Text();

                    SequenceFile.Writer writer = null;

                    try {

                                    writer = SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());

                                    for (int i = 0; i < 100; i ++) {

                                                    key.set( 100 - i);

                                                    value.set( DATA[ i % DATA.length]);

                                                    // System.out.printf("[% s]\t% s\t% s\n", writer.getLength(), key, value);

                                                    writer.append( key, value); }

                    } finally

                    { IOUtils.closeStream( writer);

                    }

    }

}



Note: you have to convert all image file to one sequence file.



2)      Put it into the HDFS

3)     Write MAP/Reduce  based on the logic what you need



From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:24
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
            job.setInputFormatClass(SequenceFileInputFormat.class);

Just you have to follow Hadoop API from apache web-site

Hints:

1)     Create sequence file prior to the Job.(Java Algorithm )

Example POC: You have to change based on your requirement



import java.io.IOException;

import java.net.URI;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IOUtils;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;



//White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 5375-5384). OReilly Media - A. Kindle Edition.



public class SequenceFileWriteDemo {



    private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };



    public static void main( String[] args) throws IOException {

                    //local file path

                    String uri = "/home/hadoop/Desktop/Image/test_02.txt";

                    Configuration conf = new Configuration();

                    FileSystem fs = FileSystem.get(URI.create( uri), conf);

                    Path path = new Path( uri);

                    IntWritable key = new IntWritable();

                    Text value = new Text();

                    SequenceFile.Writer writer = null;

                    try {

                                    writer = SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());

                                    for (int i = 0; i < 100; i ++) {

                                                    key.set( 100 - i);

                                                    value.set( DATA[ i % DATA.length]);

                                                    // System.out.printf("[% s]\t% s\t% s\n", writer.getLength(), key, value);

                                                    writer.append( key, value); }

                    } finally

                    { IOUtils.closeStream( writer);

                    }

    }

}



Note: you have to convert all image file to one sequence file.



2)      Put it into the HDFS

3)     Write MAP/Reduce  based on the logic what you need



From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:24
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
            job.setInputFormatClass(SequenceFileInputFormat.class);

Just you have to follow Hadoop API from apache web-site

Hints:

1)     Create sequence file prior to the Job.(Java Algorithm )

Example POC: You have to change based on your requirement



import java.io.IOException;

import java.net.URI;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IOUtils;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;



//White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 5375-5384). OReilly Media - A. Kindle Edition.



public class SequenceFileWriteDemo {



    private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };



    public static void main( String[] args) throws IOException {

                    //local file path

                    String uri = "/home/hadoop/Desktop/Image/test_02.txt";

                    Configuration conf = new Configuration();

                    FileSystem fs = FileSystem.get(URI.create( uri), conf);

                    Path path = new Path( uri);

                    IntWritable key = new IntWritable();

                    Text value = new Text();

                    SequenceFile.Writer writer = null;

                    try {

                                    writer = SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());

                                    for (int i = 0; i < 100; i ++) {

                                                    key.set( 100 - i);

                                                    value.set( DATA[ i % DATA.length]);

                                                    // System.out.printf("[% s]\t% s\t% s\n", writer.getLength(), key, value);

                                                    writer.append( key, value); }

                    } finally

                    { IOUtils.closeStream( writer);

                    }

    }

}



Note: you have to convert all image file to one sequence file.



2)      Put it into the HDFS

3)     Write MAP/Reduce  based on the logic what you need



From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:24
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by Samir Kumar Das Mohapatra <da...@adobe.com>.
            job.setInputFormatClass(SequenceFileInputFormat.class);

Just you have to follow Hadoop API from apache web-site

Hints:

1)     Create sequence file prior to the Job.(Java Algorithm )

Example POC: You have to change based on your requirement



import java.io.IOException;

import java.net.URI;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IOUtils;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;



//White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 5375-5384). OReilly Media - A. Kindle Edition.



public class SequenceFileWriteDemo {



    private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };



    public static void main( String[] args) throws IOException {

                    //local file path

                    String uri = "/home/hadoop/Desktop/Image/test_02.txt";

                    Configuration conf = new Configuration();

                    FileSystem fs = FileSystem.get(URI.create( uri), conf);

                    Path path = new Path( uri);

                    IntWritable key = new IntWritable();

                    Text value = new Text();

                    SequenceFile.Writer writer = null;

                    try {

                                    writer = SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());

                                    for (int i = 0; i < 100; i ++) {

                                                    key.set( 100 - i);

                                                    value.set( DATA[ i % DATA.length]);

                                                    // System.out.printf("[% s]\t% s\t% s\n", writer.getLength(), key, value);

                                                    writer.append( key, value); }

                    } finally

                    { IOUtils.closeStream( writer);

                    }

    }

}



Note: you have to convert all image file to one sequence file.



2)      Put it into the HDFS

3)     Write MAP/Reduce  based on the logic what you need



From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:24
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

RE: Map reduce technique

Posted by "AMARNATH, Balachandar" <BA...@airbus.com>.
Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.


RE: Map reduce technique

Posted by "AMARNATH, Balachandar" <BA...@airbus.com>.
Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.


RE: Map reduce technique

Posted by "AMARNATH, Balachandar" <BA...@airbus.com>.
Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.


RE: Map reduce technique

Posted by "AMARNATH, Balachandar" <BA...@airbus.com>.
Thanks for the mail,

Can u please share few links to start with?


Regards
Bala

From: Samir Kumar Das Mohapatra [mailto:dasmohap@adobe.com]
Sent: 06 March 2013 11:21
To: user@hadoop.apache.org
Subject: RE: Map reduce technique

I think  you have to look the sequence file  as input format .

Basically, the way this works is, you will have a separate Java process that takes several image files, reads the ray bytes into memory, then stores the data into a key-value pair in a SequenceFile. Keep going and keep writing into HDFS. This may take a while, but you'll only have to do it once.

Regards,
Samir.

From: AMARNATH, Balachandar [mailto:BALACHANDAR.AMARNATH@airbus.com]
Sent: 06 March 2013 11:07
To: user@hadoop.apache.org
Subject: Map reduce technique

Hi,

I am new to map reduce paradigm. I read in a tutorial that says that 'map' function splits the data and into key value pairs. This means, the map-reduce framework automatically splits the data into pieces or do we need to explicitly provide the method to split the data into pieces. If it does automatically, how it splits an image file (size etc)? I see, processing of an image file as a whole will give different results than processing them in chunks.



With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.