You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mix Nin <pi...@gmail.com> on 2013/03/05 16:11:05 UTC

Transpose

Hi

I have data in a file as follows . There are 3 columns separated by
semicolon(;). Each column would have multiple values separated by comma
(,).

11,22,33;144,244,344;yny;

I need output data in below format. It is like transposing  values of each
column.

11 144 y
22 244 n
33 344 y

Can we write map reduce program to achieve this. Could you help on the code
on how to write.


Thanks

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Yes you can.
You read in the row in each iteration of Mapper.map()
Text input.
You then output 3 times to the collector one for each row of the matrix.

Spin,sort, and reduce as needed.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 9:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
> 
> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
> 
> 11,22,33;144,244,344;yny;
> 
> I need output data in below format. It is like transposing  values of each column.
> 
> 11 144 y	
> 22 244 n
> 33 344 y
> 
> Can we write map reduce program to achieve this. Could you help on the code on how to write.
> 
> 
> Thanks

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Sandy, 
Remember KISS.

Don't try to read it in as anything but just a text line. 
Its really a 3x3 matrix in what looks to be grouped by columns.

Your output will drop the initial key, and you then parse the lines and then output it. 
Without further explanation, it looks like each tuple is unique.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 11:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi,
> 
> Essentially what you want to do is group your data points by their position in the column, and have each reduce call construct the data for each row into a row.  To have each record that the mapper processes be one of the columns, you can use TextInputFormat with conf.set("textinputformat.record.delimiter", ";").  Your mapper will receive keys as LongWritables specifying the byte index into the input file, and Text as values.  The mapper will tokenize the input string. 
> 
> Emiting a map output for each data point in each column, you can then use secondary sort to send the data to the right place in the right order (see http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/). Your composite key would look like (index of data point in column, which is the row index; the LongWritable passed in as the map input key).  Each reduce call would get all the points in a single row. You would sort/group by row index, and within a reduce's values, sort by byte index so that entries from earlier columns come before later ones.
> 
> Does that make sense?
> 
> Sandy
> 
> On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:
>> Hi
>> 
>> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
>> 
>> 11,22,33;144,244,344;yny;
>> 
>> I need output data in below format. It is like transposing  values of each column.
>> 
>> 11 144 y	
>> 22 244 n
>> 33 344 y
>> 
>> Can we write map reduce program to achieve this. Could you help on the code on how to write.
>> 
>> 
>> Thanks
>

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Sandy, 
Remember KISS.

Don't try to read it in as anything but just a text line. 
Its really a 3x3 matrix in what looks to be grouped by columns.

Your output will drop the initial key, and you then parse the lines and then output it. 
Without further explanation, it looks like each tuple is unique.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 11:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi,
> 
> Essentially what you want to do is group your data points by their position in the column, and have each reduce call construct the data for each row into a row.  To have each record that the mapper processes be one of the columns, you can use TextInputFormat with conf.set("textinputformat.record.delimiter", ";").  Your mapper will receive keys as LongWritables specifying the byte index into the input file, and Text as values.  The mapper will tokenize the input string. 
> 
> Emiting a map output for each data point in each column, you can then use secondary sort to send the data to the right place in the right order (see http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/). Your composite key would look like (index of data point in column, which is the row index; the LongWritable passed in as the map input key).  Each reduce call would get all the points in a single row. You would sort/group by row index, and within a reduce's values, sort by byte index so that entries from earlier columns come before later ones.
> 
> Does that make sense?
> 
> Sandy
> 
> On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:
>> Hi
>> 
>> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
>> 
>> 11,22,33;144,244,344;yny;
>> 
>> I need output data in below format. It is like transposing  values of each column.
>> 
>> 11 144 y	
>> 22 244 n
>> 33 344 y
>> 
>> Can we write map reduce program to achieve this. Could you help on the code on how to write.
>> 
>> 
>> Thanks
>

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Sandy, 
Remember KISS.

Don't try to read it in as anything but just a text line. 
Its really a 3x3 matrix in what looks to be grouped by columns.

Your output will drop the initial key, and you then parse the lines and then output it. 
Without further explanation, it looks like each tuple is unique.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 11:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi,
> 
> Essentially what you want to do is group your data points by their position in the column, and have each reduce call construct the data for each row into a row.  To have each record that the mapper processes be one of the columns, you can use TextInputFormat with conf.set("textinputformat.record.delimiter", ";").  Your mapper will receive keys as LongWritables specifying the byte index into the input file, and Text as values.  The mapper will tokenize the input string. 
> 
> Emiting a map output for each data point in each column, you can then use secondary sort to send the data to the right place in the right order (see http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/). Your composite key would look like (index of data point in column, which is the row index; the LongWritable passed in as the map input key).  Each reduce call would get all the points in a single row. You would sort/group by row index, and within a reduce's values, sort by byte index so that entries from earlier columns come before later ones.
> 
> Does that make sense?
> 
> Sandy
> 
> On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:
>> Hi
>> 
>> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
>> 
>> 11,22,33;144,244,344;yny;
>> 
>> I need output data in below format. It is like transposing  values of each column.
>> 
>> 11 144 y	
>> 22 244 n
>> 33 344 y
>> 
>> Can we write map reduce program to achieve this. Could you help on the code on how to write.
>> 
>> 
>> Thanks
>

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Sandy, 
Remember KISS.

Don't try to read it in as anything but just a text line. 
Its really a 3x3 matrix in what looks to be grouped by columns.

Your output will drop the initial key, and you then parse the lines and then output it. 
Without further explanation, it looks like each tuple is unique.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 11:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi,
> 
> Essentially what you want to do is group your data points by their position in the column, and have each reduce call construct the data for each row into a row.  To have each record that the mapper processes be one of the columns, you can use TextInputFormat with conf.set("textinputformat.record.delimiter", ";").  Your mapper will receive keys as LongWritables specifying the byte index into the input file, and Text as values.  The mapper will tokenize the input string. 
> 
> Emiting a map output for each data point in each column, you can then use secondary sort to send the data to the right place in the right order (see http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/). Your composite key would look like (index of data point in column, which is the row index; the LongWritable passed in as the map input key).  Each reduce call would get all the points in a single row. You would sort/group by row index, and within a reduce's values, sort by byte index so that entries from earlier columns come before later ones.
> 
> Does that make sense?
> 
> Sandy
> 
> On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:
>> Hi
>> 
>> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
>> 
>> 11,22,33;144,244,344;yny;
>> 
>> I need output data in below format. It is like transposing  values of each column.
>> 
>> 11 144 y	
>> 22 244 n
>> 33 344 y
>> 
>> Can we write map reduce program to achieve this. Could you help on the code on how to write.
>> 
>> 
>> Thanks
>

Re: Transpose

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi,

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row.  To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set("textinputformat.record.delimiter", ";").  Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key).  Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?

Sandy

On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
>
> I have data in a file as follows . There are 3 columns separated by
> semicolon(;). Each column would have multiple values separated by comma
> (,).
>
> 11,22,33;144,244,344;yny;
>
> I need output data in below format. It is like transposing  values of each
> column.
>
> 11 144 y
> 22 244 n
> 33 344 y
>
> Can we write map reduce program to achieve this. Could you help on the
> code on how to write.
>
>
> Thanks
>

Re: Transpose

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi,

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row.  To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set("textinputformat.record.delimiter", ";").  Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key).  Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?

Sandy

On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
>
> I have data in a file as follows . There are 3 columns separated by
> semicolon(;). Each column would have multiple values separated by comma
> (,).
>
> 11,22,33;144,244,344;yny;
>
> I need output data in below format. It is like transposing  values of each
> column.
>
> 11 144 y
> 22 244 n
> 33 344 y
>
> Can we write map reduce program to achieve this. Could you help on the
> code on how to write.
>
>
> Thanks
>

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Yes you can.
You read in the row in each iteration of Mapper.map()
Text input.
You then output 3 times to the collector one for each row of the matrix.

Spin,sort, and reduce as needed.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 9:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
> 
> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
> 
> 11,22,33;144,244,344;yny;
> 
> I need output data in below format. It is like transposing  values of each column.
> 
> 11 144 y	
> 22 244 n
> 33 344 y
> 
> Can we write map reduce program to achieve this. Could you help on the code on how to write.
> 
> 
> Thanks

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Yes you can.
You read in the row in each iteration of Mapper.map()
Text input.
You then output 3 times to the collector one for each row of the matrix.

Spin,sort, and reduce as needed.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 9:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
> 
> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
> 
> 11,22,33;144,244,344;yny;
> 
> I need output data in below format. It is like transposing  values of each column.
> 
> 11 144 y	
> 22 244 n
> 33 344 y
> 
> Can we write map reduce program to achieve this. Could you help on the code on how to write.
> 
> 
> Thanks

Re: Transpose

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi,

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row.  To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set("textinputformat.record.delimiter", ";").  Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key).  Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?

Sandy

On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
>
> I have data in a file as follows . There are 3 columns separated by
> semicolon(;). Each column would have multiple values separated by comma
> (,).
>
> 11,22,33;144,244,344;yny;
>
> I need output data in below format. It is like transposing  values of each
> column.
>
> 11 144 y
> 22 244 n
> 33 344 y
>
> Can we write map reduce program to achieve this. Could you help on the
> code on how to write.
>
>
> Thanks
>

Re: Transpose

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi,

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row.  To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set("textinputformat.record.delimiter", ";").  Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key).  Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?

Sandy

On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
>
> I have data in a file as follows . There are 3 columns separated by
> semicolon(;). Each column would have multiple values separated by comma
> (,).
>
> 11,22,33;144,244,344;yny;
>
> I need output data in below format. It is like transposing  values of each
> column.
>
> 11 144 y
> 22 244 n
> 33 344 y
>
> Can we write map reduce program to achieve this. Could you help on the
> code on how to write.
>
>
> Thanks
>

Re: Transpose

Posted by Michel Segel <mi...@hotmail.com>.

Yes you can.
You read in the row in each iteration of Mapper.map()
Text input.
You then output 3 times to the collector one for each row of the matrix.

Spin,sort, and reduce as needed.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 9:11 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi
> 
> I have data in a file as follows . There are 3 columns separated by semicolon(;). Each column would have multiple values separated by comma (,). 
> 
> 11,22,33;144,244,344;yny;
> 
> I need output data in below format. It is like transposing  values of each column.
> 
> 11 144 y	
> 22 244 n
> 33 344 y
> 
> Can we write map reduce program to achieve this. Could you help on the code on how to write.
> 
> 
> Thanks