You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Subir S <su...@gmail.com> on 2012/06/21 10:16:35 UTC

Is it possible to implement transpose with PigLatin/any other MR language?

Hi,

Is it possible to implement transpose operation of rows into columns and
vice versa...


i.e.

col1 col2 col3
col4 col5 col6
col7 col8 col9
col10 col11 col12

can this be converted to

col1 col4 col7 col10
col2 col5 col8 col11
col3 col6 col9 col12

Is this even possible with map reduce? If yes, which language helps to
achieve this faster?

Thanks

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Robert Evans <ev...@yahoo-inc.com>.

@Subit

You can do it.  Here is some pseudo code for it in map/reduce.  It abuses Map/Reduce a little to be more performent.  But it is definitely doable.  At the end you will get a file for each reducer you have configured.  If you want a single file you can concatenate all of the files together ordered by the name of the file.  You should be able to do it in pig too, but you will need an input format that will give you the offset, and you will need to possibly have the reducer sort by the offset internally in the bag it is handed.  This may cause pig to have performance issues if it cannot keep the entire bag in memory to sort, which is why I did it in MR instead.

//Assuming Text input format where the key is the offset into the original input file, and there is only one input file. If there is more then one input file you need a way to include the ordering of the input files in the offset.
Map (LongWriteable offset, Text line)
    String[] parts = line.toString().split(',');
    for(long I = 0 ; I < parts.length; i++) {
        collect(new ColumnOffsetKey(offset, I), new Text(parts[I]));
    }
 }

//We need to know the max Columns ahead of time to get total order partitioning to work
Int partition(ColumnOffsetKey key) {
  return (int)(((double)key.column/MaxColumns)*numPartitions);
}

//You probably want to put in a binary comparator for performance reasons
int compare(ColumnOffsetKey key1, ColumnOffsetKey key2) {
    //First sort by column(which will become the new row) next sort by offset which will tell us the new column ordering
    if(key1.column > key2.column) {
      return 1;
    } else if(key1.column < key2.column) {
      return -1;
    } else if(key1.offset > key2.offset) {
      return 1;
    } else if (key1.offset < key2.offset) {
      return -1;
    }
    return 0;
}

StringBuffer currentRow = null;
Long currentRowNum = -1;

Reduce(RowOffsetKey key, Iterable<Text> part) {
//This is a bit ugly because we did need to detect changes to the row, there is probably a cleaner way to do this
    if(currentRowNum != key.column) {
        //Output the currentRow if needed
        if(currentRow != null) {
            collect(null, currentRow);
        }
        currentRow = new StringBuffer();
        currentRow.append(part);
        currentRowNum = key.column;
    } else {
        currentRow.append(',');
        currentRow.append(part);
    }
}

//This is called at the end of the reducer in the new API, ro something like it I don't remember the method name off the top of my head
cleanup() {
    if(currentRow != null)
      collect(null, currentRow);
}

On 6/22/12 5:35 AM, "Subir S" <su...@gmail.com> wrote:

Thank you for the inputs!

@Norbert,
 But a Group By column number clause also does not guarantee the order of
columns to be preserved. Like even the row number should be known so that
may be in the end we can sort each row based on the row number using a
nested FOREACH. But after that  FOREACH since sorting is not preserved, for
other operations again data may be in wrong order in the row.

To me it seems like it is not possible to do this in MR.


On Fri, Jun 22, 2012 at 12:56 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> That may be true, I have not read through the code very closely, if you
> have multiple reduces,  so you can run it with a single reduce or you can
> write a custom partitioner to do it.  You only need to know the length of
> the column, and then you can divide them up appropriately, kind of like how
> the total order partitioner does it.
>
> --Bobby Evans
>
> On 6/21/12 1:15 PM, "Norbert Burger" <no...@gmail.com> wrote:
>
> While it may be fine for many cases, If I'm reading the Nectar code
> correctly, that transpose doesn't guarantee anything about the order of
> rows within each column.  In other words, transposing:
>
> a - b -c
> d - e - f
> g - h - i
>
> may give you different permutations of "a - d - g" as the first row,
> depending on shuffle order.  You can trivially avoid this with one
> mapper/reducer, but then you're not exploiting the framework.  Note that
> you can accomplish same with a higher-level language like PIg by using a
> UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
> GROUPing BY column number.
>
> [1]
>
> https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java
>
> Norbert
>
> On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com>
> wrote:
>
> > Hi,
> >  Its possible in Map/Reduce. Look into the code here
> >
> >
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
> >
> >
> >
> > 2012/6/21 Subir S <su...@gmail.com>
> >
> > > Hi,
> > >
> > > Is it possible to implement transpose operation of rows into columns
> and
> > > vice versa...
> > >
> > >
> > > i.e.
> > >
> > > col1 col2 col3
> > > col4 col5 col6
> > > col7 col8 col9
> > > col10 col11 col12
> > >
> > > can this be converted to
> > >
> > > col1 col4 col7 col10
> > > col2 col5 col8 col11
> > > col3 col6 col9 col12
> > >
> > > Is this even possible with map reduce? If yes, which language helps to
> > > achieve this faster?
> > >
> > > Thanks
> > >
> >
> >
> >
> > --
> > https://github.com/zinnia-phatak-dev/Nectar
> >
>
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Robert Evans <ev...@yahoo-inc.com>.

@Subit

You can do it.  Here is some pseudo code for it in map/reduce.  It abuses Map/Reduce a little to be more performent.  But it is definitely doable.  At the end you will get a file for each reducer you have configured.  If you want a single file you can concatenate all of the files together ordered by the name of the file.  You should be able to do it in pig too, but you will need an input format that will give you the offset, and you will need to possibly have the reducer sort by the offset internally in the bag it is handed.  This may cause pig to have performance issues if it cannot keep the entire bag in memory to sort, which is why I did it in MR instead.

//Assuming Text input format where the key is the offset into the original input file, and there is only one input file. If there is more then one input file you need a way to include the ordering of the input files in the offset.
Map (LongWriteable offset, Text line)
    String[] parts = line.toString().split(',');
    for(long I = 0 ; I < parts.length; i++) {
        collect(new ColumnOffsetKey(offset, I), new Text(parts[I]));
    }
 }

//We need to know the max Columns ahead of time to get total order partitioning to work
Int partition(ColumnOffsetKey key) {
  return (int)(((double)key.column/MaxColumns)*numPartitions);
}

//You probably want to put in a binary comparator for performance reasons
int compare(ColumnOffsetKey key1, ColumnOffsetKey key2) {
    //First sort by column(which will become the new row) next sort by offset which will tell us the new column ordering
    if(key1.column > key2.column) {
      return 1;
    } else if(key1.column < key2.column) {
      return -1;
    } else if(key1.offset > key2.offset) {
      return 1;
    } else if (key1.offset < key2.offset) {
      return -1;
    }
    return 0;
}

StringBuffer currentRow = null;
Long currentRowNum = -1;

Reduce(RowOffsetKey key, Iterable<Text> part) {
//This is a bit ugly because we did need to detect changes to the row, there is probably a cleaner way to do this
    if(currentRowNum != key.column) {
        //Output the currentRow if needed
        if(currentRow != null) {
            collect(null, currentRow);
        }
        currentRow = new StringBuffer();
        currentRow.append(part);
        currentRowNum = key.column;
    } else {
        currentRow.append(',');
        currentRow.append(part);
    }
}

//This is called at the end of the reducer in the new API, ro something like it I don't remember the method name off the top of my head
cleanup() {
    if(currentRow != null)
      collect(null, currentRow);
}

On 6/22/12 5:35 AM, "Subir S" <su...@gmail.com> wrote:

Thank you for the inputs!

@Norbert,
 But a Group By column number clause also does not guarantee the order of
columns to be preserved. Like even the row number should be known so that
may be in the end we can sort each row based on the row number using a
nested FOREACH. But after that  FOREACH since sorting is not preserved, for
other operations again data may be in wrong order in the row.

To me it seems like it is not possible to do this in MR.


On Fri, Jun 22, 2012 at 12:56 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> That may be true, I have not read through the code very closely, if you
> have multiple reduces,  so you can run it with a single reduce or you can
> write a custom partitioner to do it.  You only need to know the length of
> the column, and then you can divide them up appropriately, kind of like how
> the total order partitioner does it.
>
> --Bobby Evans
>
> On 6/21/12 1:15 PM, "Norbert Burger" <no...@gmail.com> wrote:
>
> While it may be fine for many cases, If I'm reading the Nectar code
> correctly, that transpose doesn't guarantee anything about the order of
> rows within each column.  In other words, transposing:
>
> a - b -c
> d - e - f
> g - h - i
>
> may give you different permutations of "a - d - g" as the first row,
> depending on shuffle order.  You can trivially avoid this with one
> mapper/reducer, but then you're not exploiting the framework.  Note that
> you can accomplish same with a higher-level language like PIg by using a
> UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
> GROUPing BY column number.
>
> [1]
>
> https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java
>
> Norbert
>
> On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com>
> wrote:
>
> > Hi,
> >  Its possible in Map/Reduce. Look into the code here
> >
> >
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
> >
> >
> >
> > 2012/6/21 Subir S <su...@gmail.com>
> >
> > > Hi,
> > >
> > > Is it possible to implement transpose operation of rows into columns
> and
> > > vice versa...
> > >
> > >
> > > i.e.
> > >
> > > col1 col2 col3
> > > col4 col5 col6
> > > col7 col8 col9
> > > col10 col11 col12
> > >
> > > can this be converted to
> > >
> > > col1 col4 col7 col10
> > > col2 col5 col8 col11
> > > col3 col6 col9 col12
> > >
> > > Is this even possible with map reduce? If yes, which language helps to
> > > achieve this faster?
> > >
> > > Thanks
> > >
> >
> >
> >
> > --
> > https://github.com/zinnia-phatak-dev/Nectar
> >
>
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Subir S <su...@gmail.com>.

Thank you for the inputs!

@Norbert,
 But a Group By column number clause also does not guarantee the order of
columns to be preserved. Like even the row number should be known so that
may be in the end we can sort each row based on the row number using a
nested FOREACH. But after that  FOREACH since sorting is not preserved, for
other operations again data may be in wrong order in the row.

To me it seems like it is not possible to do this in MR.


On Fri, Jun 22, 2012 at 12:56 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> That may be true, I have not read through the code very closely, if you
> have multiple reduces,  so you can run it with a single reduce or you can
> write a custom partitioner to do it.  You only need to know the length of
> the column, and then you can divide them up appropriately, kind of like how
> the total order partitioner does it.
>
> --Bobby Evans
>
> On 6/21/12 1:15 PM, "Norbert Burger" <no...@gmail.com> wrote:
>
> While it may be fine for many cases, If I'm reading the Nectar code
> correctly, that transpose doesn't guarantee anything about the order of
> rows within each column.  In other words, transposing:
>
> a - b -c
> d - e - f
> g - h - i
>
> may give you different permutations of "a - d - g" as the first row,
> depending on shuffle order.  You can trivially avoid this with one
> mapper/reducer, but then you're not exploiting the framework.  Note that
> you can accomplish same with a higher-level language like PIg by using a
> UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
> GROUPing BY column number.
>
> [1]
>
> https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java
>
> Norbert
>
> On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com>
> wrote:
>
> > Hi,
> >  Its possible in Map/Reduce. Look into the code here
> >
> >
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
> >
> >
> >
> > 2012/6/21 Subir S <su...@gmail.com>
> >
> > > Hi,
> > >
> > > Is it possible to implement transpose operation of rows into columns
> and
> > > vice versa...
> > >
> > >
> > > i.e.
> > >
> > > col1 col2 col3
> > > col4 col5 col6
> > > col7 col8 col9
> > > col10 col11 col12
> > >
> > > can this be converted to
> > >
> > > col1 col4 col7 col10
> > > col2 col5 col8 col11
> > > col3 col6 col9 col12
> > >
> > > Is this even possible with map reduce? If yes, which language helps to
> > > achieve this faster?
> > >
> > > Thanks
> > >
> >
> >
> >
> > --
> > https://github.com/zinnia-phatak-dev/Nectar
> >
>
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Subir S <su...@gmail.com>.

Thank you for the inputs!

@Norbert,
 But a Group By column number clause also does not guarantee the order of
columns to be preserved. Like even the row number should be known so that
may be in the end we can sort each row based on the row number using a
nested FOREACH. But after that  FOREACH since sorting is not preserved, for
other operations again data may be in wrong order in the row.

To me it seems like it is not possible to do this in MR.


On Fri, Jun 22, 2012 at 12:56 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> That may be true, I have not read through the code very closely, if you
> have multiple reduces,  so you can run it with a single reduce or you can
> write a custom partitioner to do it.  You only need to know the length of
> the column, and then you can divide them up appropriately, kind of like how
> the total order partitioner does it.
>
> --Bobby Evans
>
> On 6/21/12 1:15 PM, "Norbert Burger" <no...@gmail.com> wrote:
>
> While it may be fine for many cases, If I'm reading the Nectar code
> correctly, that transpose doesn't guarantee anything about the order of
> rows within each column.  In other words, transposing:
>
> a - b -c
> d - e - f
> g - h - i
>
> may give you different permutations of "a - d - g" as the first row,
> depending on shuffle order.  You can trivially avoid this with one
> mapper/reducer, but then you're not exploiting the framework.  Note that
> you can accomplish same with a higher-level language like PIg by using a
> UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
> GROUPing BY column number.
>
> [1]
>
> https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java
>
> Norbert
>
> On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com>
> wrote:
>
> > Hi,
> >  Its possible in Map/Reduce. Look into the code here
> >
> >
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
> >
> >
> >
> > 2012/6/21 Subir S <su...@gmail.com>
> >
> > > Hi,
> > >
> > > Is it possible to implement transpose operation of rows into columns
> and
> > > vice versa...
> > >
> > >
> > > i.e.
> > >
> > > col1 col2 col3
> > > col4 col5 col6
> > > col7 col8 col9
> > > col10 col11 col12
> > >
> > > can this be converted to
> > >
> > > col1 col4 col7 col10
> > > col2 col5 col8 col11
> > > col3 col6 col9 col12
> > >
> > > Is this even possible with map reduce? If yes, which language helps to
> > > achieve this faster?
> > >
> > > Thanks
> > >
> >
> >
> >
> > --
> > https://github.com/zinnia-phatak-dev/Nectar
> >
>
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Robert Evans <ev...@yahoo-inc.com>.

That may be true, I have not read through the code very closely, if you have multiple reduces,  so you can run it with a single reduce or you can write a custom partitioner to do it.  You only need to know the length of the column, and then you can divide them up appropriately, kind of like how the total order partitioner does it.

--Bobby Evans

On 6/21/12 1:15 PM, "Norbert Burger" <no...@gmail.com> wrote:

While it may be fine for many cases, If I'm reading the Nectar code
correctly, that transpose doesn't guarantee anything about the order of
rows within each column.  In other words, transposing:

a - b -c
d - e - f
g - h - i

may give you different permutations of "a - d - g" as the first row,
depending on shuffle order.  You can trivially avoid this with one
mapper/reducer, but then you're not exploiting the framework.  Note that
you can accomplish same with a higher-level language like PIg by using a
UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
GROUPing BY column number.

[1]
https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java

Norbert

On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com> wrote:

> Hi,
>  Its possible in Map/Reduce. Look into the code here
>
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
>
>
>
> 2012/6/21 Subir S <su...@gmail.com>
>
> > Hi,
> >
> > Is it possible to implement transpose operation of rows into columns and
> > vice versa...
> >
> >
> > i.e.
> >
> > col1 col2 col3
> > col4 col5 col6
> > col7 col8 col9
> > col10 col11 col12
> >
> > can this be converted to
> >
> > col1 col4 col7 col10
> > col2 col5 col8 col11
> > col3 col6 col9 col12
> >
> > Is this even possible with map reduce? If yes, which language helps to
> > achieve this faster?
> >
> > Thanks
> >
>
>
>
> --
> https://github.com/zinnia-phatak-dev/Nectar
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Robert Evans <ev...@yahoo-inc.com>.

That may be true, I have not read through the code very closely, if you have multiple reduces,  so you can run it with a single reduce or you can write a custom partitioner to do it.  You only need to know the length of the column, and then you can divide them up appropriately, kind of like how the total order partitioner does it.

--Bobby Evans

On 6/21/12 1:15 PM, "Norbert Burger" <no...@gmail.com> wrote:

While it may be fine for many cases, If I'm reading the Nectar code
correctly, that transpose doesn't guarantee anything about the order of
rows within each column.  In other words, transposing:

a - b -c
d - e - f
g - h - i

may give you different permutations of "a - d - g" as the first row,
depending on shuffle order.  You can trivially avoid this with one
mapper/reducer, but then you're not exploiting the framework.  Note that
you can accomplish same with a higher-level language like PIg by using a
UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
GROUPing BY column number.

[1]
https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java

Norbert

On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com> wrote:

> Hi,
>  Its possible in Map/Reduce. Look into the code here
>
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
>
>
>
> 2012/6/21 Subir S <su...@gmail.com>
>
> > Hi,
> >
> > Is it possible to implement transpose operation of rows into columns and
> > vice versa...
> >
> >
> > i.e.
> >
> > col1 col2 col3
> > col4 col5 col6
> > col7 col8 col9
> > col10 col11 col12
> >
> > can this be converted to
> >
> > col1 col4 col7 col10
> > col2 col5 col8 col11
> > col3 col6 col9 col12
> >
> > Is this even possible with map reduce? If yes, which language helps to
> > achieve this faster?
> >
> > Thanks
> >
>
>
>
> --
> https://github.com/zinnia-phatak-dev/Nectar
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Norbert Burger <no...@gmail.com>.

While it may be fine for many cases, If I'm reading the Nectar code
correctly, that transpose doesn't guarantee anything about the order of
rows within each column.  In other words, transposing:

a - b -c
d - e - f
g - h - i

may give you different permutations of "a - d - g" as the first row,
depending on shuffle order.  You can trivially avoid this with one
mapper/reducer, but then you're not exploiting the framework.  Note that
you can accomplish same with a higher-level language like PIg by using a
UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
GROUPing BY column number.

[1]
https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java

Norbert

On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com> wrote:

> Hi,
>  Its possible in Map/Reduce. Look into the code here
>
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
>
>
>
> 2012/6/21 Subir S <su...@gmail.com>
>
> > Hi,
> >
> > Is it possible to implement transpose operation of rows into columns and
> > vice versa...
> >
> >
> > i.e.
> >
> > col1 col2 col3
> > col4 col5 col6
> > col7 col8 col9
> > col10 col11 col12
> >
> > can this be converted to
> >
> > col1 col4 col7 col10
> > col2 col5 col8 col11
> > col3 col6 col9 col12
> >
> > Is this even possible with map reduce? If yes, which language helps to
> > achieve this faster?
> >
> > Thanks
> >
>
>
>
> --
> https://github.com/zinnia-phatak-dev/Nectar
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Norbert Burger <no...@gmail.com>.

While it may be fine for many cases, If I'm reading the Nectar code
correctly, that transpose doesn't guarantee anything about the order of
rows within each column.  In other words, transposing:

a - b -c
d - e - f
g - h - i

may give you different permutations of "a - d - g" as the first row,
depending on shuffle order.  You can trivially avoid this with one
mapper/reducer, but then you're not exploiting the framework.  Note that
you can accomplish same with a higher-level language like PIg by using a
UDF like LinkedIn's Enumerate [1] to tag each column, and then simply
GROUPing BY column number.

[1]
https://raw.github.com/linkedin/datafu/master/src/java/datafu/pig/bags/Enumerate.java

Norbert

On Thu, Jun 21, 2012 at 5:00 AM, madhu phatak <ph...@gmail.com> wrote:

> Hi,
>  Its possible in Map/Reduce. Look into the code here
>
> https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce
>
>
>
> 2012/6/21 Subir S <su...@gmail.com>
>
> > Hi,
> >
> > Is it possible to implement transpose operation of rows into columns and
> > vice versa...
> >
> >
> > i.e.
> >
> > col1 col2 col3
> > col4 col5 col6
> > col7 col8 col9
> > col10 col11 col12
> >
> > can this be converted to
> >
> > col1 col4 col7 col10
> > col2 col5 col8 col11
> > col3 col6 col9 col12
> >
> > Is this even possible with map reduce? If yes, which language helps to
> > achieve this faster?
> >
> > Thanks
> >
>
>
>
> --
> https://github.com/zinnia-phatak-dev/Nectar
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by madhu phatak <ph...@gmail.com>.

Hi,
 Its possible in Map/Reduce. Look into the code here
https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce



2012/6/21 Subir S <su...@gmail.com>

> Hi,
>
> Is it possible to implement transpose operation of rows into columns and
> vice versa...
>
>
> i.e.
>
> col1 col2 col3
> col4 col5 col6
> col7 col8 col9
> col10 col11 col12
>
> can this be converted to
>
> col1 col4 col7 col10
> col2 col5 col8 col11
> col3 col6 col9 col12
>
> Is this even possible with map reduce? If yes, which language helps to
> achieve this faster?
>
> Thanks
>



-- 
https://github.com/zinnia-phatak-dev/Nectar

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by madhu phatak <ph...@gmail.com>.

Hi,
 Its possible in Map/Reduce. Look into the code here
https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce



2012/6/21 Subir S <su...@gmail.com>

> Hi,
>
> Is it possible to implement transpose operation of rows into columns and
> vice versa...
>
>
> i.e.
>
> col1 col2 col3
> col4 col5 col6
> col7 col8 col9
> col10 col11 col12
>
> can this be converted to
>
> col1 col4 col7 col10
> col2 col5 col8 col11
> col3 col6 col9 col12
>
> Is this even possible with map reduce? If yes, which language helps to
> achieve this faster?
>
> Thanks
>



-- 
https://github.com/zinnia-phatak-dev/Nectar

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Subir S <su...@gmail.com>.

That is great Simone! I have not tried your suggestion yet, but will surely
try it.

@Robert, thank you I will try that option too.

On Mon, Jun 25, 2012 at 3:12 PM, Simone Leo <si...@crs4.it> wrote:

> Hello,
>
> we recently added a tool for solving relatively simple problems like this
> one to Pydoop. The tool is called Pydoop Script:
>
> http://pydoop.sourceforge.net/**docs/pydoop_script.html#**pydoop-script<http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script>
>
> Using Pydoop Script, I implemented the transposer in 14 lines of code:
>
> import struct
>
> def mapper(key, value, writer):
>  value = value.split()
>  for i, a in enumerate(value):
>    writer.emit(struct.pack(">q", i), "%s\t%s" % (key, a))
>
> def reducer(key, ivalue, writer):
>  vector = []
>  for v in ivalue:
>    v = v.split("\t")
>    v[0] = struct.unpack(">q", v[0])[0]
>    vector.append(v)
>  vector.sort()
>  vector = [v[1] for v in vector]
>  writer.emit(struct.unpack(">q"**, key)[0], "\t".join(vector))
>
> Here is the complete workflow:
>
> hadoop fs -put matrix.txt{,}
> pydoop script transpose.py matrix.txt t_matrix
> hadoop fs -get t_matrix{,}
> sort -mn -k1,1 -o t_matrix.txt t_matrix/part-0000*
>
> The final t_matrix.txt actually contains an additional first column with
> row indexes that should be removed (but this can probably be avoided if the
> transposed matrix acts as input for another job). Although the above
> implementation can be improved in several ways, it took me just about 30
> minutes to write and test after seeing your message.
>
> Cheers
>
> Simone
>
>
> On 06/21/2012 10:16 AM, Subir S wrote:
>
>> Hi,
>>
>> Is it possible to implement transpose operation of rows into columns and
>> vice versa...
>>
>>
>> i.e.
>>
>> col1 col2 col3
>> col4 col5 col6
>> col7 col8 col9
>> col10 col11 col12
>>
>> can this be converted to
>>
>> col1 col4 col7 col10
>> col2 col5 col8 col11
>> col3 col6 col9 col12
>>
>> Is this even possible with map reduce? If yes, which language helps to
>> achieve this faster?
>>
>> Thanks
>>
>>
> --
> Simone Leo
> Data Fusion - Distributed Computing
> CRS4
> POLARIS - Building #1
> Piscina Manna
> I-09010 Pula (CA) - Italy
> e-mail: simone.leo@crs4.it
> http://www.crs4.it
>
>
>

Re: Is it possible to implement transpose with PigLatin/any other MR language?

Posted by Subir S <su...@gmail.com>.

That is great Simone! I have not tried your suggestion yet, but will surely
try it.

@Robert, thank you I will try that option too.

On Mon, Jun 25, 2012 at 3:12 PM, Simone Leo <si...@crs4.it> wrote:

> Hello,
>
> we recently added a tool for solving relatively simple problems like this
> one to Pydoop. The tool is called Pydoop Script:
>
> http://pydoop.sourceforge.net/**docs/pydoop_script.html#**pydoop-script<http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script>
>
> Using Pydoop Script, I implemented the transposer in 14 lines of code:
>
> import struct
>
> def mapper(key, value, writer):
>  value = value.split()
>  for i, a in enumerate(value):
>    writer.emit(struct.pack(">q", i), "%s\t%s" % (key, a))
>
> def reducer(key, ivalue, writer):
>  vector = []
>  for v in ivalue:
>    v = v.split("\t")
>    v[0] = struct.unpack(">q", v[0])[0]
>    vector.append(v)
>  vector.sort()
>  vector = [v[1] for v in vector]
>  writer.emit(struct.unpack(">q"**, key)[0], "\t".join(vector))
>
> Here is the complete workflow:
>
> hadoop fs -put matrix.txt{,}
> pydoop script transpose.py matrix.txt t_matrix
> hadoop fs -get t_matrix{,}
> sort -mn -k1,1 -o t_matrix.txt t_matrix/part-0000*
>
> The final t_matrix.txt actually contains an additional first column with
> row indexes that should be removed (but this can probably be avoided if the
> transposed matrix acts as input for another job). Although the above
> implementation can be improved in several ways, it took me just about 30
> minutes to write and test after seeing your message.
>
> Cheers
>
> Simone
>
>
> On 06/21/2012 10:16 AM, Subir S wrote:
>
>> Hi,
>>
>> Is it possible to implement transpose operation of rows into columns and
>> vice versa...
>>
>>
>> i.e.
>>
>> col1 col2 col3
>> col4 col5 col6
>> col7 col8 col9
>> col10 col11 col12
>>
>> can this be converted to
>>
>> col1 col4 col7 col10
>> col2 col5 col8 col11
>> col3 col6 col9 col12
>>
>> Is this even possible with map reduce? If yes, which language helps to
>> achieve this faster?
>>
>> Thanks
>>
>>
> --
> Simone Leo
> Data Fusion - Distributed Computing
> CRS4
> POLARIS - Building #1
> Piscina Manna
> I-09010 Pula (CA) - Italy
> e-mail: simone.leo@crs4.it
> http://www.crs4.it
>
>
>