You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by Russell Jurney <ru...@gmail.com> on 2014/10/02 18:19:26 UTC
Re: Review Request 25564: DATAFU-69: Create SelectFieldByName UDF -
which, given a field who's value contains a field name, and *,
returns the value of the field referenced by the field name
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/
-----------------------------------------------------------
(Updated Oct. 2, 2014, 4:19 p.m.)
Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah.
Summary (updated)
-----------------
DATAFU-69: Create SelectFieldByName UDF - which, given a field who's value contains a field name, and *, returns the value of the field referenced by the field name
Repository: datafu
Description
-------
Example use:
group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray);
with_group = CROSS group_fields, hour_rounded;
with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField,
hour_rounded::sourceNameOrIp AS sourceNameOrIp,
hour_rounded::destinationNameOrIp AS destinationNameOrIp,
...;
with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
with_value_substitution = FOREACH with_value_substitution GENERATE
FLATTEN(groupValue) AS groupValue:chararray,
groupField,
foo,
bar,
...;
all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
FLATTEN(group) AS (seriesType, groupValue, day),
(int)COUNT_STAR(with_value_substitution) AS connections:int;
Diffs
-----
datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java PRE-CREATION
datafu-pig/src/test/java/datafu/test/pig/util/SelectFieldByNameTest.java PRE-CREATION
Diff: https://reviews.apache.org/r/25564/diff/
Testing
-------
This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan.
Testing: unit tests and used on real data on a cluster.
Thanks,
Russell Jurney
Re: Review Request 25564: DATAFU-69: Create SelectFieldByName UDF -
which, given a field who's value contains a field name, and *,
returns the value of the field referenced by the field name
Posted by Russell Jurney <ru...@gmail.com>.
> On Nov. 3, 2014, 10:08 p.m., Matthew Hayes wrote:
> >
Awesome, thanks!
- Russell
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/#review59650
-----------------------------------------------------------
On Oct. 30, 2014, 9:13 p.m., Russell Jurney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25564/
> -----------------------------------------------------------
>
> (Updated Oct. 30, 2014, 9:13 p.m.)
>
>
> Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah.
>
>
> Repository: datafu
>
>
> Description
> -------
>
> Example use:
> group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray);
> with_group = CROSS group_fields, hour_rounded;
> with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField,
> hour_rounded::sourceNameOrIp AS sourceNameOrIp,
> hour_rounded::destinationNameOrIp AS destinationNameOrIp,
> ...;
> with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
> with_value_substitution = FOREACH with_value_substitution GENERATE
> FLATTEN(groupValue) AS groupValue:chararray,
> groupField,
> foo,
> bar,
> ...;
> all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
> FLATTEN(group) AS (seriesType, groupValue, day),
> (int)COUNT_STAR(with_value_substitution) AS connections:int;
>
>
> Diffs
> -----
>
> datafu-pig/src/main/java/datafu/pig/util/SelectStringFieldByName.java PRE-CREATION
> datafu-pig/src/test/java/datafu/test/pig/util/SelectStringFieldByNameTest.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/25564/diff/
>
>
> Testing
> -------
>
> This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan.
>
> Testing: unit tests and used on real data on a cluster.
>
>
> Thanks,
>
> Russell Jurney
>
>
Re: Review Request 25564: DATAFU-69: Create SelectFieldByName UDF -
which, given a field who's value contains a field name, and *,
returns the value of the field referenced by the field name
Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/#review59650
-----------------------------------------------------------
Ship it!
- Matthew Hayes
On Oct. 30, 2014, 9:13 p.m., Russell Jurney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25564/
> -----------------------------------------------------------
>
> (Updated Oct. 30, 2014, 9:13 p.m.)
>
>
> Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah.
>
>
> Repository: datafu
>
>
> Description
> -------
>
> Example use:
> group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray);
> with_group = CROSS group_fields, hour_rounded;
> with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField,
> hour_rounded::sourceNameOrIp AS sourceNameOrIp,
> hour_rounded::destinationNameOrIp AS destinationNameOrIp,
> ...;
> with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
> with_value_substitution = FOREACH with_value_substitution GENERATE
> FLATTEN(groupValue) AS groupValue:chararray,
> groupField,
> foo,
> bar,
> ...;
> all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
> FLATTEN(group) AS (seriesType, groupValue, day),
> (int)COUNT_STAR(with_value_substitution) AS connections:int;
>
>
> Diffs
> -----
>
> datafu-pig/src/main/java/datafu/pig/util/SelectStringFieldByName.java PRE-CREATION
> datafu-pig/src/test/java/datafu/test/pig/util/SelectStringFieldByNameTest.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/25564/diff/
>
>
> Testing
> -------
>
> This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan.
>
> Testing: unit tests and used on real data on a cluster.
>
>
> Thanks,
>
> Russell Jurney
>
>
Re: Review Request 25564: DATAFU-69: Create SelectFieldByName UDF -
which, given a field who's value contains a field name, and *,
returns the value of the field referenced by the field name
Posted by Russell Jurney <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/
-----------------------------------------------------------
(Updated Oct. 30, 2014, 9:13 p.m.)
Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah.
Changes
-------
Greatly simplified implementation and tests that assumes string input and returns a string output.
Repository: datafu
Description
-------
Example use:
group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray);
with_group = CROSS group_fields, hour_rounded;
with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField,
hour_rounded::sourceNameOrIp AS sourceNameOrIp,
hour_rounded::destinationNameOrIp AS destinationNameOrIp,
...;
with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
with_value_substitution = FOREACH with_value_substitution GENERATE
FLATTEN(groupValue) AS groupValue:chararray,
groupField,
foo,
bar,
...;
all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
FLATTEN(group) AS (seriesType, groupValue, day),
(int)COUNT_STAR(with_value_substitution) AS connections:int;
Diffs (updated)
-----
datafu-pig/src/main/java/datafu/pig/util/SelectStringFieldByName.java PRE-CREATION
datafu-pig/src/test/java/datafu/test/pig/util/SelectStringFieldByNameTest.java PRE-CREATION
Diff: https://reviews.apache.org/r/25564/diff/
Testing
-------
This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan.
Testing: unit tests and used on real data on a cluster.
Thanks,
Russell Jurney
Re: Review Request 25564: DATAFU-69: Create SelectFieldByName UDF -
which, given a field who's value contains a field name, and *,
returns the value of the field referenced by the field name
Posted by Russell Jurney <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/
-----------------------------------------------------------
(Updated Oct. 28, 2014, 7:28 p.m.)
Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah.
Changes
-------
Updated patch with new name, SelectStringFieldByName
Repository: datafu
Description
-------
Example use:
group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray);
with_group = CROSS group_fields, hour_rounded;
with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField,
hour_rounded::sourceNameOrIp AS sourceNameOrIp,
hour_rounded::destinationNameOrIp AS destinationNameOrIp,
...;
with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
with_value_substitution = FOREACH with_value_substitution GENERATE
FLATTEN(groupValue) AS groupValue:chararray,
groupField,
foo,
bar,
...;
all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
FLATTEN(group) AS (seriesType, groupValue, day),
(int)COUNT_STAR(with_value_substitution) AS connections:int;
Diffs (updated)
-----
datafu-pig/src/main/java/datafu/pig/util/SelectStringFieldByName.java PRE-CREATION
datafu-pig/src/test/java/datafu/test/pig/util/SelectStringFieldByNameTest.java PRE-CREATION
Diff: https://reviews.apache.org/r/25564/diff/
Testing
-------
This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan.
Testing: unit tests and used on real data on a cluster.
Thanks,
Russell Jurney