You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Kintzer <ro...@gmail.com> on 2010/03/03 21:09:25 UTC

Separate mail list for streaming?

Hi,

Was curious if anyone else thought it would be useful to have a separate
mail list for discussion/issues specific to Hadoop Streaming?

Thanks,

Michael

Re: dataset

Posted by Gang Luo <lg...@yahoo.com.cn>.
That is a good idea, but doesn't work in my case. What I want to do is to test how my partitioner could divide the workload properly. It is supposed to go against skew, but not to generate skew. I still need a skewed data source. Any ideas?

Thanks,
-Gang

 


----- 原始邮件 ----
发件人: Aaron Kimball <aa...@cloudera.com>
收件人: common-user@hadoop.apache.org
发送日期: 2010/3/3 (周三) 3:50:59 下午
主   题: Re: dataset

Look at implementing your own Partitioner implementation to control which
records are sent to which reduce shards.

- Aaron

On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <lg...@yahoo.com.cn> wrote:

> Hi all,
> I want to generate some datasets with data skew to test my mapreduce jobs.
> I am using TPC-DS but it seems I cannot control the data skew level. There
> is a suite from Microsoft that could generate skewed datasets based on
> TPC-D, but only workable in windows. I haven't succeed make it compilable in
> linux yet. Please tell me how can I get some skewed dataset.
>
> Thanks.
> -Gang
>
>
>
>
>



      

Re: dataset

Posted by Aaron Kimball <aa...@cloudera.com>.
Look at implementing your own Partitioner implementation to control which
records are sent to which reduce shards.

- Aaron

On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <lg...@yahoo.com.cn> wrote:

> Hi all,
> I want to generate some datasets with data skew to test my mapreduce jobs.
> I am using TPC-DS but it seems I cannot control the data skew level. There
> is a suite from Microsoft that could generate skewed datasets based on
> TPC-D, but only workable in windows. I haven't succeed make it compilable in
> linux yet. Please tell me how can I get some skewed dataset.
>
> Thanks.
> -Gang
>
>
>
>
>

dataset

Posted by Gang Luo <lg...@yahoo.com.cn>.
Hi all,
I want to generate some datasets with data skew to test my mapreduce jobs. I am using TPC-DS but it seems I cannot control the data skew level. There is a suite from Microsoft that could generate skewed datasets based on TPC-D, but only workable in windows. I haven't succeed make it compilable in linux yet. Please tell me how can I get some skewed dataset.

Thanks.
-Gang



      

Re: Separate mail list for streaming?

Posted by Michael Kintzer <ro...@gmail.com>.
I am thanks.   The community is great.   But the noise level is sometimes a
little high for me as a newbie.   A list dedicated to Streaming would be
easier to search and would be more focused.  But I totally understand the
concern.   No worries.    Was just curious if anyone else felt similarly.

-Michael

On Wed, Mar 3, 2010 at 12:51 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> We've already got a lot of mailing lists :) If you send questions to
> mapreduce-user, are you not getting enough feedback?
>
> - Aaron
>
> On Wed, Mar 3, 2010 at 12:09 PM, Michael Kintzer
> <ro...@gmail.com>wrote:
>
> > Hi,
> >
> > Was curious if anyone else thought it would be useful to have a separate
> > mail list for discussion/issues specific to Hadoop Streaming?
> >
> > Thanks,
> >
> > Michael
> >
>

Re: Separate mail list for streaming?

Posted by Aaron Kimball <aa...@cloudera.com>.
We've already got a lot of mailing lists :) If you send questions to
mapreduce-user, are you not getting enough feedback?

- Aaron

On Wed, Mar 3, 2010 at 12:09 PM, Michael Kintzer
<ro...@gmail.com>wrote:

> Hi,
>
> Was curious if anyone else thought it would be useful to have a separate
> mail list for discussion/issues specific to Hadoop Streaming?
>
> Thanks,
>
> Michael
>