You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@chukwa.apache.org by James Seigel <ja...@tynt.com> on 2011/06/01 22:01:47 UTC

Re: speeding up demux

Hello!

I am seriously considering what you are suggesting in this email, even though it goes against what would seem to make sense.  I have a couple of questions if anyone has the time to answer.

1) How stable is trunk right now?
2) Any performance improvements/degredations since 0.3
3) Is there a pseudo change log between “trunk” and 0.4 that I could take a peak at at this point
4) does it compile ;)

Cheers and thanks for your time!

James.

On 2011-05-27, at 9:58 AM, Eric Yang wrote:

> I would recommend to skip Chukwa 0.4 and go to the trunk.  In addition, use HBaseWriter to stream data into HBase in parallel, hence, the data can be processed in near real time for demux.
> 
> Regards,
> Eric
> 
> On 5/26/11 8:30 PM, "Bill Graham" <bi...@gmail.com> wrote:
> 
> This seems possible, but one thing that would need to be changed is the directories that demux uses. For example:
> demuxProcessing/mrInput
> demuxProcessing/mrOutput
> 
> These would need to dynamic directories with the timestamp or something else in them to keep two jobs from interfering with each other.
> 
> On Thu, May 26, 2011 at 8:23 PM, Corbin Hoenes <co...@tynt.com> wrote:
> Finding demux to be a bit too slow for our needs.  It seems like only 1 runs at a time; is there some technical reason why we couldn't run a couple in parallel?  If so any hints on how difficult it would be to run multiple demuxers at a time?
> 
> 
> 
>

Re: speeding up demux

Posted by Ariel Rabkin <as...@gmail.com>.

Yes.

Trunk should be stable at this point.

Performance should be about the same as 0.3 or 0.4 -- most Chukwa
development has been geared to features and bugfixes.

CHANGES.txt in trunk is the changelist since 0.4

Trunk compiles, last I checked.

--Ari

On Wed, Jun 1, 2011 at 1:01 PM, James Seigel <ja...@tynt.com> wrote:
> Hello!
> I am seriously considering what you are suggesting in this email, even
> though it goes against what would seem to make sense.  I have a couple of
> questions if anyone has the time to answer.
> 1) How stable is trunk right now?
> 2) Any performance improvements/degredations since 0.3
> 3) Is there a pseudo change log between “trunk” and 0.4 that I could take a
> peak at at this point
> 4) does it compile ;)
> Cheers and thanks for your time!
> James.
>
> On 2011-05-27, at 9:58 AM, Eric Yang wrote:
>
> I would recommend to skip Chukwa 0.4 and go to the trunk.  In addition, use
> HBaseWriter to stream data into HBase in parallel, hence, the data can be
> processed in near real time for demux.
>
> Regards,
> Eric
>
> On 5/26/11 8:30 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> This seems possible, but one thing that would need to be changed is the
> directories that demux uses. For example:
> demuxProcessing/mrInput
> demuxProcessing/mrOutput
>
> These would need to dynamic directories with the timestamp or something else
> in them to keep two jobs from interfering with each other.
>
> On Thu, May 26, 2011 at 8:23 PM, Corbin Hoenes <co...@tynt.com> wrote:
>
> Finding demux to be a bit too slow for our needs.  It seems like only 1 runs
> at a time; is there some technical reason why we couldn't run a couple in
> parallel?  If so any hints on how difficult it would be to run multiple
> demuxers at a time?
>
>
>
>
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: speeding up demux

Posted by James Seigel <ja...@tynt.com>.

Thank you.  Just evaluating where we are going to go from 0.3.0  :)

J


On 2011-06-01, at 2:33 PM, Eric Yang wrote:

> Hi James,
> 
> 1) Trunk is most stable than any previous release, but it needs more documentation.
> 2) Performance is the same for sequence file writer, and 200-300X faster data availability, if the data is streamed to HBase.
> 3) Check out http://svn.apache.org/repos/asf/incubator/chukwa/trunk/CHANGES.txt
> 4) Yes it does.  Let us know if there is any questions.
> 
> The setup instruction is located at: http://wiki.apache.org/hadoop/Chukwa_Quick_Start
> 
> Hope it works for you. :)
> 
> Regards,
> Eric
> 
> On 6/1/11 1:01 PM, "James Seigel" <ja...@tynt.com> wrote:
> 
> Hello!
> 
> I am seriously considering what you are suggesting in this email, even though it goes against what would seem to make sense.  I have a couple of questions if anyone has the time to answer.
> 
> 1) How stable is trunk right now?
> 2) Any performance improvements/degredations since 0.3
> 3) Is there a pseudo change log between “trunk” and 0.4 that I could take a peak at at this point
> 4) does it compile ;)
> 
> Cheers and thanks for your time!
> 
> James.
> 
> 
> On 2011-05-27, at 9:58 AM, Eric Yang wrote:
> 
> I would recommend to skip Chukwa 0.4 and go to the trunk.  In addition, use HBaseWriter to stream data into HBase in parallel, hence, the data can be processed in near real time for demux.
> 
> Regards,
> Eric
> 
> On 5/26/11 8:30 PM, "Bill Graham" <billgraham@gmail.com <x-...@gmail.com> > wrote:
> 
> This seems possible, but one thing that would need to be changed is the directories that demux uses. For example:
> demuxProcessing/mrInput
> demuxProcessing/mrOutput
> 
> These would need to dynamic directories with the timestamp or something else in them to keep two jobs from interfering with each other.
> 
> On Thu, May 26, 2011 at 8:23 PM, Corbin Hoenes <corbin@tynt.com <x-...@tynt.com> > wrote:
> Finding demux to be a bit too slow for our needs.  It seems like only 1 runs at a time; is there some technical reason why we couldn't run a couple in parallel?  If so any hints on how difficult it would be to run multiple demuxers at a time?
> 
> 
> 
> 
> 
>

Re: speeding up demux

Posted by Eric Yang <ey...@yahoo-inc.com>.

Hi James,

1) Trunk is most stable than any previous release, but it needs more documentation.
2) Performance is the same for sequence file writer, and 200-300X faster data availability, if the data is streamed to HBase.
3) Check out http://svn.apache.org/repos/asf/incubator/chukwa/trunk/CHANGES.txt
4) Yes it does. Let us know if there is any questions.

The setup instruction is located at: http://wiki.apache.org/hadoop/Chukwa_Quick_Start

Hope it works for you. :)

Regards,
Eric

On 6/1/11 1:01 PM, "James Seigel" <ja...@tynt.com> wrote:

Hello!

I am seriously considering what you are suggesting in this email, even though it goes against what would seem to make sense. I have a couple of questions if anyone has the time to answer.

1) How stable is trunk right now?
2) Any performance improvements/degredations since 0.3
3) Is there a pseudo change log between "trunk" and 0.4 that I could take a peak at at this point
4) does it compile ;)

Cheers and thanks for your time!

James.

On 2011-05-27, at 9:58 AM, Eric Yang wrote:

I would recommend to skip Chukwa 0.4 and go to the trunk. In addition, use HBaseWriter to stream data into HBase in parallel, hence, the data can be processed in near real time for demux.

Regards,
Eric

On 5/26/11 8:30 PM, "Bill Graham" <billgraham@gmail.com <x-...@gmail.com> > wrote:

This seems possible, but one thing that would need to be changed is the directories that demux uses. For example:
demuxProcessing/mrInput
demuxProcessing/mrOutput

These would need to dynamic directories with the timestamp or something else in them to keep two jobs from interfering with each other.

On Thu, May 26, 2011 at 8:23 PM, Corbin Hoenes <corbin@tynt.com <x-...@tynt.com> > wrote:
Finding demux to be a bit too slow for our needs. It seems like only 1 runs at a time; is there some technical reason why we couldn't run a couple in parallel? If so any hints on how difficult it would be to run multiple demuxers at a time?