You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Raj Hadoop <ha...@yahoo.com> on 2013/05/20 16:48:32 UTC

Low latency data access Vs High throughput of data

Hi,

I have a basic question on HDFS. I was reading that HDFS doesnt work well with 
low latency data access. Rather it is designed for the high throughput 
of data. Can you please explain in simple words the difference between 
"Low latency data access Vs High throughput of data".


Thanks,
Raj

Re: Low latency data access Vs High throughput of data

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Waoh! I know what latency is and what throughput is , but when someone asks
me this question , I was never able to answer it to me satisfaction. Now I
can.

Thanks a lot!


On Wed, May 22, 2013 at 12:21 AM, Jens Scheidtmann <
jens.scheidtmann@gmail.com> wrote:

> Hi Chris, hi Raj,
>
> in relational databases there are different targets for the optimizer:
> Return the first record as fast as possible, even if reading through the
> whole dataset takes longer (low latency)
> Return all rows as fast as possible, even if reading the first record may
> take a longer time (highest throughput)
>
> Best regards,
>
> Jens
>

Re: Low latency data access Vs High throughput of data

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Waoh! I know what latency is and what throughput is , but when someone asks
me this question , I was never able to answer it to me satisfaction. Now I
can.

Thanks a lot!


On Wed, May 22, 2013 at 12:21 AM, Jens Scheidtmann <
jens.scheidtmann@gmail.com> wrote:

> Hi Chris, hi Raj,
>
> in relational databases there are different targets for the optimizer:
> Return the first record as fast as possible, even if reading through the
> whole dataset takes longer (low latency)
> Return all rows as fast as possible, even if reading the first record may
> take a longer time (highest throughput)
>
> Best regards,
>
> Jens
>

Re: Low latency data access Vs High throughput of data

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Waoh! I know what latency is and what throughput is , but when someone asks
me this question , I was never able to answer it to me satisfaction. Now I
can.

Thanks a lot!


On Wed, May 22, 2013 at 12:21 AM, Jens Scheidtmann <
jens.scheidtmann@gmail.com> wrote:

> Hi Chris, hi Raj,
>
> in relational databases there are different targets for the optimizer:
> Return the first record as fast as possible, even if reading through the
> whole dataset takes longer (low latency)
> Return all rows as fast as possible, even if reading the first record may
> take a longer time (highest throughput)
>
> Best regards,
>
> Jens
>

Re: Low latency data access Vs High throughput of data

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Waoh! I know what latency is and what throughput is , but when someone asks
me this question , I was never able to answer it to me satisfaction. Now I
can.

Thanks a lot!


On Wed, May 22, 2013 at 12:21 AM, Jens Scheidtmann <
jens.scheidtmann@gmail.com> wrote:

> Hi Chris, hi Raj,
>
> in relational databases there are different targets for the optimizer:
> Return the first record as fast as possible, even if reading through the
> whole dataset takes longer (low latency)
> Return all rows as fast as possible, even if reading the first record may
> take a longer time (highest throughput)
>
> Best regards,
>
> Jens
>

Re: Low latency data access Vs High throughput of data

Posted by Jens Scheidtmann <je...@gmail.com>.

Hi Chris, hi Raj,

in relational databases there are different targets for the optimizer:
Return the first record as fast as possible, even if reading through the
whole dataset takes longer (low latency)
Return all rows as fast as possible, even if reading the first record may
take a longer time (highest throughput)

Best regards,

Jens

Re: Low latency data access Vs High throughput of data

Posted by Jens Scheidtmann <je...@gmail.com>.

Hi Chris, hi Raj,

in relational databases there are different targets for the optimizer:
Return the first record as fast as possible, even if reading through the
whole dataset takes longer (low latency)
Return all rows as fast as possible, even if reading the first record may
take a longer time (highest throughput)

Best regards,

Jens

Re: Low latency data access Vs High throughput of data

Posted by Jens Scheidtmann <je...@gmail.com>.

Hi Chris, hi Raj,

in relational databases there are different targets for the optimizer:
Return the first record as fast as possible, even if reading through the
whole dataset takes longer (low latency)
Return all rows as fast as possible, even if reading the first record may
take a longer time (highest throughput)

Best regards,

Jens

Re: Low latency data access Vs High throughput of data

Posted by Jens Scheidtmann <je...@gmail.com>.

Hi Chris, hi Raj,

in relational databases there are different targets for the optimizer:
Return the first record as fast as possible, even if reading through the
whole dataset takes longer (low latency)
Return all rows as fast as possible, even if reading the first record may
take a longer time (highest throughput)

Best regards,

Jens

Re: Low latency data access Vs High throughput of data

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi Chris,

Thanks for the explaination.

Regards,
Raj

________________________________
 From: Chris Embree <ce...@gmail.com>
To: user@hadoop.apache.org; Raj Hadoop <ha...@yahoo.com> 
Sent: Monday, May 20, 2013 1:51 PM
Subject: Re: Low latency data access Vs High throughput of data

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I expect results within seconds at most.  My database query time should be sub-second.
High throughput of data:  I want to scan millions of rows of data and count or sum some subset.  I expect this will take a few minutes (or much longer depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing of M/R jobs takes a bit of overhead.  There are a couple of projects working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.  This means that there will (almost) always be some network data transfer required to get the final answer, and that "slows" things down a bit, depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

Hi,
>
>
>I have a basic question on HDFS. I was reading that HDFS doesnt work well with 
low latency data access. Rather it is designed for the high throughput 
of data. Can you please explain in simple words the difference between 
"Low latency data access Vs High throughput of data".
>
>
>
>Thanks,
>Raj

Re: Low latency data access Vs High throughput of data

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi Chris,

Thanks for the explaination.

Regards,
Raj

________________________________
 From: Chris Embree <ce...@gmail.com>
To: user@hadoop.apache.org; Raj Hadoop <ha...@yahoo.com> 
Sent: Monday, May 20, 2013 1:51 PM
Subject: Re: Low latency data access Vs High throughput of data

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I expect results within seconds at most.  My database query time should be sub-second.
High throughput of data:  I want to scan millions of rows of data and count or sum some subset.  I expect this will take a few minutes (or much longer depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing of M/R jobs takes a bit of overhead.  There are a couple of projects working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.  This means that there will (almost) always be some network data transfer required to get the final answer, and that "slows" things down a bit, depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

Hi,
>
>
>I have a basic question on HDFS. I was reading that HDFS doesnt work well with 
low latency data access. Rather it is designed for the high throughput 
of data. Can you please explain in simple words the difference between 
"Low latency data access Vs High throughput of data".
>
>
>
>Thanks,
>Raj

Re: Low latency data access Vs High throughput of data

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi Chris,

Thanks for the explaination.

Regards,
Raj

________________________________
 From: Chris Embree <ce...@gmail.com>
To: user@hadoop.apache.org; Raj Hadoop <ha...@yahoo.com> 
Sent: Monday, May 20, 2013 1:51 PM
Subject: Re: Low latency data access Vs High throughput of data

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I expect results within seconds at most.  My database query time should be sub-second.
High throughput of data:  I want to scan millions of rows of data and count or sum some subset.  I expect this will take a few minutes (or much longer depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing of M/R jobs takes a bit of overhead.  There are a couple of projects working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.  This means that there will (almost) always be some network data transfer required to get the final answer, and that "slows" things down a bit, depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

Hi,
>
>
>I have a basic question on HDFS. I was reading that HDFS doesnt work well with 
low latency data access. Rather it is designed for the high throughput 
of data. Can you please explain in simple words the difference between 
"Low latency data access Vs High throughput of data".
>
>
>
>Thanks,
>Raj

Re: Low latency data access Vs High throughput of data

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi Chris,

Thanks for the explaination.

Regards,
Raj

________________________________
 From: Chris Embree <ce...@gmail.com>
To: user@hadoop.apache.org; Raj Hadoop <ha...@yahoo.com> 
Sent: Monday, May 20, 2013 1:51 PM
Subject: Re: Low latency data access Vs High throughput of data

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I expect results within seconds at most.  My database query time should be sub-second.
High throughput of data:  I want to scan millions of rows of data and count or sum some subset.  I expect this will take a few minutes (or much longer depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing of M/R jobs takes a bit of overhead.  There are a couple of projects working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.  This means that there will (almost) always be some network data transfer required to get the final answer, and that "slows" things down a bit, depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

Hi,
>
>
>I have a basic question on HDFS. I was reading that HDFS doesnt work well with 
low latency data access. Rather it is designed for the high throughput 
of data. Can you please explain in simple words the difference between 
"Low latency data access Vs High throughput of data".
>
>
>
>Thanks,
>Raj

Re: Low latency data access Vs High throughput of data

Posted by Chris Embree <ce...@gmail.com>.

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I
expect results within seconds at most.  My database query time should be
sub-second.
High throughput of data:  I want to scan millions of rows of data and count
or sum some subset.  I expect this will take a few minutes (or much longer
depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing
of M/R jobs takes a bit of overhead.  There are a couple of projects
working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.
 This means that there will (almost) always be some network data transfer
required to get the final answer, and that "slows" things down a bit,
depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> I have a basic question on HDFS. I was reading that HDFS doesnt work well
> with low latency data access. Rather it is designed for the high throughput
> of data. Can you please explain in simple words the difference between "Low
> latency data access Vs High throughput of data".
>
> Thanks,
> Raj
>

Re: Low latency data access Vs High throughput of data

Posted by Chris Embree <ce...@gmail.com>.

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I
expect results within seconds at most.  My database query time should be
sub-second.
High throughput of data:  I want to scan millions of rows of data and count
or sum some subset.  I expect this will take a few minutes (or much longer
depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing
of M/R jobs takes a bit of overhead.  There are a couple of projects
working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.
 This means that there will (almost) always be some network data transfer
required to get the final answer, and that "slows" things down a bit,
depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> I have a basic question on HDFS. I was reading that HDFS doesnt work well
> with low latency data access. Rather it is designed for the high throughput
> of data. Can you please explain in simple words the difference between "Low
> latency data access Vs High throughput of data".
>
> Thanks,
> Raj
>

Re: Low latency data access Vs High throughput of data

Posted by Chris Embree <ce...@gmail.com>.

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I
expect results within seconds at most.  My database query time should be
sub-second.
High throughput of data:  I want to scan millions of rows of data and count
or sum some subset.  I expect this will take a few minutes (or much longer
depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing
of M/R jobs takes a bit of overhead.  There are a couple of projects
working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.
 This means that there will (almost) always be some network data transfer
required to get the final answer, and that "slows" things down a bit,
depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> I have a basic question on HDFS. I was reading that HDFS doesnt work well
> with low latency data access. Rather it is designed for the high throughput
> of data. Can you please explain in simple words the difference between "Low
> latency data access Vs High throughput of data".
>
> Thanks,
> Raj
>

Re: Low latency data access Vs High throughput of data

Posted by Chris Embree <ce...@gmail.com>.

I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I
expect results within seconds at most.  My database query time should be
sub-second.
High throughput of data:  I want to scan millions of rows of data and count
or sum some subset.  I expect this will take a few minutes (or much longer
depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing
of M/R jobs takes a bit of overhead.  There are a couple of projects
working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.
 This means that there will (almost) always be some network data transfer
required to get the final answer, and that "slows" things down a bit,
depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> I have a basic question on HDFS. I was reading that HDFS doesnt work well
> with low latency data access. Rather it is designed for the high throughput
> of data. Can you please explain in simple words the difference between "Low
> latency data access Vs High throughput of data".
>
> Thanks,
> Raj
>