You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Wukang Lin <vb...@gmail.com> on 2013/08/07 18:28:08 UTC

Re: Is there any way to use a hdfs file as a Circular buffer?

Hi Niels and Bertrand,
    Thank you for you great advices.
    In our scenario, we need to store a steady stream of binary data into a
circular storage,throughput and concurrency are the most important
indicators.The first way seems work, but as  hdfs is not friendly for small
files, this approche may be not smooth enough.HBase is good, but  not
appropriate for us, both for throughput and storage.mongodb is quite good
for web applications, but not suitable the scenario we meet all the same.
    we need a distributed storage system,with Highe throughput, HA,LB and
secure. Maybe It act much like hbase, manager a lot of small file(hfile) as
a large region. we manager a lot of small file as a large one. Perhaps we
should develop it by ourselives.

Thank you.
Lin Wukang


2013/7/25 Niels Basjes <Ni...@basjes.nl>

> A circular file on hdfs is not possible.
>
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too
> much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
>
> Niels
>
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Lin,

It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.

-Sandy


On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <af...@linkedin.com> wrote:

> If every device can send it's information as a 'event', you could use a
> publish-subscribe messaging system like Apache Kafka. (
> http://kafka.apache.org/)  Kafka is designed to self-manage it's storage
> by saving the last 'n-events' of data, acting like a circular buffer.  The
> device would publish it's binary data to Kafka and Hadoop would act as a
> subscriber to Kafka by consuming events.   If you need a scheduler to make
> Hadoop process the Kafka events look at Azkaban as it supports both
> scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
>
> Remember Hadoop is batch processing so reports won't happen in real time.
>   If you need to run reports in real-time, watch the Samza project which
> uses Yarn and Kafka to process real-time streaming data. (
> http://incubator.apache.org/projects/samza.html)
>
> On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:
>
> > Hi Shekhar,
> >     Thank you for your replies.So far as I know, Storm is a distributed
> computing framework, but what we need is a storage system, high throughput
> and concurrency is matters.We have thousands of devices, each device will
> produce a steady stream of brinary data. The space for every device is
> fixed, so their should reuse the space on the disk.So, how can storm or
> esper achieve that?
> >
> > Many Thanks
> > Lin Wukang
> >
> >
> > 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> > Use CEP tool like Esper and Storm, you will be able to achieve that
> > ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> > Regards,
> > Som Shekhar Sharma
> > +91-8197243810
> >
> >
> > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com>
> wrote:
> > Hi Niels and Bertrand,
> >     Thank you for you great advices.
> >     In our scenario, we need to store a steady stream of binary data
> into a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
> >     we need a distributed storage system,with Highe throughput, HA,LB
> and secure. Maybe It act much like hbase, manager a lot of small
> file(hfile) as a large region. we manager a lot of small file as a large
> one. Perhaps we should develop it by ourselives.
> >
> > Thank you.
> > Lin Wukang
> >
> >
> > 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> > A circular file on hdfs is not possible.
> >
> > Some of the ways around this limitation:
> > - Create a series of files and delete the oldest file when you have too
> much.
> > - Put the data into an hbase table and do something similar.
> > - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
> >
> > Niels
> >
> > Hi all,
> >    Is there any way to use a hdfs file as a Circular buffer? I mean, if
> I set a quotas to a directory on hdfs, and writting data to a file in that
> directory continuously. Once the quotas exceeded, I can redirect the
> writter and write the data from the beginning of the file automatically .
> >
> >
> >
> >
>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Lin,

It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.

-Sandy


On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <af...@linkedin.com> wrote:

> If every device can send it's information as a 'event', you could use a
> publish-subscribe messaging system like Apache Kafka. (
> http://kafka.apache.org/)  Kafka is designed to self-manage it's storage
> by saving the last 'n-events' of data, acting like a circular buffer.  The
> device would publish it's binary data to Kafka and Hadoop would act as a
> subscriber to Kafka by consuming events.   If you need a scheduler to make
> Hadoop process the Kafka events look at Azkaban as it supports both
> scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
>
> Remember Hadoop is batch processing so reports won't happen in real time.
>   If you need to run reports in real-time, watch the Samza project which
> uses Yarn and Kafka to process real-time streaming data. (
> http://incubator.apache.org/projects/samza.html)
>
> On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:
>
> > Hi Shekhar,
> >     Thank you for your replies.So far as I know, Storm is a distributed
> computing framework, but what we need is a storage system, high throughput
> and concurrency is matters.We have thousands of devices, each device will
> produce a steady stream of brinary data. The space for every device is
> fixed, so their should reuse the space on the disk.So, how can storm or
> esper achieve that?
> >
> > Many Thanks
> > Lin Wukang
> >
> >
> > 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> > Use CEP tool like Esper and Storm, you will be able to achieve that
> > ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> > Regards,
> > Som Shekhar Sharma
> > +91-8197243810
> >
> >
> > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com>
> wrote:
> > Hi Niels and Bertrand,
> >     Thank you for you great advices.
> >     In our scenario, we need to store a steady stream of binary data
> into a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
> >     we need a distributed storage system,with Highe throughput, HA,LB
> and secure. Maybe It act much like hbase, manager a lot of small
> file(hfile) as a large region. we manager a lot of small file as a large
> one. Perhaps we should develop it by ourselives.
> >
> > Thank you.
> > Lin Wukang
> >
> >
> > 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> > A circular file on hdfs is not possible.
> >
> > Some of the ways around this limitation:
> > - Create a series of files and delete the oldest file when you have too
> much.
> > - Put the data into an hbase table and do something similar.
> > - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
> >
> > Niels
> >
> > Hi all,
> >    Is there any way to use a hdfs file as a Circular buffer? I mean, if
> I set a quotas to a directory on hdfs, and writting data to a file in that
> directory continuously. Once the quotas exceeded, I can redirect the
> writter and write the data from the beginning of the file automatically .
> >
> >
> >
> >
>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Lin,

It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.

-Sandy


On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <af...@linkedin.com> wrote:

> If every device can send it's information as a 'event', you could use a
> publish-subscribe messaging system like Apache Kafka. (
> http://kafka.apache.org/)  Kafka is designed to self-manage it's storage
> by saving the last 'n-events' of data, acting like a circular buffer.  The
> device would publish it's binary data to Kafka and Hadoop would act as a
> subscriber to Kafka by consuming events.   If you need a scheduler to make
> Hadoop process the Kafka events look at Azkaban as it supports both
> scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
>
> Remember Hadoop is batch processing so reports won't happen in real time.
>   If you need to run reports in real-time, watch the Samza project which
> uses Yarn and Kafka to process real-time streaming data. (
> http://incubator.apache.org/projects/samza.html)
>
> On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:
>
> > Hi Shekhar,
> >     Thank you for your replies.So far as I know, Storm is a distributed
> computing framework, but what we need is a storage system, high throughput
> and concurrency is matters.We have thousands of devices, each device will
> produce a steady stream of brinary data. The space for every device is
> fixed, so their should reuse the space on the disk.So, how can storm or
> esper achieve that?
> >
> > Many Thanks
> > Lin Wukang
> >
> >
> > 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> > Use CEP tool like Esper and Storm, you will be able to achieve that
> > ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> > Regards,
> > Som Shekhar Sharma
> > +91-8197243810
> >
> >
> > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com>
> wrote:
> > Hi Niels and Bertrand,
> >     Thank you for you great advices.
> >     In our scenario, we need to store a steady stream of binary data
> into a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
> >     we need a distributed storage system,with Highe throughput, HA,LB
> and secure. Maybe It act much like hbase, manager a lot of small
> file(hfile) as a large region. we manager a lot of small file as a large
> one. Perhaps we should develop it by ourselives.
> >
> > Thank you.
> > Lin Wukang
> >
> >
> > 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> > A circular file on hdfs is not possible.
> >
> > Some of the ways around this limitation:
> > - Create a series of files and delete the oldest file when you have too
> much.
> > - Put the data into an hbase table and do something similar.
> > - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
> >
> > Niels
> >
> > Hi all,
> >    Is there any way to use a hdfs file as a Circular buffer? I mean, if
> I set a quotas to a directory on hdfs, and writting data to a file in that
> directory continuously. Once the quotas exceeded, I can redirect the
> writter and write the data from the beginning of the file automatically .
> >
> >
> >
> >
>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Lin,

It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.

-Sandy


On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <af...@linkedin.com> wrote:

> If every device can send it's information as a 'event', you could use a
> publish-subscribe messaging system like Apache Kafka. (
> http://kafka.apache.org/)  Kafka is designed to self-manage it's storage
> by saving the last 'n-events' of data, acting like a circular buffer.  The
> device would publish it's binary data to Kafka and Hadoop would act as a
> subscriber to Kafka by consuming events.   If you need a scheduler to make
> Hadoop process the Kafka events look at Azkaban as it supports both
> scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
>
> Remember Hadoop is batch processing so reports won't happen in real time.
>   If you need to run reports in real-time, watch the Samza project which
> uses Yarn and Kafka to process real-time streaming data. (
> http://incubator.apache.org/projects/samza.html)
>
> On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:
>
> > Hi Shekhar,
> >     Thank you for your replies.So far as I know, Storm is a distributed
> computing framework, but what we need is a storage system, high throughput
> and concurrency is matters.We have thousands of devices, each device will
> produce a steady stream of brinary data. The space for every device is
> fixed, so their should reuse the space on the disk.So, how can storm or
> esper achieve that?
> >
> > Many Thanks
> > Lin Wukang
> >
> >
> > 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> > Use CEP tool like Esper and Storm, you will be able to achieve that
> > ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> > Regards,
> > Som Shekhar Sharma
> > +91-8197243810
> >
> >
> > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com>
> wrote:
> > Hi Niels and Bertrand,
> >     Thank you for you great advices.
> >     In our scenario, we need to store a steady stream of binary data
> into a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
> >     we need a distributed storage system,with Highe throughput, HA,LB
> and secure. Maybe It act much like hbase, manager a lot of small
> file(hfile) as a large region. we manager a lot of small file as a large
> one. Perhaps we should develop it by ourselives.
> >
> > Thank you.
> > Lin Wukang
> >
> >
> > 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> > A circular file on hdfs is not possible.
> >
> > Some of the ways around this limitation:
> > - Create a series of files and delete the oldest file when you have too
> much.
> > - Put the data into an hbase table and do something similar.
> > - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
> >
> > Niels
> >
> > Hi all,
> >    Is there any way to use a hdfs file as a Circular buffer? I mean, if
> I set a quotas to a directory on hdfs, and writting data to a file in that
> directory continuously. Once the quotas exceeded, I can redirect the
> writter and write the data from the beginning of the file automatically .
> >
> >
> >
> >
>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Adam Faris <af...@linkedin.com>.

If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer.  The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)  

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html) 

On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?
> 
> Many Thanks
> Lin Wukang
> 
> 
> 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
> 
> 
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.
> 
> Thank you.
> Lin Wukang
> 
> 
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> A circular file on hdfs is not possible.
> 
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).
> 
> Niels
> 
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
> 
> 
> 
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Adam Faris <af...@linkedin.com>.

If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer.  The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)  

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html) 

On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?
> 
> Many Thanks
> Lin Wukang
> 
> 
> 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
> 
> 
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.
> 
> Thank you.
> Lin Wukang
> 
> 
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> A circular file on hdfs is not possible.
> 
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).
> 
> Niels
> 
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
> 
> 
> 
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Adam Faris <af...@linkedin.com>.

If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer.  The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)  

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html) 

On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?
> 
> Many Thanks
> Lin Wukang
> 
> 
> 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
> 
> 
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.
> 
> Thank you.
> Lin Wukang
> 
> 
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> A circular file on hdfs is not possible.
> 
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).
> 
> Niels
> 
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
> 
> 
> 
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Adam Faris <af...@linkedin.com>.

If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer.  The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)  

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html) 

On Aug 7, 2013, at 9:59 AM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?
> 
> Many Thanks
> Lin Wukang
> 
> 
> 2013/8/8 Shekhar Sharma <sh...@gmail.com>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
> 
> 
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.
> 
> Thank you.
> Lin Wukang
> 
> 
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
> A circular file on hdfs is not possible.
> 
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).
> 
> Niels
> 
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
> 
> 
> 
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Wukang Lin <vb...@gmail.com>.

Hi Shekhar,
    Thank you for your replies.So far as I know, Storm is a distributed
computing framework, but what we need is a storage system, high throughput
and concurrency is matters.We have thousands of devices, each device will
produce a steady stream of brinary data. The space for every device is
fixed, so their should reuse the space on the disk.So, how can storm or
esper achieve that?

Many Thanks
Lin Wukang


2013/8/8 Shekhar Sharma <sh...@gmail.com>

> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
>
>> Hi Niels and Bertrand,
>>     Thank you for you great advices.
>>     In our scenario, we need to store a steady stream of binary data into
>> a circular storage,throughput and concurrency are the most important
>> indicators.The first way seems work, but as  hdfs is not friendly for small
>> files, this approche may be not smooth enough.HBase is good, but  not
>> appropriate for us, both for throughput and storage.mongodb is quite
>> good for web applications, but not suitable the scenario we meet all the
>> same.
>>     we need a distributed storage system,with Highe throughput, HA,LB
>> and secure. Maybe It act much like hbase, manager a lot of small
>> file(hfile) as a large region. we manager a lot of small file as a large
>> one. Perhaps we should develop it by ourselives.
>>
>> Thank you.
>> Lin Wukang
>>
>>
>> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>>
>>> A circular file on hdfs is not possible.
>>>
>>> Some of the ways around this limitation:
>>> - Create a series of files and delete the oldest file when you have too
>>> much.
>>> - Put the data into an hbase table and do something similar.
>>> - Use completely different technology like mongodb which has built in
>>> support for a circular buffer (capped collection).
>>>
>>> Niels
>>>
>>> Hi all,
>>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>>
>>>
>>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Wukang Lin <vb...@gmail.com>.

Hi Shekhar,
    Thank you for your replies.So far as I know, Storm is a distributed
computing framework, but what we need is a storage system, high throughput
and concurrency is matters.We have thousands of devices, each device will
produce a steady stream of brinary data. The space for every device is
fixed, so their should reuse the space on the disk.So, how can storm or
esper achieve that?

Many Thanks
Lin Wukang


2013/8/8 Shekhar Sharma <sh...@gmail.com>

> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
>
>> Hi Niels and Bertrand,
>>     Thank you for you great advices.
>>     In our scenario, we need to store a steady stream of binary data into
>> a circular storage,throughput and concurrency are the most important
>> indicators.The first way seems work, but as  hdfs is not friendly for small
>> files, this approche may be not smooth enough.HBase is good, but  not
>> appropriate for us, both for throughput and storage.mongodb is quite
>> good for web applications, but not suitable the scenario we meet all the
>> same.
>>     we need a distributed storage system,with Highe throughput, HA,LB
>> and secure. Maybe It act much like hbase, manager a lot of small
>> file(hfile) as a large region. we manager a lot of small file as a large
>> one. Perhaps we should develop it by ourselives.
>>
>> Thank you.
>> Lin Wukang
>>
>>
>> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>>
>>> A circular file on hdfs is not possible.
>>>
>>> Some of the ways around this limitation:
>>> - Create a series of files and delete the oldest file when you have too
>>> much.
>>> - Put the data into an hbase table and do something similar.
>>> - Use completely different technology like mongodb which has built in
>>> support for a circular buffer (capped collection).
>>>
>>> Niels
>>>
>>> Hi all,
>>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>>
>>>
>>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Wukang Lin <vb...@gmail.com>.

Hi Shekhar,
    Thank you for your replies.So far as I know, Storm is a distributed
computing framework, but what we need is a storage system, high throughput
and concurrency is matters.We have thousands of devices, each device will
produce a steady stream of brinary data. The space for every device is
fixed, so their should reuse the space on the disk.So, how can storm or
esper achieve that?

Many Thanks
Lin Wukang


2013/8/8 Shekhar Sharma <sh...@gmail.com>

> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
>
>> Hi Niels and Bertrand,
>>     Thank you for you great advices.
>>     In our scenario, we need to store a steady stream of binary data into
>> a circular storage,throughput and concurrency are the most important
>> indicators.The first way seems work, but as  hdfs is not friendly for small
>> files, this approche may be not smooth enough.HBase is good, but  not
>> appropriate for us, both for throughput and storage.mongodb is quite
>> good for web applications, but not suitable the scenario we meet all the
>> same.
>>     we need a distributed storage system,with Highe throughput, HA,LB
>> and secure. Maybe It act much like hbase, manager a lot of small
>> file(hfile) as a large region. we manager a lot of small file as a large
>> one. Perhaps we should develop it by ourselives.
>>
>> Thank you.
>> Lin Wukang
>>
>>
>> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>>
>>> A circular file on hdfs is not possible.
>>>
>>> Some of the ways around this limitation:
>>> - Create a series of files and delete the oldest file when you have too
>>> much.
>>> - Put the data into an hbase table and do something similar.
>>> - Use completely different technology like mongodb which has built in
>>> support for a circular buffer (capped collection).
>>>
>>> Niels
>>>
>>> Hi all,
>>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>>
>>>
>>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Wukang Lin <vb...@gmail.com>.

Hi Shekhar,
    Thank you for your replies.So far as I know, Storm is a distributed
computing framework, but what we need is a storage system, high throughput
and concurrency is matters.We have thousands of devices, each device will
produce a steady stream of brinary data. The space for every device is
fixed, so their should reuse the space on the disk.So, how can storm or
esper achieve that?

Many Thanks
Lin Wukang


2013/8/8 Shekhar Sharma <sh...@gmail.com>

> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:
>
>> Hi Niels and Bertrand,
>>     Thank you for you great advices.
>>     In our scenario, we need to store a steady stream of binary data into
>> a circular storage,throughput and concurrency are the most important
>> indicators.The first way seems work, but as  hdfs is not friendly for small
>> files, this approche may be not smooth enough.HBase is good, but  not
>> appropriate for us, both for throughput and storage.mongodb is quite
>> good for web applications, but not suitable the scenario we meet all the
>> same.
>>     we need a distributed storage system,with Highe throughput, HA,LB
>> and secure. Maybe It act much like hbase, manager a lot of small
>> file(hfile) as a large region. we manager a lot of small file as a large
>> one. Perhaps we should develop it by ourselives.
>>
>> Thank you.
>> Lin Wukang
>>
>>
>> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>>
>>> A circular file on hdfs is not possible.
>>>
>>> Some of the ways around this limitation:
>>> - Create a series of files and delete the oldest file when you have too
>>> much.
>>> - Put the data into an hbase table and do something similar.
>>> - Use completely different technology like mongodb which has built in
>>> support for a circular buffer (capped collection).
>>>
>>> Niels
>>>
>>> Hi all,
>>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>>
>>>
>>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Shekhar Sharma <sh...@gmail.com>.

Use CEP tool like Esper and Storm, you will be able to achieve that
...I can give you more inputs if you can provide me more details of what
you are trying to achieve
Regards,
Som Shekhar Sharma
+91-8197243810


On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into
> a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and
> secure. Maybe It act much like hbase, manager a lot of small file(hfile) as
> a large region. we manager a lot of small file as a large one. Perhaps we
> should develop it by ourselives.
>
> Thank you.
> Lin Wukang
>
>
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>
>> A circular file on hdfs is not possible.
>>
>> Some of the ways around this limitation:
>> - Create a series of files and delete the oldest file when you have too
>> much.
>> - Put the data into an hbase table and do something similar.
>> - Use completely different technology like mongodb which has built in
>> support for a circular buffer (capped collection).
>>
>> Niels
>>
>> Hi all,
>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>
>>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Shekhar Sharma <sh...@gmail.com>.

Use CEP tool like Esper and Storm, you will be able to achieve that
...I can give you more inputs if you can provide me more details of what
you are trying to achieve
Regards,
Som Shekhar Sharma
+91-8197243810


On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into
> a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and
> secure. Maybe It act much like hbase, manager a lot of small file(hfile) as
> a large region. we manager a lot of small file as a large one. Perhaps we
> should develop it by ourselives.
>
> Thank you.
> Lin Wukang
>
>
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>
>> A circular file on hdfs is not possible.
>>
>> Some of the ways around this limitation:
>> - Create a series of files and delete the oldest file when you have too
>> much.
>> - Put the data into an hbase table and do something similar.
>> - Use completely different technology like mongodb which has built in
>> support for a circular buffer (capped collection).
>>
>> Niels
>>
>> Hi all,
>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>
>>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Shekhar Sharma <sh...@gmail.com>.

Use CEP tool like Esper and Storm, you will be able to achieve that
...I can give you more inputs if you can provide me more details of what
you are trying to achieve
Regards,
Som Shekhar Sharma
+91-8197243810


On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into
> a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and
> secure. Maybe It act much like hbase, manager a lot of small file(hfile) as
> a large region. we manager a lot of small file as a large one. Perhaps we
> should develop it by ourselives.
>
> Thank you.
> Lin Wukang
>
>
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>
>> A circular file on hdfs is not possible.
>>
>> Some of the ways around this limitation:
>> - Create a series of files and delete the oldest file when you have too
>> much.
>> - Put the data into an hbase table and do something similar.
>> - Use completely different technology like mongodb which has built in
>> support for a circular buffer (capped collection).
>>
>> Niels
>>
>> Hi all,
>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>
>>
>

Re: Is there any way to use a hdfs file as a Circular buffer?

Posted by Shekhar Sharma <sh...@gmail.com>.

Use CEP tool like Esper and Storm, you will be able to achieve that
...I can give you more inputs if you can provide me more details of what
you are trying to achieve
Regards,
Som Shekhar Sharma
+91-8197243810


On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vb...@gmail.com> wrote:

> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into
> a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and
> secure. Maybe It act much like hbase, manager a lot of small file(hfile) as
> a large region. we manager a lot of small file as a large one. Perhaps we
> should develop it by ourselives.
>
> Thank you.
> Lin Wukang
>
>
> 2013/7/25 Niels Basjes <Ni...@basjes.nl>
>
>> A circular file on hdfs is not possible.
>>
>> Some of the ways around this limitation:
>> - Create a series of files and delete the oldest file when you have too
>> much.
>> - Put the data into an hbase table and do something similar.
>> - Use completely different technology like mongodb which has built in
>> support for a circular buffer (capped collection).
>>
>> Niels
>>
>> Hi all,
>>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>>
>>
>