You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Lauren Taylor <ch...@gmail.com> on 2020/09/21 19:49:21 UTC

RE: Question About Feasibility of Hadoop Over P2P Architecture

Hello!

I am currently provisioning a P2P peer network (there are already a couple
of live networks that have been created, but we do not want to test this in
production, fo course).

In this p2p network, I was looking at the best ways in which one could
distribute file storage (and access it to) in an efficient manner.

The difference between this solution & Bittorrent (DHT / mainline
DHT), is *that
all of the files that are uploaded to the network are meant to be stored
and distributed*.

Putting the complexities of that to the side (the sustainability of that
proposal has been accounted for), I am wondering whether Apache Hadoop
would be a good structure to run on top of that system.

*Why I Ask*
The p2p structure of this protocol is absolutely essential to its
functioning. Thus, if I am going to leverage it for the purposes of storage
/ distribution, it is imperative that I ensure I'm not injecting something
into the ecosystem that could ultimately harm it (i.e., DoS vulnerability).

*Hadoop-LAFS?*
I was on the 'Tahoe-LAFS' website and I saw that there was a proposal for
'Hadoop-LAFS' - which is a deployment of Apache Hadoop over top of the
Tahoe-LAFS layer.

According to the project description given by Google's Code Archive, this
allows for:

"Provides an integration layer between Tahoe LAFS and Hadoop so Map Reduce
> jobs can be run over encrypted data stored in Tahoe."
>

Any and all answers would help a ton, thank you!

Sincerely,
Buck Wiston

Re: Question About Feasibility of Hadoop Over P2P Architecture

Posted by Hariharan <ha...@gmail.com>.
As long as you have a filesystem implementation [1] for your p2p fs, hadoop
(and other software like Hive and Spark that use the hadoop fs) should work
just fine. Performance may be a concern, but you may have to tune your
implementation to adapt as far as possible.

1.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html

Thanks,
Hariharan

On Wed, 23 Sep 2020, 22:27 Lauren Taylor, <ch...@gmail.com> wrote:

> Hello!
>
> I am currently provisioning a P2P peer network (there are already a couple
> of live networks that have been created, but we do not want to test this in
> production, fo course).
>
> In this p2p network, I was looking at the best ways in which one could
> distribute file storage (and access it to) in an efficient manner.
>
> The difference between this solution & Bittorrent (DHT / mainline DHT), is *that
> all of the files that are uploaded to the network are meant to be stored
> and distributed*.
>
> Putting the complexities of that to the side (the sustainability of that
> proposal has been accounted for), I am wondering whether Apache Hadoop
> would be a good structure to run on top of that system.
>
> *Why I Ask*
> The p2p structure of this protocol is absolutely essential to its
> functioning. Thus, if I am going to leverage it for the purposes of storage
> / distribution, it is imperative that I ensure I'm not injecting something
> into the ecosystem that could ultimately harm it (i.e., DoS vulnerability).
>
> *Hadoop-LAFS?*
> I was on the 'Tahoe-LAFS' website and I saw that there was a proposal for
> 'Hadoop-LAFS' - which is a deployment of Apache Hadoop over top of the
> Tahoe-LAFS layer.
>
> According to the project description given by Google's Code Archive, this
> allows for:
>
> "Provides an integration layer between Tahoe LAFS and Hadoop so Map Reduce
>> jobs can be run over encrypted data stored in Tahoe."
>>
>
> Any and all answers would help a ton, thank you!
>
> Sincerely,
> Buck Wiston
>
>
>
>
>
>

Re: Question About Feasibility of Hadoop Over P2P Architecture

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
HDFS isn't going to work here, but the filesystem APIs could be suitable
for implementation.

Look also at what Apache Cassandra do; they use a DHT to scatter data.

On Tue, 22 Sep 2020 at 06:17, Lauren Taylor <ch...@gmail.com>
wrote:

> Hello!
>
> I am currently provisioning a P2P peer network (there are already a couple
> of live networks that have been created, but we do not want to test this in
> production, fo course).
>
> In this p2p network, I was looking at the best ways in which one could
> distribute file storage (and access it to) in an efficient manner.
>
> The difference between this solution & Bittorrent (DHT / mainline
> DHT), is *that
> all of the files that are uploaded to the network are meant to be stored
> and distributed*.
>
> Putting the complexities of that to the side (the sustainability of that
> proposal has been accounted for), I am wondering whether Apache Hadoop
> would be a good structure to run on top of that system.
>
> *Why I Ask*
> The p2p structure of this protocol is absolutely essential to its
> functioning. Thus, if I am going to leverage it for the purposes of storage
> / distribution, it is imperative that I ensure I'm not injecting something
> into the ecosystem that could ultimately harm it (i.e., DoS vulnerability).
>
> *Hadoop-LAFS?*
> I was on the 'Tahoe-LAFS' website and I saw that there was a proposal for
> 'Hadoop-LAFS' - which is a deployment of Apache Hadoop over top of the
> Tahoe-LAFS layer.
>
> According to the project description given by Google's Code Archive, this
> allows for:
>
> "Provides an integration layer between Tahoe LAFS and Hadoop so Map Reduce
> > jobs can be run over encrypted data stored in Tahoe."
> >
>
> Any and all answers would help a ton, thank you!
>
> Sincerely,
> Buck Wiston
>

Re: Question About Feasibility of Hadoop Over P2P Architecture

Posted by Hariharan <ha...@gmail.com>.
As long as you have a filesystem implementation [1] for your p2p fs, hadoop
(and other software like Hive and Spark that use the hadoop fs) should work
just fine. Performance may be a concern, but you may have to tune your
implementation to adapt as far as possible.

1.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html

Thanks,
Hariharan

On Wed, 23 Sep 2020, 22:27 Lauren Taylor, <ch...@gmail.com> wrote:

> Hello!
>
> I am currently provisioning a P2P peer network (there are already a couple
> of live networks that have been created, but we do not want to test this in
> production, fo course).
>
> In this p2p network, I was looking at the best ways in which one could
> distribute file storage (and access it to) in an efficient manner.
>
> The difference between this solution & Bittorrent (DHT / mainline DHT), is *that
> all of the files that are uploaded to the network are meant to be stored
> and distributed*.
>
> Putting the complexities of that to the side (the sustainability of that
> proposal has been accounted for), I am wondering whether Apache Hadoop
> would be a good structure to run on top of that system.
>
> *Why I Ask*
> The p2p structure of this protocol is absolutely essential to its
> functioning. Thus, if I am going to leverage it for the purposes of storage
> / distribution, it is imperative that I ensure I'm not injecting something
> into the ecosystem that could ultimately harm it (i.e., DoS vulnerability).
>
> *Hadoop-LAFS?*
> I was on the 'Tahoe-LAFS' website and I saw that there was a proposal for
> 'Hadoop-LAFS' - which is a deployment of Apache Hadoop over top of the
> Tahoe-LAFS layer.
>
> According to the project description given by Google's Code Archive, this
> allows for:
>
> "Provides an integration layer between Tahoe LAFS and Hadoop so Map Reduce
>> jobs can be run over encrypted data stored in Tahoe."
>>
>
> Any and all answers would help a ton, thank you!
>
> Sincerely,
> Buck Wiston
>
>
>
>
>
>