You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Russ Weeks <rw...@newbrightidea.com> on 2015/10/07 00:39:48 UTC

Anybody ever used the HDFS NFS Gateway?

I hope this isn't too off-topic. Any opinions re. its
completeness/quality/reliability?

(The use case is, CSV files -> NFS -> HDFS -> Spark -> RFiles -> Accumulo.
Relevance established!)

Thanks,
-Russ

Re: Anybody ever used the HDFS NFS Gateway?

Posted by David Medinets <da...@gmail.com>.
One aspect of creating rfiles for importing into Accumulo that I don't
recall mentioned before is the ability to archive them for future use.

On Tue, Oct 6, 2015 at 10:25 PM, Russ Weeks <rw...@newbrightidea.com>
wrote:

> Hi, Dylan,
>
> Yeah, writing RFiles instead of using BatchWriters
> (AccumuloFileOutputFormat vs. AccumuloOutputFormat) for efficiency and
> atomicity of ingest ("improved" atomicity if that even makes sense).
>
> I'm thinking about the NFS gateway just because the system that's
> producing the CSV is kind of a black box to me. It doesn't speak Hadoop, as
> Christopher alluded to, and I can't control its output format, but I can
> direct its output to a filesystem that it perceives to be local.
>
> My options are either an NFS write direct to HDFS via the gateway, or an
> NFS write to a conventional filesystem that I control, followed by some
> sort of inotify-driven migration from that server to HDFS.
>
> -Russ
>
> On Tue, Oct 6, 2015 at 6:12 PM Dylan Hutchison <dh...@uw.edu> wrote:
>
>> Hi Russ,
>>   I'm curious what you have in mind.  Are you looking for a solution more
>> efficient than running clients that read the CSV files and open
>> BatchWriters?
>>
>> Regards, Dylan
>>
>> On Tue, Oct 6, 2015 at 4:56 PM, Christopher <ct...@apache.org> wrote:
>>
>>> I haven't tried it, but it sounds like a cool use case. Might be a good
>>> alternative to distcp, more interoperable with tools which don't speak
>>> hadoop.
>>>
>>> On Tue, Oct 6, 2015, 18:41 Russ Weeks <rw...@newbrightidea.com> wrote:
>>>
>>>> I hope this isn't too off-topic. Any opinions re. its
>>>> completeness/quality/reliability?
>>>>
>>>> (The use case is, CSV files -> NFS -> HDFS -> Spark -> RFiles ->
>>>> Accumulo. Relevance established!)
>>>>
>>>> Thanks,
>>>> -Russ
>>>>
>>>
>>

Re: Anybody ever used the HDFS NFS Gateway?

Posted by Russ Weeks <rw...@newbrightidea.com>.
Hi, Dylan,

Yeah, writing RFiles instead of using BatchWriters
(AccumuloFileOutputFormat vs. AccumuloOutputFormat) for efficiency and
atomicity of ingest ("improved" atomicity if that even makes sense).

I'm thinking about the NFS gateway just because the system that's producing
the CSV is kind of a black box to me. It doesn't speak Hadoop, as
Christopher alluded to, and I can't control its output format, but I can
direct its output to a filesystem that it perceives to be local.

My options are either an NFS write direct to HDFS via the gateway, or an
NFS write to a conventional filesystem that I control, followed by some
sort of inotify-driven migration from that server to HDFS.

-Russ

On Tue, Oct 6, 2015 at 6:12 PM Dylan Hutchison <dh...@uw.edu> wrote:

> Hi Russ,
>   I'm curious what you have in mind.  Are you looking for a solution more
> efficient than running clients that read the CSV files and open
> BatchWriters?
>
> Regards, Dylan
>
> On Tue, Oct 6, 2015 at 4:56 PM, Christopher <ct...@apache.org> wrote:
>
>> I haven't tried it, but it sounds like a cool use case. Might be a good
>> alternative to distcp, more interoperable with tools which don't speak
>> hadoop.
>>
>> On Tue, Oct 6, 2015, 18:41 Russ Weeks <rw...@newbrightidea.com> wrote:
>>
>>> I hope this isn't too off-topic. Any opinions re. its
>>> completeness/quality/reliability?
>>>
>>> (The use case is, CSV files -> NFS -> HDFS -> Spark -> RFiles ->
>>> Accumulo. Relevance established!)
>>>
>>> Thanks,
>>> -Russ
>>>
>>
>

Re: Anybody ever used the HDFS NFS Gateway?

Posted by Dylan Hutchison <dh...@uw.edu>.
Hi Russ,
  I'm curious what you have in mind.  Are you looking for a solution more
efficient than running clients that read the CSV files and open
BatchWriters?

Regards, Dylan

On Tue, Oct 6, 2015 at 4:56 PM, Christopher <ct...@apache.org> wrote:

> I haven't tried it, but it sounds like a cool use case. Might be a good
> alternative to distcp, more interoperable with tools which don't speak
> hadoop.
>
> On Tue, Oct 6, 2015, 18:41 Russ Weeks <rw...@newbrightidea.com> wrote:
>
>> I hope this isn't too off-topic. Any opinions re. its
>> completeness/quality/reliability?
>>
>> (The use case is, CSV files -> NFS -> HDFS -> Spark -> RFiles ->
>> Accumulo. Relevance established!)
>>
>> Thanks,
>> -Russ
>>
>

Re: Anybody ever used the HDFS NFS Gateway?

Posted by Christopher <ct...@apache.org>.
I haven't tried it, but it sounds like a cool use case. Might be a good
alternative to distcp, more interoperable with tools which don't speak
hadoop.

On Tue, Oct 6, 2015, 18:41 Russ Weeks <rw...@newbrightidea.com> wrote:

> I hope this isn't too off-topic. Any opinions re. its
> completeness/quality/reliability?
>
> (The use case is, CSV files -> NFS -> HDFS -> Spark -> RFiles -> Accumulo.
> Relevance established!)
>
> Thanks,
> -Russ
>