You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/06/01 06:50:55 UTC
Re: Metadata based versioning

Hey Tom,

-----Original Message-----

From: Thomas Bennett <lm...@gmail.com>
Reply-To: "dev@oodt.apache.org" <de...@oodt.apache.org>
Date: Friday, May 30, 2014 6:26 AM
To: OODT <de...@oodt.apache.org>
Subject: Re: Metadata based versioning

>Hey Chris,
>
>Thanks for your reply.
>
>I think you may have clarified some of my understanding of oodt under the
>hood. Woot. (or is that woodt?)

Haha, woodt it is.

>
>Firstly, from OODT-72 I can see how the design decisions were made. It
>just
>so happens that I'm wanting the filename for versioning and suddenly I
>understand why FinalFileLocationExtractor is needed. I will now use it
>with
>confidence :-).
>
>Your use of the term 'client side data movement' confused me at first, so
>I
>had to think about it a bit. I was always under the impression (a naive
>misconception) that if your file manager existed on a "machine B" you
>would
>need to do a remote data transfer to use that file manager.
>
>But what you're saying is that the following setup is possible:
>
>   - Machine A (client): crawler + repository path + local data transfer
>   (i.e. machine A, or the 'client' does not need a file manager running
>and
>   does not need to remote data transfer to the machine B)
>   - Machine B (server): file manager (does not need the repository path
>to
>   archive files)

Yes this is totally possible. Imagine the following configuration:

Machine A: no file manager, but has crawler, + can see src + dest path with
local data transfer (note *local* is a misnomer, b/c through distributed
file
systems like NFS, Hadoop, Spark/Shark, GlusterFS, etc. we can logically
mount
local commodity shared nothing disk and federate them to make them appear
like
one big one - each of the preceding distributed file system technologies
all have
different strengths benefits, but from OODT's perspective, it can all be
local
even if it truly isn't).

Machine B: file manager

Use Case:

Ingest a file on machine A into the File Manager on machine B.
  - totally doable
  - crawler on A contacts (by default) http://B:9000/ and then ingests
into file manager using client side transfer.

Make sense?

>
>Have I got the right idea?

Yep!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-5th floor
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




>
>
>On 30 May 2014 07:01, Mattmann, Chris A (3980) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Tom,
>>
>> You've correctly discovered this. This was an intentional by-design
>> artifact of my belief that versioning and data movement should be
>> sort of co-located on the same machine. So if you do client side
>> data movement (which most people do), then the versioning should
>> happen alongside of it, and thus any metadata extraction present
>> there should be available during versioning for use in e.g., Metadata
>> based versioning.
>>
>> The rub comes in the issue where the metadata is generated on the
>> server side and you expect versioning to be available to the system.
>> One way of getting around this is taking a look at the way that
>> the FinalFileLocationExtractor [1] grabs the latest version of the
>> CoreMetKeys.FILE_LOCATION property and then makes it available for e.g.,
>> versioning.
>>
>> See discussion too in OODT-72 [2] for some rationale behind my
>> sentiments there. Happy to discuss!
>>
>> Cheers,
>> Chris
>>
>> [1] http://s.apache.org/bvd
>> [2] https://issues.apache.org/jira/browse/OODT-72
>>
>>