You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oodt.apache.org by Thomas Bennett <lm...@gmail.com> on 2014/05/27 12:53:14 UTC

Metadata based versioning

Hey,


When calling  *XmlRpcFileManager.ingestProduct()*, I noticed that the
variable *"m"* (Metadata m = new Metadata()) is never updated with server
side met extraction.

This means that metadata based versioning cannot work unless the metadata
used is client side metadata.

For example:

I use CoreMetExtractor on the server side to extract FileLocation and
Filename.

Howeve when *addMetadata(p,m)* is called it does the following steps:

   1. does the server based met extraction (in my case CoreMetExtraction)
   2. updates the catalog
   3. returns true.

Since it only returns true, the updates that have been made to the internal
version of m passed into the method are lost.

Versioning happens after this step and I use Filename as part of my
versioner, which ends up getting set to 'null'.

Any reason why sever side met extraction should not be used for product
versioning?
Any reason why should addMetadata should not return the updated m?

Cheers,
Tom

Re: Metadata based versioning

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.

Hey Tom,

-----Original Message-----

From: Thomas Bennett <lm...@gmail.com>
Reply-To: "dev@oodt.apache.org" <de...@oodt.apache.org>
Date: Friday, May 30, 2014 6:26 AM
To: OODT <de...@oodt.apache.org>
Subject: Re: Metadata based versioning

>Hey Chris,
>
>Thanks for your reply.
>
>I think you may have clarified some of my understanding of oodt under the
>hood. Woot. (or is that woodt?)

Haha, woodt it is.

>
>Firstly, from OODT-72 I can see how the design decisions were made. It
>just
>so happens that I'm wanting the filename for versioning and suddenly I
>understand why FinalFileLocationExtractor is needed. I will now use it
>with
>confidence :-).
>
>Your use of the term 'client side data movement' confused me at first, so
>I
>had to think about it a bit. I was always under the impression (a naive
>misconception) that if your file manager existed on a "machine B" you
>would
>need to do a remote data transfer to use that file manager.
>
>But what you're saying is that the following setup is possible:
>
>   - Machine A (client): crawler + repository path + local data transfer
>   (i.e. machine A, or the 'client' does not need a file manager running
>and
>   does not need to remote data transfer to the machine B)
>   - Machine B (server): file manager (does not need the repository path
>to
>   archive files)

Yes this is totally possible. Imagine the following configuration:

Machine A: no file manager, but has crawler, + can see src + dest path with
local data transfer (note *local* is a misnomer, b/c through distributed
file
systems like NFS, Hadoop, Spark/Shark, GlusterFS, etc. we can logically
mount
local commodity shared nothing disk and federate them to make them appear
like
one big one - each of the preceding distributed file system technologies
all have
different strengths benefits, but from OODT's perspective, it can all be
local
even if it truly isn't).

Machine B: file manager

Use Case:

Ingest a file on machine A into the File Manager on machine B.
  - totally doable
  - crawler on A contacts (by default) http://B:9000/ and then ingests
into file manager using client side transfer.

Make sense?

>
>Have I got the right idea?

Yep!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-5th floor
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




>
>
>On 30 May 2014 07:01, Mattmann, Chris A (3980) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Tom,
>>
>> You've correctly discovered this. This was an intentional by-design
>> artifact of my belief that versioning and data movement should be
>> sort of co-located on the same machine. So if you do client side
>> data movement (which most people do), then the versioning should
>> happen alongside of it, and thus any metadata extraction present
>> there should be available during versioning for use in e.g., Metadata
>> based versioning.
>>
>> The rub comes in the issue where the metadata is generated on the
>> server side and you expect versioning to be available to the system.
>> One way of getting around this is taking a look at the way that
>> the FinalFileLocationExtractor [1] grabs the latest version of the
>> CoreMetKeys.FILE_LOCATION property and then makes it available for e.g.,
>> versioning.
>>
>> See discussion too in OODT-72 [2] for some rationale behind my
>> sentiments there. Happy to discuss!
>>
>> Cheers,
>> Chris
>>
>> [1] http://s.apache.org/bvd
>> [2] https://issues.apache.org/jira/browse/OODT-72
>>
>>

Re: Metadata based versioning

Posted by Thomas Bennett <lm...@gmail.com>.

Hey Chris,

Thanks for your reply.

I think you may have clarified some of my understanding of oodt under the
hood. Woot. (or is that woodt?)

Firstly, from OODT-72 I can see how the design decisions were made. It just
so happens that I'm wanting the filename for versioning and suddenly I
understand why FinalFileLocationExtractor is needed. I will now use it with
confidence :-).

Your use of the term 'client side data movement' confused me at first, so I
had to think about it a bit. I was always under the impression (a naive
misconception) that if your file manager existed on a "machine B" you would
need to do a remote data transfer to use that file manager.

But what you're saying is that the following setup is possible:

   - Machine A (client): crawler + repository path + local data transfer
   (i.e. machine A, or the 'client' does not need a file manager running and
   does not need to remote data transfer to the machine B)
   - Machine B (server): file manager (does not need the repository path to
   archive files)

Have I got the right idea?

Cheers,
Tom

On 30 May 2014 07:01, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hey Tom,
>
> You've correctly discovered this. This was an intentional by-design
> artifact of my belief that versioning and data movement should be
> sort of co-located on the same machine. So if you do client side
> data movement (which most people do), then the versioning should
> happen alongside of it, and thus any metadata extraction present
> there should be available during versioning for use in e.g., Metadata
> based versioning.
>
> The rub comes in the issue where the metadata is generated on the
> server side and you expect versioning to be available to the system.
> One way of getting around this is taking a look at the way that
> the FinalFileLocationExtractor [1] grabs the latest version of the
> CoreMetKeys.FILE_LOCATION property and then makes it available for e.g.,
> versioning.
>
> See discussion too in OODT-72 [2] for some rationale behind my
> sentiments there. Happy to discuss!
>
> Cheers,
> Chris
>
> [1] http://s.apache.org/bvd
> [2] https://issues.apache.org/jira/browse/OODT-72
>
>

Re: Metadata based versioning

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.

Hey Tom,

You've correctly discovered this. This was an intentional by-design
artifact of my belief that versioning and data movement should be
sort of co-located on the same machine. So if you do client side
data movement (which most people do), then the versioning should
happen alongside of it, and thus any metadata extraction present
there should be available during versioning for use in e.g., Metadata
based versioning.

The rub comes in the issue where the metadata is generated on the
server side and you expect versioning to be available to the system.
One way of getting around this is taking a look at the way that
the FinalFileLocationExtractor [1] grabs the latest version of the
CoreMetKeys.FILE_LOCATION property and then makes it available for e.g.,
versioning.

See discussion too in OODT-72 [2] for some rationale behind my
sentiments there. Happy to discuss!

Cheers,
Chris

[1] http://s.apache.org/bvd
[2] https://issues.apache.org/jira/browse/OODT-72

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-5th floor
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Thomas Bennett <lm...@gmail.com>
Reply-To: "dev@oodt.apache.org" <de...@oodt.apache.org>
Date: Tuesday, May 27, 2014 3:53 AM
To: OODT <de...@oodt.apache.org>
Subject: Metadata based versioning

>Hey,
>
>
>When calling  *XmlRpcFileManager.ingestProduct()*, I noticed that the
>variable *"m"* (Metadata m = new Metadata()) is never updated with server
>side met extraction.
>
>This means that metadata based versioning cannot work unless the metadata
>used is client side metadata.
>
>For example:
>
>I use CoreMetExtractor on the server side to extract FileLocation and
>Filename.
>
>Howeve when *addMetadata(p,m)* is called it does the following steps:
>
>   1. does the server based met extraction (in my case CoreMetExtraction)
>   2. updates the catalog
>   3. returns true.
>
>Since it only returns true, the updates that have been made to the
>internal
>version of m passed into the method are lost.
>
>Versioning happens after this step and I use Filename as part of my
>versioner, which ends up getting set to 'null'.
>
>Any reason why sever side met extraction should not be used for product
>versioning?
>Any reason why should addMetadata should not return the updated m?
>
>Cheers,
>Tom