You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Parks <da...@yahoo.com> on 2013/01/29 08:41:18 UTC

Tricks to upgrading Sequence Files?

Anyone have any good tricks for upgrading a sequence file.

 

We maintain a sequence file like a flat file DB and the primary object in
there changed in recent development.

 

It's trivial to write a job to read in the sequence file, update the object,
and write it back out in the new format.

 

But since sequence files read and write the key/value class I would either
need to rename the model object with a version number, or change the header
of each sequence file.

 

Just wondering if there are any nice tricks to this.


Re: Tricks to upgrading Sequence Files?

Posted by Terry Healy <th...@bnl.gov>.
AVROs versioning capability might help if that could replace
SequenceFile in your workflow.

Just a thought.

-Terry

On 1/29/13 9:17 PM, David Parks wrote:
> I'll consider a patch to the SequenceFile, if we could manually override the
> sequence file input Key and Value that's read from the sequence file headers
> we'd have a clean solution.
>
> I don't like versioning my Model object because it's used by 10's of other
> classes and I don't want to risk less maintained classes continuing to use
> an old version.
>
> For the time being I just used 2 jobs. First I renamed the old Model Object
> to the original name, read it in, upgraded it, and wrote the new version
> with a different class name.
>
> Then I renamed the classes again so the new model object used the original
> name and read in the altered name and cloned it into the original name.
>
> All in all an hours work only, but having a cleaner process would be better.
> I'll add the request to JIRA at a minimum.
>
> Dave
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com] 
> Sent: Wednesday, January 30, 2013 2:32 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: Tricks to upgrading Sequence Files?
>
> This is a pretty interesting question, but unfortunately there isn't an
> inbuilt way in SequenceFiles itself to handle this. However, your key/value
> classes can be made to handle versioning perhaps - detecting if what they've
> read is of an older time and decoding it appropriately (while handling newer
> encoding separately, in the normal fashion).
> This would be much better than going down the classloader hack paths I
> think?
>
> On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
>> Anyone have any good tricks for upgrading a sequence file.
>>
>>
>>
>> We maintain a sequence file like a flat file DB and the primary object 
>> in there changed in recent development.
>>
>>
>>
>> It's trivial to write a job to read in the sequence file, update the 
>> object, and write it back out in the new format.
>>
>>
>>
>> But since sequence files read and write the key/value class I would 
>> either need to rename the model object with a version number, or 
>> change the header of each sequence file.
>>
>>
>>
>> Just wondering if there are any nice tricks to this.
>
>
> --
> Harsh J
>


Re: Tricks to upgrading Sequence Files?

Posted by Terry Healy <th...@bnl.gov>.
AVROs versioning capability might help if that could replace
SequenceFile in your workflow.

Just a thought.

-Terry

On 1/29/13 9:17 PM, David Parks wrote:
> I'll consider a patch to the SequenceFile, if we could manually override the
> sequence file input Key and Value that's read from the sequence file headers
> we'd have a clean solution.
>
> I don't like versioning my Model object because it's used by 10's of other
> classes and I don't want to risk less maintained classes continuing to use
> an old version.
>
> For the time being I just used 2 jobs. First I renamed the old Model Object
> to the original name, read it in, upgraded it, and wrote the new version
> with a different class name.
>
> Then I renamed the classes again so the new model object used the original
> name and read in the altered name and cloned it into the original name.
>
> All in all an hours work only, but having a cleaner process would be better.
> I'll add the request to JIRA at a minimum.
>
> Dave
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com] 
> Sent: Wednesday, January 30, 2013 2:32 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: Tricks to upgrading Sequence Files?
>
> This is a pretty interesting question, but unfortunately there isn't an
> inbuilt way in SequenceFiles itself to handle this. However, your key/value
> classes can be made to handle versioning perhaps - detecting if what they've
> read is of an older time and decoding it appropriately (while handling newer
> encoding separately, in the normal fashion).
> This would be much better than going down the classloader hack paths I
> think?
>
> On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
>> Anyone have any good tricks for upgrading a sequence file.
>>
>>
>>
>> We maintain a sequence file like a flat file DB and the primary object 
>> in there changed in recent development.
>>
>>
>>
>> It's trivial to write a job to read in the sequence file, update the 
>> object, and write it back out in the new format.
>>
>>
>>
>> But since sequence files read and write the key/value class I would 
>> either need to rename the model object with a version number, or 
>> change the header of each sequence file.
>>
>>
>>
>> Just wondering if there are any nice tricks to this.
>
>
> --
> Harsh J
>


Re: Tricks to upgrading Sequence Files?

Posted by Terry Healy <th...@bnl.gov>.
AVROs versioning capability might help if that could replace
SequenceFile in your workflow.

Just a thought.

-Terry

On 1/29/13 9:17 PM, David Parks wrote:
> I'll consider a patch to the SequenceFile, if we could manually override the
> sequence file input Key and Value that's read from the sequence file headers
> we'd have a clean solution.
>
> I don't like versioning my Model object because it's used by 10's of other
> classes and I don't want to risk less maintained classes continuing to use
> an old version.
>
> For the time being I just used 2 jobs. First I renamed the old Model Object
> to the original name, read it in, upgraded it, and wrote the new version
> with a different class name.
>
> Then I renamed the classes again so the new model object used the original
> name and read in the altered name and cloned it into the original name.
>
> All in all an hours work only, but having a cleaner process would be better.
> I'll add the request to JIRA at a minimum.
>
> Dave
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com] 
> Sent: Wednesday, January 30, 2013 2:32 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: Tricks to upgrading Sequence Files?
>
> This is a pretty interesting question, but unfortunately there isn't an
> inbuilt way in SequenceFiles itself to handle this. However, your key/value
> classes can be made to handle versioning perhaps - detecting if what they've
> read is of an older time and decoding it appropriately (while handling newer
> encoding separately, in the normal fashion).
> This would be much better than going down the classloader hack paths I
> think?
>
> On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
>> Anyone have any good tricks for upgrading a sequence file.
>>
>>
>>
>> We maintain a sequence file like a flat file DB and the primary object 
>> in there changed in recent development.
>>
>>
>>
>> It's trivial to write a job to read in the sequence file, update the 
>> object, and write it back out in the new format.
>>
>>
>>
>> But since sequence files read and write the key/value class I would 
>> either need to rename the model object with a version number, or 
>> change the header of each sequence file.
>>
>>
>>
>> Just wondering if there are any nice tricks to this.
>
>
> --
> Harsh J
>


Re: Tricks to upgrading Sequence Files?

Posted by Terry Healy <th...@bnl.gov>.
AVROs versioning capability might help if that could replace
SequenceFile in your workflow.

Just a thought.

-Terry

On 1/29/13 9:17 PM, David Parks wrote:
> I'll consider a patch to the SequenceFile, if we could manually override the
> sequence file input Key and Value that's read from the sequence file headers
> we'd have a clean solution.
>
> I don't like versioning my Model object because it's used by 10's of other
> classes and I don't want to risk less maintained classes continuing to use
> an old version.
>
> For the time being I just used 2 jobs. First I renamed the old Model Object
> to the original name, read it in, upgraded it, and wrote the new version
> with a different class name.
>
> Then I renamed the classes again so the new model object used the original
> name and read in the altered name and cloned it into the original name.
>
> All in all an hours work only, but having a cleaner process would be better.
> I'll add the request to JIRA at a minimum.
>
> Dave
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com] 
> Sent: Wednesday, January 30, 2013 2:32 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: Tricks to upgrading Sequence Files?
>
> This is a pretty interesting question, but unfortunately there isn't an
> inbuilt way in SequenceFiles itself to handle this. However, your key/value
> classes can be made to handle versioning perhaps - detecting if what they've
> read is of an older time and decoding it appropriately (while handling newer
> encoding separately, in the normal fashion).
> This would be much better than going down the classloader hack paths I
> think?
>
> On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
>> Anyone have any good tricks for upgrading a sequence file.
>>
>>
>>
>> We maintain a sequence file like a flat file DB and the primary object 
>> in there changed in recent development.
>>
>>
>>
>> It's trivial to write a job to read in the sequence file, update the 
>> object, and write it back out in the new format.
>>
>>
>>
>> But since sequence files read and write the key/value class I would 
>> either need to rename the model object with a version number, or 
>> change the header of each sequence file.
>>
>>
>>
>> Just wondering if there are any nice tricks to this.
>
>
> --
> Harsh J
>


RE: Tricks to upgrading Sequence Files?

Posted by David Parks <da...@yahoo.com>.
I'll consider a patch to the SequenceFile, if we could manually override the
sequence file input Key and Value that's read from the sequence file headers
we'd have a clean solution.

I don't like versioning my Model object because it's used by 10's of other
classes and I don't want to risk less maintained classes continuing to use
an old version.

For the time being I just used 2 jobs. First I renamed the old Model Object
to the original name, read it in, upgraded it, and wrote the new version
with a different class name.

Then I renamed the classes again so the new model object used the original
name and read in the altered name and cloned it into the original name.

All in all an hours work only, but having a cleaner process would be better.
I'll add the request to JIRA at a minimum.

Dave


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, January 30, 2013 2:32 AM
To: <us...@hadoop.apache.org>
Subject: Re: Tricks to upgrading Sequence Files?

This is a pretty interesting question, but unfortunately there isn't an
inbuilt way in SequenceFiles itself to handle this. However, your key/value
classes can be made to handle versioning perhaps - detecting if what they've
read is of an older time and decoding it appropriately (while handling newer
encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object 
> in there changed in recent development.
>
>
>
> It's trivial to write a job to read in the sequence file, update the 
> object, and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would 
> either need to rename the model object with a version number, or 
> change the header of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



--
Harsh J


RE: Tricks to upgrading Sequence Files?

Posted by David Parks <da...@yahoo.com>.
I'll consider a patch to the SequenceFile, if we could manually override the
sequence file input Key and Value that's read from the sequence file headers
we'd have a clean solution.

I don't like versioning my Model object because it's used by 10's of other
classes and I don't want to risk less maintained classes continuing to use
an old version.

For the time being I just used 2 jobs. First I renamed the old Model Object
to the original name, read it in, upgraded it, and wrote the new version
with a different class name.

Then I renamed the classes again so the new model object used the original
name and read in the altered name and cloned it into the original name.

All in all an hours work only, but having a cleaner process would be better.
I'll add the request to JIRA at a minimum.

Dave


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, January 30, 2013 2:32 AM
To: <us...@hadoop.apache.org>
Subject: Re: Tricks to upgrading Sequence Files?

This is a pretty interesting question, but unfortunately there isn't an
inbuilt way in SequenceFiles itself to handle this. However, your key/value
classes can be made to handle versioning perhaps - detecting if what they've
read is of an older time and decoding it appropriately (while handling newer
encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object 
> in there changed in recent development.
>
>
>
> It's trivial to write a job to read in the sequence file, update the 
> object, and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would 
> either need to rename the model object with a version number, or 
> change the header of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



--
Harsh J


RE: Tricks to upgrading Sequence Files?

Posted by David Parks <da...@yahoo.com>.
I'll consider a patch to the SequenceFile, if we could manually override the
sequence file input Key and Value that's read from the sequence file headers
we'd have a clean solution.

I don't like versioning my Model object because it's used by 10's of other
classes and I don't want to risk less maintained classes continuing to use
an old version.

For the time being I just used 2 jobs. First I renamed the old Model Object
to the original name, read it in, upgraded it, and wrote the new version
with a different class name.

Then I renamed the classes again so the new model object used the original
name and read in the altered name and cloned it into the original name.

All in all an hours work only, but having a cleaner process would be better.
I'll add the request to JIRA at a minimum.

Dave


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, January 30, 2013 2:32 AM
To: <us...@hadoop.apache.org>
Subject: Re: Tricks to upgrading Sequence Files?

This is a pretty interesting question, but unfortunately there isn't an
inbuilt way in SequenceFiles itself to handle this. However, your key/value
classes can be made to handle versioning perhaps - detecting if what they've
read is of an older time and decoding it appropriately (while handling newer
encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object 
> in there changed in recent development.
>
>
>
> It's trivial to write a job to read in the sequence file, update the 
> object, and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would 
> either need to rename the model object with a version number, or 
> change the header of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



--
Harsh J


RE: Tricks to upgrading Sequence Files?

Posted by David Parks <da...@yahoo.com>.
I'll consider a patch to the SequenceFile, if we could manually override the
sequence file input Key and Value that's read from the sequence file headers
we'd have a clean solution.

I don't like versioning my Model object because it's used by 10's of other
classes and I don't want to risk less maintained classes continuing to use
an old version.

For the time being I just used 2 jobs. First I renamed the old Model Object
to the original name, read it in, upgraded it, and wrote the new version
with a different class name.

Then I renamed the classes again so the new model object used the original
name and read in the altered name and cloned it into the original name.

All in all an hours work only, but having a cleaner process would be better.
I'll add the request to JIRA at a minimum.

Dave


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, January 30, 2013 2:32 AM
To: <us...@hadoop.apache.org>
Subject: Re: Tricks to upgrading Sequence Files?

This is a pretty interesting question, but unfortunately there isn't an
inbuilt way in SequenceFiles itself to handle this. However, your key/value
classes can be made to handle versioning perhaps - detecting if what they've
read is of an older time and decoding it appropriately (while handling newer
encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object 
> in there changed in recent development.
>
>
>
> It's trivial to write a job to read in the sequence file, update the 
> object, and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would 
> either need to rename the model object with a version number, or 
> change the header of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



--
Harsh J


Re: Tricks to upgrading Sequence Files?

Posted by Harsh J <ha...@cloudera.com>.
This is a pretty interesting question, but unfortunately there isn't
an inbuilt way in SequenceFiles itself to handle this. However, your
key/value classes can be made to handle versioning perhaps - detecting
if what they've read is of an older time and decoding it appropriately
(while handling newer encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object in
> there changed in recent development.
>
>
>
> It’s trivial to write a job to read in the sequence file, update the object,
> and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would either
> need to rename the model object with a version number, or change the header
> of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



-- 
Harsh J

Re: Tricks to upgrading Sequence Files?

Posted by Harsh J <ha...@cloudera.com>.
This is a pretty interesting question, but unfortunately there isn't
an inbuilt way in SequenceFiles itself to handle this. However, your
key/value classes can be made to handle versioning perhaps - detecting
if what they've read is of an older time and decoding it appropriately
(while handling newer encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object in
> there changed in recent development.
>
>
>
> It’s trivial to write a job to read in the sequence file, update the object,
> and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would either
> need to rename the model object with a version number, or change the header
> of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



-- 
Harsh J

Re: Tricks to upgrading Sequence Files?

Posted by Harsh J <ha...@cloudera.com>.
This is a pretty interesting question, but unfortunately there isn't
an inbuilt way in SequenceFiles itself to handle this. However, your
key/value classes can be made to handle versioning perhaps - detecting
if what they've read is of an older time and decoding it appropriately
(while handling newer encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object in
> there changed in recent development.
>
>
>
> It’s trivial to write a job to read in the sequence file, update the object,
> and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would either
> need to rename the model object with a version number, or change the header
> of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



-- 
Harsh J

Re: Tricks to upgrading Sequence Files?

Posted by Harsh J <ha...@cloudera.com>.
This is a pretty interesting question, but unfortunately there isn't
an inbuilt way in SequenceFiles itself to handle this. However, your
key/value classes can be made to handle versioning perhaps - detecting
if what they've read is of an older time and decoding it appropriately
(while handling newer encoding separately, in the normal fashion).
This would be much better than going down the classloader hack paths I
think?

On Tue, Jan 29, 2013 at 1:11 PM, David Parks <da...@yahoo.com> wrote:
> Anyone have any good tricks for upgrading a sequence file.
>
>
>
> We maintain a sequence file like a flat file DB and the primary object in
> there changed in recent development.
>
>
>
> It’s trivial to write a job to read in the sequence file, update the object,
> and write it back out in the new format.
>
>
>
> But since sequence files read and write the key/value class I would either
> need to rename the model object with a version number, or change the header
> of each sequence file.
>
>
>
> Just wondering if there are any nice tricks to this.



-- 
Harsh J