You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Renato Marroquín Mogrovejo <re...@gmail.com> on 2010/10/24 22:14:33 UTC

Using data with Zebra

Hi there, I have some doubts about zebra usage.
The thing is that all my data is already in HDFS, and want to use the zebra
storers and loaders, but I don't want to reprocess all my data just to get
the .meta, .schema and the .btschema files, and by the way how are those
files related? I mean they all keep file's metadata, right?
Is there any way I can create the necessary files to use zebra's loaders and
storers functionality? Any advice or suggestion is highly appreciated.
Thanks in advanced.


Renato M.

Re: Using data with Zebra

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.

Thanks for the pointers Yan!

Renato M.

2010/10/27 Yan Zhou <ya...@yahoo-inc.com>

>  If you can not change your input data generation process to generate
> input directly in Zebra, I can’t see any alternative than two sets of data.
>
>
>
> Regarding generating Zebra data, Pig is simpler than raw map/reduce and the
> performance should be fine too, provided there is a PIG loader for your
> input data format.
>
>
>
> Yan
>
>
>  ------------------------------
>
> *From:* Renato Marroquín Mogrovejo [mailto:renatoj.marroquin@gmail.com]
> *Sent:* Wednesday, October 27, 2010 9:29 AM
> *To:* Yan Zhou; user@pig.apache.org
> *Subject:* Re: Using data with Zebra
>
>
>
> Thanks Yan!
>
> Just a couple of questions. The thing is that I have too much data just to
> delete it and reprocess it all, and if I would reprocess all my hdfs data,
> then I will generate the same amount of data duplicated, one with Zebra and
> one with regular hdfs data. What would be the best approach that you would
> suggest? and would it be better to use Pig or raw MapReduce?
>
> Renato M.
>
> 2010/10/25 Yan Zhou <ya...@yahoo-inc.com>
>
> .schema is column group's schema file; .btschema is Zebra table's schema
> file; .meta is column group's index file.
>
> The bottom line is that they are all internal files maintained by Zebra and
> users should not access or manipulate them directly. Also, the storage
> format by Zebra is probably different from that used by you data already on
> HDFS.
>
> In summary, you have to use Zebra to generate Zebra data and no other data
> format can be used by Zebra.
>
> Yan
>
>
> -----Original Message-----
> From: Renato Marroquín Mogrovejo [mailto:renatoj.marroquin@gmail.com]
> Sent: Sunday, October 24, 2010 1:15 PM
> To: user@pig.apache.org
> Subject: Using data with Zebra
>
> Hi there, I have some doubts about zebra usage.
> The thing is that all my data is already in HDFS, and want to use the zebra
> storers and loaders, but I don't want to reprocess all my data just to get
> the .meta, .schema and the .btschema files, and by the way how are those
> files related? I mean they all keep file's metadata, right?
> Is there any way I can create the necessary files to use zebra's loaders
> and
> storers functionality? Any advice or suggestion is highly appreciated.
> Thanks in advanced.
>
>
> Renato M.
>
>
>

RE: Using data with Zebra

Posted by Yan Zhou <ya...@yahoo-inc.com>.

If you can not change your input data generation process to generate input directly in Zebra, I can't see any alternative than two sets of data.

Regarding generating Zebra data, Pig is simpler than raw map/reduce and the performance should be fine too, provided there is a PIG loader for your input data format.

Yan

________________________________
From: Renato Marroquín Mogrovejo [mailto:renatoj.marroquin@gmail.com]
Sent: Wednesday, October 27, 2010 9:29 AM
To: Yan Zhou; user@pig.apache.org
Subject: Re: Using data with Zebra

Thanks Yan!

Just a couple of questions. The thing is that I have too much data just to delete it and reprocess it all, and if I would reprocess all my hdfs data, then I will generate the same amount of data duplicated, one with Zebra and one with regular hdfs data. What would be the best approach that you would suggest? and would it be better to use Pig or raw MapReduce?

Renato M.
2010/10/25 Yan Zhou <ya...@yahoo-inc.com>>
.schema is column group's schema file; .btschema is Zebra table's schema file; .meta is column group's index file.

The bottom line is that they are all internal files maintained by Zebra and users should not access or manipulate them directly. Also, the storage format by Zebra is probably different from that used by you data already on HDFS.

In summary, you have to use Zebra to generate Zebra data and no other data format can be used by Zebra.

Yan

-----Original Message-----
From: Renato Marroquín Mogrovejo [mailto:renatoj.marroquin@gmail.com<ma...@gmail.com>]
Sent: Sunday, October 24, 2010 1:15 PM
To: user@pig.apache.org<ma...@pig.apache.org>
Subject: Using data with Zebra

Hi there, I have some doubts about zebra usage.
The thing is that all my data is already in HDFS, and want to use the zebra
storers and loaders, but I don't want to reprocess all my data just to get
the .meta, .schema and the .btschema files, and by the way how are those
files related? I mean they all keep file's metadata, right?
Is there any way I can create the necessary files to use zebra's loaders and
storers functionality? Any advice or suggestion is highly appreciated.
Thanks in advanced.

Renato M.

Re: Using data with Zebra

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.

Thanks Yan!

Just a couple of questions. The thing is that I have too much data just to
delete it and reprocess it all, and if I would reprocess all my hdfs data,
then I will generate the same amount of data duplicated, one with Zebra and
one with regular hdfs data. What would be the best approach that you would
suggest? and would it be better to use Pig or raw MapReduce?

Renato M.

2010/10/25 Yan Zhou <ya...@yahoo-inc.com>

> .schema is column group's schema file; .btschema is Zebra table's schema
> file; .meta is column group's index file.
>
> The bottom line is that they are all internal files maintained by Zebra and
> users should not access or manipulate them directly. Also, the storage
> format by Zebra is probably different from that used by you data already on
> HDFS.
>
> In summary, you have to use Zebra to generate Zebra data and no other data
> format can be used by Zebra.
>
> Yan
>
> -----Original Message-----
> From: Renato Marroquín Mogrovejo [mailto:renatoj.marroquin@gmail.com]
> Sent: Sunday, October 24, 2010 1:15 PM
> To: user@pig.apache.org
> Subject: Using data with Zebra
>
> Hi there, I have some doubts about zebra usage.
> The thing is that all my data is already in HDFS, and want to use the zebra
> storers and loaders, but I don't want to reprocess all my data just to get
> the .meta, .schema and the .btschema files, and by the way how are those
> files related? I mean they all keep file's metadata, right?
> Is there any way I can create the necessary files to use zebra's loaders
> and
> storers functionality? Any advice or suggestion is highly appreciated.
> Thanks in advanced.
>
>
> Renato M.
>

RE: Using data with Zebra

Posted by Yan Zhou <ya...@yahoo-inc.com>.

.schema is column group's schema file; .btschema is Zebra table's schema file; .meta is column group's index file.

The bottom line is that they are all internal files maintained by Zebra and users should not access or manipulate them directly. Also, the storage format by Zebra is probably different from that used by you data already on HDFS.

In summary, you have to use Zebra to generate Zebra data and no other data format can be used by Zebra.

Yan

-----Original Message-----
From: Renato Marroquín Mogrovejo [mailto:renatoj.marroquin@gmail.com] 
Sent: Sunday, October 24, 2010 1:15 PM
To: user@pig.apache.org
Subject: Using data with Zebra

Hi there, I have some doubts about zebra usage.
The thing is that all my data is already in HDFS, and want to use the zebra
storers and loaders, but I don't want to reprocess all my data just to get
the .meta, .schema and the .btschema files, and by the way how are those
files related? I mean they all keep file's metadata, right?
Is there any way I can create the necessary files to use zebra's loaders and
storers functionality? Any advice or suggestion is highly appreciated.
Thanks in advanced.


Renato M.