You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gora.apache.org by Mike Baranczak <mb...@gmail.com> on 2012/10/09 23:09:19 UTC

DataFileAvroStore vs. AvroStore

I'm trying to get Nutch 2 set up. What's the difference between those two data stores? I've read the javadocs, and I'm still confused.

I already asked on the Nutch mailing list, but nobody seems to know.

-MB


Re: DataFileAvroStore vs. AvroStore

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Guys,

On Wed, Oct 10, 2012 at 12:07 AM, Enis Söztutar <en...@apache.org> wrote:

> You should use DataFileAvroStore. Is there any reason you are using a
> file-backed data store for nutch. I am not sure this is tested enough.

I would back this up, until now I have undertaken very little work
with DataFileAvroStore as of yet. Please see my other thread for my
particular problems. We will be working towards a solution ASAP.

Lewis

Re: DataFileAvroStore vs. AvroStore

Posted by Enis Söztutar <en...@apache.org>.
Sorry, It's been some time that I last looked into these. AvroStore uses
files and writes data with DatumWriter directly, whereas DataFileAvroStore
uses the data file, which is an avro file format. This format support
blocks, so they can be split for mapreduce tasks.

Yes, all FileBasedDataStores work on top of files stored at a hadoop file
system. even local file system should work.

Enis

On Tue, Oct 9, 2012 at 4:31 PM, Mike Baranczak <mb...@gmail.com> wrote:

> On Oct 9, 2012, at 7:07 PM, Enis Söztutar wrote:
>
> > Hi Mike,
> >
> > You should use DataFileAvroStore.
>
> OK, but why?
>
>
> > Is there any reason you are using a file-backed data store for nutch. I
> am not sure this is tested enough.
>
> Well, right now I'm not using anything. I'm still trying to figure out
> which data store I want. I picked these because I wanted to keep things
> simple: they don't require setting up any servers besides Hadoop with HDFS
> (they don't, right?)
>
> -MB
>
>

Re: DataFileAvroStore vs. AvroStore

Posted by Mike Baranczak <mb...@gmail.com>.
On Oct 9, 2012, at 7:07 PM, Enis Söztutar wrote:

> Hi Mike, 
> 
> You should use DataFileAvroStore.

OK, but why?


> Is there any reason you are using a file-backed data store for nutch. I am not sure this is tested enough.

Well, right now I'm not using anything. I'm still trying to figure out which data store I want. I picked these because I wanted to keep things simple: they don't require setting up any servers besides Hadoop with HDFS (they don't, right?)

-MB


Re: DataFileAvroStore vs. AvroStore

Posted by Enis Söztutar <en...@apache.org>.
Hi Mike,

You should use DataFileAvroStore. Is there any reason you are using a
file-backed data store for nutch. I am not sure this is tested enough.

Cheers,
Enis

On Tue, Oct 9, 2012 at 2:32 PM, Mike Baranczak <mb...@gmail.com> wrote:

> On Oct 9, 2012, at 5:20 PM, Renato Marroquín Mogrovejo wrote:
>
> > Hi Mike,
> >
> > What data stores are you talking about?
>
>
> http://gora.apache.org/docs/current/apidocs-0.2.1/org/apache/gora/avro/store/DataFileAvroStore.html
>
> http://gora.apache.org/docs/current/apidocs-0.2.1/org/apache/gora/avro/store/AvroStore.html
>
> The Nutch 2.1 config file tells me that I can use any one of several data
> store implementations. I'm asking specifically about these ones. How do
> they work, and what's the difference between them?
>
> -MB

Re: DataFileAvroStore vs. AvroStore

Posted by Mike Baranczak <mb...@gmail.com>.
On Oct 9, 2012, at 5:20 PM, Renato Marroquín Mogrovejo wrote:

> Hi Mike,
> 
> What data stores are you talking about?

http://gora.apache.org/docs/current/apidocs-0.2.1/org/apache/gora/avro/store/DataFileAvroStore.html
http://gora.apache.org/docs/current/apidocs-0.2.1/org/apache/gora/avro/store/AvroStore.html

The Nutch 2.1 config file tells me that I can use any one of several data store implementations. I'm asking specifically about these ones. How do they work, and what's the difference between them?

-MB

Re: DataFileAvroStore vs. AvroStore

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi Mike,

What data stores are you talking about? Nutch uses Gora as a
persistence layer. Now Gora helps making easier to access NoSQL data
stores. You can decide which data store to use e.g. HBase, Cassandra,
Accumulo, and others. But all of these data stores are separate
systems from Nutch. Gora just tries to make it easier to persist data.
Is that your question?


Renato M.

2012/10/9 Mike Baranczak <mb...@gmail.com>:
> I'm trying to get Nutch 2 set up. What's the difference between those two data stores? I've read the javadocs, and I'm still confused.
>
> I already asked on the Nutch mailing list, but nobody seems to know.
>
> -MB
>