You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Kesten Broughton <kb...@21ct.com> on 2014/02/11 17:10:37 UTC

ingest problems

Hi there,

We have been experimenting with accumulo for about two months now.  Our biggest painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way through an ingest and then on a final try it works, without any changes.

Once the ingest works, the cluster is usually stable for querying for weeks or months only requiring the occasional start-all.sh if there is a problem.

Sometimes our ingest can be 24 hours long, and we need a stronger ingest story to be able to commit to accumulo.
Our cluster architecture has been:
3 hdfs datanodes overlaid with name node, secondary nn and accumulo master each collocated with a datanode, and a zookeeper server on each.
We realize this is not optimal and are transitioning to separate hardware for zookeepers and name/secondary/accumulomaster nodes.
However, the big concern is that sometimes a failed ingest will bork the whole cluster and we have to re-init accumulo with an accumulo init destroying all our data.
We have experienced this on at least three different clusters of this description.

The most recent attempt was on a 65GB dataset.   The cluster had been up for over 24 hours.  The ingest test takes 40 mins and about 5 mins in, one of the datanodes failed.
There were no error logs on the failed node, and the two other nodes had logs filled with zookeeper connection errors.  We were unable to recover the cluster and had to re-init.

I know a vague description of problems is difficult to respond to, and the next time we have an ingest failure, i will bring specifics forward.  But I’m writing to know if
1.  Ingest failures are a known fail point for accumulo, or if we are perhaps unlucky/mis-configured.
2.  Are there any guidelines for capturing ingest failures / determining root causes when errors don’t show up in the logs
3.  Are there any means of checkpointing a data ingest, so that if a failure were to occur at hour 23.5 we could roll back to hour 23 and continue.  Client code could checkpoint and restart at the last one, but if the underlying accumulo cluster can’t be recovered, that’s of no use.

thanks,

kesten

Re: ingest problems

Posted by Kesten Broughton <kb...@21ct.com>.
Hi Sean,
Thanks for your detailed questions.  We will add them to the automated log gathering bundle we have.
Responses inline.  It’s not a ton to go on, but now we will be ready to capture every relevant detail for our next ingest tests.


Hi Kesten!

Could you tell us:

1) Accumulo version

accumulo-native-1.5.0  Running on CentOS 6.5, using the RPM distribution package from Apache, on jdk 1.7.0_u25

[user@node-hdfs02:accumulo-1.5.0]$ rpm -q accumulo

accumulo-1.5.0-1.noarch

2) HDFS + ZooKeeper versions

Hadoop 1.2.0.1.3.2.0-111 and Zookeeper version: 3.4.5-111--1, built on 08/20/2013 01:42 GMT

[user@node-hdfs02:accumulo-1.5.0]$ echo status | nc 10.x.y.67 2181

Zookeeper version: 3.4.5-111--1, built on 08/20/2013 01:42 GMT

Clients:

 /10.x.y.67:48232[1](queued=0,recved=73830,sent=73830)

 /10.x.y.67:57837[0](queued=0,recved=1,sent=0)

 /10.x.y.67:49991[1](queued=0,recved=41486,sent=41492)

 /10.x.y.66:41154[1](queued=0,recved=245163,sent=245163)


Latency min/avg/max: 0/0/244

Received: 1943601

Sent: 1943610

Connections: 4

Outstanding: 0

Zxid: 0x1500088030

Mode: leader

Node count: 503

[user@node-hdfs02:accumulo-1.5.0]$ hadoop version

Hadoop 1.2.0.1.3.2.0-111

Subversion git://c64-s8/ on branch comanche-branch-1 -r 3e43bec958e627d53f02d2842f6fac24a93110a9

Compiled by jenkins on Mon Aug 19 18:34:32 PDT 2013

>From source with checksum cf234891d3fd875413caf539bc5aa5ce

This command was run using /usr/lib/hadoop/hadoop-core-1.2.0.1.3.2.0-111.jar


3) are you using the BatchWriter API, or bulk ingest?

Using the BatchWriter API

4) what does your table design look like?

Loading a directed graph model presentation, with attributed nodes and unattributed edges. Vertex identifier are used for Row ID values, family to distinguish between attribute/relation types.

Key     Value
Row ID  Column  Timestamp
Family  Qualifier       Visibility
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    A       attribute1       <unset>        <default>       value1
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    A       attribute2      <unset>          <default>      value2
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    R       relation_name_outgoing/87fd1ea5-0769-3086-b328-c5272ac7d65c     <unset>  <default>      1
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    RI      relation_name_incoming/4231ea5-c527-3086-d65c-0a6c2ac7b328      <unset> <default>



5) what does your source data look like?

The dataset is 65Gb and we have about 200 million documents.  The source data is coming from files of JSON documents, with each document fully defining a vertex or edge.

The vertex files are gzipped with a line for each document, with each line containing a UUID and JSON document pair separated by a space.

The edge files are gzipped with a line for each document, with each line containing the source UUID, destination UUID, and label each separated by a space.


6) what kind of hardware is on these 3 nodes? Memory, disks, CPU cores

For our datacenter baremetal cluster:

Dell 720s, 256 Gb ram, 7 Tb as 24 x 300Gb disks, 24 hyperthreaded cores.  We have 2 disks in raid 10 for the root partition, /dev/sdw mounted at /var and 8 disks mounted as /data/1/dfs/dn to /data/8

They are connected by 10 Gb ethernet.

For our virtual clusters:

Each node is a running as  VMWare deployed VM with the same configuration:

 *
    *   4 CPU Cores
    *   16 GB memory
    *   100 GB (ext4 fs, no lvm)
    *   CentOS 6.5

7) could you post your config files (minus any passwords, usernames, machine names, or instance secrets) in a gist or pastebin so that I can see them?

Summary:  We use ambari to deploy the hdfs cluster and usually the 2Gb example accumulo-env.sh.  On one occasion we used 3Gb example and increased memory map size and related settings.

I have attached the ambari snapshots of the modified hdfs java heap size settings.  In this instance, we cranked them way up, to 4Gb, but we have returned to 2Gb as there is indication that for < 2,000 blocks there is no need for such a large heap and it could be detrimental.  The second attachment shows the extra hdfs-site.xml settings being used such as durable.sync

Redacted config files gist:

https://gist.github.com/darKoram/8dcc63e212d052c70e29

8) could you describe what the failure mode looks like a bit? Does the monitor come up? Does a table remain offline or with unrecovered tablets?

Typically, the monitor will be up, as long as the accumulo master process didn't die.  The dashboard log will show errors.  We have seen lots of Zookeeper Connection errors, IOException, ThriftTransport, !0/ table and other errors plus Warnings for gc Collection and others.  See attachments for more.

If the cluster fails on ingest, when we bring it back up we may see the names of the tables, but dashes for all other entries in the table.

Morgan

In the initial failure, one of the tablet server processes will dissappear from one of the cluster nodes, and the client will fail shortly after with org.apache.accumulo.core.client.TimedOutException talking to the node that failed. It is possible to use the accumulo shell to cluster; however any access to the target table will cause the shell to hang. Likewise the monitor webpage hangs when attempting to load at this point.

Attempting to restart the failed node individually usually doesn't work at this point, as the tablet server exits when started (start-here.sh). Stopping and restarting the cluster of accumulo processes (stop-all.sh, start-all.sh) will allow us to get the accumulo processes running again on all nodes. However, at this point any access to the table that was being loaded will hang (scan, delete, load). From the monitor there aren't any errors shown, and indicates the expected number of tablets are running.


From: Sean Busbey <bu...@clouderagovt.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, February 11, 2014 at 10:24 AM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: ingest problems


Hi Kesten!

Could you tell us:

1) Accumulo version

2) HDFS + ZooKeeper versions

3) are you using the BatchWriter API, or bulk ingest?

4) what does your table design look like?

5) what does your source data look like?

6) what kind of hardware is on these 3 nodes? Memory, disks, CPU cores.

7) could you post your config files (minus any passwords, usernames, machine names, or instance secrets) in a gist or pastebin so that I can see them?

8) could you describe what the failure mode looks like a bit? Does the monitor come up? Does a table remain offline or with unrecovered tablets?

On Feb 11, 2014 10:11 AM, "Kesten Broughton" <kb...@21ct.com>> wrote:
Hi there,

We have been experimenting with accumulo for about two months now.  Our biggest painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way through an ingest and then on a final try it works, without any changes.

Once the ingest works, the cluster is usually stable for querying for weeks or months only requiring the occasional start-all.sh if there is a problem.

Sometimes our ingest can be 24 hours long, and we need a stronger ingest story to be able to commit to accumulo.
Our cluster architecture has been:
3 hdfs datanodes overlaid with name node, secondary nn and accumulo master each collocated with a datanode, and a zookeeper server on each.
We realize this is not optimal and are transitioning to separate hardware for zookeepers and name/secondary/accumulomaster nodes.
However, the big concern is that sometimes a failed ingest will bork the whole cluster and we have to re-init accumulo with an accumulo init destroying all our data.
We have experienced this on at least three different clusters of this description.

The most recent attempt was on a 65GB dataset.   The cluster had been up for over 24 hours.  The ingest test takes 40 mins and about 5 mins in, one of the datanodes failed.
There were no error logs on the failed node, and the two other nodes had logs filled with zookeeper connection errors.  We were unable to recover the cluster and had to re-init.

I know a vague description of problems is difficult to respond to, and the next time we have an ingest failure, i will bring specifics forward.  But I’m writing to know if
1.  Ingest failures are a known fail point for accumulo, or if we are perhaps unlucky/mis-configured.
2.  Are there any guidelines for capturing ingest failures / determining root causes when errors don’t show up in the logs
3.  Are there any means of checkpointing a data ingest, so that if a failure were to occur at hour 23.5 we could roll back to hour 23 and continue.  Client code could checkpoint and restart at the last one, but if the underlying accumulo cluster can’t be recovered, that’s of no use.

thanks,

kesten

Re: ingest problems

Posted by Sean Busbey <bu...@clouderagovt.com>.
Hi Kesten!

Could you tell us:

1) Accumulo version

2) HDFS + ZooKeeper versions

3) are you using the BatchWriter API, or bulk ingest?

4) what does your table design look like?

5) what does your source data look like?

6) what kind of hardware is on these 3 nodes? Memory, disks, CPU cores.

7) could you post your config files (minus any passwords, usernames,
machine names, or instance secrets) in a gist or pastebin so that I can see
them?

8) could you describe what the failure mode looks like a bit? Does the
monitor come up? Does a table remain offline or with unrecovered tablets?
On Feb 11, 2014 10:11 AM, "Kesten Broughton" <kb...@21ct.com> wrote:

> Hi there,
>
> We have been experimenting with accumulo for about two months now.  Our
> biggest painpoint has been on ingest.
> Often we will have ingest process fail 2 or 3 times 3/4 of the way
> through an ingest and then on a final try it works, without any changes.
>
> Once the ingest works, the cluster is usually stable for querying for
> weeks or months only requiring the occasional start-all.sh if there is a
> problem.
>
> Sometimes our ingest can be 24 hours long, and we need a stronger ingest
> story to be able to commit to accumulo.
> Our cluster architecture has been:
> 3 hdfs datanodes overlaid with name node, secondary nn and accumulo master
> each collocated with a datanode, and a zookeeper server on each.
> We realize this is not optimal and are transitioning to separate hardware
> for zookeepers and name/secondary/accumulomaster nodes.
> However, the big concern is that sometimes a failed ingest will bork the
> whole cluster and we have to re-init accumulo with an accumulo init
> destroying all our data.
> We have experienced this on at least three different clusters of this
> description.
>
> The most recent attempt was on a 65GB dataset.   The cluster had been up
> for over 24 hours.  The ingest test takes 40 mins and about 5 mins in, one
> of the datanodes failed.
> There were no error logs on the failed node, and the two other nodes had
> logs filled with zookeeper connection errors.  We were unable to recover
> the cluster and had to re-init.
>
> I know a vague description of problems is difficult to respond to, and the
> next time we have an ingest failure, i will bring specifics forward.  But
> I’m writing to know if
> 1.  Ingest failures are a known fail point for accumulo, or if we are
> perhaps unlucky/mis-configured.
> 2.  Are there any guidelines for capturing ingest failures / determining
> root causes when errors don’t show up in the logs
> 3.  Are there any means of checkpointing a data ingest, so that if a
> failure were to occur at hour 23.5 we could roll back to hour 23 and
> continue.  Client code could checkpoint and restart at the last one, but if
> the underlying accumulo cluster can’t be recovered, that’s of no use.
>
> thanks,
>
> kesten
>

Re: ingest problems

Posted by Kesten Broughton <kb...@21ct.com>.
Hi keith,

My role has been mostly standing up the clusters and triaging failures.
I got the following response from one of our engineers working on the actual ingest:

We are creating a separate Mutation per edge while loading. The edges for a single vertex share the same RowID value, but each edge will have a different qualifier.
I don’t have the original gc logs from the accumulo eval cluster when the error occurred. I checked the backup of the log files and they go overwritten by newer log files after the 3rd node crashed.

From: Keith Turner <ke...@deenlo.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, February 11, 2014 at 2:08 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: ingest problems

When you add edges are you by chance creating one mutation and adding a lot of edges to it?   This could create a large mutation, which would have to fit in JVM memory on the tserver (which looks like its 1g).

Accumulo logs messages about whats going on w/ the java GC every few seconds.   Try grepping the tserver logs for GC, what does this look like?


On Tue, Feb 11, 2014 at 2:23 PM, Kesten Broughton <kb...@21ct.com>> wrote:
Hi david,

Responses inline


What is the average load on the servers while the ingest runs?

We are seeing ingest rates (ingest column on accumulo dashboard) of 200-400k.  Load is low, perhaps up to 1 on a 4 core vm.  Less on bare-metal.  Often we see only one tablet server (of two) ingesting.  However, both show they are online.  Sometimes it is just highly skewed.  We are now running pre-split ingests.

How large are the mutations?

How do we determine this?

What are your heap sizes?

Typically we are running with configs based on the example 2Gb accumulo-site.xml.  Our block count is under 2000.   See config bundle for more details.

How much memory do the servers have?

metal hdfs cluster - 256 Gb, 2nd metal cluster - 128 Gb, virtual boxes - 16Gb

Can you move beyond a three node cluster?

We are moving to this now.   ETA 2 days for virtualized stack of 9 nodes, 8 core, 64 Gb, fully separating 3 zookeepers, master nodes and 3 datanodes.  ETA for metal version of this 1-2 weeks.

Are you querying while writing to the same table?

No

[http://portal.mxlogic.com/images/transparent.gif]|

From: David Medinets <da...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, February 11, 2014 at 10:55 AM
To: accumulo-user <us...@accumulo.apache.org>>
Subject: Re: ingest problems

My cluster ingests data every night. We use a map-reduce program to generate rFiles. Then import those files into Accumulo. No hiccups. No instability. I've also used map-reduce to directly write mutations. Haven't seen any issues there either.

What is the average load on the servers while the ingest runs?
How large are the mutations?
What are your heap sizes?
How much memory do the servers have?
Can you move beyond a three node cluster?
Are you querying while writing to the same table?



On Tue, Feb 11, 2014 at 11:28 AM, Josh Elser <jo...@gmail.com>> wrote:
On 2/11/14, 11:10 AM, Kesten Broughton wrote:
Hi there,

We have been experimenting with accumulo for about two months now.  Our
biggest painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way
through an ingest and then on a final try it works, without any changes.

Funny, most times I hear that people consider Accumulo to handles ingest fairly well, but let's see what we can do to help.

We need a bit more information than what you provided here though: what's your "ingest process"? Are you using some other workflow library? Are you running MapReduce? Do you just have a Java class with a main method that uses a BatchWriter?

The fact that it "works sometimes" implies that the problem might be resource related.


Once the ingest works, the cluster is usually stable for querying for
weeks or months only requiring the occasional start-all.sh if there is a
problem.

Sometimes our ingest can be 24 hours long, and we need a stronger ingest
story to be able to commit to accumulo.

You should be able to run ingest 24/7 with Accumulo without it falling over (I do regularly to stress-test it). The limitation should only be the disk-space you have available.


Our cluster architecture has been:
3 hdfs datanodes overlaid with name node, secondary nn and accumulo
master each collocated with a datanode, and a zookeeper server on each.
We realize this is not optimal and are transitioning to separate
hardware for zookeepers and name/secondary/accumulomaster nodes.
However, the big concern is that sometimes a failed ingest will bork the
whole cluster and we have to re-init accumulo with an accumulo init
destroying all our data.
We have experienced this on at least three different clusters of this
description.

Can you be more specific than "bork the whole cluster"? Unless you're hitting a really nasty bug, there shouldn't be any way that a client writing data into Accumulo will destroy an instance.


The most recent attempt was on a 65GB dataset.   The cluster had been up
for over 24 hours.  The ingest test takes 40 mins and about 5 mins
in, one of the datanodes failed.
There were no error logs on the failed node, and the two other nodes had
logs filled with zookeeper connection errors.  We were unable to recover
the cluster and had to re-init.

Check both the log4j logs and the stdout/stderr redirection files for the datanode process. Typically, if you get an OOME, log4j gets torn down before that exception can be printed to the normal log files. "Silent" failures seem indicative of lack of physical resources (over-subscribed the node) on the box or insufficient resources provided to the processes (-Xmx was too small for the process).


I know a vague description of problems is difficult to respond to, and
the next time we have an ingest failure, i will bring specifics forward.
  But I’m writing to know if
1.  Ingest failures are a known fail point for accumulo, or if we are
perhaps unlucky/mis-configured.

No -- something else is going on here.


2.  Are there any guidelines for capturing ingest failures / determining
root causes when errors don’t show up in the logs

For any help request, be sure to gather Accumulo, Hadoop and ZooKeeper versions, OS and Java versions. Capturing log files and stdout/stderr files are important; beware that if you restart the Accumulo process on that node, it will overwrite the stdout/stderr files, so make sure to copy them out of the way.


3.  Are there any means of checkpointing a data ingest, so that if a
failure were to occur at hour 23.5 we could roll back to hour 23 and
continue.  Client code could checkpoint and restart at the last one, but
if the underlying accumulo cluster can’t be recovered, that’s of no use.

You can do anything you want in your client ingest code :)

Assuming that you're using a BatchWriter, if you manually call flush() and it returns without Exception, you can assume that all data up to that point written with that BatchWriter instance is "ingested". This can easily extrapolated: if you're ingesting CSV files, ensure that a flush() happens every 1000lines and denote that somewhere that your ingest process can advance itself to the appropriate place in the CSV file and proceed from where it left off.

thanks,

kesten



Re: ingest problems

Posted by Kesten Broughton <kb...@21ct.com>.
we are now pre-splitting data for ingest.  Still seeing the error that morgan mentioned in q. 8 response to Sean Busbey


From: David Medinets <da...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, February 11, 2014 at 2:20 PM
To: accumulo-user <us...@accumulo.apache.org>>
Subject: Re: ingest problems

In your example, the row ID is "000c35b2-ee6c-339e-9e6a-65a9bccbfa2c". If you are using one UUID
for all of the ingested data, then you'd be creating one large row and just one tablet would
be ingesting the information.

If you are using more than one UUID in the row fields, are you pre-splitting the accumulo table before the ingest process?



On Tue, Feb 11, 2014 at 3:08 PM, Keith Turner <ke...@deenlo.com>> wrote:
When you add edges are you by chance creating one mutation and adding a lot of edges to it?   This could create a large mutation, which would have to fit in JVM memory on the tserver (which looks like its 1g).

Accumulo logs messages about whats going on w/ the java GC every few seconds.   Try grepping the tserver logs for GC, what does this look like?


On Tue, Feb 11, 2014 at 2:23 PM, Kesten Broughton <kb...@21ct.com>> wrote:
Hi david,

Responses inline


What is the average load on the servers while the ingest runs?

We are seeing ingest rates (ingest column on accumulo dashboard) of 200-400k.  Load is low, perhaps up to 1 on a 4 core vm.  Less on bare-metal.  Often we see only one tablet server (of two) ingesting.  However, both show they are online.  Sometimes it is just highly skewed.  We are now running pre-split ingests.

How large are the mutations?

How do we determine this?

What are your heap sizes?

Typically we are running with configs based on the example 2Gb accumulo-site.xml.  Our block count is under 2000.   See config bundle for more details.

How much memory do the servers have?

metal hdfs cluster - 256 Gb, 2nd metal cluster - 128 Gb, virtual boxes - 16Gb

Can you move beyond a three node cluster?

We are moving to this now.   ETA 2 days for virtualized stack of 9 nodes, 8 core, 64 Gb, fully separating 3 zookeepers, master nodes and 3 datanodes.  ETA for metal version of this 1-2 weeks.

Are you querying while writing to the same table?

No

[http://portal.mxlogic.com/images/transparent.gif]|

From: David Medinets <da...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, February 11, 2014 at 10:55 AM
To: accumulo-user <us...@accumulo.apache.org>>
Subject: Re: ingest problems

My cluster ingests data every night. We use a map-reduce program to generate rFiles. Then import those files into Accumulo. No hiccups. No instability. I've also used map-reduce to directly write mutations. Haven't seen any issues there either.

What is the average load on the servers while the ingest runs?
How large are the mutations?
What are your heap sizes?
How much memory do the servers have?
Can you move beyond a three node cluster?
Are you querying while writing to the same table?



On Tue, Feb 11, 2014 at 11:28 AM, Josh Elser <jo...@gmail.com>> wrote:
On 2/11/14, 11:10 AM, Kesten Broughton wrote:
Hi there,

We have been experimenting with accumulo for about two months now.  Our
biggest painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way
through an ingest and then on a final try it works, without any changes.

Funny, most times I hear that people consider Accumulo to handles ingest fairly well, but let's see what we can do to help.

We need a bit more information than what you provided here though: what's your "ingest process"? Are you using some other workflow library? Are you running MapReduce? Do you just have a Java class with a main method that uses a BatchWriter?

The fact that it "works sometimes" implies that the problem might be resource related.


Once the ingest works, the cluster is usually stable for querying for
weeks or months only requiring the occasional start-all.sh if there is a
problem.

Sometimes our ingest can be 24 hours long, and we need a stronger ingest
story to be able to commit to accumulo.

You should be able to run ingest 24/7 with Accumulo without it falling over (I do regularly to stress-test it). The limitation should only be the disk-space you have available.


Our cluster architecture has been:
3 hdfs datanodes overlaid with name node, secondary nn and accumulo
master each collocated with a datanode, and a zookeeper server on each.
We realize this is not optimal and are transitioning to separate
hardware for zookeepers and name/secondary/accumulomaster nodes.
However, the big concern is that sometimes a failed ingest will bork the
whole cluster and we have to re-init accumulo with an accumulo init
destroying all our data.
We have experienced this on at least three different clusters of this
description.

Can you be more specific than "bork the whole cluster"? Unless you're hitting a really nasty bug, there shouldn't be any way that a client writing data into Accumulo will destroy an instance.


The most recent attempt was on a 65GB dataset.   The cluster had been up
for over 24 hours.  The ingest test takes 40 mins and about 5 mins
in, one of the datanodes failed.
There were no error logs on the failed node, and the two other nodes had
logs filled with zookeeper connection errors.  We were unable to recover
the cluster and had to re-init.

Check both the log4j logs and the stdout/stderr redirection files for the datanode process. Typically, if you get an OOME, log4j gets torn down before that exception can be printed to the normal log files. "Silent" failures seem indicative of lack of physical resources (over-subscribed the node) on the box or insufficient resources provided to the processes (-Xmx was too small for the process).


I know a vague description of problems is difficult to respond to, and
the next time we have an ingest failure, i will bring specifics forward.
  But I’m writing to know if
1.  Ingest failures are a known fail point for accumulo, or if we are
perhaps unlucky/mis-configured.

No -- something else is going on here.


2.  Are there any guidelines for capturing ingest failures / determining
root causes when errors don’t show up in the logs

For any help request, be sure to gather Accumulo, Hadoop and ZooKeeper versions, OS and Java versions. Capturing log files and stdout/stderr files are important; beware that if you restart the Accumulo process on that node, it will overwrite the stdout/stderr files, so make sure to copy them out of the way.


3.  Are there any means of checkpointing a data ingest, so that if a
failure were to occur at hour 23.5 we could roll back to hour 23 and
continue.  Client code could checkpoint and restart at the last one, but
if the underlying accumulo cluster can’t be recovered, that’s of no use.

You can do anything you want in your client ingest code :)

Assuming that you're using a BatchWriter, if you manually call flush() and it returns without Exception, you can assume that all data up to that point written with that BatchWriter instance is "ingested". This can easily extrapolated: if you're ingesting CSV files, ensure that a flush() happens every 1000lines and denote that somewhere that your ingest process can advance itself to the appropriate place in the CSV file and proceed from where it left off.

thanks,

kesten




Re: ingest problems

Posted by David Medinets <da...@gmail.com>.
In your example, the row ID is "000c35b2-ee6c-339e-9e6a-65a9bccbfa2c". If
you are using one UUID
for all of the ingested data, then you'd be creating one large row and just
one tablet would
be ingesting the information.

If you are using more than one UUID in the row fields, are you
pre-splitting the accumulo table before the ingest process?



On Tue, Feb 11, 2014 at 3:08 PM, Keith Turner <ke...@deenlo.com> wrote:

> When you add edges are you by chance creating one mutation and adding a
> lot of edges to it?   This could create a large mutation, which would have
> to fit in JVM memory on the tserver (which looks like its 1g).
>
> Accumulo logs messages about whats going on w/ the java GC every few
> seconds.   Try grepping the tserver logs for GC, what does this look like?
>
>
> On Tue, Feb 11, 2014 at 2:23 PM, Kesten Broughton <kb...@21ct.com>wrote:
>
>> Hi david,
>>
>> Responses inline
>>
>> What is the average load on the servers while the ingest runs?
>>
>> We are seeing ingest rates (ingest column on accumulo dashboard) of
>> 200-400k.  Load is low, perhaps up to 1 on a 4 core vm.  Less on
>> bare-metal.  Often we see only one tablet server (of two) ingesting.
>>  However, both show they are online.  Sometimes it is just highly skewed.
>>  We are now running pre-split ingests.
>>
>> How large are the mutations?
>>
>> How do we determine this?
>>
>> What are your heap sizes?
>>
>> Typically we are running with configs based on the example 2Gb
>> accumulo-site.xml.  Our block count is under 2000.   See config bundle for
>> more details.
>>
>> How much memory do the servers have?
>>
>> metal hdfs cluster - 256 Gb, 2nd metal cluster - 128 Gb, virtual boxes -
>> 16Gb
>>
>> Can you move beyond a three node cluster?
>>
>> We are moving to this now.   ETA 2 days for virtualized stack of 9 nodes,
>> 8 core, 64 Gb, fully separating 3 zookeepers, master nodes and 3 datanodes.
>>  ETA for metal version of this 1-2 weeks.
>>
>> Are you querying while writing to the same table?
>>
>> No
>> |
>>
>> From: David Medinets <da...@gmail.com>
>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>> Date: Tuesday, February 11, 2014 at 10:55 AM
>> To: accumulo-user <us...@accumulo.apache.org>
>> Subject: Re: ingest problems
>>
>> My cluster ingests data every night. We use a map-reduce program to
>> generate rFiles. Then import those files into Accumulo. No hiccups. No
>> instability. I've also used map-reduce to directly write mutations. Haven't
>> seen any issues there either.
>>
>> What is the average load on the servers while the ingest runs?
>> How large are the mutations?
>> What are your heap sizes?
>> How much memory do the servers have?
>> Can you move beyond a three node cluster?
>> Are you querying while writing to the same table?
>>
>>
>>
>> On Tue, Feb 11, 2014 at 11:28 AM, Josh Elser <jo...@gmail.com>wrote:
>>
>>> On 2/11/14, 11:10 AM, Kesten Broughton wrote:
>>>
>>>> Hi there,
>>>>
>>>> We have been experimenting with accumulo for about two months now.  Our
>>>> biggest painpoint has been on ingest.
>>>> Often we will have ingest process fail 2 or 3 times 3/4 of the way
>>>> through an ingest and then on a final try it works, without any changes.
>>>>
>>>
>>> Funny, most times I hear that people consider Accumulo to handles ingest
>>> fairly well, but let's see what we can do to help.
>>>
>>> We need a bit more information than what you provided here though:
>>> what's your "ingest process"? Are you using some other workflow library?
>>> Are you running MapReduce? Do you just have a Java class with a main method
>>> that uses a BatchWriter?
>>>
>>> The fact that it "works sometimes" implies that the problem might be
>>> resource related.
>>>
>>>
>>> Once the ingest works, the cluster is usually stable for querying for
>>>> weeks or months only requiring the occasional start-all.sh if there is a
>>>> problem.
>>>>
>>>> Sometimes our ingest can be 24 hours long, and we need a stronger ingest
>>>> story to be able to commit to accumulo.
>>>>
>>>
>>> You should be able to run ingest 24/7 with Accumulo without it falling
>>> over (I do regularly to stress-test it). The limitation should only be the
>>> disk-space you have available.
>>>
>>>
>>> Our cluster architecture has been:
>>>> 3 hdfs datanodes overlaid with name node, secondary nn and accumulo
>>>> master each collocated with a datanode, and a zookeeper server on each.
>>>> We realize this is not optimal and are transitioning to separate
>>>> hardware for zookeepers and name/secondary/accumulomaster nodes.
>>>> However, the big concern is that sometimes a failed ingest will bork the
>>>> whole cluster and we have to re-init accumulo with an accumulo init
>>>> destroying all our data.
>>>> We have experienced this on at least three different clusters of this
>>>> description.
>>>>
>>>
>>> Can you be more specific than "bork the whole cluster"? Unless you're
>>> hitting a really nasty bug, there shouldn't be any way that a client
>>> writing data into Accumulo will destroy an instance.
>>>
>>>
>>> The most recent attempt was on a 65GB dataset.   The cluster had been up
>>>> for over 24 hours.  The ingest test takes 40 mins and about 5 mins
>>>> in, one of the datanodes failed.
>>>> There were no error logs on the failed node, and the two other nodes had
>>>> logs filled with zookeeper connection errors.  We were unable to recover
>>>> the cluster and had to re-init.
>>>>
>>>
>>> Check both the log4j logs and the stdout/stderr redirection files for
>>> the datanode process. Typically, if you get an OOME, log4j gets torn down
>>> before that exception can be printed to the normal log files. "Silent"
>>> failures seem indicative of lack of physical resources (over-subscribed the
>>> node) on the box or insufficient resources provided to the processes (-Xmx
>>> was too small for the process).
>>>
>>>
>>> I know a vague description of problems is difficult to respond to, and
>>>> the next time we have an ingest failure, i will bring specifics forward.
>>>>   But I'm writing to know if
>>>> 1.  Ingest failures are a known fail point for accumulo, or if we are
>>>> perhaps unlucky/mis-configured.
>>>>
>>>
>>> No -- something else is going on here.
>>>
>>>
>>> 2.  Are there any guidelines for capturing ingest failures / determining
>>>> root causes when errors don't show up in the logs
>>>>
>>>
>>> For any help request, be sure to gather Accumulo, Hadoop and ZooKeeper
>>> versions, OS and Java versions. Capturing log files and stdout/stderr files
>>> are important; beware that if you restart the Accumulo process on that
>>> node, it will overwrite the stdout/stderr files, so make sure to copy them
>>> out of the way.
>>>
>>>
>>> 3.  Are there any means of checkpointing a data ingest, so that if a
>>>> failure were to occur at hour 23.5 we could roll back to hour 23 and
>>>> continue.  Client code could checkpoint and restart at the last one, but
>>>> if the underlying accumulo cluster can't be recovered, that's of no use.
>>>>
>>>
>>> You can do anything you want in your client ingest code :)
>>>
>>> Assuming that you're using a BatchWriter, if you manually call flush()
>>> and it returns without Exception, you can assume that all data up to that
>>> point written with that BatchWriter instance is "ingested". This can easily
>>> extrapolated: if you're ingesting CSV files, ensure that a flush() happens
>>> every 1000lines and denote that somewhere that your ingest process can
>>> advance itself to the appropriate place in the CSV file and proceed from
>>> where it left off.
>>>
>>> thanks,
>>>>
>>>> kesten
>>>>
>>>
>>
>

Re: ingest problems

Posted by Keith Turner <ke...@deenlo.com>.
When you add edges are you by chance creating one mutation and adding a lot
of edges to it?   This could create a large mutation, which would have to
fit in JVM memory on the tserver (which looks like its 1g).

Accumulo logs messages about whats going on w/ the java GC every few
seconds.   Try grepping the tserver logs for GC, what does this look like?


On Tue, Feb 11, 2014 at 2:23 PM, Kesten Broughton <kb...@21ct.com>wrote:

> Hi david,
>
> Responses inline
>
> What is the average load on the servers while the ingest runs?
>
> We are seeing ingest rates (ingest column on accumulo dashboard) of
> 200-400k.  Load is low, perhaps up to 1 on a 4 core vm.  Less on
> bare-metal.  Often we see only one tablet server (of two) ingesting.
>  However, both show they are online.  Sometimes it is just highly skewed.
>  We are now running pre-split ingests.
>
> How large are the mutations?
>
> How do we determine this?
>
> What are your heap sizes?
>
> Typically we are running with configs based on the example 2Gb
> accumulo-site.xml.  Our block count is under 2000.   See config bundle for
> more details.
>
> How much memory do the servers have?
>
> metal hdfs cluster - 256 Gb, 2nd metal cluster - 128 Gb, virtual boxes -
> 16Gb
>
> Can you move beyond a three node cluster?
>
> We are moving to this now.   ETA 2 days for virtualized stack of 9 nodes,
> 8 core, 64 Gb, fully separating 3 zookeepers, master nodes and 3 datanodes.
>  ETA for metal version of this 1-2 weeks.
>
> Are you querying while writing to the same table?
>
> No
> |
>
> From: David Medinets <da...@gmail.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Tuesday, February 11, 2014 at 10:55 AM
> To: accumulo-user <us...@accumulo.apache.org>
> Subject: Re: ingest problems
>
> My cluster ingests data every night. We use a map-reduce program to
> generate rFiles. Then import those files into Accumulo. No hiccups. No
> instability. I've also used map-reduce to directly write mutations. Haven't
> seen any issues there either.
>
> What is the average load on the servers while the ingest runs?
> How large are the mutations?
> What are your heap sizes?
> How much memory do the servers have?
> Can you move beyond a three node cluster?
> Are you querying while writing to the same table?
>
>
>
> On Tue, Feb 11, 2014 at 11:28 AM, Josh Elser <jo...@gmail.com> wrote:
>
>> On 2/11/14, 11:10 AM, Kesten Broughton wrote:
>>
>>> Hi there,
>>>
>>> We have been experimenting with accumulo for about two months now.  Our
>>> biggest painpoint has been on ingest.
>>> Often we will have ingest process fail 2 or 3 times 3/4 of the way
>>> through an ingest and then on a final try it works, without any changes.
>>>
>>
>> Funny, most times I hear that people consider Accumulo to handles ingest
>> fairly well, but let's see what we can do to help.
>>
>> We need a bit more information than what you provided here though: what's
>> your "ingest process"? Are you using some other workflow library? Are you
>> running MapReduce? Do you just have a Java class with a main method that
>> uses a BatchWriter?
>>
>> The fact that it "works sometimes" implies that the problem might be
>> resource related.
>>
>>
>> Once the ingest works, the cluster is usually stable for querying for
>>> weeks or months only requiring the occasional start-all.sh if there is a
>>> problem.
>>>
>>> Sometimes our ingest can be 24 hours long, and we need a stronger ingest
>>> story to be able to commit to accumulo.
>>>
>>
>> You should be able to run ingest 24/7 with Accumulo without it falling
>> over (I do regularly to stress-test it). The limitation should only be the
>> disk-space you have available.
>>
>>
>> Our cluster architecture has been:
>>> 3 hdfs datanodes overlaid with name node, secondary nn and accumulo
>>> master each collocated with a datanode, and a zookeeper server on each.
>>> We realize this is not optimal and are transitioning to separate
>>> hardware for zookeepers and name/secondary/accumulomaster nodes.
>>> However, the big concern is that sometimes a failed ingest will bork the
>>> whole cluster and we have to re-init accumulo with an accumulo init
>>> destroying all our data.
>>> We have experienced this on at least three different clusters of this
>>> description.
>>>
>>
>> Can you be more specific than "bork the whole cluster"? Unless you're
>> hitting a really nasty bug, there shouldn't be any way that a client
>> writing data into Accumulo will destroy an instance.
>>
>>
>> The most recent attempt was on a 65GB dataset.   The cluster had been up
>>> for over 24 hours.  The ingest test takes 40 mins and about 5 mins
>>> in, one of the datanodes failed.
>>> There were no error logs on the failed node, and the two other nodes had
>>> logs filled with zookeeper connection errors.  We were unable to recover
>>> the cluster and had to re-init.
>>>
>>
>> Check both the log4j logs and the stdout/stderr redirection files for the
>> datanode process. Typically, if you get an OOME, log4j gets torn down
>> before that exception can be printed to the normal log files. "Silent"
>> failures seem indicative of lack of physical resources (over-subscribed the
>> node) on the box or insufficient resources provided to the processes (-Xmx
>> was too small for the process).
>>
>>
>> I know a vague description of problems is difficult to respond to, and
>>> the next time we have an ingest failure, i will bring specifics forward.
>>>   But I'm writing to know if
>>> 1.  Ingest failures are a known fail point for accumulo, or if we are
>>> perhaps unlucky/mis-configured.
>>>
>>
>> No -- something else is going on here.
>>
>>
>> 2.  Are there any guidelines for capturing ingest failures / determining
>>> root causes when errors don't show up in the logs
>>>
>>
>> For any help request, be sure to gather Accumulo, Hadoop and ZooKeeper
>> versions, OS and Java versions. Capturing log files and stdout/stderr files
>> are important; beware that if you restart the Accumulo process on that
>> node, it will overwrite the stdout/stderr files, so make sure to copy them
>> out of the way.
>>
>>
>> 3.  Are there any means of checkpointing a data ingest, so that if a
>>> failure were to occur at hour 23.5 we could roll back to hour 23 and
>>> continue.  Client code could checkpoint and restart at the last one, but
>>> if the underlying accumulo cluster can't be recovered, that's of no use.
>>>
>>
>> You can do anything you want in your client ingest code :)
>>
>> Assuming that you're using a BatchWriter, if you manually call flush()
>> and it returns without Exception, you can assume that all data up to that
>> point written with that BatchWriter instance is "ingested". This can easily
>> extrapolated: if you're ingesting CSV files, ensure that a flush() happens
>> every 1000lines and denote that somewhere that your ingest process can
>> advance itself to the appropriate place in the CSV file and proceed from
>> where it left off.
>>
>> thanks,
>>>
>>> kesten
>>>
>>
>

Re: ingest problems

Posted by Kesten Broughton <kb...@21ct.com>.
Hi david,

Responses inline


What is the average load on the servers while the ingest runs?

We are seeing ingest rates (ingest column on accumulo dashboard) of 200-400k.  Load is low, perhaps up to 1 on a 4 core vm.  Less on bare-metal.  Often we see only one tablet server (of two) ingesting.  However, both show they are online.  Sometimes it is just highly skewed.  We are now running pre-split ingests.

How large are the mutations?

How do we determine this?

What are your heap sizes?

Typically we are running with configs based on the example 2Gb accumulo-site.xml.  Our block count is under 2000.   See config bundle for more details.

How much memory do the servers have?

metal hdfs cluster - 256 Gb, 2nd metal cluster - 128 Gb, virtual boxes - 16Gb

Can you move beyond a three node cluster?

We are moving to this now.   ETA 2 days for virtualized stack of 9 nodes, 8 core, 64 Gb, fully separating 3 zookeepers, master nodes and 3 datanodes.  ETA for metal version of this 1-2 weeks.

Are you querying while writing to the same table?

No

[http://confluence.21technologies.com/download/attachments/46170273/acc-tframed-transport.png?version=1&modificationDate=1392139706470&api=v2]|

From: David Medinets <da...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, February 11, 2014 at 10:55 AM
To: accumulo-user <us...@accumulo.apache.org>>
Subject: Re: ingest problems

My cluster ingests data every night. We use a map-reduce program to generate rFiles. Then import those files into Accumulo. No hiccups. No instability. I've also used map-reduce to directly write mutations. Haven't seen any issues there either.

What is the average load on the servers while the ingest runs?
How large are the mutations?
What are your heap sizes?
How much memory do the servers have?
Can you move beyond a three node cluster?
Are you querying while writing to the same table?



On Tue, Feb 11, 2014 at 11:28 AM, Josh Elser <jo...@gmail.com>> wrote:
On 2/11/14, 11:10 AM, Kesten Broughton wrote:
Hi there,

We have been experimenting with accumulo for about two months now.  Our
biggest painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way
through an ingest and then on a final try it works, without any changes.

Funny, most times I hear that people consider Accumulo to handles ingest fairly well, but let's see what we can do to help.

We need a bit more information than what you provided here though: what's your "ingest process"? Are you using some other workflow library? Are you running MapReduce? Do you just have a Java class with a main method that uses a BatchWriter?

The fact that it "works sometimes" implies that the problem might be resource related.


Once the ingest works, the cluster is usually stable for querying for
weeks or months only requiring the occasional start-all.sh if there is a
problem.

Sometimes our ingest can be 24 hours long, and we need a stronger ingest
story to be able to commit to accumulo.

You should be able to run ingest 24/7 with Accumulo without it falling over (I do regularly to stress-test it). The limitation should only be the disk-space you have available.


Our cluster architecture has been:
3 hdfs datanodes overlaid with name node, secondary nn and accumulo
master each collocated with a datanode, and a zookeeper server on each.
We realize this is not optimal and are transitioning to separate
hardware for zookeepers and name/secondary/accumulomaster nodes.
However, the big concern is that sometimes a failed ingest will bork the
whole cluster and we have to re-init accumulo with an accumulo init
destroying all our data.
We have experienced this on at least three different clusters of this
description.

Can you be more specific than "bork the whole cluster"? Unless you're hitting a really nasty bug, there shouldn't be any way that a client writing data into Accumulo will destroy an instance.


The most recent attempt was on a 65GB dataset.   The cluster had been up
for over 24 hours.  The ingest test takes 40 mins and about 5 mins
in, one of the datanodes failed.
There were no error logs on the failed node, and the two other nodes had
logs filled with zookeeper connection errors.  We were unable to recover
the cluster and had to re-init.

Check both the log4j logs and the stdout/stderr redirection files for the datanode process. Typically, if you get an OOME, log4j gets torn down before that exception can be printed to the normal log files. "Silent" failures seem indicative of lack of physical resources (over-subscribed the node) on the box or insufficient resources provided to the processes (-Xmx was too small for the process).


I know a vague description of problems is difficult to respond to, and
the next time we have an ingest failure, i will bring specifics forward.
  But I’m writing to know if
1.  Ingest failures are a known fail point for accumulo, or if we are
perhaps unlucky/mis-configured.

No -- something else is going on here.


2.  Are there any guidelines for capturing ingest failures / determining
root causes when errors don’t show up in the logs

For any help request, be sure to gather Accumulo, Hadoop and ZooKeeper versions, OS and Java versions. Capturing log files and stdout/stderr files are important; beware that if you restart the Accumulo process on that node, it will overwrite the stdout/stderr files, so make sure to copy them out of the way.


3.  Are there any means of checkpointing a data ingest, so that if a
failure were to occur at hour 23.5 we could roll back to hour 23 and
continue.  Client code could checkpoint and restart at the last one, but
if the underlying accumulo cluster can’t be recovered, that’s of no use.

You can do anything you want in your client ingest code :)

Assuming that you're using a BatchWriter, if you manually call flush() and it returns without Exception, you can assume that all data up to that point written with that BatchWriter instance is "ingested". This can easily extrapolated: if you're ingesting CSV files, ensure that a flush() happens every 1000lines and denote that somewhere that your ingest process can advance itself to the appropriate place in the CSV file and proceed from where it left off.

thanks,

kesten


Re: ingest problems

Posted by David Medinets <da...@gmail.com>.
My cluster ingests data every night. We use a map-reduce program to
generate rFiles. Then import those files into Accumulo. No hiccups. No
instability. I've also used map-reduce to directly write mutations. Haven't
seen any issues there either.

What is the average load on the servers while the ingest runs?
How large are the mutations?
What are your heap sizes?
How much memory do the servers have?
Can you move beyond a three node cluster?
Are you querying while writing to the same table?



On Tue, Feb 11, 2014 at 11:28 AM, Josh Elser <jo...@gmail.com> wrote:

> On 2/11/14, 11:10 AM, Kesten Broughton wrote:
>
>> Hi there,
>>
>> We have been experimenting with accumulo for about two months now.  Our
>> biggest painpoint has been on ingest.
>> Often we will have ingest process fail 2 or 3 times 3/4 of the way
>> through an ingest and then on a final try it works, without any changes.
>>
>
> Funny, most times I hear that people consider Accumulo to handles ingest
> fairly well, but let's see what we can do to help.
>
> We need a bit more information than what you provided here though: what's
> your "ingest process"? Are you using some other workflow library? Are you
> running MapReduce? Do you just have a Java class with a main method that
> uses a BatchWriter?
>
> The fact that it "works sometimes" implies that the problem might be
> resource related.
>
>
>  Once the ingest works, the cluster is usually stable for querying for
>> weeks or months only requiring the occasional start-all.sh if there is a
>> problem.
>>
>> Sometimes our ingest can be 24 hours long, and we need a stronger ingest
>> story to be able to commit to accumulo.
>>
>
> You should be able to run ingest 24/7 with Accumulo without it falling
> over (I do regularly to stress-test it). The limitation should only be the
> disk-space you have available.
>
>
>  Our cluster architecture has been:
>> 3 hdfs datanodes overlaid with name node, secondary nn and accumulo
>> master each collocated with a datanode, and a zookeeper server on each.
>> We realize this is not optimal and are transitioning to separate
>> hardware for zookeepers and name/secondary/accumulomaster nodes.
>> However, the big concern is that sometimes a failed ingest will bork the
>> whole cluster and we have to re-init accumulo with an accumulo init
>> destroying all our data.
>> We have experienced this on at least three different clusters of this
>> description.
>>
>
> Can you be more specific than "bork the whole cluster"? Unless you're
> hitting a really nasty bug, there shouldn't be any way that a client
> writing data into Accumulo will destroy an instance.
>
>
>  The most recent attempt was on a 65GB dataset.   The cluster had been up
>> for over 24 hours.  The ingest test takes 40 mins and about 5 mins
>> in, one of the datanodes failed.
>> There were no error logs on the failed node, and the two other nodes had
>> logs filled with zookeeper connection errors.  We were unable to recover
>> the cluster and had to re-init.
>>
>
> Check both the log4j logs and the stdout/stderr redirection files for the
> datanode process. Typically, if you get an OOME, log4j gets torn down
> before that exception can be printed to the normal log files. "Silent"
> failures seem indicative of lack of physical resources (over-subscribed the
> node) on the box or insufficient resources provided to the processes (-Xmx
> was too small for the process).
>
>
>  I know a vague description of problems is difficult to respond to, and
>> the next time we have an ingest failure, i will bring specifics forward.
>>   But I'm writing to know if
>> 1.  Ingest failures are a known fail point for accumulo, or if we are
>> perhaps unlucky/mis-configured.
>>
>
> No -- something else is going on here.
>
>
>  2.  Are there any guidelines for capturing ingest failures / determining
>> root causes when errors don't show up in the logs
>>
>
> For any help request, be sure to gather Accumulo, Hadoop and ZooKeeper
> versions, OS and Java versions. Capturing log files and stdout/stderr files
> are important; beware that if you restart the Accumulo process on that
> node, it will overwrite the stdout/stderr files, so make sure to copy them
> out of the way.
>
>
>  3.  Are there any means of checkpointing a data ingest, so that if a
>> failure were to occur at hour 23.5 we could roll back to hour 23 and
>> continue.  Client code could checkpoint and restart at the last one, but
>> if the underlying accumulo cluster can't be recovered, that's of no use.
>>
>
> You can do anything you want in your client ingest code :)
>
> Assuming that you're using a BatchWriter, if you manually call flush() and
> it returns without Exception, you can assume that all data up to that point
> written with that BatchWriter instance is "ingested". This can easily
> extrapolated: if you're ingesting CSV files, ensure that a flush() happens
> every 1000lines and denote that somewhere that your ingest process can
> advance itself to the appropriate place in the CSV file and proceed from
> where it left off.
>
>  thanks,
>>
>> kesten
>>
>

Re: ingest problems

Posted by Josh Elser <jo...@gmail.com>.
On 2/11/14, 11:10 AM, Kesten Broughton wrote:
> Hi there,
>
> We have been experimenting with accumulo for about two months now.  Our
> biggest painpoint has been on ingest.
> Often we will have ingest process fail 2 or 3 times 3/4 of the way
> through an ingest and then on a final try it works, without any changes.

Funny, most times I hear that people consider Accumulo to handles ingest 
fairly well, but let's see what we can do to help.

We need a bit more information than what you provided here though: 
what's your "ingest process"? Are you using some other workflow library? 
Are you running MapReduce? Do you just have a Java class with a main 
method that uses a BatchWriter?

The fact that it "works sometimes" implies that the problem might be 
resource related.

> Once the ingest works, the cluster is usually stable for querying for
> weeks or months only requiring the occasional start-all.sh if there is a
> problem.
>
> Sometimes our ingest can be 24 hours long, and we need a stronger ingest
> story to be able to commit to accumulo.

You should be able to run ingest 24/7 with Accumulo without it falling 
over (I do regularly to stress-test it). The limitation should only be 
the disk-space you have available.

> Our cluster architecture has been:
> 3 hdfs datanodes overlaid with name node, secondary nn and accumulo
> master each collocated with a datanode, and a zookeeper server on each.
> We realize this is not optimal and are transitioning to separate
> hardware for zookeepers and name/secondary/accumulomaster nodes.
> However, the big concern is that sometimes a failed ingest will bork the
> whole cluster and we have to re-init accumulo with an accumulo init
> destroying all our data.
> We have experienced this on at least three different clusters of this
> description.

Can you be more specific than "bork the whole cluster"? Unless you're 
hitting a really nasty bug, there shouldn't be any way that a client 
writing data into Accumulo will destroy an instance.

> The most recent attempt was on a 65GB dataset.   The cluster had been up
> for over 24 hours.  The ingest test takes 40 mins and about 5 mins
> in, one of the datanodes failed.
> There were no error logs on the failed node, and the two other nodes had
> logs filled with zookeeper connection errors.  We were unable to recover
> the cluster and had to re-init.

Check both the log4j logs and the stdout/stderr redirection files for 
the datanode process. Typically, if you get an OOME, log4j gets torn 
down before that exception can be printed to the normal log files. 
"Silent" failures seem indicative of lack of physical resources 
(over-subscribed the node) on the box or insufficient resources provided 
to the processes (-Xmx was too small for the process).

> I know a vague description of problems is difficult to respond to, and
> the next time we have an ingest failure, i will bring specifics forward.
>   But I’m writing to know if
> 1.  Ingest failures are a known fail point for accumulo, or if we are
> perhaps unlucky/mis-configured.

No -- something else is going on here.

> 2.  Are there any guidelines for capturing ingest failures / determining
> root causes when errors don’t show up in the logs

For any help request, be sure to gather Accumulo, Hadoop and ZooKeeper 
versions, OS and Java versions. Capturing log files and stdout/stderr 
files are important; beware that if you restart the Accumulo process on 
that node, it will overwrite the stdout/stderr files, so make sure to 
copy them out of the way.

> 3.  Are there any means of checkpointing a data ingest, so that if a
> failure were to occur at hour 23.5 we could roll back to hour 23 and
> continue.  Client code could checkpoint and restart at the last one, but
> if the underlying accumulo cluster can’t be recovered, that’s of no use.

You can do anything you want in your client ingest code :)

Assuming that you're using a BatchWriter, if you manually call flush() 
and it returns without Exception, you can assume that all data up to 
that point written with that BatchWriter instance is "ingested". This 
can easily extrapolated: if you're ingesting CSV files, ensure that a 
flush() happens every 1000lines and denote that somewhere that your 
ingest process can advance itself to the appropriate place in the CSV 
file and proceed from where it left off.

> thanks,
>
> kesten

Re: ingest problems

Posted by Keith Turner <ke...@deenlo.com>.
On Tue, Feb 11, 2014 at 11:10 AM, Kesten Broughton <kb...@21ct.com>wrote:

> Hi there,
>
> We have been experimenting with accumulo for about two months now.  Our
> biggest painpoint has been on ingest.
> Often we will have ingest process fail 2 or 3 times 3/4 of the way
> through an ingest and then on a final try it works, without any changes.
>
> Once the ingest works, the cluster is usually stable for querying for
> weeks or months only requiring the occasional start-all.sh if there is a
> problem.
>
> Sometimes our ingest can be 24 hours long, and we need a stronger ingest
> story to be able to commit to accumulo.
> Our cluster architecture has been:
> 3 hdfs datanodes overlaid with name node, secondary nn and accumulo master
> each collocated with a datanode, and a zookeeper server on each.
> We realize this is not optimal and are transitioning to separate hardware
> for zookeepers and name/secondary/accumulomaster nodes.
> However, the big concern is that sometimes a failed ingest will bork the
> whole cluster and we have to re-init accumulo with an accumulo init
> destroying all our data.
> We have experienced this on at least three different clusters of this
> description.
>
> The most recent attempt was on a 65GB dataset.   The cluster had been up
> for over 24 hours.  The ingest test takes 40 mins and about 5 mins in, one
> of the datanodes failed.
> There were no error logs on the failed node, and the two other nodes had
> logs filled with zookeeper connection errors.  We were unable to recover
> the cluster and had to re-init.
>
> I know a vague description of problems is difficult to respond to, and the
> next time we have an ingest failure, i will bring specifics forward.  But
> I'm writing to know if
> 1.  Ingest failures are a known fail point for accumulo, or if we are
> perhaps unlucky/mis-configured.
> 2.  Are there any guidelines for capturing ingest failures / determining
> root causes when errors don't show up in the logs
> 3.  Are there any means of checkpointing a data ingest, so that if a
> failure were to occur at hour 23.5 we could roll back to hour 23 and
> continue.  Client code could checkpoint and restart at the last one, but if
> the underlying accumulo cluster can't be recovered, that's of no use.
>

For check pointing, in 1.5 you can clone a table, offline the table, and
then export the table.   The offline clone will prevent Accumulo from
deleting any files for the snapshot.   The export will preserve the
metadata related to the snapshot ouside of Accumulo.  Later you can do all
this again and delete the previous clone.


>
> thanks,
>
> kesten
>

Re: ingest problems

Posted by Kesten Broughton <kb...@21ct.com>.
Jeremy,

Our ingest tests using BatchWriter into accumulo showed that none of cpu, network or disk io were maxed.
We were seeing a cpu load of 1 out of 4 on a 4 core vm for example.

I will answer there hardware question in my response to Sean Busbey

From: "Kepner, Jeremy - 0553 - MITLL" <ke...@ll.mit.edu>>
Date: Tuesday, February 11, 2014 at 10:17 AM
To: Kesten Broughton <kb...@21ct.com>>
Cc: "Kepner, Jeremy - 0553 - MITLL" <ke...@ll.mit.edu>>
Subject: Re: ingest problems

What are the ingest rates you are seeing?
Can you describe your hardware more precisely?

On Feb 11, 2014, at 11:10 AM, Kesten Broughton <kb...@21ct.com>> wrote:

Hi there,

We have been experimenting with accumulo for about two months now.  Our biggest painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way through an ingest and then on a final try it works, without any changes.

Once the ingest works, the cluster is usually stable for querying for weeks or months only requiring the occasional start-all.sh if there is a problem.

Sometimes our ingest can be 24 hours long, and we need a stronger ingest story to be able to commit to accumulo.
Our cluster architecture has been:
3 hdfs datanodes overlaid with name node, secondary nn and accumulo master each collocated with a datanode, and a zookeeper server on each.
We realize this is not optimal and are transitioning to separate hardware for zookeepers and name/secondary/accumulomaster nodes.
However, the big concern is that sometimes a failed ingest will bork the whole cluster and we have to re-init accumulo with an accumulo init destroying all our data.
We have experienced this on at least three different clusters of this description.

The most recent attempt was on a 65GB dataset.   The cluster had been up for over 24 hours.  The ingest test takes 40 mins and about 5 mins in, one of the datanodes failed.
There were no error logs on the failed node, and the two other nodes had logs filled with zookeeper connection errors.  We were unable to recover the cluster and had to re-init.

I know a vague description of problems is difficult to respond to, and the next time we have an ingest failure, i will bring specifics forward.  But I’m writing to know if
1.  Ingest failures are a known fail point for accumulo, or if we are perhaps unlucky/mis-configured.
2.  Are there any guidelines for capturing ingest failures / determining root causes when errors don’t show up in the logs
3.  Are there any means of checkpointing a data ingest, so that if a failure were to occur at hour 23.5 we could roll back to hour 23 and continue.  Client code could checkpoint and restart at the last one, but if the underlying accumulo cluster can’t be recovered, that’s of no use.

thanks,

kesten