You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Trevor Francis <tr...@tgrahamcapital.com> on 2012/04/19 19:04:19 UTC

High Log Storage

I have a web application that generates multiple log files in a log file directory. On a particularly chatty box, up to 2000 entries per second are written to those log files. We are looking for a solution to tail that directory and insert new entries into a cassandra db. 

The fields in the log file are pipe delimited, but we can delimit the data points using any delimiter. We would want to structure the data such that each data point would get its own column when its inserted into Cassandra. 

We setup Flume to handle this, but the cassandra sink isn't robust enough to handle even one chatty machine. We may have up to 200 machines.

Any suggestions on a tool that can reliably do this. Data not making it into the cassandra db will cause huge problems, so that is a factor to consider.

Regards,

Trevor Francis



Re: High Log Storage

Posted by bi...@dehora.net.
Try writing them through Kafka. It should that load.

Bill
Sent from my BlackBerry® wireless handheld

-----Original Message-----
From: Trevor Francis <tr...@tgrahamcapital.com>
Date: Thu, 19 Apr 2012 12:04:19 
To: <us...@cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Subject: High Log Storage

I have a web application that generates multiple log files in a log file directory. On a particularly chatty box, up to 2000 entries per second are written to those log files. We are looking for a solution to tail that directory and insert new entries into a cassandra db. 

The fields in the log file are pipe delimited, but we can delimit the data points using any delimiter. We would want to structure the data such that each data point would get its own column when its inserted into Cassandra. 

We setup Flume to handle this, but the cassandra sink isn't robust enough to handle even one chatty machine. We may have up to 200 machines.

Any suggestions on a tool that can reliably do this. Data not making it into the cassandra db will cause huge problems, so that is a factor to consider.

Regards,

Trevor Francis