You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Colin <co...@cloudeventprocessing.com> on 2010/12/12 15:57:25 UTC
Unsubscribe
Unsubscribe
Please
Sent from my iPad
On Dec 12, 2010, at 1:26 AM, Dave Martin <mo...@googlemail.com> wrote:
> Hi there,
>
> I see the following:
>
> 1) Add 8,000,000 columns to a single row. Each column name is a UUID.
> 2) Use cassandra-cli to run count keyspace.cf['myGUID']
>
> The following is reported in the logs:
>
> ERROR [DroppedMessagesLogger] 2010-12-12 18:17:36,046 CassandraDaemon.java (line 87) Uncaught exception in thread Thread[DroppedMessagesLogger,5,main]
> java.lang.OutOfMemoryError: Java heap space
> ERROR [pool-1-thread-2] 2010-12-12 18:17:36,046 Cassandra.java (line 1407) Internal error processing get_count
> java.lang.OutOfMemoryError: Java heap space
>
> and Cassandra falls over. I see the same behaviour with 0.6.6.
>
> Increasing the memory allocation with the -Xmx & -Xms args to 4GB allows the count to return in this particular example (i.e. no OutOfMemory is thrown).
>
> Here's the scala code that was ran to load the column, which uses the AKKA persistence API:
>
> object ColumnTest {
> def main(args : Array[String]) : Unit = {
> println("Super column test starting")
> val hosts = Array{"localhost"}
> val sessions = new CassandraSessionPool("occurrence",StackPool(SocketProvider("localhost", 9160)),Protocol.Binary,ConsistencyLevel.ONE)
> val session = sessions.newSession
> loadRow("myGUID", 8000000, session)
> session.close
> }
>
> def loadRow(key:String, noOfColumns:Int, session:CassandraSession){
> print("loading: "+key+", with columns: "+noOfColumns)
> val start = System.currentTimeMillis
> val rawPath = new ColumnPath("dr")
> for(i <- 0 until noOfColumns){
> val recordUuid = UUID.randomUUID.toString
> session ++| (key, rawPath.setColumn(recordUuid.getBytes), "1".getBytes, System.currentTimeMillis)
> session.flush
> }
> val finish = System.currentTimeMillis
> print(", Time taken (secs) :" +((finish-start)/1000) + " seconds.\n")
> }
> }
>
> Heres the configuration used:
>
> # Arguments to pass to the JVM
> JVM_OPTS=" \
> -ea \
> -Xms1G \
> -Xmx2G \
> -XX:+UseParNewGC \
> -XX:+UseConcMarkSweepGC \
> -XX:+CMSParallelRemarkEnabled \
> -XX:SurvivorRatio=8 \
> -XX:MaxTenuringThreshold=1 \
> -XX:CMSInitiatingOccupancyFraction=75 \
> -XX:+UseCMSInitiatingOccupancyOnly \
> -XX:+HeapDumpOnOutOfMemoryError \
> -Dcom.sun.management.jmxremote.port=8080 \
> -Dcom.sun.management.jmxremote.ssl=false \
> -Dcom.sun.management.jmxremote.authenticate=false"
>
> Admittedly the resource allocation is small, but I wondered if there should be some configuration guidelines (e.g. memory allocation vs number of columns supported).
>
> Im running this on my MBP with a single node and java as thus:
>
> $ java -version
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
>
> Heres the CF definition:
>
> <Keyspace Name="occurrence">
> <ColumnFamily Name="dr"
> CompareWith="UTF8Type"
> Comment="The column family for dataset tracking"/>
> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
> <ReplicationFactor>1</ReplicationFactor>
> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
> </Keyspace>
>
> Apologies in advance if this is a known issue or a known limitation of 0.6.x.
> I had wondered if I was hitting the 2GB row limit for 0.6.x releases, but 8mill columns = 300MB approx in this particular case.
> I guess it may also be a result of the limitations with thrift (i.e. no streaming capabilities).
>
> Any thoughts appreciated,
>
> Dave
>
>
>
>
>
>
>
>
Re: Unsubscribe
Posted by Peter Schuller <pe...@infidyne.com>.
> Unsubscribe
http://wiki.apache.org/cassandra/FAQ#unsubscribe
--
/ Peter Schuller