Configuring Cassandra

Configuring Cassandra

There are four configuration files for cassandra:

FileDescription
cassandra.yamlCassandra options
jvm.optionsJVM static settings
cassandra-env.shbash script file to pass additional options to the JVM
cassandra-rackdc.propertiesTopology for most snitches

NOTE: cassandra-topology.properties is for the older PropertyFileSnitch. It rarely used.

File Locations

Packaged Installations

For packaged installations, the configuration files are located in either /etc/cassandra or /etc/cassandra/conf

Stand-alone Installations

For stand-alone installs the configuration files are located in install_directory/conf

Configuration File

NOTE: Each node in the cluster must be configured

cassandra.yaml

num_tokens
Set this value to 16

cluster_name
(Default: Test Cluster) The name of the cluster. This setting prevents nodes in one logical cluster from joining another. All nodes in a cluster must have the same cluster_name.

listen_address
(Default: localhost) The IP address or hostname that Cassandra binds to for connecting this node to other nodes. Set this parameter or listen_interface, not both. Correct settings for various use cases:

  • Single-node installations: do one of the following: Leave set to the default, localhost.
  • Node in a multi-node installations: set this property to the node's IP address or hostname, or set listen_interface.
  • Node in a multi-network or multi-Datacenter installation, within an EC2 environment that supports automatic switching between public and private interfaces: set listen_address to the node's IP address or hostname.
  • Node with two physical network interfaces in a multi-datacenter installation or a Cassandra cluster deployed across multiple Amazon EC2 regions using the Ec2MultiRegionSnitch:
    • Set listen_address to this node's private IP or hostname, or set listen_interface (for communication within the local datacenter).
    • Set broadcast_address to the second IP or hostname (for communication between datacenters).
    • Set listen_on_broadcast_address to true.
    • If this node is a seed node, add the node's public IP address or hostname to the seeds list.
    • Open the storage_port or ssl_storage_port on the public IP firewall.

🚧

WARNING:

Never set listen_address to 0.0.0.0. It is always wrong.

Do not set values for both listen_address and listen_interface on the same node.

seed_provider

The addresses of hosts designated as contact points in the cluster. A joining node contacts one of the nodes in the -seeds list to learn the topology of the ring.

- seeds: “127.0.0.1”   # this is the default.

If you have a single node cluster, use the default value for the seed.

If you have a 3 node cluster, designate at least 2 nodes in the seed list. For example, assume you have a 3 node cluster where the nodes have the ip address of 10.255.1.101, 10.255.1.102, and 10.255.1.103

  • seeds: “10.255.1.101, 10.255.1.103”

The seed list is the same for all nodes. Do not include all 3 nodes as seed nodes.

Directories

If you have changed any of the default directories during installation, set these properties to the new locations. Make sure you have root access.

commitlog_directory
The directory where the commit log is stored. Default locations:

  • Package installations: /var/lib/cassandra/commitlog
  • Tarball installations: install_location/data/commitlog

For optimal write performance, place the commit log be on a separate disk partition, or (ideally) a separate physical device from the data file directories. Because the commit log is append only, an HDD is acceptable for this purpose.

data_file_directories
The directory location where table data is stored (in SSTables). Cassandra distributes data evenly across the location, subject to the granularity of the configured compaction strategy. Default locations:

  • Package installations: /var/lib/cassandra/data
  • Tarball installations: install_location/data/data
    As a production best practice, use RAID 0 and SSDs.

saved_caches_directory
The location where Cassandra will save key and row caches. The default locations are:

  • Package installations: /var/lib/cassandra/saved_caches
  • Tarball installations: install_location/data/saved_caches

rpc_address
Set this address to the ip or hostname of the node.

batch_size_fail_threshold_in_kb
Set this value to 500.

cassandra-rackdc.properties

Several snitch options use the cassandra-rackdc.properties configuration file to determine which datacenters and racks cluster nodes belong to. Information about the network topology allows requests to be routed efficiently and to distribute replicas evenly. The following snitches can be configured here:

  • GossipingPropertyFileSnitch
  • AWS EC2 single-region snitch
  • AWS EC2 multi-region snitch

GossipingPropertyFileSnitch

The file has two options: dc and rack.

dc is the name you have given to the Cassandra datacenter. The default name is dc1. It is common to give the Cassandra datacenter the same name as the physical datacenter in which it resides. The dc name cannot contain space or dashes.

rack is the logical rack. The default is rack1.

On a 3 node cluster in a single datacenter, the dc name is the same but the rack name should be different for each node. For example:

Host IPdcrack
10.255.1.101my_datacenterr1
10.255.1.102my_datacenterr2
10.255.1.103my_datacenterr3

jvm.options file

This file controls options for the Java Virtual Machine. Most important are setting the heap size and the Garbage Collector settings.

Heap Size

If you have installed Cassandra on a server with 16GB or more of RAM, find these two lines

#-Xms4G
#-Xmx4G
And replace with-Xms8G
-Xmx8G

If the server have less that 16G of memory, there is not need to change anything. Cassandra will use the default memory assigned to the JVM heap. This automatically calculated using the following formula: max( min (½ RAM, 1024MB), min(¼ RAM, 8192MB)

Unless instructed to do otherwise, use the default Garbage collectors -- CMS (Concurrent Mark and Sweep) and ParNew (Parallel New) -- as well as the default GC settings.

cassandra-env.sh file

If you had to create a special tmp directory, add this two lines to the end of the cassandra-env.sh file:

JVM_OPTS="$JVM_OPTS -Djna.tmpdir=/kdp_cassandra/tmp"
JVM_OPTS="$JVM_OPTS -Djava.io.tmpdir=/kdp_cassandra/tmp".

If Cassandra logs are being stored under /kdp_cassandra, add the following:

JVM_OPTS="$JVM_OPTS -Dcassandra.logdir=/kdp_cassandra/log”