In my previous article I have explain how to configure multi-tenant logging in WSO2 stratos, in this post I am going to explain how we can create an external carbonized cluster and point it to BAM so that our logs will be stored in an external cassandra cluster. If you are having a production deployment and if you want to store big data for example daily logs of all servers, then you need to have an external cassandra cluster for high available and for high perfomance.
So to have a carbonized cassandra cluster.. we are going to use WSO2 storage server. WSO2 Storage server provides rich set of tools to create and manipulate storages such as relational data storages, cassandra and hdfs file systems.
To start you need to download WSO2 Storage Server.
Before we begin you need to have a basic understanding on cassandra in order to get a clear understanding on what we are trying to do. Unlike most master-slave deployments .. cassandra does not use master/slave architecture, it uses a peer-to-peer implementation which avoids the pitfalls, latency problems, single point of failure issues and perfomance hits assosiated with master/slave setups. Which makes cassandra more high available and efficient.
So to have a carbonized cassandra cluster.. we are going to use WSO2 storage server. WSO2 Storage server provides rich set of tools to create and manipulate storages such as relational data storages, cassandra and hdfs file systems.
To start you need to download WSO2 Storage Server.
Before we begin you need to have a basic understanding on cassandra in order to get a clear understanding on what we are trying to do. Unlike most master-slave deployments .. cassandra does not use master/slave architecture, it uses a peer-to-peer implementation which avoids the pitfalls, latency problems, single point of failure issues and perfomance hits assosiated with master/slave setups. Which makes cassandra more high available and efficient.
So basically what happens when we write to cassandra is client write to any node in the cassandra cluster controller node replicate to nodes and zones and nodes return acknowledgement to coordinator. Then coordinator return ack to client and data is written to internal commit log disk. If a node goes offline hinted handoff completes the write when the nodes come back up/
So lets begin to configure carbonized cassandra ..
Storage Server Management node deployment steps
cassandra-component.xml -
cassandra-component.xml point the backend Cassandra cluster.
<Cassandra>
<Cluster>
<Name>SSCluster</Name>
<DefaultPort>9160</DefaultPort>
<Nodes>node0:9160,node1:9160,node2:9160,node3:9160</Nodes>
<AutoDiscovery disable="true" delay="1000"/>
</Cluster>
</Cassandra>
cassandra-auth.xml
User has to create a system user with admin privilege to communicate with the cassandra backend and
configure the cassandra-auth.xml with that user and the remote shared key publisher service URL.
<Cassandra>
<EPR>https://cassandra.cluster.backend.ip:9443/services/CassandraSharedKeyPublisher</EPR>
<User>admin</User>
<Password>admin</Password>
</Cassandra>
For the EPR you can give the IP of the first backend server node, you also need to consider the offset when providing the port as it will not automatically change it.
Since we are doing many hdfs/hadoop functions (to complete the bam story) using relational data storage we also need to configure rss-config.xml
rss-config.xml
There you need to give the datasource properties (jdbc url of the mysql server and user credentials) accordingly.
<dataSourceProps>
<property name="URL">jdbc:mysql://mysql.stratos-local.wso2.com:3306/rss_db</property>
<property name="user">root</property>
<property name="password">root</property>
</dataSourceProps>
Storage Server Cassandra cluster deployment steps
configurations.
Node Configuration
cassandra.yaml
Start the node with default seed configuration
- seeds: "127.0.0.1"
Edit the cluster listening address
listen_address:cassandra.node.ip
Edit the thrift listening address
rpc_address::cassandra.node.ip
Both these ip addresses are the ip address of the machine it self which cassandra recedes on.
If you are having another node as the seed node, then you can seed node to boostrap
ie
rpc_address:ip.of.cassandra.api
Configuring BAM to talk to External Cassandra
Once we configure the external carbonized cassandra we need to tell the BAM we are no longer using the internal cassandra inbuilt in BAM but to use the external cassandra. In order to do that you need to start the BAM server with the following system property.
-Ddisable.cassandra.server.startup=true
And also you need to configure the following configuration files.
cassandracomponent.xml
<Cassandra>
<Cluster>
<Name>ClusterOne</Name> <!--This is not important-->
<Nodes>node0.cassandra.com:9160,node1.cassandra.com:9160,node2.cassandra.com:9160</Nodes> <!--Host and port of the first node of the backend cluster. Refer the diagram to identify the first node. Need to consider offset of the backend node when providing the port here-->
<AutoDiscovery disable="false" delay="1000"/>
</Cluster>
</Cassandra>
<Cassandra>
<Cluster>
<Name>ClusterOne</Name> <!--This is not important-->
<Nodes>node0.cassandra.com:9160,node1.cassandra.com:9160,node2.cassandra.com:9160</Nodes> <!--Host and port of the first node of the backend cluster. Refer the diagram to identify the first node. Need to consider offset of the backend node when providing the port here-->
<AutoDiscovery disable="false" delay="1000"/>
</Cluster>
</Cassandra>
cassandra-auth.xml
<Cassandra> <EPR>https://cassandra.cluster.backend.ip:9443/services/CassandraSharedKeyPublisher</EPR>
<User>admin</User>
<Password>admin</Password>
</Cassandra>
rss-config.xml
</dataSourceProps>
<property name="URL">jdbc:mysql://mysql.stratos-local.wso2.com:3306/rss_db</property>
<property name="user">root</property>
<property name="password">root</property>
</dataSourceProps>
Also if you are using BAM analytics you need to configure the hive-site.xml as well
hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://mysql.server.url/hive_db</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hdfs.url:hdfs_port</value> <!--normally the port is 9000 -->
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs.url:hdfs_job_tracker_port</value> <!--normally the port is 9001 -->
</property>