Thing which seemed very Thingish inside you is quite different when it gets out into the open and has other people looking at it

Thursday, March 21, 2013

How to setup Cassandra Cluster using WSO2 Storage Server

In my previous article I have explain how to configure multi-tenant logging in WSO2 stratos, in this post I am going to explain how we can create an external carbonized cluster and point it to BAM so that our logs will be stored in an external cassandra cluster. If you are having a production deployment and if you want to store big data for example daily logs of all servers, then you need to have an external cassandra cluster for high available and for high perfomance.

So to have a carbonized cassandra cluster.. we are going to use WSO2 storage server. WSO2 Storage server provides rich set of tools to create and manipulate storages such as relational data storages, cassandra and hdfs file systems.

To start you need to download WSO2 Storage Server.

Before we begin you need to have a basic understanding on cassandra in order to get a clear understanding on what we are trying to do. Unlike most master-slave deployments .. cassandra does  not use master/slave architecture, it uses a peer-to-peer implementation which avoids the pitfalls, latency problems, single point of failure issues and perfomance hits assosiated with master/slave setups. Which makes cassandra more high available and efficient.


So basically what happens when we write to cassandra is client write to any node in the cassandra cluster controller node replicate to  nodes and zones and nodes return acknowledgement to coordinator. Then coordinator return ack to client and data is written to internal commit log disk. If a node goes offline hinted handoff completes the write when the nodes come back up/

So lets begin to configure carbonized cassandra ..

Storage Server Management node deployment steps


cassandra-component.xml - 

cassandra-component.xml point the backend Cassandra cluster. 

<Cassandra>   
  <Cluster>
        <Name>SSCluster</Name>
        <DefaultPort>9160</DefaultPort>
        <Nodes>node0:9160,node1:9160,node2:9160,node3:9160</Nodes>
        <AutoDiscovery disable="true" delay="1000"/>
    </Cluster>
</Cassandra>

cassandra-auth.xml  

User has to create a system user with admin privilege to communicate with the cassandra backend and
configure the cassandra-auth.xml with that user and the remote shared key publisher service URL.

<Cassandra>  
<EPR>https://cassandra.cluster.backend.ip:9443/services/CassandraSharedKeyPublisher</EPR>
   <User>admin</User>
   <Password>admin</Password>
</Cassandra>

For the EPR you can give the IP of the first backend server node, you also need to consider the offset when providing the port as it will not automatically change it.

Since we are doing many hdfs/hadoop functions (to complete the bam story) using relational data storage we also need to configure rss-config.xml

rss-config.xml


There you need to give the datasource properties (jdbc url of the mysql server and user credentials) accordingly.


<dataSourceProps>          
 <property name="URL">jdbc:mysql://mysql.stratos-local.wso2.com:3306/rss_db</property>
            <property name="user">root</property>
            <property name="password">root</property>
         </dataSourceProps>

Storage Server Cassandra cluster deployment steps

All the nodes in the Storage Server Cassandra cluster should configure to use common carbon user base as any other carbon server. User has to update the user-mgt.xml and registry.xml with correct
configurations.

Node Configuration


cassandra.yaml

Start the node with default seed configuration

- seeds: "127.0.0.1"

Edit the cluster listening address 

listen_address:cassandra.node.ip

Edit the thrift listening address 

rpc_address::cassandra.node.ip

Both these ip addresses are the ip address of the machine it self which cassandra recedes on.

If you are  having another node as the seed node, then you can seed node to boostrap

ie 

rpc_address:ip.of.cassandra.api

Configuring BAM to talk to External Cassandra


Once we configure the external carbonized cassandra we need to tell the BAM we are no longer using the internal cassandra inbuilt in BAM but to use the external cassandra. In order to do that you need to start the BAM server with the  following system property.


-Ddisable.cassandra.server.startup=true

And also you need to configure the following configuration files.

cassandra­component.xml


 <Cassandra>
    <Cluster>
       <Name>ClusterOne</Name> <!--This is not important-->
<Nodes>node0.cassandra.com:9160,node1.cassandra.com:9160,node2.cassandra.com:9160</Nodes> <!--Host and port of the first node of the backend cluster. Refer the diagram to identify the first node. Need to consider offset of the backend node when providing the port here-->
       <AutoDiscovery disable="false" delay="1000"/>
    </Cluster>
</Cassandra>

cassandra-auth.xml  

 <Cassandra> <EPR>https://cassandra.cluster.backend.ip:9443/services/CassandraSharedKeyPublisher</EPR>
   <User>admin</User>
   <Password>admin</Password>
</Cassandra>


rss-config.xml

</dataSourceProps>     
  <property name="URL">jdbc:mysql://mysql.stratos-local.wso2.com:3306/rss_db</property>
            <property name="user">root</property>
            <property name="password">root</property>
         </dataSourceProps>


Also if you are using BAM analytics you need to configure the hive-site.xml as well

hive-site.xml

<property> 
<name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://mysql.server.url/hive_db</value>
 <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
    <name>fs.default.name</name>
    <value>hdfs://hdfs.url:hdfs_port</value> <!--normally the port is 9000 -->
 </property>

 <property>
    <name>mapred.job.tracker</name>
    <value>hdfs.url:hdfs_job_tracker_port</value> <!--normally the port is 9001 -->
 </property>

2 comments:

  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Cassandra Admin , kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Sangita Mohanty
    MaxMunus
    E-mail: sangita@maxmunus.com
    Skype id: training_maxmunus
    Ph:(0) 9738075708 / 080 - 41103383
    http://www.maxmunus.com/

    ReplyDelete
  2. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Cassandra Admin , kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Sangita Mohanty
    MaxMunus
    E-mail: sangita@maxmunus.com
    Skype id: training_maxmunus
    Ph:(0) 9738075708 / 080 - 41103383
    http://www.maxmunus.com/

    ReplyDelete