sparkle tech thoughts: September 2012

Wednesday, September 26, 2012

How Distributed Logging Works in WSO2 Stratos.

Why we need distributed Logging ???????

Stratos is a distributed clustered setup where we have several applications such as ESB Servers,Application Servers, Identity Servers, Governance Servers, Data Services Sever etc deployed together to work with each other to serve as Platform as a Service. Each of these servers are deployed in a clustered environment, where there will be more than one node for a given server and depending on the need, there will be new nodes spawned dynamically inside this cluster. And all these servers are fronted through an Elastic load balancer and depending on the request the load balancer will send requests to a selected node in a round robin fashion.

What would you do when there is an error occurs in a deployment like above where there are 13 different types of servers running in production and each of these servers are clustered and load balanced across 50+ servers?. This would be a nightmare for the system administrators to log-in into each server and grepping for the logs to identify the exact caused of the error. This is why distributed application deployment's needs to keep a centralized application logs. These centralize logs should also be kept in a high scalable data storage in an ordered manner with easy access.So that the users (administrators,developers) can easily accesses logs, whenever something goes unexpected, with the least amount of filtering to pinpoint the exact cause of the issue.

When designing a logging system like above, there are several things you need to consider.

Capturing the right information inside the LogEvent – You have to make sure all the information you need in order to monitor your logs is aggregated in the LogEvent. For example in a cloud deployment setup you have to make sure not only the basic log details(logger,date,log level) are not enough to point a critical issue. You further needs tenant information (user/domain), Host information (to identify which node is sending what), Name of the server (from which server you are getting the log) etc. These information is very critical when it comes to analyzing and monitor logs in an efficient way.
Send logs to a centralized system in a nonblocking asynchronous manner so that monitoring will not affect the performance of the applications.
High availability and Scalability
Security – Stratos can be deployed and hosted in public clouds therefore, its important to make sure the logging system is high secured.
How to display system/application logs in an efficient way with filtering options along with log rotation.

Those are the 5 main aspects which were mainly concerned when designing the distributed logging architecture. Since Stratos support multitenancy we made sure that logs can be separated by tenants, services, and applications.

MT-Logging with WSO2 BAM 2.0

WSO2 BAM 2.0 provide a rich set of tool for aggregation, analyzing and presentation for large scale data sets and any monitoring scenario can be easily modeled according to the BAM architecture. We selected WSO2 BAM as the backbone of our logging architecture mainly because it provides high performance with non intrusiveness along with high scalability and security. Since those are the crucial factors essential for a distributed logging system WSO2 BAM became the idol candidate for MT-Logging architecture.

Publishing Logs to BAM

We implemented a Log4JAppender to send LogEvents to bam. There we used BAM Data agents get Log Data across to BAM. BAM data agents send data using thrift protocol which gives us high performance message through put as well as it is non blocking and asynchronous. When publishing Log events to BAM we make sure the Data Stream is created per tenant, per server, per date. When the data stream is initialized there will be a unique column family created per tenant, per server per date and the logs will be stored in that column family in a predefine keyspace in cassandra cluster.

The Data stream defines the set of information which needs to be stored for a particular LogEvent and can be modeled into a Data Model.

Data Model which is used for Log Event.

{'name':'log. tenantId. applicationName.date','version':'1.0.0', 'nickName':'Logs', 'description':'Logging Event',

'metaData':[{'name':'clientType','type':'STRING'} ],

'payloadData':[

{'name':'tenantID','type':'STRING'},

{'name':'serverName','type':'STRING'},

{'name':'appName','type':'STRING'},

{'name':'logTime','type':'LONG'},

{'name':'priority','type':'STRING'},

{'name':'message','type':'STRING'},

{'name':'logger','type':'STRING'},

{'name':'ip','type':'STRING'},

{'name':'instance','type':'STRING'},

{'name':'stacktrace','type':'STRING'}

] }

We extend org.apache.log4j.PatternLayout a in order to capture tenant information, server information and node information and wrap it with log4j LogEvent.

Log Rotation and Archiving

Once we send the log events to BAM the logs will be saved in a Cassandra cluster. WSO2 BAM provides a rich set of tools to create analytic and schedule task. Therefore, we used these hadoop task to rotate logs daily and archive them and store it in a secure environment. In order to do that we use a hive query which will run daily as a cron job. It will read Cassandra data store, retrieve all the column families per tenant per application and archive them in to gzip format.

The hive Query which is used to rotate logs daily

set logs_column_family = %s;

set file_path= %s;

drop table LogStats;

set mapred.output.compress=true;

set hive.exec.compress.output=true;

set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;

CREATE EXTERNAL TABLE IF NOT EXISTS LogStats (key STRING,

payload_tenantID STRING,payload_serverName STRING,

payload_appName STRING,payload_message STRING,

payload_stacktrace STRING,

payload_logger STRING,

payload_priority STRING,payload_logTime BIGINT)

STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'

WITH SERDEPROPERTIES ( "cassandra.host" = %s,

"cassandra.port" = %s,"cassandra.ks.name" = %s,

"cassandra.ks.username" = %s,"cassandra.ks.password" = %s,

"cassandra.cf.name" = ${hiveconf:logs_column_family},

"cassandra.columns.mapping" =

":key,payload_tenantID,

payload_serverName,payload_appName,payload_message,

payload_stacktrace,payload_logger,payload_priority,

payload_logTime" );

INSERT OVERWRITE DIRECTORY 'file:///${hiveconf:file_path}'

select

concat('TID[',payload_tenantID, ']\t',

'Server[',payload_serverName,']\t',

'Application[',payload_appName,']\t',

'Message[',payload_message,']\t',

'Stacktrace ',payload_stacktrace,'\t',

'Logger{',payload_logger,'}\t',

'Priority[',payload_priority,']\t'),

concat('LogTime[',

(from_unixtime(cast(payload_logTime/1000 as BIGINT),'yyyy-MM-dd HH:mm:ss.SSS' )),']\n') as LogTime from LogStats

ORDER BY LogTime

Once we archived the logs we will send these archived logs to HDFS file system. The archived logs can be further analysed using map-reduce jobs, for long term data analytics

Advantages of sending Logs to WSO2 BAM

Asynchronous and None Blocking Data publishing
Receives and Stores Log Events Cassandra Cluster which is high scalable and a big Data Repository
Rich tools set for analytics
Can be shared with CEP for real time Log Event analysis.
Can provide Logging tool boxes and dashboards for system administrators using WSO2 BAM
High performance and non-intrusiveness
Big data analysis

Daily log information analytic - Analyse cassandra data storage
Long term log information - Analyse HDFS file system using map-reduce

Monitoring and Analyzing System Logs

Using the Log Viewer

Both application and system logs can be displayed using the management console of a given product. Simply log-in to Management console and under monitor there are two links 1. System logs which has system logs of the running server 2) Application Logs which has application level logs (this can be services/web applications) for a selected application. This makes it easy for users to filter logs by the application they develop monitor logs up to application level.
Dashboards and Reports

System administrators can log-in to WSO2 BAM and create their own dashboards and reports, so the can monitor their logs according to their Key performance Indicators. For example if they want to monitor number of fatal errors occur per given month for a given node.
SMS Alerts and Emails
Not just dashboards and Reports ... Combining WSO2 BAM with WSO2 CEP you can get real time alerts like trigger emails, SMS so that System administrators can instantly get to know when your system is going through an unexpected behavior.

View Logs Using the Log Viewer - Current Log

View Logs Using the Log Viewer - Archived Logs

All these rich set of monitoring capabilities can be in built into your deployment using Stratos Distributed Logging system. Where you don’t have to worry about always going to the system administrators for logs whenever something goes wrong in your application :).

Tuesday, September 11, 2012

Sneak Peek at WSO2 BAM 2.0 & How to Install BAM Data Agents in WSO2 Products

In a fast growing company, enterprise data plays a major role when it comes for decision making and other top level business activities. When I say enterprise data, it can be anything which is an asset to your company, for example

It can be the data coming into your system (Employee Information, Product Information Sales data etc)
It can be mediation data (who is accessing your services/application, when and how)
Request,response and faulty count for your services.

These row data can be crucial and also it is very important to analyze and convert these data into information in order provide business intelligence for decision makers.

WSO2 BAM is mostly used in SOA environments because when it comes to SOA environment monitoring your data means monitor your services. Most of your business functionalites are exposed as services, for example if you want to insert set of records to your data base you will expose set of data services to do it or if you want to mediate some services you need to create proxy service. You can monitor these kind of data using WSO2 Buisness activity monitor Server.

If you look at BAM highlevel architecture BAM can be modelled according to three major components

Aggregation – BAM Data Agents which publish data in to BAM Data storage
Analytics - Analyzers which analyzed Data in BAM storage
Presentation – Gadgets which displays key performance indicators of analyzed information

You can monitor any business modeling scenario according to these components and BAM architecture is modelled around these three components.

Aggregation – basically this means getting your data into BAM (capturing/collecting data into BAM). BAM data are sent using events, you can capture important set of data and make it as an event stream. I will give more details on events stream in my next posts :) but for the moment all you need to know is capture many data as possible when you are sending data to BAM more data means more business intelligence.

Analytics – After you capture your data you need to make your data meaningful (basically converting data into information) by analyzing your storage data. In order to do that you need to do some data operation (aggregation, merging, sorting ordering) and ultimately build some key performance indicators which will be needed for business intelligence. In BAM you will be able to write your own analyzers (custom analyzers) using hive/hadoop and also these queries can be saved and scheduled accordingly. We will dig deep into how to write hive queries using bam and how to analyze data and create KPIs using data in future.

Presentation – Once you analyze your data, you might need to visualize your KIPs (using bar charts tables) and build your own dashboards in order to visualize your information to decision makers. And not just visualizing you might need to send these analyzed data to needed parties using reporting alerts daily to the topper management etc. WSO2 BAM will provide rich set of tools to do those task without much coding effort. It provides a gadget editor tool which give you a drag and drop kind of way to create your gadgets from your analyzed data or you can connected to reporting o do some complex event processing and send emails etc just by using set of configurations.

Referance http://mackiemathew.files.wordpress.com/2012/01/bamarchitecture.png

Now that we discussed the main architecture behind BAM and also bit about the functionalities of BAM (you can go through BAM samples and install bam samples in order to get in depth knowledge on BAM)

Lets see how we can use BAM to monitor your services.

For this I am going to use WSO2 ESB and we will see how we can monitor service statistics using BAM.

First we need to install BAM data agents inside ESB.

How to install BAM data agents in ESB?

Download P2 Profile which will contain BAM Data agent features. Download WSO2 ESB Latest packs. Go to ESB_HOME/bin and start up the server.

Go to Configure -> Features and Click on Add Repository.

Click on add Repository, You need to give a meaningful name for the repository name, I am going to give it as My_REPO and give the path to your P2 Profile and click on add.

Once you add the repository you will be redirected to a page which contains available features. You need to install BAM data agent features therefore, type BAM on Filter by feature name and untick group feature by category and click on find.

Tick on following features BAM Mediation Data Agent Aggregate, BAM Mediator Aggregated,BAM Service Data Agent Aggregate and click install.

Read and accept License Agreement and click next. Once the installation is finished restart the server.

Once you have installed the BAM data agents you will see newly installed features under Configure.
1. Service Data Publishing, 2. Mediation Data Publishing 3. BAM Server Profiling.

How to Configure Service Data Agent ?

You can click on Service Data Publishing to send your Service Data to WSO2 BAM. In order to do that you need to Download and Start up BAM (if you are running ESB on the same machine you need to change the offset{BAM_HOME->repository->conf->carbon.xml set offset to 1}.

Go to ${WSO2ESB_HOME}/repository/conf/etc and open bam.xml file. Please make sure that you enable the ServiceDataPublishing.
The bam.xml should have the following configuration:

   <BamConfig>
   <ServiceDataPublishing>enable</ServiceDataPublishing>
   </BamConfig>

NOTE: By default the this is disabled, and the BAM publishing won't occur even though you proceed the below steps and changed in the mentioned UI. Therefore please enable the ServiceDataPublishing to use with WSO2 BAM.
Then in the ESB side go to Service Data Publishing, Give the BAM Server URL which will be the thrift ports started in BAM server (tcp://localhost:7611), give BAM credentials (default is admin,admin) and click on update.

Now if you invoke some services from ESB side the service data will get published to bam. Now to check to confirm whether your data is in BAM you can go to BAM Management Console Check it by connecting to Cassandra Explorer.

In this post I have only explained an intro to BAM 2.0 and how to send service data to bam From my next session I will explain how to send your enterprise data to BAM and how you can do analytics and presentation using WSO2 BAM tool kits.