Facing the challenges of multi-cloud SLAs

February 20, 2020

Before cloud services emerged, all aspects of the IT infrastructure were managed by the IT departments, who were in charge of everything starting from hardware devices, computer software, performance and security of the network, to maintaining databases and running synchronizing processes on a company’s network. The framework for carrying out all these multiple tasks in-house was outlined by provisions set forth in Service Level Agreement (SLA) between IT professionals and their users.

The main issue with migrating applications to the public cloud is keeping the same scope of services as was covered by the SLA, and not having means to exercise the same level of control over the resources.

A traditional SLA is a specific agreement between service vendors and service users, which incorporates accurate and well-defined performance metrics for the entire scope of services. Usually, this kind of agreement has a provision on applying penalties for breaking or not following through with the requirements under the agreement.

By contrast, the basis of SLA between cloud vendors and cloud users is the availability of an individual infrastructure. In such agreements, all metrics are defined separately: separate availability figures are specified for networking, storage, virtual machines and more intricate services such as managing databases. Usually, such agreements stipulate a non-negotiable charge for availability failures, which is subtracted from cloud usage credit. Сloud services users, however, are more interested in a complete, comprehensive service agreement rather than having to rely on different infrastructure availability figures.

The issue that complicates things for clients of cloud computing services is the lack of a clear, well-defined document that could lay down framework and guidance for cloud vendors. Each single cloud vendor makes up his/her own SLA. This makes it difficult for customers to understand exactly what services they could expect from a cloud vendor. Furthermore, a lack of standardized SLA prevents the migration of data among different cloud vendors and forces a customer to remain using service of a single cloud vendor.

According to cloud engineering experts, one of the solutions to resolving uncertainties of a multi-cloud environment is through the creation of a standardized platform for monitoring resources and infrastructure from various cloud providers. This platform should incorporate and interpret SLA performance metrics from different cloud providers into one, universal language. The metrics should follow the results of different providers’ service offers. The platform should also have the ability to facilitate comprehensive monitoring over the infrastructure consisting of different vendors’ components, identify and spot deficiencies and disorders, test for SLA breaches, and alert providers for any communication failures.

Crosschecking multiple availability metrics across various cloud providers can be used in a standardized cloud management platform to elaborate decisions that are compatible and consistent with multi-cloud architecture.

 

 

Implementing a standardized SLA is also important because it will secure customers with the mechanism to hold a vendor accountable.  Due to a rapid evolution of cloud computing, an SLA prepared today can become obsolete tomorrow. Facing intense competition on the cloud market and pressure to lower prices, quality of services offered by some vendors may go down. For  instance,  a prominent cloud  vendor  can offer  a  99.99% service  up-time  along  with 5  minutes  of downtime  per  annum  with a 10% discount on service charge during  the month when such downtime occurs. In other words, in this instance a cloud vendor calculates a certain  discount into the agreement,  already knowing ahead of time that  their infrastructure  will  not  be  able  to  meet  the 99.99% service up-time. If a customer really requires the infallible 99.99% service up time, then the discount will not be of any use to him, as it will not remedy the losses of revenue incurred during those 5 minutes of downtime.