In this tutorial, we will demonstrate how to use Drone to create an automated chaos gauntlet for our services and applications. This will be done using Drone to inject a controlled amount of failure with Gremlin while building new services. We’ll then also share how to use Gremlin to automatically run a Chaos Guantlet when deploying new services to Staging and Production environments.

Gremlin can be easily integrated with Drone using the Gremlin API.

Image for post
Image for post

Prerequisites

Before you begin this tutorial, you’ll need the following:

Step 1 —Choose a repository

In this step, you’ll choose a repository. For example, I am choosing to build the Drone_Memcache…


This collection of short videos demonstrate how to run the following attacks in 60 seconds. This is a quick and simple way to share the why, how, and what of these various Chaos Engineering attacks.

Turn your sound on 🔊

  • Shutdown (scheduled / automated) — aka Chaos Monkey
  • CPU
  • Memory
  • Disk
  • IO
  • more coming soon….

Questions? Join gremlin.com/slack and ask in the #questions channel.

Chaos Engineering in 60 Seconds — How to run a Scheduled Shutdown Attack On A Host with Gremlin ( aka Chaos Monkey 🐒 )

Chaos Engineering in 60 Seconds — How to run a CPU Attack On A Host with Gremlin

Chaos Engineering in 60 Seconds — How to run a Memory Attack On A Host with Gremlin

Chaos Engineering in 60 Seconds — How to run a Disk Attack On A Host with Gremlin

Chaos Engineering in 60 Seconds — How to run an I/O Attack On A Host with Gremlin


In this tutorial, we’ll be using Gremlin to run a blackhole attack that blocks an external address. In this example, we’ll block access to example.com.

First, we’ll launch our container, a simple ubuntu container:

sudo docker run -l service=curl --name curl -d nginx

Obtain the container id, in this example it is 30d570653c9f

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
30d570653c9f nginx “/docker-entrypoint.…” 22 seconds ago Up 21 seconds 80/tcp curl

Now we’ll hop in the container to run curl and confirm the blackhole is working as expected:

docker exec -it 30d570653c9f /bin/bash

Curl to get example.com

curl example.com …

In this tutorial, we will demonstrate how to use Jenkins to create an automated chaos gauntlet. This will be done using Jenkins Pipelines and Stages to inject a controlled amount of failure with Gremlin. We then add a final stage that allows you to optionally halt the attack from the pipeline, rather than having to wait for the full duration of the attack.

Image for post
Image for post
Gremlin can be easily integrated into your Jenkins pipelines using the Gremlin API

If you’d like to see a video of this in action before you create it yourself, you can view it below:

Prerequisites

Before you begin this tutorial, you’ll need the following:


Image for post
Image for post

Overview

Understanding the relative reliability of your services will enable you to prioritize your reliability efforts. Determining your critical path, understanding all services along this path, and then calculating their reliability scores will enable you to identify the top services you need to focus on improving immediately.

Which Services Should I Score First?

I recommend that you first roll this out as a pilot with critical services. If you don’t yet know what your list of critical services are I recommend you spend a day in a workshop-style sync mapping out your critical path and all the services along the critical path. …


Image for post
Image for post

Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. Cockroach DB is an elastic, indestructible SQL database for developers building modern applications

This tutorial will teach you how to do Chaos Engineering on Cockroach DB using Gremlin.

This tutorial shows:

  • How to install Cockroach DB
  • How to Install Gremlin
  • How to practice Chaos Engineering on Cockroach DB — specific use cases and examples

Chaos Engineering Hypothesis

For the purposes of this tutorial, we will run Chaos Engineering experiments on Cockroach DB. We will focus on a specific set of use cases that we have crafted into Gremlin Scenarios including understanding clock skew constraints and understanding what happens when our primaries and replicas fail. …


Image for post
Image for post

This tutorial shares how you can utilize the Gremlin Time Travel attack to change clock time. This attack is cloud-agnostic and will work across AWS, GCP, Azure, DigitalOcean, and more.

Here are a few reasons to use the Time Travel attack:

  • Ensure your systems can effectively handle certificate expiration
  • Prepare for unknown-unknown incidents caused by clock skew
  • Prepare for unexpected downtime

Prerequisites

  • A Gremlin account (sign up here)
  • Your Gremlin daemon credentials
  • A kubernetes cluster

Time Travel a Kubernetes node using Gremlin

Kubernetes architecture is commonly 1 primary and 2 or more nodes which are replicated from the primary. When the primary dies the nodes are ready to replace it. …


Image for post
Image for post

Gremlin is a simple, safe, and secure service for performing Chaos Engineering experiments through a SaaS-based platform. Amazon Simple Notification Service (SNS) provides fully managed pub/sub messaging, SMS, email, and mobile push notifications. Datadog is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

This tutorial shows:

  • How to create a Gremlin Scenario for Amazon SNS
  • How to measure the impact/results of your Gremlin Scenario

Chaos Engineering Hypothesis

For the purposes of this tutorial, we will run a Gremlin Scenario on Amazon SNS. We will focus on network-related Chaos Engineering attacks.

When Amazon Simple Notification Service (SNS) is unreachable from our API servers, we are not able to dispatch events to our event pipeline. As a result, some Tier2/Tier3 services will behave slowly or not at all (e.g. Slack Integration, Mixpanel, Salesforce…


Chaos Engineering involves running thoughtful, planned experiments that teach us how our systems behave in the face of failure.

These experiments follow three steps:

Image for post
Image for post
Image for post
Image for post

3 steps in a Chaos Engineering experiment

You start by forming a hypothesis about how a system should behave when something goes wrong.

Then, you design the smallest possible experiment to test it in your system.

Finally, you measure the impact of the failure at each step, looking for signs of success or failure. When the experiment is over, you have a better understanding of your system’s real-world behavior.


Over the last 4.5 years, Kubernetes has dramatically improved in terms of usability and it’s now easier than ever to get started with Kubernetes. Cloud providers like Amazon AWS now have managed Kubernetes products that create and manage your clusters for you. This is a huge change compared to rolling your own Kubernetes cluster.

One of the most interesting shifts in our industry I have seen over the last 2 years is that more and more companies are now running Kubernetes with their Production workloads. This is where things start to get interesting for SREs. …

About

Tammy Bryant (Butow)

Principal Site Reliability Engineer @GremlinInc http://gremlin.com | Chaos Engineering ☁️ 💻 ⚡️💀 Previously @DigitalOcean @Dropbox @NAB @QUT

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store