Tools for keeping your cloud operating in top form. Chaos Monkey is a resiliency tool that helps applications tolerate random instance failures.

Overview

NetflixOSS Lifecycle Build Status License

PROJECT STATUS: RETIRED

The Simian Army project is no longer actively maintained. Some of the Simian Army functionality has been moved to other Netflix projects:

  • A newer version of Chaos Monkey is available as a standalone service.
  • Swabbie is a new standalone service that will replace the functionality provided by Janitor Monkey.
  • Conformity Monkey functionality will be rolled into other Spinnaker backend services.

DESCRIPTION

The Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures

DETAILS

Please see the wiki.

SUPPORT

Simian Army Google group

Because the project is no longer maintained, there is a good chance that nobody will be able to answer a support question.

LICENSE

Copyright 2012-2016 Netflix, Inc.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • More chaos types

    More chaos types

    Add a bunch more chaos types, using the new script support.

    Also make sure that the Apache license is on all the files I added.

    The chaos monkey army is documented here: https://github.com/Netflix/SimianArmy/wiki/The-Chaos-Monkey-Army

    opened by justinsb 15
  • Added the ability to fine-tune the probability on a per-ASG basis.

    Added the ability to fine-tune the probability on a per-ASG basis.

    Because not all ASG are created equally, we needed a way to tune the killing of instances and didn't want to add an entry in chaos.properties for every new ASG. This PR allows you to add a tag to your ASG with the key "chaosMonkey.aggressionCoefficient" and that value will be multiplied by your effective probability for that ASG. For example if my ASG effective probability is 0.5, I can tag my Cassandra ASG's [as an example] with chaosMonkey.aggressionCoefficient=0.5 so that my Cassandra ASG's will have an effective probability of 0.25.

    opened by jeffggardner 12
  • Support for notification via SNS

    Support for notification via SNS

    It seems odd that the only notification of terminations is via SES, not SNS.

    Suggested feature: for each termination, send a JSON message to some configured SNS topic, including at least the instance id terminated, and the arn of the Autoscaling Group of which it was a member.

    opened by rvedotrc 12
  • Add vSphere crawler only to crawl target application

    Add vSphere crawler only to crawl target application

    There is no similar ASG concept in vSphere as in AWS and auto grouping all VMs by parrent folder name makes over-chaos to the whole vSphere. Specifying all VMs under the same absolute folder path as a group is a more precise way to inject faulty into target applications. So I add a VsphereChaosCrawler to crawl only VMs under target folder.

    opened by DanielXiao 12
  • Abstraction of email transport and message generation.

    Abstraction of email transport and message generation.

    This request contains one commit that was merged in from SimianArmy on Netflix:master (removal of java 1.6 requirement). Tests run (and new tests written), no regressions. Comments welcome!

    The purpose of this PR is to start work on abstraction of "cloud" and "cluster" ideas (all still abstract) from some of the tightly coupled AWS dependencies that exist in those base classes. This particular branch allows a non-aws user to send notifications through a specified SMTP server. Our particular use case is that we are a cloud provider (rather than an app developer who deploys to a cloud provider). SimianArmy has the same intrinsic value for cloud providers (taking out machines to see if the cluster will survive without interrupting the user experience), but we obviously can't route mail through the AWS email service.

    opened by mgeis 10
  • Option for non-SDB MonkeyRecorder

    Option for non-SDB MonkeyRecorder

    This change uses a property to load a MonkeyRecorder, so an alternate can be used when SDB is not available/desired. This is critical for non-AWS cloud environments, where there may not be a natural equivalent for SDB.

    opened by huxoll 10
  • Gradle Build Failure

    Gradle Build Failure

    [DEBUG] [org.gradle.configuration.project.BuildScriptProcessor] Timing: Running the build script took 3.716 secs [ERROR] [org.gradle.BuildExceptionReporter] [ERROR] [org.gradle.BuildExceptionReporter] FAILURE: Build failed with an exception. [ERROR] [org.gradle.BuildExceptionReporter] [ERROR] [org.gradle.BuildExceptionReporter] * What went wrong: [ERROR] [org.gradle.BuildExceptionReporter] nebula/plugin/netflixossproject/NetflixOssProjectPlugin : Unsupported major.minor version 51.0 [ERROR] [org.gradle.BuildExceptionReporter] [ERROR] [org.gradle.BuildExceptionReporter] * Try: [ERROR] [org.gradle.BuildExceptionReporter] Run with --stacktrace option to get the stack trace. [LIFECYCLE] [org.gradle.BuildResultLogger] [LIFECYCLE] [org.gradle.BuildResultLogger] BUILD FAILED [LIFECYCLE] [org.gradle.BuildResultLogger] [LIFECYCLE] [org.gradle.BuildResultLogger] Total time: 6.445 secs

    java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

    opened by mwhite14 8
  • Email notification changes

    Email notification changes

    Hi,

    We've made the following changes:

    • CloudFormationChaosMonkey class to enable email notifications when no-suffix group names are used.
    • Modified BasicChaosMonkey and BasicChaosEmailNotifier classes to enable global email notifications. eg. simianarmy.chaos.notification.global.receiverEmail = [email protected] is used to enable global email notifications. Added corresponding unit test.
    • Refactored TestChaosMonkeyContext to remove duplicate code
    opened by nicktgr15 8
  • Cloud Formation Auto Generated Names and ASG opt in/out

    Cloud Formation Auto Generated Names and ASG opt in/out

    Hi

    When using cloud formation, the ASG created has a random string appended, such as:

    foo-my-app-MyAutoScalingGroup-1727U49KMHGUT

    We are currently using the opt in feature with Chaosmonkey to select a subset of our ASGs for Chaosmonkey to target.

    So we put 'simianarmy.chaos.ASG.foo-my-app-MyAutoScalingGroup-1727U49KMHGUT.enabled =1' into the properties file.

    However this can be a little problematic as occasionally we update the stack. This creates a new random string as a suffix to the ASG name and causes the opt-in configuration to fail to match the previously matched ASG.

    Is there a way we can use some kind of pattern matching and if not could we submit a patch to solve this problem.

    Dip

    opened by dipthegeezer 8
  • Added VSpehere support

    Added VSpehere support

    Hi Cory

    I'm finally done with adapting ChaosMonkey to run against VSpehere. The configuration file for this is client.properties The basic strategy fot termination is to set a custom property in the VM and then reset it. This triggers a kickstart on reboot by our infrastructure code governing the VSphere installation. I'm pretty sure others will have their own strategies here in the future. I refactored the code so I could introduce the infastructure needed to make the client configurable in order introduce the VSphereClient. I also tried to make the change to the overall structure as small as possible. The persistence is still Amazon Simple DB. I may add MySQL support later, since this more how we roll here rather then using public clouds. I couldn't get my windows eclipse with jalopy to cooperate on the NewlineAtEndOfFile Rule, so I commented that out in codequality/checkstyle.xml It's probably best if you just ignore this change. I enabled the travisci build since it's free and a threeliner I added a "EventReport" to the context. I use that to log a final score after a monkey run on what VMs where terminated. This is pure convenience on my part since I find this easier to read in the logs. The test coverage on class VSphereServiceConnection is pretty low. This is the class that encapsulates the connection to vsphere. And the VSpehere API is pretty hard to unit test, I went as far as I deem feasible.

    If you have any questions regarding my changes, do not hesitate to get back to me.

    Cheers Ingmar

    opened by IngmarKrusch 8
  • Don't take false MonkeyTime property.

    Don't take false MonkeyTime property.

    Before, the simianarmy.calendar.isMonkeyTime could have three states: true, false, or unset. If unset, the property is ignored; if true, it's always monkey time; if false, it's never monkey time.

    Arguably, given the documentation, this is the correct behavior.

    opened by csm 7
  • Simian Army not deleting the instance : EC2 is in VPC and is ignored.

    Simian Army not deleting the instance : EC2 is in VPC and is ignored.

    I am following the documentation to run simian army

    Following the steps in https://github.com/Netflix/SimianArmy/wiki/Quick-Start-Guide#setup-simpledb-table & unleashing the money, instead of getting

    Terminated i-8b55fbb8 from group monkey-target,

    I am getting

    2020-06-01 14:43:15.581 - INFO InstanceInSecurityGroup - [InstanceInSecurityGroup.java:153] Instance i-0fba7806cbf15b213 is in VPC and is ignored.

    opened by arjunsreepad 0
  • CVEs in the dependencies are in the execution path of your project

    CVEs in the dependencies are in the execution path of your project

    Hello, Your project uses some dependencies with CVEs. I found that the buggy methods of the CVEs are in the program execution path of your project, which makes your project at risk. I have suggested some version updates. See below for more details:

    • Vulnerable Dependency: org.apache.httpcomponents : httpclient : 4.3

    • Call Chain to Buggy Methods:

      • Some files in your project call the library method org.apache.http.impl.client.HttpClientBuilder.build(), which can reach the buggy method of CVE-2013-4366.

        • Files in your project: src/main/java/com/netflix/simianarmy/client/MonkeyRestClient.java
        • One of the possible call chain:
        org.apache.http.impl.client.HttpClientBuilder.build() [buggy method]
        
    • Update suggestion: version 4.5.11 4.5.11 is a safe version without CVEs. From 4.3 to 4.5.11, 2 of the APIs (called by 2 times in your project) were modified.

    opened by CleWang 0
  • Can chaos monkey terminate instance not in default namespace?

    Can chaos monkey terminate instance not in default namespace?

    # kubectl get pods -n chaos
    NAME                                 READY   STATUS    RESTARTS   AGE
    testforcw-test-test1530-v000-m8nhm   1/1     Running   0          122m
    
    # chaosmonkey eligible testforcw chaos-account --cluster=testforcw-test-test1530 --region=chaos
    testforcw-test-test1530-v000-m8nhm
    
    # chaosmonkey terminate testforcw chaos-account --cluster=testforcw-test-test1530 --region=chaos
    [ 8222] 2019/06/13 17:39:49 Picked: {testforcw chaos-account chaos test testforcw-test-test1530 testforcw-test-test1530-v000 testforcw-test-test1530-v000-m8nhm kubernetes}
    

    but I still get error: "Failed to delete pod testforcw-test-test1530-v000-m8nhm in default" and the pod was not terminated and restart

    does anyone have any ideal about this problem?

    opened by Cwwwwww 0
  • .travis.yml: The 'sudo' tag is now deprecated in Travis CI

    .travis.yml: The 'sudo' tag is now deprecated in Travis CI

    opened by cclauss 0
  • Chaos Monkey support for spinnaker running as microservices

    Chaos Monkey support for spinnaker running as microservices

    Hi all, I am trying to use chaos monkey with spinnaker running on kubernates cluster. Spinnaker is running it's services separately in container .

    I am trying to get chaos monkey support for spinnaker mentioned in below link:- https://netflix.github.io/chaosmonkey/How-to-deploy/

    But settings.js file is not found in /var/www location.

    I went inside deck container and changed setting.js file in /opt/deck/html (location inside container). But I still not chaos monkey enable feature in spinnaker UI.

    I am using spinnaker version 1.7.6

    Can anyone help me out?

    opened by Manish-Savanur-ML 5
Releases(v2.5.3)
  • v2.5.2(Sep 20, 2016)

    Various feature adds and bug fixes since 2.5.1

    #266 Add possibility to load custom calendar, add bavarian holidays #268 Add option to use RDS for resource tracking #269 Add support for exclusion rules to JanitorRuleEngine #272 Changes for Netflix environment compatibility

    Source code(tar.gz)
    Source code(zip)
  • v2.5.1(Jul 8, 2016)

    Various feature adds and bug fixes since 2.5.0

    #241 Set jdk version to 8 #242 Fix javadoc errors #243 Fix javadoc errors #244 Replace the deprecated Eureka's DiscoveryManager #245 Switch builds to Travis #247 Fix dependency #248 Fix dependency #249 Fix dependency #251 Email validation update #253 Add event ID to list of chaos events #254 Use correct strategy name in default properties #255 Stop Conformity Monkey on destroy #256 Adding feature to specify resource types untaggedResource Rule #257 Make Conformity Monkey Notify Based on owner tag value #261 Updating Resources to have application/json content-type

    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Feb 24, 2016)

    Various feature adds and bug fixes since 2.4.

    #122 Commenting out part of test that verifies DST cutover #128 We don’t want JM to crash over an empty attachment #129 PZ add org.jclouds.api:ec2:1.6.0 dependency to fix chaos type issue #133 Upgrade gradle to 1.12 #134 Fixed cobertura reports #135 MaxTerminationsPerDay was checked only once #136 The original condition should be returned in addition to the new one #138 Add cross-zone load balancing conformity rule #140 Record the suffix-stripped version of termination events by the CloudFormationChaosMonkey #145 Fix httpclient config #147 Dynamic versions in the build.gradle file replaced by specific versions #148 Switch gradle download to https #149 Only log properties that are safe to log #151 Add optional proxy configuration to client.properties #170 Netflixossbuild #174 Add accountname to emails from JanitorMonkey #183 Updated pr170 #185 OpsWorks-aware Janitor monkey #187 Update jclouds to 1.9.0 to use ssh-agent feature #191 Configurable Global Monkey owner tag key #193 Fix test - use calendar with correct timezones #195 New rule: UntaggedRule #198 Upgraded AWS SDK to 1.10.5.1 #201 Email regex fix #202 Override SES client region with an optional property and a SimpleDB fix #203 these files were missing copyright headers #204 remove duplicate copyright notices #205 Fix warnings, add SimpleDB max retry #208 Set gradle version to 2.2.1 for compatibility with Nebula NetflixOSS #210 publish jars also #211 Fix NPE with Edda ASG Janitor crawler #212 replace hardcoded "owner" with property #213 add GET route v1/api/janitor for ELB healthcheck #215 Add Servo dependency. Add JMX/Servo metrics monitoring #216 Janitor getters for metrics should be public #217 since will number -> since should be number #218 Change janitor metrics from counter to gauge #220 EddaInstanceJanitorCrawler: breakup edda queries for image ids #222 Janitor Monkey recorder changes #223 Add a URL target to add events through HTTP GET; more Calendar logging #224 BasicCalendar: isMonkeyTime=false should execute normally #225 Fix parenthesis in wrong place #226 Render a simple HTML response for opting in/out of Janitor resources #227 NoGeneratedAMIRule: Add a property to override owner email #228 Added the ability to fine-tune the probability on a per-ASG basis #230 Add a prepareToRun() method to Janitor Monkey #231 Adds proxy support to SES and some tests #232 Fixing error in building of the termination reason string #233 Adding AWS region detection #235 Update tests for AWS 17 characters resources

    Source code(tar.gz)
    Source code(zip)
Owner
Netflix, Inc.
Netflix Open Source Platform
Netflix, Inc.
BAIN Social is a Fully Decentralized Server/client system that utilizes Concepts pioneered by I2P, ToR, and PGP to create a system which bypasses singular hosts for data while keeping that data secure.

SYNOPSIS ---------------------------------------------------------------------------------------------------- Welcome to B.A.I.N - Barren's A.I. Natio

Barren A.I. Wolfsbane 14 Jan 11, 2022
Akka gRPC - Support for building streaming gRPC servers and clients on top of Akka Streams.

akka-grpc Support for building streaming gRPC servers and clients on top of Akka Streams. This library is meant to be used as a building block in proj

Akka Project 420 Dec 29, 2022
Apache MINA is a network application framework which helps users

Apache MINA is a network application framework which helps users develop high performance and high scalability network applications easily

The Apache Software Foundation 846 Dec 20, 2022
Intra is an experimental tool that allows you to test new DNS-over-HTTPS services that encrypt domain name lookups and prevent manipulation by your network

Intra Intra is an experimental tool that allows you to test new DNS-over-HTTPS services that encrypt domain name lookups and prevent manipulation by y

Jigsaw 1.2k Jan 1, 2023
CustomRPC - a tool that allows you to change your discord rich presence (RPC) to a custom one

CustomRPC is a tool that allows you to change your discord rich presence (RPC) to a custom one. It also allows creating sentence sequences

null 2 May 3, 2022
Pcap editing and replay tools for *NIX and Windows - Users please download source from

Tcpreplay Tcpreplay is a suite of GPLv3 licensed utilities for UNIX (and Win32 under Cygwin) operating systems for editing and replaying network traff

AppNeta, Inc. 956 Dec 30, 2022
LINE 4.1k Dec 31, 2022
A Java library that implements a ByteChannel interface over SSLEngine, enabling easy-to-use (socket-like) TLS for Java applications.

TLS Channel TLS Channel is a library that implements a ByteChannel interface over a TLS (Transport Layer Security) connection. It delegates all crypto

Mariano Barrios 149 Dec 31, 2022
An netty based asynchronous socket library for benchion java applications

Benchion Sockets Library An netty based asynchronous socket library for benchion java applications ?? Documents ?? Report Bug · Request Feature Conten

Fitchle 3 Dec 25, 2022
A Linux packet crafting tool.

Pig Pig (which can be understood as Packet intruder generator) is a Linux packet crafting tool. You can use Pig to test your IDS/IPS among other stuff

Rafael Santiago 431 Dec 24, 2022
JNetcat : a tool to debug network issues or simulate servers

JNetcat A tool to easily debug or monitor traffic on TCP/UDP and simulate a server or client No need of telnet anymore to test for a remote connection

io-panic 3 Jul 26, 2022
Remote Support Tool is an easy single click solution for remote maintenance.

Remote Support Tool is an easy single click solution for remote maintenance.

OpenIndex.de 74 Jun 13, 2022
A networking framework that evolves with your application

ServiceTalk ServiceTalk is a JVM network application framework with APIs tailored to specific protocols (e.g. HTTP/1.x, HTTP/2.x, etc…) and supports m

Apple 805 Dec 30, 2022
VelocityControl is a BungeeControl-fork plugin enabling ChatControl Red to connect with your Velocity network.

VelocityControl is a BungeeControl-fork plugin enabling ChatControl Red to connect with your Velocity network.

Matej Pacan 10 Oct 24, 2022
Chaos engineering tool for simulating real-world distributed system failures

Proxy for simulating real-world distributed system failures to improve resilience in your applications. Introduction Muxy is a proxy that mucks with y

Matt Fellows 811 Dec 25, 2022
Operating Systems - Concepts of computer operating systems including concurrency, memory management, file systems, multitasking, performance analysis, and security. Offered spring only.

Nachos for Java README Welcome to Nachos for Java. We believe that working in Java rather than C++ will greatly simplify the development process by p

Sabir Kirpal 1 Nov 28, 2021
Generate Heroku-like random names to use in your Java applications

HaikunatorJAVA Generate Heroku-like random names to use in your java applications. Installation To install Haikunator add the following to your maven

Atrox 29 Aug 28, 2022
Sniffy - interactive profiler, testing and chaos engineering tool for Java

Sniffy Sniffy is a Java profiler which shows the results directly in your browser. It also brings profiling to your unit (or rather component) tests a

Sniffy 139 Dec 23, 2022