The metric correlation component of Etsy's Kale system

Related tags

Big data oculus
Overview

Oculus is an Archived Project

Oculus is no longer actively maintained. Your mileage with patches may vary.

Oculus

Oculus is the anomaly correlation component of Etsy's Kale system.

It lets you search for metrics, using your choice of comparison algorithms:

search algorithms

and shows you other metrics which are similar:

search results

You can even save interesting metrics into a collection, complete with your own notes:

save collection

And then if Oculus finds matches in your saved collections, it'll show you this alongside your other search results:

collection results

Installation Overview

Oculus consists of the following components:

  • Sinatra Web App - the Oculus front end, used to search for metric correlations
  • Skyline Import Script and Cronjob - Used to import data from Skyline into Elasticsearch
  • ElasticSearch Cluster - The search backend for Oculus
  • Worker Box(es) - Used to process metrics from Skyline and import them into Elasticsearch
  • Working Skyline install - Oculus is fed using data from Skyline's data store. Without Skyline, it won't have any data to look at.

Where recommended server specs have been mentioned in this document, they're based on Etsy's usage of Oculus which is based around 250k metrics. Adjust as necessary for your own metric volume.

It's recommended that you work through the section in this README in the following order:

  • ElasticSearch 1
  • Workers
  • Oculus Config File
  • Skyline Import Script and Cronjob
  • WebApp

Following the instructions in this order should result in a working Oculus setup with all the moving parts running and functioning correctly. Please note that these instructions are geared towards installing Oculus components on separate boxes. If you're installing everything on one box, then you can ignore replicated steps such as cloning the Oculus code - this only needs to be done once.

Also provided, but not necessary to get Oculus up and running are some utility scripts, in the section headed

  • Misc Utility Scripts

##ElasticSearch

For Oculus to function properly, you will ideally need at least two ElasticSearch servers in separate clusters, which Oculus will rotate through. Oculus requires the addition of custom scoring plugins, but otherwise uses ElasticSearch out of the box.

###Recommended Server Spec

  • At least 8GB RAM
  • Quad Core Xeon 5620 CPU or comparable
  • 1GB disk space

###Installation and Plugin Build (Applies to all cluster nodes)

  • Install the Java JDK (on CentOS, this is yum install jdk)
  • Download and extract elasticsearch from http://www.elasticsearch.org/download/ - here we'll assume /opt/elasticsearch
    • Oculus has been tested with version 0.90 - it should currently not build on version 0.90 and above
  • Clone the Oculus repository to somewhere on your server - here we'll assume /opt/oculus
  • Run the command mkdir elasticsearch-oculus-plugin
  • Copy /opt/oculus/resources/elasticsearch-oculus-plugin to /opt/elasticsearch/elasticsearch-oculus-plugin
  • cd to /opt/elasticsearch/elasticsearch-oculus-plugin
  • Run the following command:
    rake build
    
  • If successful, this will create a file called OculusPlugins.jar
  • copy OculusPlugins.jar to /opt/elasticsearch/lib/OculusPlugins.jar

###Configuration

As noted above, the Elasticsearch servers which Oculus uses cannot be in the same cluster - Oculus needs at least two seperate servers (and clusters) to rotate between. To that end, the ElasticSearch configuration file needs to have the Cluster name and Node name manually specified. For simplicity, these instructions assume that you're going to name each cluster and node after the hostname of the server, but any name can be used.

The scoring plugins which Oculus uses for searching must also be added to the configuration file.

Change the following lines in /opt/elasticsearch/config/elasticsearch.yml, changing the cluster and nodename to match the servers you're using:

cluster.name: oculussearch01.mydomain.com
node.name: oculussearch01.mydomain.com

Add the following lines to /opt/elasticsearch/config/elasticsearch.yml, changing the cluster and nodename to match the servers you're using:

script.native:
  oculus_euclidian.type: com.etsy.oculus.tsscorers.EuclidianScriptFactory
  oculus_dtw.type: com.etsy.oculus.tsscorers.DTWScriptFactory

###Start Up ElasticSearch Once you've installed ElasticSearch, built the Oculus scoring plugins and edited the configuration file, you can start it up by running the following command from /opt/elasticsearch

bin/elasticsearch

You can verify that Elasticsearch is up and running by visiting http://locahost:9200 on the box you installed ElasticSearch on. If all is well, you should see JSON similar to the below:

{
  "ok" : true,
  "status" : 200,
  "name" : "oculussearch01",
  "version" : {
    "number" : "0.20.5",
    "snapshot_build" : false
  },
  "tagline" : "You Know, for Search"
}

##Workers

Oculus' import and backend functions are handled by a cluster of Worker boxes running Resque (https://github.com/resque/resque). You'll need a Worker Master, running Redis (the central data store used by Resque) and some resque workers, and optionally some more Worker Slave boxes running extra Resque workers.

###Recommended Server Spec

  • At least 12GB RAM
  • Quad Core Xeon 5620 CPU or comparable
  • 1GB disk space

###Worker Master Box

The first worker box you'll need to get up and running is the Master box running Redis.

Here's how to get it up and running:

  • Install redis (on CentOS, yum install redis)
  • Start up Redis (on CentOS, service redis start)
  • Verify you can connect to Redis by running redis-cli
    • If everything is working, you should see this prompt: redis 127.0.0.1:6379>
  • Type exit to return back to the Shell.
  • Clone the Oculus repository to somewhere on your server - here we'll assume /opt/oculus
  • cd into the directory where you cloned Oculus
  • run the command bundle install
  • Edit the file Rakefile, changing the URL on line 66 to match the URL to your redis instance
  • Run the command mkdir /var/run/oculus and make sure the user you will run your workers as can read and write to it.
  • Run the command mkdir /var/log/oculus and make sure the user you will run your workers as can read and write to it.
  • To start the workers, from the directory where you installed Oculus run rake resque:start_workers
  • Optional Steps
    • You can start up a Resque web console to monitor what's going on by running resque-web from the binary directory of the resque Gem (for example /usr/lib64/ruby/gems/1.9.1/gems/resque-1.23.0/bin/resque-web)
    • If you want to edit the number of Resque workers run on the master box, edit lines 21 and 22 of the Rakefile (changing the number '22' to whatever you like, then run rake resque:restart_workers

###Worker Slave Box(es) - Optional

Once you've got your Worker Master set up and running Redis, you can also optionally set up any number of Worker Slave boxes (which just run Resque Workers) you want - we run 3 at Etsy.

  • Clone the Oculus repository to somewhere on your server - here we'll assume /opt/oculus
  • cd into the directory where you cloned Oculus
  • run the command bundle install
  • Edit the file Rakefile, changing the URL on line 66 to match the URL to your redis instance on the Worker Master
  • Run the command mkdir /var/run/oculus and make sure the user you will run your workers as can read and write to it.
  • Run the command mkdir /var/log/oculus and make sure the user you will run your workers as can read and write to it.
  • To start the workers, from the directory where you installed Oculus, run rake resque:start_workers
  • Optional Steps
    • If you want to edit the number of Resque workers run on this box, edit lines 21 and 22 of the Rakefile (changing the number '22' to whatever you like, then run rake resque:restart_workers

##Oculus Config File

Now that you've got your Elasticsearch servers all prepped and your Workers happily waiting for work, before we can start chucking data Oculus the next step is to set up the Oculus config file - config/config.yml. Here's a sample config file:

results_explain: 0
elasticsearch:
  servers:
     - "http://oculussearch01.mycompany.com:9200"
     - "http://oculussearch02.mycompany.com:9200"
  index: "metrics"
  timeout: 30
  phrase_slop: 20
  scorers:
    dtw:
      radius: 5
      scale_points: 25
    euclidian:
      scale_points: 25
skyline:
  host: "skyline.mycompany.com"
  port: 6379
  listener_port: 2015
redis:
  host: "oculusredis.mycompany.com"
  port: 6379

At this stage, all you need to do is the following:

  • Add your elasticsearch servers (one per line) to the elasticsearch->servers section
  • Add the hostname and port of your Worker Master's redis instance to the redis section
  • Add your Skyline host, port and listener port to the skyline section.

You can ignore the other settings under elasticsearch for now - Have a look at the "Help" page in Oculus once you've gotten it up and running if you want to know more about these. For now, the defaults will suffice!

You should make sure that your config file is pushed out to all of your Oculus servers.

##Skyline Import Script and Cronjob

Now that you've got your search servers all ready to populate, you'll need to set up the script that imports data from Skyline into your search indexes. This can run on any of the servers you're using to run Oculus.

  • Double check that you've configured your Oculus config file correctly as specified in the above section.
  • cd into the scripts directory under the directory where you cloned Oculus.
  • Run ./import.rb
  • If all is working correctly, you should see output similar to this:
 Active ES Server: http://oculussearch01.mycompany.com:9200
 Next ES Server: http://oculussearch02.mycompany.com:9200
 Recreating indexes
 Creating redis jobs...
 Getting unique metric names
 Found 250578 metric names
 187 workers working
 311 process_redis_metrics jobs left to run
 187 workers working
 309 process_redis_metrics jobs left to run
 187 workers working
 148 process_redis_metrics jobs left to run
 187 workers working
 113 process_redis_metrics jobs left to run
 187 workers working
 110 process_redis_metrics jobs left to run
 153 workers working
 0 process_redis_metrics jobs left to run
 Setting active search server to http://oculussearch02.mycompany.com:9200
 Oculus import finished in 36.634672763 seconds
  • Next, create a directory for the cronjob to log into (here we've used /var/log/oculus)
  • Create a cronjob as follows, changing the full path to where you cloned the Oculus code, and the cron frequency as applicable):
*/2 * * * * /opt/oculus/scripts/import.rb > /var/log/oculus/import.log 2>&1
  • Your Oculus search index will now be updated from Skyline every 2 minutes.

##Web App Install

Now that we've got all of the moving parts set up and started updating our search indexes with metric data, the final step is to get the Oculus front end web app set up.

  • Clone the Oculus repository to somewhere on your server - here we'll assume /opt/oculus
  • cd into the directory where you cloned Oculus
  • run the command bundle install
  • Make sure you've got your correctly configured configuration file in the config directory under your Oculus checkout
  • Run the command thin start
  • If everything's working correctly, you should see output similar to the following:
>> Using rack adapter
>> Thin web server (v1.5.0 codename Knife)
>> Maximum connections set to 1024
>> Listening on 0.0.0.0:3000, CTRL+C to stop
  • The final step is to log onto the Oculus admin console (http://yourserver:3000/admin, username admin and password admin), then click "Reinitialize Collections." Click OK to confirm.
  • You're all done! Oculus should now be running at http://yourserver:3000 - you can now start searching for metrics!!
Comments
  • Oculus Elasticsearch RequestError IndexMissingException

    Oculus Elasticsearch RequestError IndexMissingException

    Hi there, when I tried to open the link behind the data points path in skyline which redirects me to oculus, I always get the IndexMissingException. Is that caused by the cron job which pulls data from skyline to oculus? This is my configuration file config/config.yml: root@ip-10-118-150-210:/opt/oculus/config# cat config.yml results_explain: 0 elasticsearch: servers: - "http://127.0.0.1:9201" - "http://127.0.0.1:9202" index: "mini" #same as "MINI_NAMESPACE" in skyline settings.py ? timeout: 30 phrase_slop: 20 scorers: dtw: radius: 5 scale_points: 25 euclidian: scale_points: 25 skyline: host: "ec2-5-42-4-244.compute-1.amazonaws.com" port: 6379 listener_port: 1500 #skyline app http server listen port ? metric_prefix: "metrics" # same as "FULL_NAMESPACE" in skyline settings.py ? redis: host: "127.0.0.1" port: 6379

    opened by patrickshan 8
  • Redis::CommandError at /search

    Redis::CommandError at /search

    Redis::CommandError at /search ERR wrong number of arguments for 'mget' command

    After following the setup/installation I wind up getting this message in the web interface and thin log. It appears that an empty array is being passed to the redis function mget, however I am not sure exactly why. Any help would be much appreciated. Ruby version 1.9.3-p448 (rvm)

    opened by apsamuel 4
  • Does not work with Elasticsearch 0.90.1

    Does not work with Elasticsearch 0.90.1

    While I understand the need to keep updating your application, being unable to compile against the current version of Elasticsearch is a frustrating limitation.

    Is there any plans in the near future to make Oculus compatible with Elasticsearch 0.90.1 and above?

    opened by Ralnoc 4
  • configuration for oculus

    configuration for oculus

    Hi there,

    In oculus' configuration file, there is a part for 'Skyline': skyline: host: "" port: 6379 listener_port: 2024 metric_prefix: "metrics"

    I am not sure if I understand them right: 'port' : this one is for redis server running for skyline 'listener_port': this is port for skyline horizon which receives data from carbon-relay 'metrick_prefix' : this one should use the same one as "FULL_NAMESPACE = " in skyline's settings.py configuration file

    Are they all right ?

    And there is a 'index' part in the elasticsearch part of the conf file with default value 'mini', shall I just keep it? Or change it to other value?

    opened by patrickshan 2
  • NoMethodError

    NoMethodError

    When I attempt to search for a metric by name, I get this error:

    NoMethodError at /search undefined method `fingerprint' for nil:NilClass /opt/oculus/helpers/elasticsearch.rb in get_fingerprint 312. @client.get(name).fingerprint /opt/oculus/oculusweb.rb in block in class:Oculusweb 312. @fingerprint = @elasticsearch_helper.get_fingerprint(@formatted_query)

    Could this be a bug, or a configuration issue?

    When I search via Drawn Query, the result is "Your search didn't match any saved collections."

    opened by mrerik92101 1
  • Adds HelperError rescue class

    Adds HelperError rescue class

    The missing HelperError rescue class obfuscates the real problem or error should one arise in the web app.

    For example, with the HelperError rescue class in place I can see this error:

    Redis::CannotConnectError at /
    Error connecting to Redis on 11.22.33.44:6379 (ECONNREFUSED)
    

    But without the class I only see:

    NameError at /
    uninitialized constant Oculusweb::HelperError
    
    opened by nicholaskuechler 1
  • Support clarifications

    Support clarifications

    We have gone through your Project and we would like to use this for our project.

    Currently we are using Elastic search version 5.1.1 which faraway from the given one. Hence we are unable to use this on our project.

    Will you able to support on this??

    Awaiting for your valuable feedback.

    opened by dpkkumar01 1
  • Fix broken headings in Markdown files

    Fix broken headings in Markdown files

    GitHub changed the way Markdown headings are parsed, so this change fixes it.

    See bryant1410/readmesfix for more information.

    Tackles bryant1410/readmesfix#1

    opened by bryant1410 1
  • multiple ES cluster requirement is onerous and unnecessary

    multiple ES cluster requirement is onerous and unnecessary

    As far as I can tell from reading the source, the requirement for multiple ES servers in distinct clusters stems from the way the importer works. It deletes the old index, creates a new one, populates it, and finally adjusts the setting in redis pointing to which server is active. I'm guessing this is done so that searches can still be performed while an import is happening. Perhaps this is also an attempt to segregate resource usage, so that indexing load does not impact searching and vice versa.

    None of these requirements necessitates a separate ES cluster. Oculus is only using a single index, so it's no big deal to rotate between index names. Better still, you can use ES's index alias feature to point to the current index inside ES. For example, perhaps the importer could name the index it's creating metrics.YYYY.MM.DD.HH.MM.SS. When it finishes, it can atomically switch the "metrics" alias to point to the new index. The searcher then just refers to "metrics" as if it were the name of an actual index. This scheme also allows storing multiple historical indices, allowing one to search against historical fluctuations.

    If isolation of indexing and searching is required, just use elasticsearch's routing parameters. One can ensure that searching and indexing always happen on different nodes. If complete isolation is required, then scrap my naming scheme above, alternate between metrics.0 and metrics.1, and add routing rules to ensure that the shards for those indices are stored on separate nodes.

    Requiring multiple ES servers adds a significant barrier to entry. I need to either run multiple ES instances on one host (probably requiring me to retool my puppet manifest) or spin up multiple hosts. This wastes resources, since searches are probably going to be fairly rare so at least one ES cluster will be mostly idle.

    Ultimately, given ES's flexibility in sharding and routing, I can't think of any case when an application would truly need to use multiple clusters.

    opened by lexelby 0
  • Consider using Symbolic Aggregate Approximation for indexing time-series data

    Consider using Symbolic Aggregate Approximation for indexing time-series data

    First off, awesome project! As the documentation mentions, using DTW is computationally expensive. Would the project benefit from using other timeseries indexing methodologies such as Symbolic Aggregate Approximation? In this case, you can transform and discretize time-series data to string representations and perform string-based indexed search.

    Let me know what your thoughts are. I'd be happy to submit a PR to implement this.

    Cheers!

    opened by stefan-pdx 1
Insanely fast ECS (Entity Component System) for Java

|) () |\/| | |\| | () |\| Dominion is an Entity Component System library for Java. Entity Component System architecture promotes data-oriented program

Dominion 151 Jan 3, 2023
Ultimate Component Suite for JavaServer Faces

PrimeFaces This is an overview page, please visit PrimeFaces.org for more information. Overview PrimeFaces is one of the most popular UI libraries in

PrimeFaces 1.5k Jan 3, 2023
Apache Wicket - Component-based Java web framework

What is Apache Wicket? Apache Wicket is an open source, java, component based, web application framework. With proper mark-up/logic separation, a POJO

The Apache Software Foundation 657 Dec 31, 2022
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)

Sentinel: The Sentinel of Your Microservices Introduction As distributed systems become increasingly popular, the reliability between services is beco

Alibaba 20.4k Dec 31, 2022
A JavaFX 3D Visualization and Component Library

FXyz3D FXyz3D Core: FXyz3D Client: FXyz3D Importers: A JavaFX 3D Visualization and Component Library How to build The project is managed by gradle. To

null 16 Aug 23, 2020
A grid component for javafx

Grid Grid is a JavaFX (8) component that is intended for different kinds of small games that are based on a grid of squares like chess or sudoku. Exam

Manuel Mauky 25 Sep 19, 2022
A React Native project starter with Typescript, a theme provider with hook to easy styling component, a folder architecture ready and some configs to keep a codebase clean.

React Native Boilerplate Folder structure : src ├── assets │   ├── audios │   ├── fonts │   ├── icons │   └── images ├── components │   ├── Layout.tsx

LazyRabbit 23 Sep 1, 2022
a simple rating component

rn-rating-component A Simple react-native rating component. Installation Install rn-rating-component and its dependeices with npm npm install --save

Yuva raghav 3 Aug 20, 2021
FastKV is an efficient and reliable key-value storage component written with Java.

FastKV 中文文档 FastKV is an efficient and reliable key-value storage component written with Java. It can be used on platforms with JVM environment, such

Billy Wei 274 Dec 28, 2022
RXControls is a JavaFX custom component library.

RXControls RXControls Version 8.x.y need javafx8 RXControls Version 11.x.y need javafx11+ 一个javafx的自定义组件库, 密码可见组件, 轮播图组件, 动态按钮组件等, 音频频谱可视化组件,歌词组件 等...

null 164 Jan 1, 2023
A straight table component designed for performance

@qlik-oss/react-native-simple-grid A straight table component designed for performance Installation npm install @qlik-oss/react-native-simple-grid Usa

Qlik - Open Source Software 2 Apr 23, 2022
An open-source component of TabLight project "Base-API"

DataAddons is a library (or framework?) created for Minecraft providing comfortable abstractions making additions over already existing data, generally, it is anti-pattern ans YOU SHOULDN'T USE IT in normal programms.

XXR 3 Mar 8, 2022
MiniMessage Component-based Placeholders for PaperMC and Velocity platforms

MiniMessage Component-based Placeholders for PaperMC and Velocity platforms

null 10 Dec 28, 2022
Java Desktop (JavaFX and Swing) Component Inspector

Java Desktop (JavaFX and Swing) Component Inspector A Tool for help you to inspect the location and properties of certain components in a window hiera

TangoraBox 21 Oct 28, 2022
NPM Package - A react native component that let you to add a wavy background UI.

react-native-wavy-background A react native component that let you to add a wavy background UI. Installation npm install react-native-wavy-background

Shevon Soyza 10 Oct 19, 2022
GalaxyCDC is a core component of PolarDB-X which is responsible for global binary log generation, publication and subscription.

中文文档 What is ApsaraDB GalaxyCDC ? GalaxyCDC is a core component of PolarDB-X which is responsible for global binary log generation, publication and su

null 56 Dec 19, 2022
An awesome native wheel picker component for React Native.

⛏️ react-native-picky An awesome native wheel picker component for react-native. Features Supports multiple columns ✅ Supports looping ✅ Native Androi

null 28 Dec 4, 2022
BAIN Social is a Fully Decentralized Server/client system that utilizes Concepts pioneered by I2P, ToR, and PGP to create a system which bypasses singular hosts for data while keeping that data secure.

SYNOPSIS ---------------------------------------------------------------------------------------------------- Welcome to B.A.I.N - Barren's A.I. Natio

Barren A.I. Wolfsbane 14 Jan 11, 2022