Skip to content
Oleksandr Sviridenko edited this page Aug 16, 2013 · 17 revisions

Objective

Groningen is an autonomic computer scientist designed to optimize Java JVM settings such as garbage collection through iterative experimentation guided by a generational genetic algorithm. Through iterative modification and monitoring of Java programs we are able to tune a service without human intervention to find near optimal JVM garbage collector settings.

Results are published to a console that the a service owner can use to determine best settings to feed into their release process as updates to the current production defaults. In this way, users can re-tune their service each release as aspects influencing Java GC performance change such as traffic patterns or volume and differences in machine performance as well as changes in the service's code base.

The general strategy is to iteratively experiment on the Java service running by mutating a bank of canary tasks within a production job to evolve optimized configurations autonomically by employing a generational genetic algorithm. Groningen continues to iterate through subsequent generations of experiments until near optimal settings are found. The best settings are displayed to the user as experiments complete.

Background

Java garbage collection (GC) is time consuming to fine tune. It is challenging to initially setup the JVM properly and requires iteration of creating a JVM parameter set, monitoring the application under load, reviewing the GC stats and then doing it again until it "all performs well enough".

The most important aspect of tuning the JVM GC settings revolves around the amount of time the application is paused. When the application is paused, it cannot perform useful work. Ideally the pause time is zero, however in any real application it can never be zero. Therefore, we want to tune the GC so that the pause time to minimize latency and maximize throughput.

Researchers have prolifically created guidelines and various forms of JVM GC tuning techniques and tools. Cloud application providers have tailored these to their massive scale and low latency requirements. At this point in time, tuning a cloud service developed on Java is an "art" and takes time and patience of the application developers to iterate through the process. There is not a single solution or process to follow to tune every type of service in every environment. It takes trial and error guided by a good understanding of how the service works and behaves.

Developers have created the tools to assist developers in gathering the statistics required for properly tuning the Java GC. This type of tool often has drawbacks in that it requires applications to be run with certain settings that are not enabled by default and when running in this configuration the applications along with the supporting infrastructure are more heavily loaded.

The Java virtual machine can be configured to dump garbage collection information into logs appearing on STDOUT or a file. The GC logs are by far the most detailed set of information available on how the GC performed within a realworld Java application.

We note that an application's code base, JDK and supporting libraries change over time. Workload, performance profile and even the relative performance of various infrastructure components change as well. All of these changes require that from time to time, people must take the time to re-tune their Java garbage collection settings or else suffer the consequences of a poorly running service and/or wasted resources.

IBM popularized the concept of autonomic computing. The central premise of which is, "creating a new capacity where important computing operations can run without the need for human intervention". IBM correctly argues that as computing scale grows much faster than human scale, it only follows that computers will have to look after themselves - or else.

Evolutionary algorithms (EAs) are inspired by the biological model of evolution and natural selection first proposed by Charles Darwin in 1859. In the natural world, evolution helps species adapt to their environments. Environmental factors that influence the survival prospects of an organism include climate, availability of food and the dangers of predators.Evolutionary algorithms are based on a simplified model of this biological evolution. To solve a particular problem we create an environment in which potential solutions can evolve. The environment is shaped by the parameters of the problem and encourages the evolution of good solutions. The most common type of evolutionary algorithm is the generational genetic algorithm.

The Watchmaker Framework is an extensible, high-performance, object-oriented framework for implementing platform-independent evolutionary/genetic algorithms in Java. The framework provides type-safe evolution for arbitrary types via a non-invasive API. The Watchmaker Framework is Open Source software, free to download and use subject to the terms of the Apache Software Licence, Version 2.0.

Overview

Groningen is a Java based application design for tuning Java JVM garbage collector autonomically. The intention is to remove the time consuming human component of iteratively "exploring" the best JVM settings, so that tuning is performed more evenly across all services and more frequently to keep up with dynamic performance profiles throughout the cloud. You give Groningen a search space that defines all possible JVM settings to experiment with and it will automatically search that space for you to find the best performing sets. This is a tool intended to tune applications on an on-going basis. It does not run by itself, finding and fixing problems. It is designed to be "pointed at" an application and left to its own devices. Groningen will work on the problem without human interaction and eventually inform its human operator via the groningenz console of near optimal settings that are likely to improve application performance. In this way, it is a labor saving device.

The technique is intrusive in that we are going to need to restart and change the application instances' JVM settings in production many times before arriving at well performing experimental tasks. At the same time, we require real system stimuli to properly identify optimal settings. For these reasons, the set of instances we are testing must either be a subset of the live application possibly as a canary or a duplicate set of instances processing replicated traffic on unused infrastructure resources such as a QA environment replaying representative traffic.

The following diagram lays out the major components of our design:

Overview Diagram When Groningen starts, it creates an initial experimental population and runs each experiment in prod as mutated canary tasks under real load to produce real world data. GC logs are scraped by Groningen at the end of a process's life cycle using cloud services and stored in memory in the experiment database. In reality, the Extractor is a thread pool that is used to process logs as the Executor detects they have restarted during the experiment. The Executor restarts tasks at the start and end of the experiment to set and clear the experimental JVM settings.

Next a validation stage is performed by the Validator object to check that experimental data collected qualifies as a successful experimental run and we can trust it. After experimental validation, the Hypothesizer object uses the Experiment Database GC data to process the fitness function that scores each experiment. The Hypothesizer then evolves the population, favoring the experiments with the highest scores.

In this way, a new generation of experiments is created. It is at this point that Groningen may decide it has found a near optimal configuration (called stagnation) and exit. Otherwise it continues the process by using the Generator object to create the protobufs containing experimental JVM settings for each task in the experiment and then the Executor to restart the experimental tasks. Infrastructure

RPC Interface

The RPC interface is under development. It provides programmatic access to control Groningen from another application for the purpose of embedding Groningen deeply into cloud operating system infrastructure. This enables fully automatic continuous tuning of Java cloud applications in production.

Command Line Arguments

Groningen currently takes most configuration parameters through the configuration subsystem described below. The command line directs the configuration system where to find its input and what format that input is in.

Required

Configuration system locator: a single argument - a string resembling an url that describes where the configuration system can find its configuration information. This argument has a tiered scheme portion dependent on the backend. The url takes the form of:

  • for text formated protobuf files: proto:txtfile:[refresh=<refresh polling interval>:]/fileutil/readable/path/to/textformat/protobuf/file
  • for binary protobuf files: proto:binfile:[refresh=<refresh polling interval>:]/fileutil/readable/path/to/binary/protobuf/file

Configuration System

Overview

The configuration system is used as a central configuration store reducing the use of flags as much as possible in order to give users a single point of configuration.

Groningen's configuration subsystem allows for multiple implementations which is instantiated and initialized via a factory class and who are presented to the user through the GroningenConfig interface. The object initialized is a manager for the requested configuration system type that will in turn instantiate and initialize any subsystems they might require and the manager is the source of the GroningenConfig objects. The configuration manager then can be polled for objects implementing the GroningenConfig interface. The GroningenConfig interface provides access to a protocol buffer of user defined parameters that tune the different stages of the pipeline, search space restrictions for the JVM garbage collector related arguments as well as the datacenter and job layout for the tasks to which the permuted arguments is applied. The configuration system allows the user to specify sets of JVM arguments to include in the initial population of the experiment.

Each cell configuration is independent so jobs with different names running as different users and/or accessing different paths within the task's JVM parameter can be grouped into the same experiment. Multiple jobs, can be utilized in the same cell. The cell, job, task hierarchy is captured in the structure of GroningenConfig and its CellConfig, JobConfig, and TaskConfig subinterfaces.

The configuration system should be able to detect and incorporate updates to the configuration information without having to restart Groningen where possible. Updates should be immutable once generated such that a single configuration can be used throughout an iteration of the pipeline. Some items will not be able to be updated past Groningen startup (and possibly even when Groningen is restarted once checkpointing has been enabled) such as certain hypothesizer configuration points like population size due to limits in the external GA libraries we are utilizing.

Adding a configuration parameter to the general set of settings for the Groningen pipeline should be made easy. Thus, a separate protobuf is used to encapsulate the general purpose parameters. Adding a variable consists of:

  • add entry to groningen/protobuf/groningen_config.proto including a default value if desired
  • if no default value and a nonnull guarantee is expected, add a test to the GroningenConfig implementing classes
  • potentially add a relavent test case to the implementing classes tests

Protocol Buffer Configuration System

Protocol Buffer Configuration System Overview Diagram

Protocol Buffers allow for a clean, widely adopted interface. They have been employed as the basis of the initial configuration backend with support from binary or text format protobuf files on any filesystem. Binary formatted files are expected to have been written without delimitation or RecordIO markers as a single ProgramConfiguration message.

Variables will often have a common value across cells and jobs. As such, the user and partial task paths should be implemented with a hierarchical scoping or search pattern: starting at the job level and if no value is present there, look for one in the cell who is a parent to the job, and if no value has been included in the cell, look in the global settings. Since these variables must contain a value in order to operate on the jobs, failure to find a value should be considered an error.

Output

The primary output of Groningen is the UI, a.k.a. "the console". It is a HTTP servlet running on port 8080 by default. This is a screenshot of the Groningen console:

![Groningen console screenshot] (https://lh3.googleusercontent.com/r1rvnnkjd4cd52Zbc2yeqi5r6dEaw_t3nOF88B_GryoI1tInvKhpy4JFjRZppudFTVSR799_-FnTFa-hNXDruOPZemlnXch5oyKQ8GB4PMYKRHQL6du8PTrn)

You have the ability to select any experiment being tuned by Groningen. Then you can review the best performing subjects from the experiment, review the history and download the entire history as a CSV file. You can also download the configuration protocol buffer for each experiment pipeline from the history view.

Detailed Design

Hypothesizer

This is the first stage of the processing pipeline for a Groningen optimization iteration. The Validator and Extractor objects outputs' are used by the Hypothesizer object to produce a new set of experiments to run. The intention is to create the "best of breed" by scoring the last set of completed experiments from the Experiment Database. The very first time the Hypothesizer generates a random set of experimental settings.

A generational genetic algorithm is used to create experiments. The chromosomes are a simple array of integers with data representing all of the appropriate JVM settings for the GC algorithm we are studying. The Watchmaker Framework is used for the GA Engine. Please take a look at the User Manual for more on the API specifically and it has a reasonably good crash course on GAs as well.

The scores of the completed experiments are computed from the Experiment Database using a fitness function. A better score implies that a particular experiment is more likely to be crossed with another experiment (generally called a chromosome) to produce the next generation. Chromosomes are mutated with a slight probability to avoid getting stuck in a sub-optimal local maximum.

JVM Settings

Keep in mind that one JVM setting from the list below is incremented or decremented by a certain amount per experiment. The following list of JVM settings is meant to be representative of what the Hypothesizer proposes to change, however please look at the source code for the complete list.

  • Collectors
    • Serial
      • -XX:+UseSerialGC
    • Parallel
      • -XX:+UseParallelGC
      • -XX:+UseParallelOldGC
      • -XX:ParallelGCThreads=<N>
    • Concurrent
      • -XX:+UseConcMarkSweepGC
      • -XX:CMSInitiatingOccupancyFraction=<N>
      • -XX:+CMSIncrementalMode
      • -XX:+CMSIncrementalPacing
      • -XX:CMSIncrementalDutyCycle=<N>
      • -XX:CMSIncrementalDutyCycleMin=<N>
      • -XX:CMSIncrementalSafetyFactor=<N>
      • -XX:CMSIncrementalOffset=<N>
      • -XX:CMSExpAvgFactor=<N>
  • Heap
    • -Xms<min> and above by -Xmx<max>
    • -XX:MinHeapFreeRatio=<minimum>
    • -XX:MaxHeapFreeRatio=<maximum>
  • Generations
    • -XX:NewSize=<N>
    • -XX:MaxNewSize=<N>
    • -XX:NewRatio=<N>
    • -XX:SurvivorRatio=<N>
  • Ergonomics
    • -XX:MaxGCPauseMillis=<N>
    • -XX:GCTimeRatio=<N>
    • -XX:YoungGenerationSizeIncrement=<Y>
    • -XX:TenuredGenerationSizeIncrement=<T>
    • -XX:AdaptiveSizeDecrementScaleFactor=<D>

Fitness Function

The fitness function used for scoring the experiments is configurable and is specific to the application type being tuned. In general we follow the three optimization goals: latency, throughput and footprint. This is important and in the end is responsible for finding the near optimal settings. The fitness function is a multivariate linear equation: Ax + By + Cz. A, B and C are user defined constants.

Latency Goal

The latency goal is measured as the inverse of the 99th percentile latency that your application experienced due to pausing threads, which is a conservative measure of the extra latency imposed on your interactive RPCs by GC. The bigger the score, the lower your latency is.

Throughput Goal

The inverse of the total amount of wall clock time the JVM paused application threads, which is a measure of how much total processing time was available to your application. The bigger this score, the more time you have for processing.

Footprint Goal

The amount of memory devoted to the application is the memory footprint and should be minimized to avoid resource waste. This is measured as the inverse of the JVM heap max size in megabytes.

Generator

The Generator object queries the Experiment Database to build an experiment from the output of the Hypothesizer. The results are written to Colossus protobuf files so that restarting tasks can read the mutated JVM settings.

Executor

The Executor starts experiments and monitors them in production in case of any problems during the execution of the experiment. If an experimental task is restarting too often for example then it is reset to the default JVM settings and marked as such so that the Validator can invalidate it later on in the Groningen pipeline.

The Executor uses an Experiment object to get a list of processes to restart and then restarts the jobs and cells to restart them. They must be restarted so that they update their JVM settings based on the output of the Generator to start the next round of experiments.

Once the experiment is started, the Executor polls tasks iteratively to determine their uptime by scraping varz. This is used to determine if a task has restarted. If it restarts too often it is reset to default JVM settings.

The Executor shares data on task restarts it records while monitoring the tasks. This data is indexed in an InMemoryCache by gcRecordId. The Validator uses this data to determine if a task is valid.

Extractor

The Extractor object is responsible for extracting GC data from the experiment and then scrapes GC logs from mutant canary processes. This command is used to store the GC logs as files on the local disk accessible by Groningen. At this point the raw JVM GC logs are opened, parsed, collated and then stored in the Experiment Database.

Extractor objects running as threads in a pool perform GC log processing during and immediately following experiments by the Executor object when tasks have restarted. The Executor executes Extractor log processing as soon as possible to prevent overwriting the prior tasks's logs in the degenerate case of a rapidly flapping mutant canary.

All of the data present in the logs are recorded in as much detail as possible in the Experiment Database. This provides a rich data set for Groningen to use as input into the fitness function that determines how successful a specific mutation is in given generation. Importantly, the pause time of each GC event is recorded and used by the Hypothesizer to create the latency and throughout fitness scores.

Validator

The Validator pipeline stage performs validation to prevent mutants that are flapping excessively from propagating to the next generation and other such abnormal and degenerate behavior. For example, the Vaildator enforces a validity threshold, which is a non-negative integer. Importantly, this threshold should not be greater than the threshold used by the Executor when resetting badly behaved tasks. The Executor shares task behavioral data with the Validator (ex. task restarts) it records while monitoring the tasks. This data is indexed in the Experimental Database. The Validator uses this data to determine if a task is valid.

In the case that a task is not valid, the Validator sets a valid flag to false in the Experimental Database. This valid tasks data is used by InMemoryCache objects implementing the ComputeScore interface to set the score to 0 for invalid tasks when the Hypothesizer creates the next generation.

Experiment Database

The Experiment Database holds Groningen data generated and used by each of the major Groningen components. In memory caching of data allows fast read and write access to the various stages of the Groningen evolutionary pipeline. Data is persisted in event logs.

Data Store Abstraction

Experiment data is stored in-memory using two data store abstractions. Experiment historical data as well as pipeline state data is saved, which can be persisted using a storage medium that generally abstracts a file interface.

In-memory data storage is effective when Groningen is used as a tool to solve a particular problem. Persistent storage is useful when it is used as a service to continuously tune many processes in a cloud.

Multiple Pipelines

Groningen supports multiple experiment pipelines running concurrently. The intention is that a single Groningen instance can support the JVM tuning needs of an entire datacenter.