Memory Machine Cloud Edition

Fault Tolerance Service

Save up to 90% by using Spot Instances

Organizations can save up to 90% by running their apps on Spot instances, but only if they can gracefully recover when the Spot instance is reclaimed by the cloud service provider. It can be risky because the probability of an interruption can be up to 20% for instances that save the most.

Stateful, Non-Fault-Tolerant, Long-Running Apps Not Recommended

Below are examples of stateful, non-fault-tolerant, and long-running apps that cloud service providers recommend you do not deploy on low-cost Spot instances.

AI/ML

EDA

Video Rendering

Financial Analytics

Genomics

Geophysical

Risk

CFD

Memory Machine™ Cloud Edition

Fault Tolerance Service

The Fault Tolerance Service is 1 of 3 new cloud services included with Memory Machine Cloud Edition.

The Fault Tolerance Service enables stateful, non-fault-tolerant, and long-running apps to gracefully recover from Spot terminations.

How It Works

Memory Machine’s Fault Tolerance Service is a cloud-based Software-as-a-Service designed to give users the best experience using the cloud service at the lowest cost, while giving cloud IT pros control over how their fault tolerance service works in their app environment.

The service consists of 2 components, the Management Center and agents that run inside a cloud VM along with the user applications. The FT service works in conjunction with app job schedulers, cloud management services, spot instances, and cloud storage services.

The FT service is easily integrated. users simply provide various information about the spot instance, snapshot frequency, and what storage to use. After installation, the fault tolerance service is automated.

Snapshots are taken at user configured intervals and asynchronously sent to persistent cloud storage. After a Spot Instance is reclaimed, a new Instance is automatically allocated based on the services and polices deployed at set-up. The Management Center then restores the latest snapshot from the storage service and the app resumes execution.

How Much You Can Save

Case Study

SplAdder

Case Study

PlantTribes

Case Study

DeNovo Gene Sequence Assembly

MemVerge Fault Tolerance Early Access Program

What You Get

  1. Memory Machine software with SpotOn service.
  2. Free professional services needed to configure various services for automated recovery and restart.
  3. A free license for Memory Machine for 6 months.
  4. Free white glove support for 6 months.

What You Have To Do

  1. Deploy a non-fault-tolerant and/or long-running workload on AWS.
  2. Pay for services (AWS) not provided by MemVerge.
  3. Provide on-going feedback to MemVerge about the status of the deployment and operation.