November 12th, 2015
Amazon Web Services is a cloud platform that can add to the flexibility and work capacity of any data analytics team.
Our presentation aims to give you:
Analysis
Security
The cloud is computers you don't own.
11 regions,
28 availability zones,
50 services,
$1.6 billion in revenue.
AWS's capacity is estimated to be four times that of its nearest ten competitors combined, including Microsoft Azure, Google Cloud, and IBM Cloud Services.
What you buy: rentable laptops.
What Amazon says: "Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers."
PERFORMANCE COMPUTING
Projects that have short-term or intermittent need for resource intensive processing.
HIGH AVAILABILITY SERVERS
Server-dependent systems that require 99.99%* accessible up-time.
Rent instances by the hour, priced based on the number of CPUs, amount of RAM, and data storage capacity. Choose the Amazon Machine Images (AMIs) loaded onto each instance, with pre-loaded operating systems and software configurations.
Instance hardware configurations are optimized for general purpose, memory, processor, GPU, or storage use.
What it's for: setting up and running your database for you.
What Amazon says: "Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale relational databases in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business."
ANALYZE LARGE DATA
We have a 100 GB database extract (i.e. structured data) and want to analyze it using SQL.
SINGLE SOURCE OF TRUTH
We want to give multiple people read-only access to the same data at the same time.
DATA SOURCE FOR EC2
Accessing data stored on AWS is faster than accessing data stored on Summit's local network.
Choose the amount of storage, and how fast it is, separately from CPU and RAM.
You can tailor an RDS instance to different workloads. You can also change the configuration on the fly to respond to fluctuating demands and requirements.
What it's for: Process and analyze massive amounts of data.
What Amazon says: "Amazon Elastic MapReduce (EMR) simplifies big data processing, providing a managed Hadoop/Spark framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances."
MASSIVE DATA SETS
Process, store, and produce summaries on hundreds of gigabytes of data, quickly.
MACHINE LEARNING
Modeling techniques that would require a single instance to run for weeks.
Hardware
Software
Cost per hour: 1 cent to 27 cents per machine per hour, not including the cost of the EC2 instances.
What you need to know: AWS services are compatible with compliance to HIPAA, PCI, and many more security standards.
What Amazon says: "Amazon Web Services Cloud Compliance enables customers to understand the robust controls in place at AWS to maintain security and data protection in the cloud. As systems are built on top of AWS cloud infrastructure, compliance responsibilities will be shared. By tying together governance-focused, audit-friendly service features with applicable compliance or audit standards, AWS Compliance enablers build on traditional programs; helping customers to establish and operate in an AWS security control environment."
SECURITY OF THE CLOUD
Amazon's guarantees to its customers about the security of its systems as sold.
SECURITY IN THE CLOUD
Our responsibilities as cloud users in securing our systems.
GovCloud is a special FedRAMP and ITAR compliant region of AWS - physically and logically accessible only from within the United States of America.
Use of GovCloud requires authorization from Amazon.
What you need to know: only your team can access your AWS resources.
What Amazon says: "Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the Amazon Web Services (AWS) Cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways."
Every AWS product we have discussed relies on an explicit VPC configuration.
For many configurations, AWS VPC is free. The exception is when a configuration requires a router internal to the VPC. Each router costs 8 cents an hour, or about $37 a month.
Configuring your VPC to be secure requires technical understanding of network security. Assume you will need to budget for several hours of professional IT consultation to review your security, at a minimum, before putting any client work into the cloud.
What you should hear: you can give your team access to AWS without worrying someone will launch ten $7-per-hour instances before everyone leaves for holiday vacation.
What Amazon says: "AWS Identity and Access Management (IAM) enables you to securely control access to AWS services and resources for your users. Using IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources."
Users can be given unique logins, including MFA tokens.
Users can be assigned permissions.
Users can be lumped into groups, which can be assigned permissions.
Permissions define whether users can login to the AWS console, launch different AWS products, access AWS resources, and more.
Analysis
Security