Setting up Cluster on Amazon Web Services (AWS)
Amazon Web Services (AWS) is a comprehensive, evolving cloud computing platform that offers a suite of cloud-computing services. The services provided by this platform that is important for SnappyData are Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). You can set up SnappyData cluster on Amazon Web Services using one of the following options:
SnappyData EC2 Scripts
The SnappyData EC2 scripts enable you to quickly launch and manage SnappyData clusters on Amazon EC2 instances. They also allow you to provide custom configuration for the cluster via SnappyData configuration files, before launching the cluster.
snappy-ec2 script is the entry point for these EC2 scripts and is derived from the
spark-ec2 script available in Apache Spark 1.6.
The EC2 scripts are provided on an experimental basis. Feel free to try it out and provide your feedback as via GitHub issues.
Ensure that you have an existing AWS account with required permissions to launch EC2 resources
Create an EC2 Key Pair in the region where you want to launch the SnappyData Cloud cluster
Refer to the Amazon Web Services EC2 documentation for more information on generating your own EC2 Key Pair.
Using the AWS Secret Access Key and the Access Key ID, set the two environment variables,
AWS_ACCESS_KEY_ID. You can find information about generating these keys in the AWS IAM console page.
If you already have set up the AWS Command Line Interface on your local machine, the script automatically detects and uses the credentials from the AWS credentials file.
export AWS_SECRET_ACCESS_KEY=abcD12efGH34ijkL56mnoP78qrsT910uvwXYZ1112 export AWS_ACCESS_KEY_ID=A1B2C3D4E5F6G7H8I9J10
- Ensure Python v 2.7 or later is installed on your local computer.
Deploying SnappyData Cluster with EC2 Scripts
In the command prompt, go to the directory where the snappydata-ec2-
<version>.tar.gz is extracted or to the
aws/ec2 directory where the SnappyData cloud tools repository is cloned locally.
./snappy-ec2 -k <your-key-name> -i <your-keyfile-path> <action> <your-cluster-name> [options]
<your-key-name>refers to the EC2 key pair.
<your-keyfile-path>refers to the path to the key file.
<action>refers to the action to be performed. Some of the available actions are
launchaction to create a new cluster while
startactions work on existing clusters.
By default, the script starts one instance of a locator, lead, and server each. The script identifies each cluster by its unique cluster name that you provide and internally ties the members (locators, leads, and stores/servers) of the cluster with EC2 security groups, whose names are derived from the cluster name.
When running the script, you can also specify options to configure the cluster such as the number of stores in the cluster and the region where the EC2 instances should be launched.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem --stores=2 --with-zeppelin --region=us-west-1 launch my-cluster
The above example launches a SnappyData cluster named my-cluster with 2 stores or servers. The locator is associated with security group named my-cluster-locator and the servers are associated with my-cluster-store security group.
The cluster is launched in the N. California (us-west-1) region on AWS and has an Apache Zeppelin server running on the instance where the lead is running.
The example assumes that you have the key file (my-ec2-key.pem) in your home directory for EC2 Key Pair named 'my-ec2-key'.
Assuming IAM role in the AWS EC2 Scripts
An IAM user in AWS can gain additional (or different) permissions, or get permissions to perform actions in a different AWS account through EC2 scripts. You can configure the AWS EC2 scripts to use an IAM role by passing the following properties:
assume-role-arn: The Amazon Resource Name (ARN) of the IAM role to be assumed. This IAM role's credentials are used to launch the cluster. If you are using the switch role functionality, this property is mandatory.
assume-role-timeout: Timeout in seconds for the temporary credentials of the assumed IAM role, min is 900 seconds and max is 3600 seconds.
assume-role-session-name: Name of this session in which this IAM role is assumed by the user.
-./snappy-ec2 -k <your-key-name> -i <your-keyfile-path> stop snap_ec2_cluster --with-zeppelin --authorized-address=<Authorized IP Address> --assume-role-arn=<role-arn> --assume-role-timeout=<timeout> --assume-role-session-name=<name-for-session>
By default, the cluster is launched in the N. Virginia (us-east-1) region on AWS. To launch the cluster in a specific region use option
This section covers the following:
- Using custom build
- Specifying Properties
- Stopping the Cluster
- Resuming the Cluster
- Adding Servers to the Cluster
- Listing Members of the Cluster
- Connecting to the Cluster
- Destroying the Cluster
- Starting Cluster with Apache Zeppelin
- More Options
Using Custom build
This script by default uses the SnappyData OSS build available on the GitHub releases page to launch the cluster.
To select a version of the OSS build available on GitHub, use option
You can also provide your own SnappyData build to the script to launch the cluster, by using
--snappydata-tarball to the
The build can be present either on a local filesystem or as a resource on the web.
For example, to use SnappyData Enterprise build to launch the cluster, download the build tarball from www.snappydata.io/download on your local machine and give its path as value to above option.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem launch my-cluster --snappydata-tarball="/home/ec2-user/snappydata/distributions/snappydata-1.1.0-bin.tar.gz"
Alternatively, you can also put your build file on a public web server and provide its URL to this option.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem launch my-cluster --snappydata-tarball="https://s3-us-east-2.amazonaws.com/mybucket/distributions/snappydata-1.1.0-bin.tar.gz"
The build file should be in .tar.gz format.
You can specify the configuration for the cluster via command line options. Use
--locator-conf to specify the
configuration properties for all the locators in the cluster. Similarly,
--lead-conf allow you
to specify the configuration properties for servers and leads in the cluster, respectively.
Following is a sample configuration for all the three processes in a SnappyData cluster:
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem --stores=2 launch my-cluster \ --locator-conf="-peer-discovery-port=9999 -heap-size=1024m" \ --lead-conf="-spark.executor.cores=10 -heap-size=4096m -spark.ui.port=3333" \ --server-conf="-client-port=1530"
The utility also reads snappy-env.sh, if present in the directory where helper scripts are present.
- The earlier method of specifying the configuration properties by placing the actual configuration files in the directory, where helper scripts are available, is discontinued.
- Ensure that the configuration properties specified are correct. Otherwise, launching the SnappyData cluster may fail but the EC2 instances would still be running.
Stopping the Cluster
When you stop a cluster, it shuts down the EC2 instances and any data saved on the local instance stores is lost. However, the data saved on EBS volumes is retained, unless the spot-instances are used.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem stop cluster-name
Resuming the Cluster
When you start a cluster, it uses the existing EC2 instances associated with the cluster name and launches SnappyData processes on them.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem start cluster-name
start command, or
launch command with
--resume option, ignores the
--stores options and launches the SnappyData cluster on existing instances.
However, if the configuration options are provided, they are read and processed, thus overriding their values that were provided when the cluster was launched or started previously.
Adding Servers to the Cluster
This is not yet supported using the script. You must manually launch an instance with
(cluster-name)-stores group and
launch command with the
Listing Members of the Cluster
To get the first locator's hostname:
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem get-locator cluster-name
get-lead command to get the first lead's hostname.
Connecting to the Cluster
You can connect to any instance of a cluster with SSH using the login command. It logs you into the first lead instance. You can then use SSH to connect to any other member of the cluster without a password. The SnappyData product directory is located at /opt/snappydata/ on all the members.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem login cluster-name
Destroying the Cluster
Destroying a cluster permanently destroys all the data on the local instance stores and on the attached EBS volumes.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem destroy cluster-name
This also deletes the security groups created for this cluster.
Starting Cluster with Apache Zeppelin
Optionally, you can start an instance of Apache Zeppelin server with the cluster. Apache Zeppelin provides a web-based interactive notebook that is pre-configured to communicate with the SnappyData cluster. The Zeppelin server is launched on the same EC2 instance where the lead node is running.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem --with-zeppelin launch cluster-name
For a complete list of options provided by the script, simply run
./snappy-ec2. The options are also provided below
for quick reference.
Usage: snappy-ec2 [options] <action> <cluster_name> <action> can be: launch, destroy, login, stop, start, get-locator, get-lead, reboot-cluster Options: --version show program's version number and exit -h, --help show this help message and exit -s STORES, --stores=STORES Number of stores to launch (default: 1) --locators=LOCATORS Number of locator nodes to launch (default: 1) --leads=LEADS Number of lead nodes to launch (default: 1) -w WAIT, --wait=WAIT DEPRECATED (no longer necessary) - Seconds to wait for nodes to start -k KEY_PAIR, --key-pair=KEY_PAIR Name of the key pair to use on instances -i IDENTITY_FILE, --identity-file=IDENTITY_FILE SSH private key file to use for logging into instances -p PROFILE, --profile=PROFILE If you have multiple profiles (AWS or boto config), you can configure additional, named profiles by using this option (default: none) -t INSTANCE_TYPE, --instance-type=INSTANCE_TYPE Type of server and lead instance to launch (default: m4.large). WARNING: must be 64-bit; small instances won't work --locator-instance-type=LOCATOR_INSTANCE_TYPE Locator instance type (default: t2.medium) -r REGION, --region=REGION EC2 region used to launch instances in, or to find them in (default: us-east-1) -z ZONE, --zone=ZONE Availability zone to launch instances in, or 'all' to spread stores across multiple (an additional $0.01/Gb for bandwidthbetween zones applies) (default: a single zone chosen at random) -a AMI, --ami=AMI Amazon Machine Image ID to use --snappydata-tarball=SNAPPYDATA_TARBALL HTTP URL or local file path of the SnappyData distribution tarball with which the cluster will be launched. (default: ) --locator-conf=LOCATOR_CONF Configuration properties for locators (default: ) --server-conf=SERVER_CONF Configuration properties for servers (default: ) --lead-conf=LEAD_CONF Configuration properties for leads (default: ) -v SNAPPYDATA_VERSION, --snappydata-version=SNAPPYDATA_VERSION Version of SnappyData to use: 'X.Y.Z' (default: LATEST) --with-zeppelin Launch Apache Zeppelin server with the cluster. It'll be launched on the same instance where lead node will be running. --deploy-root-dir=DEPLOY_ROOT_DIR A directory to copy into / on the first locator. Must be absolute. Note that a trailing slash is handled as per rsync: If you omit it, the last directory of the --deploy-root-dir path will be created in / before copying its contents. If you append the trailing slash, the directory is not created and its contents are copied directly into /. (default: none). -D [ADDRESS:]PORT Use SSH dynamic port forwarding to create a SOCKS proxy at the given local address (for use with login) --resume Resume installation on a previously launched cluster (for debugging) --root-ebs-vol-size=SIZE Size (in GB) of root EBS volume for servers and leads. SnappyData is installed on root volume. --root-ebs-vol-size-locator=SIZE Size (in GB) of root EBS volume for locators. SnappyData is installed on root volume. --ebs-vol-size=SIZE Size (in GB) of each additional EBS volume to be attached. --ebs-vol-type=EBS_VOL_TYPE EBS volume type (e.g. 'gp2', 'standard'). --ebs-vol-num=EBS_VOL_NUM Number of EBS volumes to attach to each node as /vol[x]. The volumes will be deleted when the instances terminate. Only possible on EBS-backed AMIs. EBS volumes are only attached if --ebs-vol-size > 0. Only support up to 8 EBS volumes. --placement-group=PLACEMENT_GROUP Which placement group to try and launch instances into. Assumes placement group is already created. --spot-price=PRICE If specified, launch stores as spot instances with the given maximum price (in dollars) -u USER, --user=USER The SSH user you want to connect as (default: ec2-user) --delete-groups When destroying a cluster, delete the security groups that were created --use-existing-locator Launch fresh stores, but use an existing stopped locator if possible --user-data=USER_DATA Path to a user-data file (most AMIs interpret this as an initialization script) --authorized-address=AUTHORIZED_ADDRESS Address to authorize on created security groups (default: 0.0.0.0/0) --additional-security-group=ADDITIONAL_SECURITY_GROUP Additional security group to place the machines in --additional-tags=ADDITIONAL_TAGS Additional tags to set on the machines; tags are comma-separated, while name and value are colon separated; ex: "Task:MySnappyProject,Env:production" --copy-aws-credentials Add AWS credentials to hadoop configuration to allow Snappy to access S3 --subnet-id=SUBNET_ID VPC subnet to launch instances in --vpc-id=VPC_ID VPC to launch instances in --private-ips Use private IPs for instances rather than public if VPC/subnet requires that. --instance-initiated-shutdown-behavior=INSTANCE_INITIATED_SHUTDOWN_BEHAVIOR Whether instances should terminate when shut down or just stop --instance-profile-name=INSTANCE_PROFILE_NAME IAM profile name to launch instances under --assume-role-arn=The Amazon Resource Name (ARN) of the IAM role to be assumed. This IAM role's credentials are used to launch the cluster. If you are using the switch role functionality, this property is mandatory. --assume-role-timeout=Timeout in seconds for the temporary credentials of the assumed IAM role, min is 900 seconds and max is 3600 seconds. --assume-role-session-name=Name of this session in which this IAM role is assumed by the user.
Launching the cluster on custom AMI (specified via
--amioption) does not work if the user 'ec2-user' does not have sudo permissions.
Support for option
AWS Management Console
The AMI of latest SnappyData release 1.1.0 is not available on AWS.
You can launch a SnappyData cluster on Amazon EC2 instance(s) using the AMI provided by SnappyData. For more information on launching an EC2 instance, refer to the AWS documentation. This section covers the following:
Ensure that you have an existing AWS account with required permissions to launch the EC2 resources.
Deploying SnappyData Cluster with AWS Management Console
To launch the instance and start the SnappyData cluster:
Open the Amazon EC2 console and sign in using your AWS login credentials.
The current region is displayed at the top of the screen. Select the region where you want to launch the instance.
Click Launch Instance from the Amazon EC2 console dashboard.
On the Choose an Amazon Machine Image (AMI) page, select Community AMIs from the left pane.
Enter SnappyData in the search box, and press Enter on your keyboard.
The search result is displayed. From the search results, click Select to choose the AMI with the latest release version.
On the Choose an Instance Type page, select the instance type as per the requirement of your use case and then click Review and Launch to launch the instance with default configurations.
You can also continue customizing your instance before you launch the instance. Refer to the AWS documentation for more information.
When configuring the security groups, ensure that you open at least ports 22 (for SSH access to the EC2 instance) and 5050 (for access to Snappy UI).
You are directed to the last step Review Instance Launch. Check the details of your instance, and click Launch.
In the Select an existing key pair or create a new key pair dialog box, select a key pair.
Click Launch. The Launch Status page is displayed.
Click View Instances. The dashboard which lists the instances is displayed.
Click Refresh to view the updated list and the status of the instance creation.
Once the status of the instance changes to running, you have successfully created and launched the instance with the SnappyData AMI.
Use SSH to connect to the instance using the Ubuntu username. You require:
The private key file of the key pair with which the instance was launched, and
Details of the public hostname or IP address of the instance. Refer to the following documentation, for more information on accessing an EC2 instance.
The public hostname/IP address information is available on the EC2 dashboard > Description tab.
The SnappyData product distribution is already downloaded and extracted in the /opt/snappydata directory and Java 8 is installed.
Go to the /opt/snappydata directory. Run the following command to start a basic cluster with one data node, one lead, and one locator.