AWS ParallelCluster- Slurm Workload Manager

๐Ÿ™ Ravin$ conda create -n awscli

๐Ÿ™ Ravin$ conda activate awscli

(awscli) ๐Ÿ™ Ravin$ conda install -c conda-forge awscli

(awscli) ๐Ÿ™ Ravin$ aws configure

AWS Access Key ID : ***************
AWS Secret Access Key :*****************
Default region name [us-east-1]: us-east-1
Default output format [None]:

Some of the features of aws cli


(awscli) ๐Ÿ™ Ravin$ aws iam help

(awscli) ๐Ÿ™ Ravin$ aws iam list-users

(awscli) ๐Ÿ™ Ravin$ aws iam get-user

(awscli) ๐Ÿ™ Ravin$ aws s3 ls

Install aws-parallelcluster


(awscli) ๐Ÿ™ Ravin$ conda install -c conda-forge aws-parallelcluster

More information on https://github.com/aws/aws-parallelcluster

Configure settings for cluster

(awscli) ๐Ÿ™ Ravin$ pcluster configure

AWS Region ID [us-east-1]: us-east-1
EC2 Key Pair Name [testslurm]: 1
Allowed values for Scheduler: 3 # 3. slurm
Allowed values for Operating System: 1 # 1. alinux
Minimum cluster size (instances) [2]: 1 ## can be described here
Maximum cluster size (instances) [10]: 10
Master instance type [t2.micro]: t2.micro
Compute instance type [t2.micro]: t2.micro
Automate VPC creation? (y/n) [n]: n # Enter 'n' if you already have a VPC suitable for the cluster.
VPC ID [vpc-3ac9c740]: 1
Automate Subnet creation? (y/n) [y]: n
Allowed values for Master Subnet ID:
Master Subnet ID [subnet-06d2d9837abd57fd4]: 1
Compute Subnet ID [subnet-0b41b85f10cb527de]: 1
Configuration file written to /Users/ravinpoudel/.parallelcluster/config
You can edit your configuration file or simply run 'pcluster create -c /Users/ravinpoudel/.parallelcluster/config cluster-name' to create your cluster


Options that can be passed to pcluster


create              Creates a new cluster.
    update              Updates a running cluster using the values in the config file.
    delete              Deletes a cluster.
    start               Starts the compute fleet for a cluster that has been stopped.
    stop                Stops the compute fleet, leaving the master server running.
    status              Pulls the current status of the cluster.
    list                Displays a list of stacks associated with AWS ParallelCluster.
    instances           Displays a list of all instances in a cluster.
    ssh                 Connects to the master instance using SSH.
    createami           (Linux/macOS) Creates a custom AMI to use with AWS ParallelCluster.
    configure           Start the AWS ParallelCluster configuration.
    version             Displays the version of AWS ParallelCluster.
    dcv                 The dcv command permits to use NICE DCV related features.
    
# to view the created cluster
(awscli) ๐Ÿ™ Ravin$ pcluster list


# Create a cluster with defined cluster configuration
(awscli) ๐Ÿ™ Ravin$ pcluster create mycluster

Beginning cluster creation for cluster: mycluster
Creating stack named: parallelcluster-mycluster
Status: ParallelClusterPolicies - CREATE_COMPLETE

Status: MasterServerLaunchTemplate - CREATE_IN_PROGRESS

Status: ComputeFleet - CREATE_IN_PROGRESS

Status: parallelcluster-mycluster - CREATE_COMPLETE
MasterPublicIP: *****************
ClusterUser: ec2-user
MasterPrivateIP:**************


### Check the created cluster
(awscli) ๐Ÿ™ Ravin$ pcluster list
mycluster CREATE_COMPLETE 2.6.1

### to connect/ ssh to cluster
(awscli) ๐Ÿ™ Ravin$ pcluster ssh mycluster -i testslurm.pem


Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '******' (ECDSA) to the list of known hosts.
Last login: Wed Apr 29 16:06:58 2020

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/
7 package(s) needed for security, out of 10 available
Run "sudo yum update" to apply all updates.
ec2-user@ip-172-31-92-148 ~]$ sbatch -h

Running the cluster


(awscli) ๐Ÿ™ Ravin$ pcluster status mycluster

Status: CREATE_COMPLETE
MasterServer: RUNNING
MasterPublicIP: *******
ClusterUser: ec2-user
MasterPrivateIP: ***********

(awscli) ๐Ÿ™ Ravin$ pcluster stop mycluster
Stopping compute fleet : mycluster


## Delete cluster
(awscli) ๐Ÿ™ Ravin$ pcluster delete mycluster
Deleting: mycluster
Status: DynamoDBTable - DELETE_COMPLETE
Cluster deleted successfully.

Create a conda env in aws

Copy link address from Anaconda installer archive. https://repo.anaconda.com/archive/index.html

The one that I select https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh


[ec2-user@ip-172-31-86-199 ~]$ wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh


[ec2-user@ip-172-31-86-199 ~]$ bash Anaconda3-2020.02-Linux-x86_64.sh

Anaconda3 will now be installed into this location:
/home/ec2-user/anaconda3
.
.
.
.
.
Preparing transaction: done
Executing transaction: done
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no] **Yes**


(base) [ec2-user@ip-172-31-86-199 ~]$ source .bashrc


############# test if conda works. ########

(base) [ec2-user@ip-172-31-86-199 ~]$ conda -h

What's in test.sh?


########### test.sh #########

#!/bin/bash

#SBATCH --job-name=test
#SBATCH --output=output.log
#SBATCH --ntasks=1
#SBATCH --time=00:01:00

source activate testslum
conda list

pwd; hostname; date

echo "Running plot script on a single CPU core"
python /home/ec2-user/test.py

What's in test.py?


########### test.py #######################

#!/usr/bin/env python

import sys
print(sys.version)
print("I am testing slurm in aws cluster")

Create a conda env with required packages the run the code in slurm


(base) [ec2-user@ip-172-31-86-199 ~]$ conda create -n testslum python=3.6

(testslum) [ec2-user@ip-172-31-86-199 ~]$conda install -c conda-forge r-sys

(testslum) [ec2-user@ip-172-31-86-199 ~]$ sbatch test.sh

######### ENDING cluster ###########
awscli) ๐Ÿ™ Ravin$ pcluster list
(awscli) ๐Ÿ™ Ravin$ pcluster stop mycluster
(awscli) ๐Ÿ™ Ravin$ pcluster delete mycluster

Deleting: mycluster
Status: SQS - DELETE_COMPLETE
Cluster deleted successfully.


(awscli) ๐Ÿ™ Ravin$

(awscli) ๐Ÿ™ Ravin$ pcluster list