AWS ParallelCluster- Slurm Workload Manager
๐ Ravin$ conda create -n awscli
๐ Ravin$ conda activate awscli
(awscli) ๐ Ravin$ conda install -c conda-forge awscli
(awscli) ๐ Ravin$ aws configure
AWS Access Key ID : ***************
AWS Secret Access Key :*****************
Default region name [us-east-1]: us-east-1
Default output format [None]:
Some of the features of aws cli
(awscli) ๐ Ravin$ aws iam help
(awscli) ๐ Ravin$ aws iam list-users
(awscli) ๐ Ravin$ aws iam get-user
(awscli) ๐ Ravin$ aws s3 ls
Install aws-parallelcluster
(awscli) ๐ Ravin$ conda install -c conda-forge aws-parallelcluster
More information on https://github.com/aws/aws-parallelcluster
Configure settings for cluster
(awscli) ๐ Ravin$ pcluster configure
AWS Region ID [us-east-1]: us-east-1
EC2 Key Pair Name [testslurm]: 1
Allowed values for Scheduler: 3 # 3. slurm
Allowed values for Operating System: 1 # 1. alinux
Minimum cluster size (instances) [2]: 1 ## can be described here
Maximum cluster size (instances) [10]: 10
Master instance type [t2.micro]: t2.micro
Compute instance type [t2.micro]: t2.micro
Automate VPC creation? (y/n) [n]: n # Enter 'n' if you already have a VPC suitable for the cluster.
VPC ID [vpc-3ac9c740]: 1
Automate Subnet creation? (y/n) [y]: n
Allowed values for Master Subnet ID:
Master Subnet ID [subnet-06d2d9837abd57fd4]: 1
Compute Subnet ID [subnet-0b41b85f10cb527de]: 1
Configuration file written to /Users/ravinpoudel/.parallelcluster/config
You can edit your configuration file or simply run 'pcluster create -c /Users/ravinpoudel/.parallelcluster/config cluster-name' to create your cluster
Options that can be passed to pcluster
create Creates a new cluster.
update Updates a running cluster using the values in the config file.
delete Deletes a cluster.
start Starts the compute fleet for a cluster that has been stopped.
stop Stops the compute fleet, leaving the master server running.
status Pulls the current status of the cluster.
list Displays a list of stacks associated with AWS ParallelCluster.
instances Displays a list of all instances in a cluster.
ssh Connects to the master instance using SSH.
createami (Linux/macOS) Creates a custom AMI to use with AWS ParallelCluster.
configure Start the AWS ParallelCluster configuration.
version Displays the version of AWS ParallelCluster.
dcv The dcv command permits to use NICE DCV related features.
# to view the created cluster
(awscli) ๐ Ravin$ pcluster list
# Create a cluster with defined cluster configuration
(awscli) ๐ Ravin$ pcluster create mycluster
Beginning cluster creation for cluster: mycluster
Creating stack named: parallelcluster-mycluster
Status: ParallelClusterPolicies - CREATE_COMPLETE
Status: MasterServerLaunchTemplate - CREATE_IN_PROGRESS
Status: ComputeFleet - CREATE_IN_PROGRESS
Status: parallelcluster-mycluster - CREATE_COMPLETE
MasterPublicIP: *****************
ClusterUser: ec2-user
MasterPrivateIP:**************
### Check the created cluster
(awscli) ๐ Ravin$ pcluster list
mycluster CREATE_COMPLETE 2.6.1
### to connect/ ssh to cluster
(awscli) ๐ Ravin$ pcluster ssh mycluster -i testslurm.pem
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '******' (ECDSA) to the list of known hosts.
Last login: Wed Apr 29 16:06:58 2020
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/
7 package(s) needed for security, out of 10 available
Run "sudo yum update" to apply all updates.
ec2-user@ip-172-31-92-148 ~]$ sbatch -h
Running the cluster
(awscli) ๐ Ravin$ pcluster status mycluster
Status: CREATE_COMPLETE
MasterServer: RUNNING
MasterPublicIP: *******
ClusterUser: ec2-user
MasterPrivateIP: ***********
(awscli) ๐ Ravin$ pcluster stop mycluster
Stopping compute fleet : mycluster
## Delete cluster
(awscli) ๐ Ravin$ pcluster delete mycluster
Deleting: mycluster
Status: DynamoDBTable - DELETE_COMPLETE
Cluster deleted successfully.
Create a conda env in aws
Copy link address from Anaconda installer archive. https://repo.anaconda.com/archive/index.html
The one that I select https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
[ec2-user@ip-172-31-86-199 ~]$ wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
[ec2-user@ip-172-31-86-199 ~]$ bash Anaconda3-2020.02-Linux-x86_64.sh
Anaconda3 will now be installed into this location:
/home/ec2-user/anaconda3
.
.
.
.
.
Preparing transaction: done
Executing transaction: done
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no] **Yes**
(base) [ec2-user@ip-172-31-86-199 ~]$ source .bashrc
############# test if conda works. ########
(base) [ec2-user@ip-172-31-86-199 ~]$ conda -h
What's in test.sh
?
########### test.sh #########
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=output.log
#SBATCH --ntasks=1
#SBATCH --time=00:01:00
source activate testslum
conda list
pwd; hostname; date
echo "Running plot script on a single CPU core"
python /home/ec2-user/test.py
What's in test.py
?
########### test.py #######################
#!/usr/bin/env python
import sys
print(sys.version)
print("I am testing slurm in aws cluster")
Create a conda env with required packages the run the code in slurm
(base) [ec2-user@ip-172-31-86-199 ~]$ conda create -n testslum python=3.6
(testslum) [ec2-user@ip-172-31-86-199 ~]$conda install -c conda-forge r-sys
(testslum) [ec2-user@ip-172-31-86-199 ~]$ sbatch test.sh
######### ENDING cluster ###########
awscli) ๐ Ravin$ pcluster list
(awscli) ๐ Ravin$ pcluster stop mycluster
(awscli) ๐ Ravin$ pcluster delete mycluster
Deleting: mycluster
Status: SQS - DELETE_COMPLETE
Cluster deleted successfully.
(awscli) ๐ Ravin$
(awscli) ๐ Ravin$ pcluster list