Learning objectives

  1. Understanding Basic R Operations:
    • Learn to perform basic arithmetic operations such as addition, subtraction, multiplication, division, and exponentiation in R.
    • Understand how to assign values to variables.
    • Familiarize with different methods of subsetting vectors, including indexing, logical conditions, and functions like subset().
  2. Working with Basic Data Structures in R:
    • Gain knowledge about fundamental data structures in R such as vectors, matrices, arrays, lists, and data frames.
    • Understand how to create and manipulate these data structures.
    • Learn about subsetting data frames based on specific conditions using logical operators (&, |, !).
  3. Logical Operators in Data Subsetting:
    • Understand the usage of logical operators (& for “AND”, | for “OR”, ! for “NOT”) in subsetting data frames.
    • Learn to combine logical operators to create complex conditions for data subsetting.
  4. Practical Application:
    • Apply the learned concepts in practical scenarios such as subsetting data frames based on multiple conditions.
    • Execute R code snippets provided in the document to practice and reinforce understanding


Getting started

Open R program on your computer and look into your command window. It displays a “>” symbol, referred to as the “prompt”. The prompt indicates that R is idle and waits for user input, or commands to execute. In general, each command in R generates output (some commands choose to explicitly suppress their output, as we will see later). If you type a command at the prompt and press enter, R will immediately proceed to executing the command. If no errors occurred and execution finished successfully, the output of your command will be printed into next line(s) in the command window, right after the input (command) itself.

Note also that symbol “#” indicates start of a comment in R: everything starting from that symbol and through the end of the line is completely ignored by R.

We will use this feature often in order to add inline comments to the code snippets – whether you copy just the command itself from these notes, or the full line (command + comment) into your R session, both will execute and produce exactly the same result (try it, and make sure you do not copy the prompt itself!) In other words, when the code snippet looks like:

2 + 2 # comment, ignored by R; command in this line adds 2 and 2
## [1] 4

In R, you can perform a wide range of operations for data manipulation, mathematical computations, statistical analysis, and more. Here’s a brief overview of some of the basic operations you can perform in R:

Arithmetic Operations:

R supports basic arithmetic operations such as addition, subtraction, multiplication, division, and exponentiation.

# Addition
3 + 5
## [1] 8
# Subtraction
10 - 3
## [1] 7
# Multiplication
4 * 6
## [1] 24
# Division
20 / 5
## [1] 4
# Exponentiation
2 ^ 3
## [1] 8

Assignment:

Assign values to variables using <- or =.

x <- 10
y = 5

Subsetting:

Subsetting vectors in R allows you to extract specific elements or subsets of elements from a vector based on certain criteria. There are different ways to subset vectors, including indexing, logical conditions, and using functions like subset(). Here are some examples:

# Create a numeric vector
nums <- c(10, 20, 30, 40, 50)

# Check the number of element in vector 
length(nums)
## [1] 5
# Check the type of element in vector object
mode(nums)
## [1] "numeric"
# Extract the second element of the vector
nums[2]  # Output: 20
## [1] 20
# Extract elements 2 to 4
nums[2:4]  # Output: 20 30 40
## [1] 20 30 40

Indexing

You can subset a vector using square brackets [ ] with numeric indices indicating the positions of the elements you want to extract.

Logical Indexing

You can use logical conditions to subset a vector based on certain criteria.

# Create a logical vector
logic <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

# check the type of element 
mode(logic)
## [1] "logical"
is.logical(logic)
## [1] TRUE
# Subset elements where the corresponding logical value is TRUE
nums[logic]  # Output: 10 30 50
## [1] 10 30 50

Negative Indexing

You can use negative indices to exclude certain elements from the vector.

# Exclude the third element from the vector
nums[-3]  # Output: 10 20 40 50
## [1] 10 20 40 50

Using Functions

You can use functions like subset() to subset vectors based on conditions.

# Create a vector
fruits <- c("apple", "banana", "orange", "grape", "kiwi")

# check the type of element 
mode(fruits)
## [1] "character"
is.character(fruits)
## [1] TRUE
# Subset elements that start with the letter 'a'
subset(fruits, substr(fruits, 1, 1) == "a")  # Output: apple
## [1] "apple"

These are just a few examples of how you can subset vectors in R. Subsetting is a powerful feature that allows you to extract specific elements or subsets of elements from vectors, which is essential for data manipulation and analysis.

Basic data structures in R

In R, there are several basic data structures that are fundamental for storing and manipulating data. These data structures include:

Vectors

Vectors are one-dimensional arrays that can hold elements of the same data type, such as numeric, character, logical, etc.

numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "orange")
logical_vector <- c(TRUE, FALSE, TRUE)

Matrices

Matrices are two-dimensional arrays with rows and columns that contain elements of the same data type.

my_matrix <- matrix(1:9, nrow = 3, ncol = 3)

# check the dimension of matrix object
dim(my_matrix)
## [1] 3 3
# check the elements in matrix object
str(my_matrix)
##  int [1:3, 1:3] 1 2 3 4 5 6 7 8 9

Since my_matrix has 3 rows and 3 columns, dim(my_matrix) will output [1] 3 3, indicating that the matrix has 3 rows and 3 columns.

Arrays

Arrays are multi-dimensional extensions of matrices that can have more than two dimensions.

my_array <- array(1:12, dim = c(3, 2, 2))

Lists

Lists are collections of elements that can be of different data types, including other lists.

my_list <- list(
   numeric_vector = c(1, 2, 3),
   character_vector = c("apple", "banana", "orange"),
   nested_list = list(inner_numeric_vector = c(4, 5, 6))
)

# check the length of list object
length(my_list)
## [1] 3
# check the type of element in the list
str(my_list)
## List of 3
##  $ numeric_vector  : num [1:3] 1 2 3
##  $ character_vector: chr [1:3] "apple" "banana" "orange"
##  $ nested_list     :List of 1
##   ..$ inner_numeric_vector: num [1:3] 4 5 6
# get the name of object in the list
names(my_list)
## [1] "numeric_vector"   "character_vector" "nested_list"
# use name to select the component from the list object
# by position
my_list[1]
## $numeric_vector
## [1] 1 2 3
# or by name
my_list$numeric_vector
## [1] 1 2 3

Data Frames

Data frames are two-dimensional structures similar to matrices, but columns can contain elements of different data types. They are commonly used for storing structured data.

my_df <- data.frame(
   Name = c("Alice", "Bob", "Charlie"),
   Age = c(25, 30, 35),
   Married = c(TRUE, FALSE, TRUE)
)

# check the dimesion of data frame
dim(my_df)
## [1] 3 3
# inspect the elements
str(my_df)
## 'data.frame':    3 obs. of  3 variables:
##  $ Name   : chr  "Alice" "Bob" "Charlie"
##  $ Age    : num  25 30 35
##  $ Married: logi  TRUE FALSE TRUE

Subsetting data frames

# Subset rows where Age is greater than 30
subset_df <- my_df[my_df$Age > 30, ]

# Subset only the Name and Married columns
subset_df <- my_df[, c("Name", "Married")]

# Subset rows where Age is greater than 25 and Married is TRUE
subset_df <- my_df[my_df$Age > 25 & my_df$Married == TRUE, ]

# Subset rows where Age is less than or equal to 30
subset_df <- subset(my_df, Age <= 30)

Logical operators

In R, you can use the logical operators & for “AND”, | for “OR”, and ! for “NOT” when subsetting dataframes. Here are examples demonstrating their use:

  1. “AND” (&):
# Subset rows where Age is greater than 25 AND Married is TRUE
subset_df <- my_df[my_df$Age > 25 & my_df$Married == TRUE, ]
subset_df
##      Name Age Married
## 3 Charlie  35    TRUE
  1. “OR” (|):
# Subset rows where Age is less than 30 OR Married is TRUE
subset_df <- my_df[my_df$Age < 30 | my_df$Married == TRUE, ]
subset_df
##      Name Age Married
## 1   Alice  25    TRUE
## 3 Charlie  35    TRUE
  1. “NOT” (!):
# Subset rows where Name is NOT "Bob"
subset_df <- my_df[my_df$Name != "Bob", ]
subset_df
##      Name Age Married
## 1   Alice  25    TRUE
## 3 Charlie  35    TRUE

We can combine these logical operators to create more complex conditions when subsetting dataframes in R.For example:

Let’s say we want to subset my_df to include rows where the Age is greater than 25 OR (Married is TRUE AND Name is not “Bob”):

# Subset rows where Age is greater than 25 OR (Married is TRUE AND Name is not "Bob")
subset_df <- my_df[my_df$Age > 25 | (my_df$Married == TRUE & my_df$Name != "Bob"), ]
subset_df
##      Name Age Married
## 1   Alice  25    TRUE
## 2     Bob  30   FALSE
## 3 Charlie  35    TRUE