Hey guys, what’s up? In this R programming tutorial I’ll teach you about data frame in R; how to create them, access their elements & how to modify them in our program. So, let’s start—
What is Data Frame in R?
In R programming language, Data frame is a two dimensional data structure. It is a special case of a list which has same length of observations as rows or measurements as columns. The vectors we pass should also have the same length. But don’t forget that it is possible that they might contain different types of data.
#Author Programiz int_vec <- c(1,2,3) char_vec <- c("a", "b", "c") bool_vec <- c(TRUE, TRUE, FALSE) data_frame <- data.frame(int_vec, char_vec, bool_vec)
Characteristics of R Data Frame
Let’s discuss about characteristics of data frame in R programming language.
- Column names have to be non-empty.
- Row names have to be unique.
- Data frame can hold a numeric, character or of factor type data.
- same number of data items have to contain in each column
To check a variable is whether a frame or not, we need to use class() function.
> x SN Age Name 1 1 21 John 2 2 15 Dora > typeof(x) # data frame is a special case of list [1] "list" > class(x) [1] "data.frame"
In this above example, x has a list of three components. Each component has 2 element vectors. There are some useful data.frame() r functions you have to know about it. I am going to show it below
> names(x) [1] "SN" "Age" "Name" > ncol(x) [1] 3 > nrow(x) [1] 2 > length(x) # returns length of the list, same as ncol() [1] 3
> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora")) > str(x) # structure of x 'data.frame': 2 obs. of 3 variables: $ SN : int 1 2 $ Age : num 21 15 $ Name: Factor w/ 2 levels "Dora","John": 2 1
Here, 3rd columns Name is of type factor, instead of a character vector. data.frame() r function converts character vector into factor, by default. We can pass argument stringsAsFactors=FALSE, to suppress this behavior,
> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE) > str(x) # now the third column is a character vector 'data.frame': 2 obs. of 3 variables: $ SN : int 1 2 $ Age : num 21 15 $ Name: chr "John" "Dora"
In R programming language has many data input function like, read.table(), read.csv(), read.delim(), read.fwf(). Those functions are also used to read data into a data frame.
If we want to access data frame columns, we can use either [, [[ or $ operator. [[ or $ is similar for accessing in data frame. But indexing with [ will return us a data frame, where [[ and $ will reduce it into a vector.
> x["Name"] Name 1 John 2 Dora > x$Name [1] "John" "Dora" > x[["Name"]] [1] "John" "Dora" > x[[3]] [1] "John" "Dora"
Providing index for row and column, we can access Data frames like a matrix. To check available Datasets we can use command library(help = “datasets”). Below, we use the trees dataset of Black Cherry Trees which contains Girth, Height & Volume. We can use str() and head()function to examine A data frame.
> str(trees) 'data.frame': 31 obs. of 3 variables: $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ... $ Height: num 70 65 63 72 81 83 66 75 80 75 ... $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ... > head(trees,n=3) Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2
> trees[2:3,] # select 2nd and 3rd row Girth Height Volume 2 8.6 65 10.3 3 8.8 63 10.2 > trees[trees$Height > 82,] # selects rows with Height greater than 82 Girth Height Volume 6 10.8 83 19.7 17 12.9 85 33.8 18 13.3 86 27.4 31 20.6 87 77.0 > trees[10:12,2] [1] 75 79 76
Since we extracted data from a single column, above we can see in the last case returned type is a vector. By passing the argument drop=FALSE, this behavior can be avoided like as follows.
> trees[10:12,2, drop = FALSE] Height 10 75 11 79 12 76
> x SN Age Name 1 1 21 John 2 2 15 Dora > x[1,"Age"] <- 20; x SN Age Name 1 1 20 John 2 2 15 Dora
> rbind(x,list(1,16,"Paul")) SN Age Name 1 1 20 John 2 2 15 Dora 3 1 16 Paul
> cbind(x,State=c("NY","FL")) SN Age Name State 1 1 20 John NY 2 2 15 Dora FL
> x SN Age Name 1 1 20 John 2 2 15 Dora > x$State <- c("NY","FL"); x SN Age Name State 1 1 20 John NY 2 2 15 Dora FL
> x$State <- NULL > x SN Age Name 1 1 20 John 2 2 15 Dora
> x <- x[-1,] > x SN Age Name 2 2 15 Dora
Slice Data Frame
Assume, our data frame is looks like this:
ID items store price 1 10 book TRUE 2.5 2 20 pen FALSE 8.0 3 30 textbook TRUE 10.0 4 40 pencil_case FALSE 7.0
We can slice data frame values. We can slice the data frame either by specifying the rows and/or columns. To slice data frame, our command is df[A, B], where Arepresent rows & B represents columns.
## Select row 1 in column 2 df[1,2] Output: -------------- [1] book ## Levels: book pen pencil_case textbook ## Select Rows 1 to 2 df[1:2,] Output: ------------- ID items store price 1 10 book TRUE 2.5 2 20 pen FALSE 8.0 ## Select Columns 1 df[,1] Output: ------------- [1] 10 20 30 40 ## Select Rows 1 to 3 and columns 3 to 4 df[1:3, 3:4] Output: ------------- store price 1 TRUE 2.5 2 FALSE 8.0 3 TRUE 10.0
We can select columns with their names. Below code extracts two columns: ID & store.
# Slice with columns name df[, c('ID', 'store')]
After executing this, our output looks like this:
ID store 1 10 TRUE 2 20 FALSE 3 30 TRUE 4 40 FALSE
Today we have learned many topics about data frame in r programming language. I hope you guys understood everything clearly. So, guys, that’s all about for today. Later we will discuss another topic of r programming language. Till then, take care. Happy Coding