Data Frame in R || data.frame() R programming language

Data Frame in R || data.frame() R programming language

Hey guys, what’s up? In this R programming tutorial I’ll teach you about data frame in R; how to create them, access their elements & how to modify them in our program. So, let’s start—

What is Data Frame in R?

In R programming language, Data frame is a two dimensional data structure. It is a special case of a list which has same length of observations as rows or measurements as columns. The vectors we pass should also have the same length. But don’t forget that it is possible that they might contain different types of data.

In other words, we can say data frame as an array. Unlike an array, the data we store in the columns of the data frame can be of various types. That means 1 column might be a numeric variable, another might be a factor, and a third might be a character variable. The vectors that are contained in the form of a list in a data frame are of equal length.
For example: 
The following has 3 variables char_vec, vec and bool_vec.
#Author Programiz
int_vec <- c(1,2,3) 
char_vec <- c("a", "b", "c")
bool_vec <- c(TRUE, TRUE, FALSE)
data_frame <- data.frame(int_vec, char_vec, 
bool_vec)

Characteristics of R Data Frame

Let’s discuss about characteristics of data frame in R programming language.

  • Column names have to be non-empty.
  • Row names have to be unique.
  • Data frame can hold a numeric, character or of factor type data.
  • same number of data items have to contain in each column
Check if a variable is a data frame or not


To check a variable is whether a frame or not, we need to use class() function.

> x
SN Age Name
1  1  21 John
2  2  15 Dora
> typeof(x)    # data frame is a special case of  list
[1] "list"
> class(x)
[1] "data.frame"

In this above example, x has a list of three components. Each component has 2 element vectors. There are some useful data.frame() r functions you have to know about it. I am going to show it below

Functions of data frame
> names(x)
[1] "SN"   "Age"  "Name"
> ncol(x)
[1] 3
> nrow(x)
[1] 2
> length(x)    # returns length of the list, same as ncol()
[1] 3
How to create a Data Frame in R?


To create a data frame in r programming language, we need to use data.frame() r function. We going to create a data frame, using the above shown data as follows
> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora"))
> str(x)    # structure of x
'data.frame':   2 obs. of  3 variables:
$ SN  : int  1 2
$ Age : num  21 15
$ Name: Factor w/ 2 levels "Dora","John": 2 1

Here, 3rd columns Name is of type factor, instead of a character vector. data.frame()  r function converts character vector into factor, by default. We can pass argument stringsAsFactors=FALSE, to suppress this behavior,

> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE)
> str(x)    # now the third column is a character vector
'data.frame':   2 obs. of  3 variables:
$ SN  : int  1 2
$ Age : num  21 15
$ Name: chr  "John" "Dora"

In R programming language has many data input function like, read.table(), read.csv(), read.delim(), read.fwf(). Those functions are also used to read data into a data frame.

How to access Components of a Data Frame?


We can access in a data frame in r components like a list or like a matrix


Accessing like a list

If we want to access data frame columns, we can use either [, [[ or $ operator. [[ or $ is similar for accessing in data frame. But indexing with [ will return us a data frame, where [[ and $ will reduce it into a vector.

> x["Name"]
Name
1 John
2 Dora
> x$Name
[1] "John" "Dora"
> x[["Name"]]
[1] "John" "Dora"
> x[[3]]
[1] "John" "Dora"

Accessing like a matrix

Providing index for row and column, we can access Data frames like a matrix. To check available Datasets we can use command library(help = “datasets”). Below, we use the trees dataset of Black Cherry Trees which contains Girth, Height & Volume. We can use str() and head()function to examine A data frame.

> str(trees)
'data.frame':   31 obs. of 3 variables:
$ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
$ Height: num  70 65 63 72 81 83 66 75 80 75 ...
$ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
> head(trees,n=3)
    Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2


We can see that trees is a data frame with 31 rows and 3 columns. We also display the first 3 rows of the data frame. Above, we can see trees data frame has 31 rows & 3 columns. And there, we also display 1st 3 rows of the data frame. Now, we are going to access data frame like a matrix.
> trees[2:3,]    # select 2nd and 3rd row
    Girth Height Volume
2   8.6     65   10.3
3   8.8     63   10.2
> trees[trees$Height > 82,]    # selects rows with Height greater than 82
    Girth Height Volume
6   10.8     83   19.7
17  12.9     85   33.8
18  13.3     86   27.4
31  20.6     87   77.0
> trees[10:12,2]
[1] 75 79 76



Since we extracted data from a single column, above we can see in the last case returned type is a vector. By passing the argument drop=FALSE, this behavior can be avoided like as follows. 

> trees[10:12,2, drop = FALSE]
     Height
10     75
11     79
12     76




How to modify a Data Frame in R?


The Data frames in R programming language can be modified like we modified matrices through reassignment.
> x
SN Age Name
1  1  21 John
2  2  15 Dora
> x[1,"Age"] <- 20; x
SN Age Name
1  1  20 John
2  2  15 Dora



Adding Components
We can use, rbind() function to add rows in data frame in r. 
> rbind(x,list(1,16,"Paul"))
SN Age Name
1  1  20 John
2  2  15 Dora
3  1  16 Paul




Similarly, we can use, cbind()function to add rows in data frame. 
> cbind(x,State=c("NY","FL"))
   SN Age Name State
1  1  20 John    NY
2  2  15 Dora    FL




Through simple list-like assignments, we can also add new columns, since data frames are implemented as list. 
> x
SN Age Name
1  1  20 John
2  2  15 Dora
> x$State <- c("NY","FL"); x
   SN Age Name State
1  1  20 John    NY
2  2  15 Dora    FL





Deleting Component
By assigning NULL, we can delate columns from data frame in r programming language. 
> x$State <- NULL
> x
   SN Age Name
1  1  20 John
2  2  15 Dora






Similarly, by reassignments, we can delate rows from data frame in r programming.

> x <- x[-1,]
> x
   SN Age Name
2  2  15 Dora





Slice Data Frame

Assume, our data frame is looks like this:

   ID       items  store  price
 1 10        book  TRUE    2.5
 2 20         pen  FALSE   8.0
 3 30    textbook  TRUE    10.0
 4 40 pencil_case  FALSE   7.0

We can slice data frame values. We can slice the data frame either by specifying the rows and/or columns. To slice data frame, our command is df[A, B], where Arepresent rows & B represents columns.

## Select row 1 in column 2
df[1,2]

Output:
--------------

[1] book
## Levels: book pen pencil_case textbook

## Select Rows 1 to 2
df[1:2,]

Output:
-------------

   ID   items store  price
 1 10   book  TRUE    2.5
 2 20   pen   FALSE   8.0

## Select Columns 1
df[,1]

Output:
-------------

[1] 10 20 30 40

## Select Rows 1 to 3 and columns 3 to 4
df[1:3, 3:4]

Output:
-------------

   store  price
 1  TRUE   2.5
 2 FALSE   8.0
 3  TRUE  10.0

We can select columns with their names. Below code extracts two columns: ID & store.

# Slice with columns name
df[, c('ID', 'store')]

After executing this, our output looks like this:

   ID store
 1 10  TRUE
 2 20 FALSE
 3 30  TRUE
 4 40 FALSE

Today we have learned many topics about data frame in r programming language. I hope you guys understood everything clearly. So, guys, that’s all about for today. Later we will discuss another topic of r programming language. Till then, take care. Happy Coding

Post a Comment