R Code Snippets

Download Code

Data Understanding in R


Book reference: Chapter 4.3.1, page 40

Histograms are generated by the function hist. The simplest way to create a histogram is to just use the corresponding attribute as an argument of the function hist, and R will automatically determine the number of bins for the histogram based on Sturge’s rule. In order to generate the histogram for the petal length of the Iris data set, the following command is sufficient:

Histogram of iris$Petal.Length

The partition into bins can also be specified directly. One of the parameters of hist is breaks. If the bins should cover the intervals \([a_{0},a_{1}),[a_{1},a_{2}),...,[a_{k−1},a_{k}]\) , then one can simply create a vector in R containing the values \(a_{i}\) and assign it to breaks. Note that \(a_{0}\) and \(a_{k}\) should be the minimum and maximum values of the corresponding attribute. If we want the boundaries for the bins at 1.0, 3.0, 4.5, 4.0, 6.1, then we would use

Histogram of iris$Petal.Length

to generate the histogram. Note that in the case of bins with different length, the heights of the boxes in the histogram do not show the relative frequencies. The areas of the boxes are chosen in such a way that they are proportional to the relative frequencies.


Book reference: Chapter 4.3.1, page 43

A boxplot for a single attribute is generated by

Boxplot of iris$Petal.Length

yielding the boxplot for the petal length of the Iris data set. Instead of a single attribute, we can hand over more than one attribute

Boxplot of iris$Petal.Length

to show the boxplots in the same plot. We can even use the whole data set as an argument to see the boxplots of all attributes in one plot:

Boxplot of iris

In this case, categorical attributes will be turned into numerical attributes by coding the values of the categorical attribute as 1, 2, . . . , so that these boxplots are also shown but do not really make sense.

In order to include the notches in the boxplots, we need to set the parameter notch to true:


If one is interested in the precise values of the boxplot like the median, etc., one can use the print-command:

[1,]  2.2
[2,]  2.8
[3,]  3.0
[4,]  3.3
[5,]  4.0

[1] 150

[1,] 2.935497
[2,] 3.064503

[1] 4.4 4.1 4.2 2.0

[1] 1 1 1 1

[1] "1"

The first five values are the minimum, the first quartile, the median, the third quartile, and the maximum value of the attribute, respectively. \(n\) is the number of data. Then come the boundaries for the confidence interval for the notch, followed by the list of outliers. The last values group and names only make sense when more than one boxplot is included in the same plot. Then group is needed to identify to which attribute the outliers in the list of outliers belong. names just lists the names of the attributes.

Scatter Plots

Book reference: Chapter 4.3.1, page 45

A scatter plot of the petal width against petal length of the Iris data is obtained by


All scatter plots of each attribute against each other in one diagram are created with


If symbols representing the values for some categorical attribute should be included in a scatter plot, this can be achieved by


where in this example the three types of Iris are plotted with different symbols.

If there are some interesting or suspicious points in a scatter plot and one wants to find out which data records these are, one can do this by


and then clicking on the points. The index of the corresponding records will be added to the scatter plot. To finish selecting points, press the ESCAPE-key.

Jitter can be added to a scatter plot in the following way:


Intensity plots and density plots with hexagonal binning, as they are shown Fig. 4.9, can be generated by




respectively, where the library hexbin does not come along with the standard version of R and needs to be installed as described in the appendix on R. Note that such plots are not very useful for such a small data sets like the Iris data set.

For three-dimensional scatter plots, the library scatterplots3d is needed and has to be installed first:


Principal Component Analysis

Book reference: Chapter, page 48

PCA can be carried out with R in the following way:

species <- which(colnames(iris)=="Species")
iris.pca <- prcomp(iris[,-species],center=T,scale=T)
Standard deviations (1, .., p=4):
[1] 1.7083611 0.9560494 0.3830886 0.1439265

Rotation (n x k) = (4 x 4):
                    PC1         PC2        PC3        PC4
Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971
Importance of components:
                          PC1    PC2     PC3     PC4
Standard deviation     1.7084 0.9560 0.38309 0.14393
Proportion of Variance 0.7296 0.2285 0.03669 0.00518
Cumulative Proportion  0.7296 0.9581 0.99482 1.00000

For the Iris data set, it is necessary to exclude the categorical attribute Species from PCA. This is achieved by the first line of the code and calling prcomp with iris[,-species] instead of iris.

The parameter settings center=T, scale=T, where T is just a short form of TRUE, mean that z-score standardization is carried out for each attribute before applying PCA.

The function predict can be applied in the above-described way to obtain the transformed data from which the PCA was computed. If the computed PCA transformation should be applied to another data set x, this can be achieved by


where x must have the same number of columns as the data set from which the PCA has been computed. In this case, x must have four columns which must be numerical. predict will compute the full transformation, so that the above command will also yield transformed data with four columns.

Multidimensional Scaling

Book reference: Chapter, page 53

MDS requires the library MASS which is not included in the standard version of R and needs installing. First, a distance matrix is needed for MDS. Identical objects leading to zero distances are not admitted. Therefore, if there are identical objects in a data set, all copies of the same object except one must be removed. In the Iris data set, there is only one pair of identical objects, so that one of them needs to be removed. The Species is not a numerical attribute and will be ignored for the distance.

x <- iris[-102,]
species <- which(colnames(x)=="Species")
x.dist <- dist(x[,-species])
x.sammon <- sammon(x.dist,k=2)
Initial stress        : 0.00678
stress after  10 iters: 0.00404, magic = 0.500
stress after  12 iters: 0.00402

k = 2 means that MDS should reduce the original data set to two dimensions.

Note that in the above example code no normalization or z-score standardization is carried out.

Parallel Coordinates, Radar, and Star Plots

Book reference: Chapter, page 59

Parallel coordinates need the library MASS. All attributes must be numerical. If the attribute Species should be included in the parallel coordinates, one can achieve this in the following way:

x <- iris
x$Species <- as.numeric(iris$Species)

Star and radar plots are obtained by the following two commands:


Correlation Coefficients

Book reference: Chapter 4.4, page 62

Pearson’s, Spearman’s, and Kendall’s correlation coefficients are obtained by the following three commands:


Grubbs Test for Outlier Detection

Book reference: Chapter 4.5, page 65

Grubbs test for outlier detection needs the installation of the library outliers:

	Grubbs test for one outlier

data:  iris$Petal.Width
G = 1.70638, U = 0.98033, p-value = 1
alternative hypothesis: highest value 2.5 is an outlier
Download Code

Errors and Validation in R

Book reference: Chapter 5.5.1, page 112

Training and Testing Data

In order to apply the idea of using separate parts of a data set for training and testing, one needs to select random subsets of the data set. As a very simple example, we consider the Iris data set that we want to split into training and test sets. The size of the training set should contain 2/3 of the original data, and the test set 1/3. It would not be a good idea to take the first 100 records in the Iris data set for training purposes and the remaining 50 as a test set, since the records in the Iris data set are ordered with respect to the species.With such a split, all examples of Iris setosa and Iris versicolor would end up in the training set, but none of Iris versicolor, which would form the test set. Therefore, we need random sample from the Iris data set. If the records in the Iris data set were not systematically orderer, but in a random order, we could just take the first 100 records for training purposes and the remaining 50 as a test set.

Sampling and orderings in R provide a simple way to shuffle a data set, i.e., to generate a random order of the records.

First, we need to know the number n of records in our data set. Then we generate a permutation of the numbers 1, . . . , n by sampling from the vector containing the numbers 1, . . . , n, generated by the R-command c(1:n). We sample n numbers without replacement from this vector:

n <- length(iris$Species)
permut <- sample(c(1:n),n,replace=F)

Then we define this permutation as an ordering in which the records of our data set should be ordered and store the shuffled data set in the object iris.shuffled:

ord <- order(permut)
iris.shuffled <- iris[ord,]

Now define how large the fraction for the training set should be—here 2/3—and take the first two thirds of the data set as a training set and the last third as a test set:

prop.train <- 2/3  # training data consists of 2/3 of observations
k <- round(prop.train*n)
iris.training <- iris.shuffled[1:k,]
iris.test <- iris.shuffled[(k+1):n,]

The R-command sample can also be used to generate bootstrap samples by setting the parameter replace to TRUE instead of F (FALSE).

Download Code

Data Preparation in R

Missing Values

Book reference: Chapter 6.2.2, page 137

The logical constant NA (not available) is used to represent missing values in R. There are various methods in R that can handle missing values directly.

As a very simple example, we consider the mean value.We create a data set with one attribute with four missing values and try to compute the mean:

x <- c(3,2,NA,4,NA,1,NA,NA,5)

The mean value is in this case also a missing value, since R has no information about the missing values and how to handle them. But if we explicitly say that missing values should simply be ignored for the computation of the mean value (na.rm=T), then R returns the mean value of all nonmissing values:


Note that this computation of the mean value implicitly assumes that the values are missing completely at random (MCAR).

Normalization and Scaling

Book reference: Chapter 6.6.3, page 153

Normalization and standardization of numeric attributes can be achieved in the following way. The function is.factor returns true if the corresponding attribute is categorical (or ordinal), so that we can ensure with this function that normalization is only applied to all numerical attributes, but not to the categorical ones. With the following R-code, z-score standardization is applied to all numerical attributes:

iris.norm <- iris

# for loop over each coloumn
for (i in c(1:length(iris.norm))){
    if (!is.factor(iris.norm[,i])){
        attr.mean <- mean(iris.norm[,i])
        attr.sd <- sd(iris.norm[,i])
        iris.norm[,i] <- (iris.norm[,i]-attr.mean)/attr.sd

Other normalization and standardization techniques can carried out in a similar manner. Of course, instead of the functions mean (for the mean value) and sd (for the standard deviation), other functions like min (for the minimum), max (for the maximum), median (for the median), or IQR (for the interquartile range) have to be used.

Download Code

Finding Patterns in R

Hierarchical Clustering

Book reference: Chapter 7.1, page 159

As an example, we apply hierarchical clustering to the Iris data set, ignoring the categorical attribute Species. We use the normalized Iris data set iris.norm that is constructed in Sect. We can apply hierarchical clustering after removing the categorical attribute and can plot the dendrogram afterwards:


# using the iris data as an example
iris.norm <- iris

# for loop over each coloumn
for (i in c(1:length(iris.norm))){
	if (!is.factor(iris.norm[,i])){
		attr.mean <- mean(iris.norm[,i])
		attr.sd <- sd(iris.norm[,i])
		iris.norm[,i] <- (iris.norm[,i]-attr.mean)/attr.sd
iris.num <- iris.norm[1:4]
iris.cl <- hclust(dist(iris.num), method="ward.D")
Cluster Dendrogram

Here, the Ward method for the cluster distance aggregation function as described in Table 7.3 was chosen. For the other cluster distance aggregation functions in the table, one simply has to replace ward.D by single (for single linkage), by complete (for complete linkage), by average (for average linkage), or by centroid.

For heatmaps, the library gplots is required that needs installing first:

rowv <- as.dendrogram(hclust(dist(iris.num), method="ward.D"))
colv <- as.dendrogram(hclust(dist(t(iris.num)), method="ward.D"))
heatmap.2(as.matrix(iris.num), Rowv=rowv,Colv=colv, trace="none")
Cluster Dendrogram

Prototype-Based Clustering

Book reference: Chapter 7.3, page 173

The R-function kmeans carries out k-means clustering.

iris.km <- kmeans(iris.num,centers=3)

The desired numbers of clusters is specified by the parameter centers. The location of the cluster centers and the assignment of the data to the clusters is obtained by the print function:

K-means clustering with 3 clusters of sizes 50, 53, 47

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1  -1.01119138  0.85041372   -1.3006301  -1.2507035
2  -0.05005221 -0.88042696    0.3465767   0.2805873
3   1.13217737  0.08812645    0.9928284   1.0141287

Clustering vector:
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 2 2 2 3 2 2 2 2 2 2 2 2 3 2 2 2 2 3 2 2 2
 [75] 2 3 3 3 2 2 2 2 2 2 2 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 3 3 3 2 3 3 3 3
[112] 3 3 2 2 3 3 3 3 2 3 2 3 2 3 3 2 3 3 3 3 3 3 2 2 3 3 3 2 3 3 3 2 3 3 3 2 3
[149] 3 2

Within cluster sum of squares by cluster:
[1] 47.35062 44.08754 47.45019
 (between_SS / total_SS =  76.7 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      

For fuzzy c-means clustering, the library cluster is required. The clustering is carried out by the method fanny similar to kmeans:

iris.fcm <- fanny(iris.num,3)

The last line provides the necessary information on the clustering results, especially the membership degrees to the clusters.

Gaussian mixture decomposition automatically determining the number of clusters requires the library mclust to be installed first:

iris.em <- mclustBIC(iris.num[,1:4])
iris.mod <- mclustModel(iris.num[,1:4],iris.em)

The last line lists the assignment of the data to the clusters.

Density-based clustering with DBSCAN is implemented in the library fpc which needs installation first:

iris.dbscan <- dbscan(iris.num[,1:4],1.0,showplot=T)

The last line will print out the assignment of the data to the clusters. Singletons or outliers are marked by the number zero. The second argument in dbscan (in the above example 1:0) is the parameter " for DBSCAN. showplot=T will generate a plot of the clustering result projected to the first two dimensions of the data set.

Self Organizing Maps

Book reference: Chapter 7.5, page 187

The library som provides methods for self organizing maps. The library som needs to be installed:

iris.som <- som(iris.num,xdim=5,ydim=5)
Self-organizing Map

xdim and ydim define the number of nodes in the mesh in x- and y-directions, respectively. plot will show, for each node in the mesh, a representation of the values in the form of parallel coordinates.

Association Rules

Book reference: Chapter 7.9, page 189

For association rule mining, the library arules is required in which the function apriori is defined. This library does not come along with R directly and needs to be installed first.

Here we use an artificial data set basket that we enter manually. The data set is a list of vectors where each vector contains the items that were bought:

baskets <- list(c("a","b","c"), c("a","d","e"),
                c("b","c","d"), c("a","b","c","d"),
                c("b","c"), c("a","b","d"),
                c("d","e"), c("a","b","c","d"),
                c("c","d","e"), c("a","b","c"))
rules <- apriori(baskets,parameter = list(supp=0.1,conf=0.8,target="rules"))
Loading required package: Matrix

Attaching package: ‘arules’

The following objects are masked from ‘package:base’:

    abbreviate, write

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5 item(s), 10 transaction(s)] done [0.00s].
sorting and recoding items ... [5 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [9 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
    lhs        rhs support confidence coverage lift     count
[1] {e}     => {d} 0.3     1.0000000  0.3      1.428571 3
[2] {a}     => {b} 0.5     0.8333333  0.6      1.190476 5
[3] {b}     => {c} 0.6     0.8571429  0.7      1.224490 6
[4] {c}     => {b} 0.6     0.8571429  0.7      1.224490 6
[5] {a,e}   => {d} 0.1     1.0000000  0.1      1.428571 1
[6] {c,e}   => {d} 0.1     1.0000000  0.1      1.428571 1
[7] {a,b}   => {c} 0.4     0.8000000  0.5      1.142857 4
[8] {a,c}   => {b} 0.4     1.0000000  0.4      1.428571 4
[9] {a,c,d} => {b} 0.2     1.0000000  0.2      1.428571 2    

The last command lists the rules with their support, confidence, and lift.

Download Code

Finding Explanations

Decision tree

Book reference: Chapter 8.1, page 220

R also allows for much finer control of the decision tree construction. The script below demonstrates how to create a simple tree for the Iris data set using a training set of 100 records. Then the tree is displayed, and a confusion matrix for the test set—the remaining 50 records of the Iris data set—is printed. The libraries rpart, which comes along with the standard installation of R, and rattle, that needs to be installed, are required:

iris.train <- c(sample(1:150,50))
iris.dtree <- rpart(Species~.,data=iris,subset=iris.train)
Loading required package: tibble

Loading required package: bitops

Rattle: A free graphical interface for data science with R.
Version 5.4.0 Copyright (c) 2006-2020 Togaware Pty Ltd.
Type 'rattle()' to shake, rattle, and roll your data.
             setosa versicolor virginica
  setosa         35          0         0
  versicolor      0         27         1
  virginica       0          6        31
Decision tree

In addition to many options related to tree construction, R also offers many ways to beautify the graphical representation. We refer to R manuals for more details.

Naive Bayes Classifiers

Book reference: Chapter 8.2, page 230

Naive Bayes classifiers use normal distributions by default for numerical attributes. The package e1071 must be installed first:

iris.train <- c(sample(1:150,75))
iris.nbayes <- naiveBayes(Species~.,data=iris,subset=iris.train)
             setosa versicolor virginica
  setosa         28          0         0
  versicolor      0         22         1
  virginica       0          2        22

As in the example of the decision tree, the Iris data set is split into a training and a test data set, and the confusion matrix is printed. The parameters for the normal distributions of the classes can be obtained in the following way:

Naive Bayes Classifier for Discrete Predictors

naiveBayes.default(x = X, y = Y, laplace = laplace)

A-priori probabilities:
    setosa versicolor  virginica
 0.2933333  0.3466667  0.3600000

Conditional probabilities:
Y                [,1]      [,2]
  setosa     4.972727 0.3781935
  versicolor 6.034615 0.5059188
  virginica  6.555556 0.5401804

Y                [,1]      [,2]
  setosa     3.422727 0.4264268
  versicolor 2.807692 0.2682135
  virginica  2.937037 0.3236191

Y                [,1]      [,2]
  setosa     1.413636 0.1457181
  versicolor 4.315385 0.4333057
  virginica  5.514815 0.4888792

Y                 [,1]       [,2]
  setosa     0.2409091 0.09081164
  versicolor 1.3307692 0.18279875
  virginica  2.0185185 0.30386869


If Laplace correction should be applied for categorical attribute, this can be achieved by setting the parameter laplace to the desired value when calling the function naiveBayes.


Book reference: Chapter 8.3, page 241

Least squares linear regression is implemented by the function lm (linear model). As an example, we construct a linear regression function to predict the petal width of the Iris data set based on the other numerical attributes:

iris.lm <- lm(iris$Petal.Width ~ iris$Sepal.Length
              + iris$Sepal.Width + iris$Petal.Length)
lm(formula = iris$Petal.Width ~ iris$Sepal.Length + iris$Sepal.Width +

     Min       1Q   Median       3Q      Max
-0.60959 -0.10134 -0.01089  0.09825  0.60685

                  Estimate Std. Error t value Pr(>|t|)
(Intercept)       -0.24031    0.17837  -1.347     0.18
iris$Sepal.Length -0.20727    0.04751  -4.363 2.41e-05 ***
iris$Sepal.Width   0.22283    0.04894   4.553 1.10e-05 ***
iris$Petal.Length  0.52408    0.02449  21.399  < 2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.192 on 146 degrees of freedom
Multiple R-squared:  0.9379,	Adjusted R-squared:  0.9366
F-statistic: 734.4 on 3 and 146 DF,  p-value: < 2.2e-16

The summary provides the necessary information about the regression result, including the coefficient of the regression function.

If we want to use a polynomial as the regression function, we need to protect the evaluation of the corresponding power by the function I inhibiting interpretation. As an example, we compute a regression function to predict the petal width based on a quadratic function in the petal length:

iris.lm <- lm(iris$Petal.Width ~ iris$Petal.Length +
lm(formula = iris$Petal.Width ~ iris$Petal.Length + I(iris$Petal.Length^2))

     Min       1Q   Median       3Q      Max
-0.56213 -0.12392 -0.01555  0.13547  0.64105

                        Estimate Std. Error t value Pr(>|t|)
(Intercept)            -0.386781   0.079883  -4.842 3.22e-06 ***
iris$Petal.Length       0.433833   0.053652   8.086 2.10e-13 ***
I(iris$Petal.Length^2) -0.002569   0.007501  -0.342    0.732
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2071 on 147 degrees of freedom
Multiple R-squared:  0.9272,	Adjusted R-squared:  0.9262
F-statistic: 935.7 on 2 and 147 DF,  p-value: < 2.2e-16

Robust regression requires the library MASS, which needs installation. Otherwise it is handled in the same way as least squares regression, using the function rlm instead of lm:

iris.rlm <- rlm(iris$Petal.Width ~ iris$Sepal.Length
                + iris$Sepal.Width + iris$Petal.Length)
Call: rlm(formula = iris$Petal.Width ~ iris$Sepal.Length + iris$Sepal.Width +
      Min        1Q    Median        3Q       Max
-0.606532 -0.099003 -0.009158  0.101636  0.609940

                  Value   Std. Error t value
(Intercept)       -0.2375  0.1771    -1.3412
iris$Sepal.Length -0.2062  0.0472    -4.3713
iris$Sepal.Width   0.2201  0.0486     4.5297
iris$Petal.Length  0.5231  0.0243    21.5126

Residual standard error: 0.1492 on 146 degrees of freedom

The default method is based on Huber’s error function. If Tukey’s biweight should be used, the parameter method should be changed in the following way:

# ridge regression with Tukey's biweight
iris.rlm <- rlm(iris$Petal.Width ~ iris$Sepal.Length
                + iris$Sepal.Width + iris$Petal.Length,
Call: rlm(formula = iris$Petal.Width ~ iris$Sepal.Length + iris$Sepal.Width +
    iris$Petal.Length, method = "MM")
     Min       1Q   Median       3Q      Max
-0.60608 -0.09975 -0.01030  0.10375  0.61963

                  Value   Std. Error t value
(Intercept)       -0.1896  0.1718    -1.1036
iris$Sepal.Length -0.2152  0.0457    -4.7036
iris$Sepal.Width   0.2181  0.0471     4.6276
iris$Petal.Length  0.5252  0.0236    22.2689

Residual standard error: 0.1672 on 146 degrees of freedom

A plot of the computed weights can be obtained by the following command:

Download Code


Nearest Neighbor Classifiers

Book reference: Chapter 9.1, page 275

A nearest-neighbor classifier based on the Euclidean distance is implemented in the package class in R. To show how to use the nearest-neighbor classifier in R, we use splitting of the Iris data set into a training set iris.training and test set iris.test as it was demonstrated in Sect. 5.6.2. The function knn requires a training and a set with only numerical attributes and a vector containing the classifications for the training set. The parameter k determines how many nearest neighbors are considered for the classification decision.

# generating indices for shuffling
n <- length(iris$Species)
permut <- sample(c(1:n),n,replace=F)

# shuffling the observations
ord <- order(permut)
iris.shuffled <- iris[ord,]

# splitting into training and testing data
prop.train <- 2/3  # training data consists of 2/3 of observations
k <- round(prop.train*n)
iris.training <- iris.shuffled[1:k,]
iris.test <- iris.shuffled[(k+1):n,]
iris.knn <- knn(iris.training[,1:4],iris.test[,1:4],iris.training[,5],k=3)
iris.knn     setosa versicolor virginica
  setosa         18          0         0
  versicolor      0         18         0
  virginica       0          1        13

The last line prints the confusion matrix.

Neural Networks

Book reference: Chapter 9.2, page 282

For the example of multilayer perceptrons in R, we use the same training and test data as for the nearest-neighbor classifier above. The multilayer perceptron can only process numerical values. Therefore, we first have to transform the categorical attribute Species into a numerical attribute:

x <- iris.training
x$Species <- as.numeric(x$Species)

The multilayer perceptron is constructed and trained in the following way, where the library neuralnet needs to be installed first:

iris.nn <- neuralnet(Species + Sepal.Length ~
                     Sepal.Width + Petal.Length + Petal.Width, x,

The first argument of neuralnet defines that the attributes Species and sepal length correspond to the output neurons. The other three attributes correspond to the input neurons. x specifies the training data set. The parameter hidden defines how many hidden layers the multilayer perceptron should have and how many neurons in each hidden layer should be. In the above example, there is only one hidden layer with three neurons. When we replace c(3) by c(4,2), there would be two hidden layers, one with four and one with two neurons.

The training of the multilayer perceptron can take some time, especially for larger data sets.

When the training is finished, the multilayer perceptron can be visualized:


The visualization includes also dummy neurons as shown in Fig. 9.4.

The output of the multilayer perceptron for the test set can be calculated in the following way. Note that we first have to remove the output attributes from the test set:

y <- iris.test
y <- y[-5]
y <- y[-1]
y.out <- compute(iris.nn,y)

We can then compare the target outputs for the training set with the outputs from the multilayer perceptron. If we want to compute the squared errors for the second output neuron -— the sepal length —- we can do this in the following way:

y.sqerr <- (y[1] - y.out$net.result[,2])^2

Support Vector Machines

Book reference: Chapter 9.4, page 297

For support vector machine, we use the same training and test data as already for the nearest-neighbor classifier and for the neural networks. A support vector machine to predict the species in the Iris data set based on the other attributes can be constructed in the following way. The package e1071 is needed and should be installed first if it has not been installed before:

iris.svm <- svm(Species ~ ., data = iris.training)
             setosa versicolor virginica
  setosa         18          0         0
  versicolor      0         19         1
  virginica       0          0        12

The last line prints the confusion matrix for the test data set.

The function svm works also for support vector regression. We could, for instance, use

iris.svm <- svm(Petal.Width ~ ., data = iris.training)
sqerr <- (predict(iris.svm,iris.test[-4])-iris.test[4])^2

to predict the numerical attribute petal width based on the other attributes and to compute the squared errors for the test set.

Ensemble Methods

Book reference: Chapter 9.5, page 304

As an example for ensemble methods, we consider random forest with the training and test data of the Iris data set as before. The package randomForest needs to be installed first:

iris.rf <- randomForest(Species ~., iris.training)
             setosa versicolor virginica
  setosa         18          0         0
  versicolor      0         19         1
  virginica       0          0        12

In this way, a random forest is constructed to predict the species in the Iris data set based on the other attributes. The last line of the code prints the confusion matrix for the test data set.