R also allows for much finer control of the decision tree construction. The script
below demonstrates how to create a simple tree for the Iris data set using a training
set of 100 records. Then the tree is displayed, and a confusion matrix for the test
set—the remaining 50 records of the Iris data set—is printed. The libraries rpart
,
which comes along with the standard installation of R, and rattle
, that needs to
be installed, are required:
library(rpart)
iris.train <- c(sample(1:150,50))
iris.dtree <- rpart(Species~.,data=iris,subset=iris.train)
library(rattle)
drawTreeNodes(iris.dtree)
table(predict(iris.dtree,iris[-iris.train,],type="class"),
iris[-iris.train,"Species"])
In addition to many options related to tree construction, R also offers many ways to beautify the graphical representation. We refer to R manuals for more details.
Naive Bayes classifiers use normal distributions by default for numerical attributes.
The package e1071
must be installed first:
library(e1071)
iris.train <- c(sample(1:150,75))
iris.nbayes <- naiveBayes(Species~.,data=iris,subset=iris.train)
table(predict(iris.nbayes,iris[-iris.train,],type="class"),
iris[-iris.train,"Species"])
As in the example of the decision tree, the Iris data set is split into a training and a test data set, and the confusion matrix is printed. The parameters for the normal distributions of the classes can be obtained in the following way:
print(iris.nbayes)
If Laplace correction should be applied for categorical attribute, this can be
achieved by setting the parameter laplace
to the desired value when calling the
function naiveBayes
.
Least squares linear regression is implemented by the function lm
(linear model).
As an example, we construct a linear regression function to predict the petal width
of the Iris data set based on the other numerical attributes:
iris.lm <- lm(iris$Petal.Width ~ iris$Sepal.Length
+ iris$Sepal.Width + iris$Petal.Length)
summary(iris.lm)
The summary
provides the necessary information about the regression result, including
the coefficient of the regression function.
If we want to use a polynomial as the regression function, we need to protect the
evaluation of the corresponding power by the function I
inhibiting interpretation.
As an example, we compute a regression function to predict the petal width based
on a quadratic function in the petal length:
iris.lm <- lm(iris$Petal.Width ~ iris$Petal.Length +
I(iris$Petal.Length^2))
summary(iris.lm)
Robust regression requires the library MASS
, which needs installation. Otherwise
it is handled in the same way as least squares regression, using the function rlm
instead of lm
:
library(MASS)
iris.rlm <- rlm(iris$Petal.Width ~ iris$Sepal.Length
+ iris$Sepal.Width + iris$Petal.Length)
summary(iris.rlm)
The default method is based on Huber’s error function. If Tukey’s biweight should
be used, the parameter method
should be changed in the following way:
# ridge regression with Tukey's biweight
iris.rlm <- rlm(iris$Petal.Width ~ iris$Sepal.Length
+ iris$Sepal.Width + iris$Petal.Length,
method="MM")
summary(iris.rlm)
A plot of the computed weights can be obtained by the following command:
plot(iris.rlm$w)