Implementation of a Decision Tree Algorithm in C

The decision tree algorithm is a classification algorithm that builds a tree-like model of decisions and their possible consequences. The tree consists of nodes that represent decisions, and branches that represent possible outcomes of those decisions. Each node has an associated feature or attribute, and each branch corresponds to one of the possible values of that feature or attribute. At the leaf nodes of the tree, a classification label is assigned based on the feature values that led to that leaf node.

The decision tree algorithm is often used in machine learning(ML) and in artificial intelligence(AI) for classification problems, as it is a simple and intuitive model that can handle both categorical and numerical data. The algorithm works by recursively splitting the data into subsets based on the values of the features, until the subsets are homogeneous with respect to the classification label. The splitting criterion can be chosen based on various measures of impurity, such as entropy or Gini impurity, which aim to maximize the information gain at each split.

Here it is illustrated how to implement of a decision tree basic algorithm in C. The decision tree algorithm in C is as follows.

#include <stdio.h>
#include <stdlib.h>

// Define the structure of a node in the decision tree
struct Node {
int attribute;
int result;
struct Node* trueBranch;
struct Node* falseBranch;
};

// Define a function to create a new node in the decision tree
struct Node* createNode(int attribute, int result, struct Node* trueBranch, struct Node* falseBranch) {
struct Node* node = (struct Node*)malloc(sizeof(struct Node));
node->attribute = attribute;
node->result = result;
node->trueBranch = trueBranch;
node->falseBranch = falseBranch;
return node;
}

// Define the function to traverse the decision tree
int traverseTree(struct Node* root, int* instance) {
if (root->trueBranch == NULL && root->falseBranch == NULL) {
return root->result;
}
if (instance[root->attribute] == 1) {
return traverseTree(root->trueBranch, instance);
}
else {
return traverseTree(root->falseBranch, instance);
}
}

// Define a function to test the decision tree with a set of instances
void testTree(struct Node* root, int instances[][7], int numInstances) {
int numCorrect = 0;
for (int i = 0; i < numInstances; i++) {
int predictedResult = traverseTree(root, instances[i]);
int actualResult = instances[i][6];
if (predictedResult == actualResult) {
numCorrect++;
}
}
printf("Accuracy: %.2f%%\n", (float)numCorrect / (float)numInstances * 100.0);
}

// Define the main function
int main() {
// Create the training instances
int instances[10][7] = {
{1, 0, 1, 0, 1, 0, 1},
{1, 0, 1, 1, 0, 0, 1},
{0, 1, 0, 1, 1, 1, 0},
{0, 1, 1, 1, 0, 0, 1},
{1, 0, 0, 1, 1, 1, 0},
{1, 1, 0, 0, 0, 1, 1},
{0, 0, 1, 0, 0, 0, 0},
{1, 0, 0, 0, 1, 0, 1},
{0, 1, 0, 0, 0, 1, 1},
{1, 1, 1, 1, 1, 0, 0}
};

// Create the decision tree
struct Node* root = createNode(4, 0, createNode(1, 1, createNode(0, 1, NULL, NULL), createNode(2, 0, NULL, NULL)), createNode(5, 1, createNode(2, 1, NULL, NULL), createNode(3, 0, NULL, NULL)));

// Test the decision tree with the training instances
testTree(root, instances, 10);

return 0;
}



The code first defines the structure of a node in the decision tree. Each node has an attribute, a result (which is the classification value for that node), and pointers to two child nodes (one for when the attribute is true, and one for when it is false).

Then, the code defines a function to create a new node in the decision tree. This function takes the attribute, result, and pointers to the true and false child nodes as arguments, and allocates memory for a new node.

Next, the code defines a function to traverse the decision tree. This function takes a pointer to the root node of the tree and an instance (an array of attribute values) as arguments, and recursively traverses the tree to determine the classification result for that instance.

The code then defines a function to test the decision tree with a set of instances. This function takes a pointer to the root node of the tree, a 2D array of instances (each row is an instance and each column is an attribute), and the number of instances as arguments. It then loops through the instances, calling the traverseTree function for each instance, and compares the predicted classification with the actual classification. It keeps track of the number of correctly classified instances and prints out the accuracy percentage.

Finally, the main function creates a set of training instances, creates the decision tree, and tests the decision tree with the training instances. In this example, the decision tree is hardcoded to split on attributes 4, 1, 0, 2, and 3, with classification values 0 or 1.

Note that this is just a simple example implementation, and a real-world implementation would likely involve more complex logic for building and pruning the decision tree, as well as additional functions for handling data preprocessing and model evaluation.

One of the advantages of the decision tree algorithm is that it can handle non-linear relationships between the features and the classification label, and it can capture interactions between multiple features. Additionally, the resulting tree can be easily interpreted and visualized, which can be useful for understanding the model and its decision-making process.

However, one of the main challenges with decision trees is overfitting, which occurs when the tree is too complex and fits the training data too closely, resulting in poor performance on new data. To address this, various techniques such as pruning or ensemble methods like random forests or gradient boosting can be used to improve the generalization of the model.