Support Vector Machine is among the most popular Machine Learning algorithms. It is efficient and can train in limited datasets. But what is it?

What is a Support Vector Machine (SVM)?

Support vector machine is a machine learning algorithm that uses supervised learning to create a model for binary classification. That is a mouthful. This article will explain SVM and how it relates to natural language processing. But first, let us analyze how a support vector machine works.

How Does SVM Work?

Consider a simple classification problem where we have data that has two features, x and y, and one output – a classification that is either red or blue. We can plot an imaginary dataset that looks like this:

<img alt="Untitled-drawing" data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/Untitled-drawing.jpg" decoding="async" height="500" src="data:image/svg xml,” width=”800″>

Given data like this, the task would be to create a decision boundary. A decision boundary is a line that separates the two classes of our data points. This is the same dataset but with a decision boundary:

<img alt="Untitled-drawing-1" data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/Untitled-drawing-1.jpg" decoding="async" height="500" src="data:image/svg xml,” width=”800″>

With this decision boundary, we can then make predictions for which class a datapoint belongs to, given where it lies relative to the decision boundary. The Support Vector Machine algorithm creates the best decision boundary that will be used to classify points.

But what do we mean by best decision boundary?

The best decision boundary can be argued to be the one that maximizes its distance from either one of the support vectors. Support vectors are data points of either class closest to the opposite class. These data points pose the greatest risk of misclassification because of their proximity to the other class.

<img alt="Untitled-drawing-2" data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/Untitled-drawing-2.jpg" decoding="async" height="500" src="data:image/svg xml,” width=”800″>

Training of a support vector machine, therefore, involves trying to find a line that maximizes the margin between support vectors.

It is also important to note that because the decision boundary is positioned relative to the support vectors, they are the only determinants of the position of the decision boundary. The other data points are, therefore, redundant. And thus, training only requires the support vectors.

In this example, the decision boundary formed is a straight line. This is only because the dataset has just two features. When the dataset has three features, the decision boundary formed is a plane rather than a line. And when it has four or more features, the decision boundary is known as a hyperplane.

Non-Linearly Separable Data

The example above considered very simple data that, when plotted, can be separated by a linear decision boundary. Consider a different case where data is plotted as follows:

<img alt="Untitled-drawing-4" data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/Untitled-drawing-4.jpg" decoding="async" height="500" src="data:image/svg xml,” width=”800″>

In this case, separating the data using a line is impossible. But we may create another feature, z. And this feature may be defined by the equation: z = x^2 y^2. We can add z as a third axis to the plane to make it three-dimensional.

When we look at the 3D plot from an angle such that the x-axis is horizontal while the z-axis is vertical, this is the view we get something that looks like this:

<img alt="Untitled-drawing-5" data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/Untitled-drawing-5.jpg" decoding="async" height="500" src="data:image/svg xml,” width=”800″>

The z-value represents how far a point is from the origin relative to the other points in the old XY-plane. As a result, the blue points closer to the origin have low z-values.

While the red points further from the origin had higher z-values, plotting them against their z-values gives us a clear classification that can be demarcated by a linear decision boundary, as illustrated.

This is a powerful idea that is used in Support Vector Machines. More generally, it is the idea of mapping the dimensions into a higher number of dimensions so that data points can be separated by a linear boundary. Functions that are responsible for this are kernel functions. There are many kernel functions, such as sigmoid, linear, non-linear, and RBF.

To make mapping these features more efficient, SVM uses a kernel trick.

SVM in Machine Learning

Support Vector Machine is one of the many algorithms used in machine learning alongside popular ones like Decision Trees and Neural Networks. It is favored because it works well with fewer data than other algorithms. It is commonly used to do the following:

  • Text Classification: Classifying text data such as comments and reviews into one or more categories
  • Face Detection: Analysing images to detect faces to do things such as add filters for augmented reality
  • Image Classification: Support vector machines can classify images efficiently compared to other approaches.

The Text Classification Problem

The internet is filled with lots and lots of textual data. However, much of this data is unstructured and unlabelled. To better use this text data and understand it more, there is a need for classification. Examples of times when text is classified include:

  • When tweets are categorized into topics so people can follow topics they want
  • When an email is categorized as either Social, Promotions, or Spam
  • When comments are classified as being hateful or obscene in public forums

How SVM Works With Natural Language Classification

Support Vector Machine is used to classify text into text that belongs to a particular topic and text that does not belong to the topic. This is achieved by first converting and representing the text data into a dataset with several features.

One way to do this is by creating features for every word in the data set. Then for every text data point, you record the number of times each word occurs. So suppose unique words are occurring in the data set; you will have features in the dataset.

Additionally, you will provide classifications for these data points. While these classifications are labeled by text, most SVM implementations expect numeric labels.

Therefore, you will have to convert these labels to numbers before training. Once the dataset has been prepared, using these features as coordinates, you can then use an SVM model to classify the text.

Creating an SVM in Python

To create a support vector machine (SVM) in Python, you can use the SVC class from the sklearn.svm library. Here is an example of how you can use the SVC class to build an SVM model in Python:

from sklearn.svm import SVC 

# Load the dataset 
X = ... y = ... 

# Split the data into training and test sets 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=19) 

# Create an SVM model 
model = SVC(kernel='linear') 

# Train the model on the training data 
model.fit(X_train, y_train) 

# Evaluate the model on the test data 
accuracy = model.score(X_test, y_test) 

print("Accuracy: ", accuracy) 

In this example, we first import the SVC class from the sklearn.svm library. Then, we load the dataset and split it into training and test sets.

Next, we create an SVM model by instantiating an SVC object and specifying the kernel parameter as ‘linear’. We then train the model on the training data using the fit method and evaluate the model on the test data using the score method. The score method returns the model’s accuracy, which we print to the console.

You can also specify other parameters for the SVC object, such as the C parameter which controls the strength of the regularization, and the gamma parameter, which controls the kernel coefficient for certain kernels.

Benefits of SVM

Here is a list of some benefits of using support vector machines (SVMs):

  • Efficient: SVMs are generally efficient to train, especially when the number of samples is large.
  • Robust to Noise: SVMs are relatively robust to noise in the training data as they try to find the maximum margin classifier, which is less sensitive to noise than other classifiers.
  • Memory Efficient: SVMs only require a subset of the training data to be in memory at any given time, making them more memory efficient than other algorithms.
  • Effective in High-Dimensional Spaces: SVMs can still perform well even when the number of features exceeds the number of samples.
  • Versatility: SVMs can be used for classification and regression tasks and can handle various types of data, including linear and non-linear data.

Now, let’s explore some of the best resources to learn Support Vector Machine (SVM).

Learning Resources

An Introduction to Support Vector Machines

This book on Introduction to Support Vector Machines comprehensively and gradually introduces you to Kernel-based Learning methods.

It gives you a firm foundation on the Support Vector Machines theory.

Support Vector Machines Applications

While the first book focused on the theory of Support Vector Machines, this book on Support Vector Machines Applications focuses on their practical applications.

It looks at how SVMs are used in image processing, pattern detection, and computer vision.

Support Vector Machines (Information Science and Statistics)

The purpose of this book on Support Vector Machines (Information Science and Statistics) is to provide an overview of the principles behind the effectiveness of support vector machines (SVMs) in various applications.

The authors highlight several factors that contribute to the success of SVMs, including their ability to perform well with a limited number of adjustable parameters, their resistance to various types of errors and anomalies, and their efficient computational performance compared to other methods.

Learning with Kernels

“Learning with Kernels” is a book that introduces readers to support vector machines (SVMs) and related kernel techniques.

It is designed to give readers a basic understanding of mathematics and the knowledge they need to start using kernel algorithms in machine learning. The book aims to provide a thorough yet accessible introduction to SVMs and kernel methods.

Support Vector Machines with Sci-kit Learn

This online Support Vector Machines with Sci-kit Learn course by the Coursera project network teaches how to implement an SVM model using the popular machine learning library, Sci-Kit Learn.

<img alt="Support Vector Machines with Sci-kit Learn" data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/image-38-1500×527.png" decoding="async" height="300" src="data:image/svg xml,” width=”800″>

Additionally, you will learn the theory behind SVMs and determine their strengths and limitations. The course is beginner-level and requires about 2.5 hours.

Support Vector Machines in Python: Concepts and Code

This paid online course on Support Vector Machines in Python by Udemy has up to 6 hours of video-based instruction and comes with a certification.

<img alt="Support Vector Machines in Python: Concepts and Code" data- data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/image-41.png" data- decoding="async" src="data:image/svg xml,” width=”800″>

It covers SVMs and how they can solidly be implemented in Python. Furthermore, it covers business applications of Support Vector Machines.

Machine Learning and AI: Support Vector Machines in Python

In this course on Machine Learning and AI, you will learn how to use support vector machines (SVMs) for various practical applications, including image recognition, spam detection, medical diagnosis, and regression analysis.

<img alt="Machine Learning and AI: Support Vector Machines in Python" data- data-src="https://kirelos.com/wp-content/uploads/2023/01/echo/image-40.png" data- decoding="async" src="data:image/svg xml,” width=”800″>

You will use the Python programming language to implement ML models for these applications.

Final Words

In this article, we learned briefly about the theory behind Support Vector Machines. We learned about their application in Machine Learning and Natural Langauge Processing.

We also saw what its implementation using scikit-learn looks like. Furthermore, we spoke about the practical applications and benefits of Support Vector Machines.

While this article was just an introduction, the additional resources recommended going into more detail, explaining more about Support Vector Machines. Given how versatile and efficient they are, SVMs are worth understanding to grow as a data scientist and ML engineer.

Next, you can check out top machine learning models.