Brief about SVM and KNN

Madhuri MK
5 min readAug 6, 2023

Machine learning is a type of artificial intelligence (AI) that allows software applications to become more accurate in predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.

Why Support Vector Machine?

Support vector machines (SVMs) are a type of machine learning algorithm that can be used for classification and regression tasks. SVMs are particularly well-suited for tasks where the data is linearly separable, meaning that the data can be divided into two or more classes by a straight line.

What is a Support Vector Machine?

A support vector machine is a supervised learning algorithm that finds the best hyperplane to separate two or more classes of data. The hyperplane is a line or a plane that divides the data into two or more regions, with each region representing a different class.

Advantages of Support Vector Machine:

SVMs have several advantages over other machine learning algorithms, including:

  • They can handle high-dimensional input space
  • They can deal with sparse data
  • They are relatively insensitive to noise
  • They can automatically avoid overfitting and bias

Use Case in Python:

Here is a simple example of how to implement an SVM in Python using the scikit-learn library:

Python

import numpy as np
from sklearn.svm import SVC
# Create the data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([0, 0, 1, 1, 1])
# Create the SVM model
model = SVC()
# Train the model
model.fit(X, y)
# Predict the class of a new data point
new_data = np.array([11, 12])
prediction = model.predict(new_data)
print(prediction)

Key Takeaways:

  • SVMs are a powerful machine learning algorithm that can be used for classification and regression tasks.
  • SVMs are particularly well-suited for tasks where the data is linearly separable.
  • SVMs have several advantages over other machine learning algorithms, including their ability to handle high-dimensional input space, sparse data, and noise.
  • SVMs can be implemented in Python using the scikit-learn library.

Here is a pictorial presentation of the concept of SVM:

SVM

Ref:www.researchgate.net

The hyperplane is the line that separates the two classes of data. The support vectors are the data points that are closest to the hyperplane. These points are important because they determine the position of the hyperplane.

KNN

The K-nearest neighbors (KNN) algorithm is a simple supervised machine learning algorithm used for classification tasks. It is considered a fundamental starting point in machine learning and is easy to understand and implement.

Why do we need KNN?

Machine learning models make predictions based on past data. KNN is used to classify new data points based on their similarity to existing data points. For example, if we have a dataset of patients with diabetes, we can use KNN to predict whether a new patient is likely to have diabetes based on their blood sugar levels, blood pressure, and other features.

What is KNN?

K-nearest neighbors is a classification algorithm that stores all available cases and classifies new cases based on a similarity measure, often using the Euclidean distance. The Euclidean distance is the straight-line distance between two points in n-dimensional space.

How to choose the factor ‘K’:

The parameter K in KNN refers to the number of nearest neighbors to include in the voting process. The selection of K is a process called parameter tuning, and a common approach is to use the square root of the total number of data points.

When to use KNN:

KNN is suitable for labeled data, smaller datasets with minimal noise, and situations where feature similarity is essential for classification.

How KNN works:

KNN calculates the Euclidean distance between a new data point and all existing data points in the dataset. The algorithm then classifies the new data point based on the majority class of its K nearest neighbors.

Model training and testing:

The data is split into training and testing sets to build and evaluate the KNN classifier. The classifier is trained on the training data and tested on the testing data to assess its accuracy.

Evaluation metrics:

The tutorial uses confusion matrix, f1 score, and accuracy to evaluate the model’s performance. The confusion matrix provides information about true positives, false positives, true negatives, and false negatives, while the f1 score takes into account both precision and recall. The accuracy score is a commonly used metric for overall model accuracy.

Pictorial presentation:

KNN algorithm working

Ref:www.javatpoint.com

The image shows how the KNN algorithm works. The new data point is represented by the red star. The algorithm calculates the Euclidean distance between the new data point and all existing data points in the dataset. The blue circles represent the K nearest neighbors of the new data point. The algorithm then classifies the new data point based on the majority class of its K nearest neighbors.

Use case code:

Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.neighbors import KNeighborsClassifier
# Load the diabetes dataset
diabetes = load_diabetes()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.25)
# Create a KNN classifier with K=5
knn = KNeighborsClassifier(n_neighbors=5)
# Train the classifier on the training data
knn.fit(X_train, y_train)
# Predict the labels of the test data
y_pred = knn.predict(X_test)
# Evaluate the model's performance
print("Accuracy:", knn.score(X_test, y_test))
# Plot the confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.matshow(cm, cmap=plt.cm.Blues)
plt.title("Confusion matrix")
plt.show()

This code will load the diabetes dataset, split it into training and testing sets, create a KNN confusion matrix, and accuracy to evaluate the model’s performance. The confusion matrix provides information about true positives, false positives, true negatives, and false negatives, while the f1 score takes into account both precision and recall. The accuracy score is a commonly used metric for overall model accuracy.

I hope this helps! Let me know if you have any other questions.

Sources:

e5.tamsohbet.site/

mechomotive.com/machine-learning-3/

#Quantum30 #Day6_QML Learning #Quantum Computing India #Quantum Machine Learning

--

--