K-Means Clustering Use Case on Security Domain

πŸ”° What do you mean by Unsupervised learning?

Mohammed Abdul Basith
6 min readAug 19, 2021

πŸ”° What is K-Means Clustering?

πŸ”° How K-Means Clustering Work?

πŸ”° K-Means Clustering Use Cases on Security Domain

πŸŽ‡ What do you mean by Unsupervised learning?

πŸ—Ό Unsupervised Learning, as discussed earlier, can be thought of as self-learning where the algorithm can find previously unknown patterns in datasets that do not have any sort of labels.

πŸ—Ό It helps in modelling probability density functions, finding anomalies in the data, and much more.

πŸ—Ό To give you a simple example, think of a student who has textbooks and all the required material to study but has no teacher to guide. Ultimately, the student will have to learn by himself or herself to pass the exams.

πŸ—Ό Here we don’t have X and Y, There is unlabeled data to train the machine, this is done by the concept clustering.

πŸŽ† Types of Unsupervised Learning

πŸ”Ή Unsupervised Learning has been split up majorly into 2 types:

  • Clustering
  • Association

πŸŽ‡ What is K-Means Clustering?

πŸ—Ό K-means clustering is one of the simplest and popular unsupervised machine learning algorithms.

πŸ—Ό One of K-means most important applications is dividing a data set into clusters.

πŸ—Ό Clustering is the type of Unsupervised Learning where you find patterns in the data that you are working on. It may be the shape, size, colour etc. which can be used to group data items or create clusters(groups).

πŸ—Ό The below diagram explains the working of the K-means Clustering Algorithm:

πŸ—Ό It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabeled dataset on its own without the need for any training.

πŸ—Ό It is a centroid-based algorithm, where each cluster is associated with a centroid.

β€œ It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. β€œ

πŸ—Ό The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.

πŸŽ‡ How k-means Clustering Work?

🎯 First let’s say we have trained model and created 3 Clusters(Groups) on the basis of the historical dataset you have.

🎯 Know if we have 3 Data Point A, B, C come as Input and know our Model need to predict that in which Cluster it will be a Part of.

🎯 Here the 3 Data Points A, B, C are provided to model. So First it will do Calculations That which data point is near to which cluster. The A data point is near to a cluster(set of data points) which is downwards(21, 22, 24, 23, 25 etc.,) showing in graph.

🎯 Same way it use to calculate Distance of Data Point with respect to Clusters and Nearer Cluster that Data point is Added to it.

πŸŽ† Advantages of K- Means Clustering Algorithm

βœ” It is fast, Robust, Easy to understand, Comparatively Efficient.

βœ” If data sets are distinct, then gives the best results.

βœ” Flexible, Easy to interpret.

βœ” Better computational Cost, Enhances Accuracy.

πŸŽ† Disadvantages of K- Means Clustering Algorithm

βœ” If there are two highly overlapping data, then it cannot be distinguished and cannot tell that there are two clusters.

βœ” Cannot handle outliers and noisy data.

βœ” Do not work for the non-linear data set, Lacks consistency, Sensitive to scale.

πŸŽ† Applications of K- Means Clustering

πŸ”Ή Market Segmentation

πŸ”Ή Document Clustering

πŸ”Ή Image Segmentation

πŸ”Ή Image Compression, Vector Quantization, Cluster Analysis

πŸ”Ή Identifying Crime-Prone Areas

πŸ”Ή Drug Activity Prediction, etc.,

πŸŽ‡ K-Means Clustering Use Cases on Security Domain

πŸŽ† An augmented K-means clustering approach for the detection of distributed denial-of-service attacks

πŸ”… The problem of distributed denial-of-service (DDoS) attack detection remains challenging due to new and innovative methods developed by attackers to evade the deployed security systems.

πŸ”… In this work, we devise an unsupervised machine learning (ML)-based approach for the detection of different types of DDoS attacks by augmenting the performance of K-means clustering algorithm with the aid of a hybrid method for feature selection and extraction.

πŸ”… By sequentially combining an integrated feature selection (IFS) algorithm and a deep autoencoder (DAE), we develop the hybrid method for extracting encoded features, which can better separate the clusters of benign and malicious network flows.

πŸ”… We formulate the problem of DDoS attack detection as a binary clustering of network flows. Although K-means clustering is the simplest and widely used algorithm, we investigate its performance for DDoS attack detection before and after applying the proposed hybrid method for feature selection and extraction.

πŸ”… Our results show that after employing the proposed hybrid method, the performance of K-means clustering model improves, and it is comparable to the state-of-the-art supervised ML and deep learning (DL)-based methods developed for DDoS attack detection.

πŸŽ† CYBER PROFILING

πŸ”… The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.

πŸ”… Profiling is more specifically based on what is known and not known about the criminal. Profiling is information about an individual or group of individuals that is accumulated, stored, and used for various purposes, such as by monitoring their behavior through their internet activity.

πŸ”… Difficulties in implementing cyber profiling are on the diversity of user data and behavior when online is sometimes different from actual behavior. Given the privilege in personal behavior, inductive generalizations can be very reliable but can also lead to a misunderstanding of behavior analysis. Therefore the cyber-profiling process is via a combination of deductive and inductive methods.

πŸ”… For investigation, the cyber-profiling process gives a good, contributing to the field of forensic computer science.

πŸ”… Cyber Profiling is one of the efforts made by the investigator, to know the alleged offenders through the analysis of data patterns that include aspects of technology, investigation, psychology, and sociology.

πŸ”… The process of profiling against criminals is often also known as cyber-criminal profiling criminal investigation or analysis.

πŸ”… Criminal profiles are generated in the form of data on personal traits, tendencies, habits, and geographic-demographic characteristics of the offender (for example age, gender, socioeconomic status, education, the origin place of residence).

πŸ”… Preparation of criminal profiling will relate to the analysis of physical evidence found at the crime scene, the process of extracting the understanding of the victim (victimology), looking for a modus operandi (whether the crime scene planned or unplanned), and the process of tracing the perpetrators was deliberately left out (signature).

Thank you for reading.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ __ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

--

--