Task 05 : Cyber Crimes & Confusion Matrix

Mohammed Abdul Basith
4 min readJun 16, 2021

--

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classification model. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision.

The following 4 are the basic terminology which will help us in determining the metrics we are looking for in confusion matrix

  • True Positives (TP): when the actual value is Positive and predicted is also Positive.
  • True negatives (TN): when the actual value is Negative and prediction is also Negative.
  • False positives (FP): When the actual is negative but prediction is Positive. Also known as the Type 1 error
  • False negatives (FN): When the actual is Positive but the prediction is Negative. Also known as the Type 2 error

Check these important terms and their meanings

  1. Precision: It is the portion of values that are identified by the model as correct and are relevant to the problem statement solution. We can also quote this as values, which are a portion of the total positive results given by the model and are positive. Therefore, we can give its formula as TP/ (TP + FP).
  2. Recall: It is the portion of values that are correctly identified as positive by the model. It is also termed as True Positive Rate or Sensitivity. Its formula comes out to be TP/ (TP+FN).
  3. F-1 Score: It is the harmonic mean of Precision and Recall. It means that if we were to compare two models, then this metric will suppress the extreme values and consider both False Positives and False Negatives at the same time. It can be quoted as 2*Precision*Recall/ (Precision+Recall).
  4. Accuracy: It is the portion of values that are identified correctly irrespective of whether they are positives or negatives. It means that all True positives and True negatives are included in this. The formula for this is (TP+TN)/ (TP+TN+FP+FN).

Confusion Matrix & Cyber Crimes

From Rechtspraak.nl, an archive containing all court cases from 1913 until 2018 was downloaded. The data was grouped per year and each year was divided into 12 folders, in which court cases were grouped per month. In total, 7 classes remained, including the ‘other’ class. The classes and number of files are shown below. The classes are imbalanced because each one of them does not contain the same number of files as the other.

The confusion matrix that was obtained from the classifier is depicted below. It is in normalized form, since the classes are imbalanced. The darker the blue, the better the classifier is at predicting files for this class. The accuracies can also be read from the diagonal in the confusion matrix. It appears ‘child pornography’ can be determined with high accuracy.

Conclusion

Cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime.

The framework developed in our work is essential to the creation of a model that can support analytics regarding the identification, detection and classification of the integrated cybercrime offenses (structured and unstructured).

The main focus of our work is to find the attacks that take advantage of the security vulnerabilities and analyze these attacks by making use of machine learning techniques.

Thank you for reading …

#worldrecordholder #training #internship #makingindiafutureready #summer #summertraining #python #machinelearning #docker #rightmentor #deepknowledge #linuxworld #vimaldaga #righteducation

--

--

No responses yet