Cyber crime case where they talk about confusion matrix🔒

Harshal Thakare
4 min readJun 3, 2021

What is Cybercrime

Cybercrime, also called computer crime, the use of a computer as an instrument to further illegal ends, such as committing fraud, trafficking in child pornography and intellectual property, stealing identities, or violating privacy. Cybercrime, especially through the Internet, has grown in importance as the computer has become central to commerce, entertainment, and government.

Because of the early and widespread adoption of computers and the Internet in the United States, most of the earliest victims and villains of cybercrime were Americans. By the 21st century, though, hardly a hamlet remained anywhere in the world that had not been touched by cybercrime of one sort or another.

Machine Learning

With the use of machine learning, criminal court cases can be automatically classified based on certain features of ICT involvement which will be identified in this research. From the number of cybercrimes that took place in 2016 and how much of those were reported to the police it can be concluded that 0.88% of all Dutch residents filed a police report for cybercrime. Domenie et al. support this number with their research, they conclude that the percentage of cybercrime in filed police reports is less than 1% . Not all cases will go to court, so the percentage of cybercrime in criminal court cases will be even less. For training a classifier a large dataset is desirable. Since the size of the dataset was not yet determined and research has indicated the cybercrime rate in police reports is at most 1%, a provisionary choice was made for Naïve Bayes as the learning algorithm. The learning algorithm is effective and efficient for data mining and proves to do well with little data . From reading criminal court cases, certain classes were defined in which a case involving ICT could be classified. These categories can be found in Appendix A. Some categories have been removed. For example, if too little data was available for a category, it needed to be removed as more data was needed for correctly classifying files for this category. The remaining categories consist of: ‘child pornography’, ‘cyberattack’, ‘identity theft’, ‘other’, ‘phishing’, ‘platform fraud’ and ‘online threat’, with ‘other’ being a category a criminal court case will belong to if it does not fit into any of the defined categories.

Cyber Attack on Cosmos Bank

In August 2018, the Pune branch of Cosmos bank was drained of Rs 94 crores, in an extremely bold cyber attack. By hacking into the main server, the thieves were able to transfer the money to a bank in Hong Kong. Along with this, the hackers made their way into the ATM server, to gain details of various VISA and Rupay debit cards.
The switching system i.e. the link between the centralized system and the payment gateway was attacked, meaning neither the bank nor the account holders caught wind of the money being transferred.
According to the cybercrime case study internationally, a total of 14,000 transactions were carried out, spanning across 28 countries using 450 cards. Nationally, 2,800 transactions using 400 cards were carried out.
This was one of its kinds, and in fact, the first malware attack that stopped all communication between the bank and the payment gateway.

Confusion matrix and accuracy

The confusion matrix that was obtained from the classifier is depicted in Figure . It is in normalized form, since the classes are imbalanced. The darker the blue, the better the classifier is at predicting files for this class. It is clear where the classifier gets ‘confused’. The ‘identity theft’ class does not seem to do well, which has a good reason. Through reading court cases, the discovery was made that ‘platform fraud’ is linked to ‘identity theft’, as it appears that stolen identities are often used to commit platform fraud. In the confusion matrix it is shown that ‘identity theft’ is often predicted as ‘platform fraud’.

From calculating the f1_score the accuracy proved to be 0.76, which means a criminal court case label can be predicted with an accuracy of 76%. This means 24% of all criminal court cases gets misclassified as another class. However, since this accuracy is the weighted average of each f1_score of a class, it may be better to calculate accuracies per class as some classes are performing better than others. The f1_score per class is shown in Table 6. The confusion matrix in Figure clearly indicates as which classes the labels are misclassified, as well as the percentage per class. The accuracies can also be read from the diagonal in the confusion matrix. It appears ‘child pornography’ can be determined with high accuracy.

🔰Keep Learning❗❗ 🔰Keep Sharing❗❗

--

--