Development of Threat Information De-identification Technology Using Big Data Tools

Main Article Content

Seung-Yeon Hwang, Jeong-Joon Kim

Abstract

In cybersecurity, many new technologies such as big data tools have developed, causing enormous damage to companies and public institutions not only through data generation but also major cyberattacks such as ransomware that abuse them. As threat information such as personal information is leaked by malicious hackers, cases requiring money such as voice phishing, and various spam emails and illegal telemarketing are used to constantly receive advertisement information that users do not want. The scale of mental and material damage continues to increase due to personal information leakage, and once leaked personal information is virtually impossible to recover, it is identified as important threat information. Therefore, in this paper, a de-identification technology that can safely store personal information among threat information was developed. De-identification technology provides strong security compared to general leaks because even if some or all the information is deleted or replaced with other characters, important parts are concealed. The de-identification technology operates and stores in MongoDB's sharding environment, and memory-based Spark is used because distributed parallel processing is required to de-identify large amounts of data. As the main data generated for testing this technology, personal information data in the form of integers, strings, and patterns can be generated as much as desired, and de-identification techniques such as suppression and masking can be applied through a convenient UI.

Article Details

Section
Articles