When OpenAI released ChatGPT in November, programmers were astounded to discover that the artificial intelligence-powered chatbot could not only mimic a huge variety of human speech but could also write code. In the days following its release, programmers posted wild-eyed examples of ChatGPT churning out fairly competent code. From stringing together cloud services to porting python to Rust, ChatGPT showed remarkable competence in at least some basic programming tasks.
But separating hype from reality when it comes to ChatGPT is no small feat. Its coding abilities have inspired a series of overheated headlines — “ChatGPT is a bigger threat to cybersecurity than most realize,” for example — about its ability to write malware and have left seasoned hackers wondering about the extent to which large language models can really be used for malicious hacking.
Marcus Hutchins, the black-hat-turned-white-hat hacker, made headlines in 2017 for stopping the spread of the WannaCry ransomware, and given his experience writing banking trojans in a past life, counted himself among the curious regarding ChatGPT’s abilities. He wondered: Could the chat-bot be used to write malware?
The results were disappointing. “I was legitimately a malware developer for a decade and it took me three hours to get a functional piece of code — and this was in python,” Hutchins, widely known by his online moniker MalwareTech, told CyberScoop in an interview.
After hours of tinkering, Hutchins was able to generate components of a ransomware program — a file encryption routine — but when trying to combine that component with other features necessary to build a fully fledged piece of malware, ChatGPT failed in sophomoric ways, requesting to open a file after attempting to open it. And when he tried to combine various components, ChatGPT would generally fail.
These types of rudimentary ordering problems illustrate the shortcomings of generative AI systems such as ChatGPT. While they are able to create content that is immensely similar to the data it is trained on, large language models often lack the error correction tools and contextual knowledge that make up actual expertise. And amid the astonished reactions to ChatGPT, the limitations of the tool are often lost.
If you believe the hype, there are few things ChatGPT won’t disrupt. From white-collar work, to the college essay, to professional exams, and, yes, even malware development at the hands of ordinary hackers, all are poised for obsolescence. But that hype obscures the ways tools such as ChatGPT are likely to be deployed — not as replacements for human expertise, but as assistants to it.
In the weeks after ChatGPT’s release, cybersecurity companies have released a flurry of reports demonstrating the bot might be used to write malicious code, spawning catchy headlines about ChatGPT’s ability, for example, to write “polymorphic malware.” But these reports tend to obscure the role of expert authors in prompting the model to write and, crucially, correct the code it generates.
In December, researchers at Checkpoint demonstrated how ChatGPT could potentially build a malware campaign from start to finish — from crafting a phishing email to writing malicious code. But generating fully featured code required prompting the model to consider things that only an expert programmer would think of, like adding features to detect a sandbox and checking whether a feature is open to SQL injection.
“The attacker has to know what exactly he wants and to be able to specify the functionality,” said Sergey Shykevich, a researcher at Checkpoint. “Just writing ‘write a code for malware’ won’t produce anything really useful.”
For hackers such as Hutchins, knowing what questions to ask is half the battle in trying to write software, and much of the press around ChatGPT as a programming tool can miss how much expertise researchers are bringing to the conversation when they ask ChatGPT to aid in software development, or “dev.”
“People who understand dev are showing it doing dev, and then they’re not realizing how much they’re contributing,” Hutchins says. “Someone with no programming experience won’t even know what prompts to give it.”
For now, ChatGPT remains one tool among many in the malware development kit. In a report published last week, the threat intelligence firm Recorded Future found more than 1,500 references on the dark web and in closed forums to the use of ChatGPT in malware development and the creation of proof-of-concept code. But the report notes that much of that code is publicly available and that company expects ChatGPT to be most useful for “script kiddies, hacktivists, scammers and spammers, payment card fraudsters, and threat actors engaging in other lowly and disreputable forms of cybercrime.”
For newcomers in the field, ChatGPT might provide help on the margins, the report concludes: “ChatGPT is lowering the barrier to entry for malware development by providing real-time examples, tutorials, and resources for threat actors that might not know where to start.”
In aggregate, the benefits for malicious hackers will be marginal — the introductory hacking tips provided by ChatGPT come in a more accessible form but can just as easily be Googled. As ChatGPT and other large language models mature, their ability to write original code — both nefarious and not — will likely improve, as CyberScoop reported in December. Until then, rather than generating malware out of whole cloth, tools such as ChatGPT are more likely to play a supporting role.
ChatGPT offers a compelling way to craft more effective phishing emails, for example. For Russian-speaking hackers who might struggle to write clickable messages in native-sounding English (or another target language), ChatGPT can sharpen their writing skills. “The vast majority of attacks originate in email, and the vast majority of email attacks are not malware attacks,” says Asaf Cidon, an assistant computer science professor at Columbia University and an advisor the cybersecurity firm Barracuda. “They’re trying to trick the user into giving their credentials or transfer money.” This, Cidon predicts, will now be much easier: “ChatGPT is going to be extremely good at doing that.”
But this represents a step change rather than a revolution in hacking. High-quality phishing emails are already easy to craft — either by the attacker or with the help of, say, a translator hired on a gig-work platform — but ChatGPT can produce them at scale. ChatGPT, Cidon argues, “reduces the investment required.”
In a more exotic approach, an attacker with access to an email archive might use that to fine-tune a large language model to replicate a CEO’s writing style. Training an LLM to write like the boss makes it easier to dupe employees, Cidon said.
But when assessing the impact of ChatGPT on cybersecurity more generally, experts say it’s important to remain focused on the big picture. The use of an LLM in a targeted attack would represent an interesting use-case. For most targets, however, ChatGPT probably won’t improve the chances of success. After all, as Drew Lohn, a researcher at Georgetown’s Center for Security and Emerging Technology observes, “Phishing is already so successful that it might not make a huge difference.”
In aggregate, tools such as ChatGPT are likely to increase the number of capable threat actors. ChatGPT, Lohn argues could be used “to guide hackers through the process of an intrusion that maybe doesn’t even involve any new malware. … There are tons of open source tools and bits of malware that are just floating around or that are just prepackaged,” he said. “I’m worried that ChatGPT will show more people to use that.”
Then again, he says, given how quickly the field is advancing, “give it a week and maybe this will all change.”