LLM Security: Understanding AI as an Attack Surface, A TryHackMe Writeup

Press enter or click to view image in full size

Link — https://tryhackme.com/room/llmsecurity Lets do this together…..

Disclaimer: This write up is based on a Capture The Flag (CTF) challenge hosted on TryHackMe and is intended strictly for educational purposes only.

Introduction

When most people hear the term Large Language Model (LLM), they immediately think of productivity, automation, or chatbots that can write code and answer questions in seconds. And honestly, that excitement makes sense. LLMs have changed how people interact with technology.

Today organizations are integrating AI into almost everything:

Customer support
Security operations
Coding assistants
Search engines
Document analysis
Internal knowledge bases

Tasks that previously took hours can now happen in minutes.

But while learning this TryHackMe room, I realized something important: Most people are using AI without understanding that AI itself has become an attack surface.

That sentence completely changed my perspective on AI security. In traditional cybersecurity, we usually focus on servers, APIs, authentication systems, or vulnerable code. But with LLMs, the attack surface becomes much more abstract. Sometimes the vulnerability is not even in the code itself — it is in the model’s behavior, memory, or the way humans interact with it.

This room gave a beginner-friendly but eye-opening introduction to the security threats surrounding LLMs. Instead of only thinking about “hacking systems,” it teaches you to think about how attackers can manipulate or extract information from AI models themselves.

Task 1 Understanding LLMs Beyond the Hype

An LLM is essentially a machine learning model trained on massive amounts of text data. These models learn patterns, relationships, sentence structures, and contextual meaning from huge datasets.

Examples include:

GPT models
Gemini
Claude
Llama
Mistral

The problem is that these models sometimes remember more than they should. And that is where security concerns begin. Unlike normal software programs with predictable outputs, LLMs generate responses probabilistically. This means the same input can produce different outputs depending on context, prompting, and model behavior. This unpredictability introduces entirely new types of security risks.

Task 2 Data-Based Threats

The first section focused on one of the most interesting concepts in AI security: data-based threats. This is where attackers attempt to discover information about the data used to train the model. Initially, this sounded strange to me. I wondered: “How can someone attack training data indirectly through a model?” But after understanding the concepts, it started making sense.

Press enter or click to view image in full size

Which sample is a member?
MI_SAMPLE_ALPHA

Membership Inference Attack

One of the questions asked:

Which attack determines whether a known data sample was part of an LLM’s training set?
The answer: Membership inference

This attack tries to determine whether a specific piece of data was included during training. Imagine a healthcare AI trained on sensitive patient records. An attacker might ask carefully crafted questions to determine whether a specific patient’s data was used during training.

This becomes extremely dangerous in environments involving:

Medical records
Financial information
Legal documents
Corporate secrets

Even if the model never directly reveals the data, subtle behavioral differences may expose whether certain information existed in the training dataset. That realization genuinely surprised me because it shows how privacy risks can exist even when no obvious data leak happens.

Training Data Extraction

Another threat discussed was:

Training data extraction

This attack involves making the model reproduce memorized parts of its training data.

Some early AI systems accidentally leaked:

API keys
Passwords
Internal code snippets
Email addresses

because those values appeared in training datasets. This is a huge shift from traditional cybersecurity. Normally, if a database leaks, we investigate the server. But with AI, the model itself may unintentionally become the leakage point. That is a completely different security mindset.

Which data-based threat involves the model reproducing memorised snippets of its training data?
Training data extraction

Press enter or click to view image in full size

Task 3 Model-Based Threats

This section became even more interesting because it focused on attacks against the model itself rather than the data.

One question asked for the employee ID, which turned out to be:
7814

Press enter or click to view image in full size

Chatbot Answering

But the deeper concept here was understanding Model Inversion.

Model Inversion Attack

Which model-based threat attempts to reconstruct sensitive information encoded within a model’s internal representations?
The answer: Model inversion

Model inversion attacks attempt to reconstruct sensitive information hidden inside the model’s internal representations.

This is difficult to visualize at first. Think of it this way:

If an AI model is trained heavily on facial images or sensitive records, attackers may try to reverse-engineer or infer information from the model’s learned patterns. It is almost like interrogating the model until fragments of hidden information emerge. This was one of the moments where I realized AI security is deeply connected to behavior rather than only infrastructure.

Traditional Security vs AI Security

While doing this room, I kept comparing AI security to traditional cybersecurity.

In normal applications:

Vulnerabilities exist in code
Exploits target systems
Input validation solves many problems

But in AI systems:

Vulnerabilities can exist in behavior
Attackers manipulate prompts
Models may reveal information unintentionally

The attack surface becomes psychological and contextual. That is a massive shift.

Task 4 System-Based Threats

This section introduced one of the most important concepts in modern LLM systems:
The Context Window

The context window is essentially the memory space the model uses while generating responses.

Get Gajanan Tayde’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

It combines:

System instructions
User prompts
Retrieved external data
Conversation history

into one giant sequence processed by the model. At first this sounds harmless.

But then I realized: If attackers can manipulate what enters the context window, they may manipulate the model itself. That becomes extremely dangerous.

Memory Poisoning

One challenge in this section involved convincing the model and retrieving the flag:

Press enter or click to view image in full size

Grandma Attack

Press enter or click to view image in full size

Did you convince the model? Whats the flag?
THM{MEMORY_POISONED}

The term itself explains the attack beautifully. Memory poisoning happens when malicious or manipulated information enters the AI system’s memory or context flow. Imagine a corporate AI assistant connected to internal documents. If an attacker injects misleading instructions into retrieved documents, the model may start behaving incorrectly or leaking information. This is similar to social engineering — except now the victim is the AI itself. That concept genuinely fascinated me.

Task 5 User-Based Threats

This section focused on perhaps the most dangerous component in cybersecurity: Human

One question asked:

Which package should you NOT download?
Answer:robbco-llm-audit

Press enter or click to view image in full size

This simulated a malicious package disguised as a legitimate AI-related tool. This reflects real-world attacks happening today. As AI becomes popular, attackers are creating:

Fake Python packages
Malicious AI tools
Trojanized models
Fake extensions

to target developers and researchers.

AI-Powered Phishing

Another important concept: Phishing

The room explained that LLM-powered social engineering amplifies??
Answer: phishing attacks.

This is extremely true in the real world. Earlier phishing emails were often easy to detect because of:

Bad grammar
Awkward wording
Poor formatting

Now AI can generate:

Perfect grammar
Personalized messages
Context-aware communication
Convincing fake conversations

Press enter or click to view image in full size

Attackers can scale phishing campaigns dramatically using LLMs. This is one of the clearest examples of AI amplifying existing threats rather than inventing entirely new ones.

My Biggest Learning From This Room

Before studying AI security, I used to think:

“Secure the API, secure the server, secure the code.”

But this room taught me that AI systems require a different mindset. You are no longer securing only software.

You are securing:

Model behavior
Training data
User interaction
Context flow
Memory handling
Prompt integrity

That is fundamentally different from traditional application security.

Why LLM Security Matters So Much

As organizations integrate AI into:

SOC platforms
Chatbots
Healthcare systems
Financial applications
Development pipelines

the risks grow rapidly. An insecure AI system can:

Leak sensitive data
Generate insecure code
Amplify phishing attacks
Spread misinformation
Manipulate users
Behave unpredictably

And the scariest part? Sometimes no “hack” is required. Just conversation…………

Conclusion

This TryHackMe room was an excellent introduction to thinking about AI systems from a security perspective.

It helped me understand that AI security is not just an extension of traditional cybersecurity — it is a new domain with entirely different attack surfaces.

The biggest takeaway for me was this:

In traditional cybersecurity, attackers exploit systems. In AI security, attackers often manipulate behavior.

And that difference changes everything. As AI continues becoming part of everyday infrastructure, understanding these concepts will become essential not just for AI engineers, but for every cybersecurity professional. Because the future of security is no longer only about protecting machines. It is also about understanding how machines think.