0xkato’s walkthrough of the mechanisms inside modern transformer-based LLMs — tokenization, embeddings, Rotary Position Embeddings, attention with Q/K/V and causal masking, multi-head attention and the move to Grouped-Query Attention, the feed-forward network as the stored-knowledge layer, Mixture of Experts, the residual stream + RMSNorm + pre-norm stack, and the next-token prediction loop with speculative decoding. By the end you can read a modern LLM model card and recognise which piece of the architecture each section is talking about.
From API Key to Server Takeover: How LiteLLM 1.83.14 Chained Secret Leakage and Jinja2 SSTI into RCE
A LiteLLM 1.83.14 exploit chain leaks the master key through callback metadata, then abuses non-sandboxed Jinja2 GitLab prompts to achieve server-side RCE.
Jenny was a Friend of Mine – MCPs and Friends
The article shows how Claude Code plus MCP can automate vulnerability hunting with RE, fuzzing, RAG, bounty scoring, and strict validation gates to reduce LLM hallucinations and confirm real bugs.
Leveling Up Secure Code Reviews with Claude Code
Claude Code can speed up secure code reviews by mapping code paths, sources, sinks, and risky patterns, but it works best with strong prompts, human validation, and private handling of sensitive code.
When Local AI Becomes an Attack Vector: A Deep Dive into LLM Infrastructure Security
The article analyzes a real deployment of a low-privileged on-prem LLM server and shows that even restricted models can expose internal systems through APIs, RAG pipelines, and data access, creating new enterprise attack surfaces.





