Hi! I am Wenbo Pan

I’m a Ph.D. student at City University of Hong Kong, expecting to graduate in 2028. With a background in Statistics, Computer Science, and Philosophy, I focus on understanding the capability boundaries and safety concerns of advanced AI systems from a mechanistic perspective. My research blends analytical theory with empirical practice, enabling me to develop well-founded insights. My long-term mission is to reduce p(doom) - even if a tiny bit.

Curious about my academic and professional journey? Explore my full bio to learn more.

What I’m Passionate About

  • Research. I enjoy exploring the unknown in science. Currently, I focus on leveraging mathematical and analytical tools to gain deeper insights into Large Language Models (LLMs). I’m also interested in data-driven approaches to fine-tune LLMs for better alignment.
  • Reading & Thinking. I spend a significant portion of my leisure time immersed in books, obsessing over both academic non-fiction and sci-fi.
  • Productivity Tools. I have open-sourced several projects on GitHub, mainly focusing on productivity tools that aid my research and workflow.
  • Writing for Learning. Publishing articles on this site and Rednote is a great way to express my thoughts and refine my ideas through feedback.

Works I’m Proud Of

Faro Chat Models (2024)

A series of fine-tuned LLMs designed for long-context modeling. I curated a large, diverse set of instruction-tuning samples and optimized existing base models using LoRA training and model merging. As a result, these models exhibit significantly improved stability and accuracy in long-text understanding.

Safety Residual Space (2025)

An analytical tool for understanding what LLMs learn from safety fine-tuning. It measures activation shifts during training and extracts the most meaningful feature directions—those predictive of LLM behaviors like refusal. Check out the paper for details.

CPT Scaling Law (2024)

We propose a scaling law that estimates the optimal compute (FLOPs) required to reach a certain loss in CPT. Continually Pre-Training (CPT) refers to pre-training a model from an existing checkpoint. Since CPT data distributions often differ from original pre-training data (e.g., different languages), it requires a dedicated study.

moffee (2024)

An open-source slide maker that transforms Markdown documents into clean, professional slide decks. moffee handles the styling and layout, allowing you to focus entirely on the content.