Boxuan Shan
Cupertino, California • boxuan.shan@gmail.com • 857 413 1628 • github.com/bxshan
Sophomore student at the Harker School with a focus on machine learning, competitive programming, and mathematics. Possesses significant research experience applying machine learning to real-world problems. Proficient in Python, C++, and Java, complemented by professional experience in investment research.
EDUCATION
The Harker School
Sep. 2024 - present
Freshman GPA 4.23 (weighted)
Sophomore GPA 4.38 (weighted) as of Jan 4, 2026
Advanced curriculum:
AP Calculus BC (5), AP Physics C: Mechanics (5)
2024
AP Computer Science A (5), AP Chinese (5)
2025
AP Chemistry, AP Microeconomics, AP Macroeconomics, AP European History
2026 (expected)
Stanford Pre-Collegiate University-Level Online Math Courses
XM521 Multivariable Differential Calculus
Fall 2025
99.94% (A+), Final Assessment Grade Pending
XM511 Linear Algebra
Spring 2026
USACO Courses w/ X-Camp (602H)
Topics include: DP, Graphs & Trees manipulations, DSU, MST, Combinatorics, etc
Jan. 2025 - present
PROFESSIONAL EXPERIENCES
Next Capital — Operations / Investment Intern
➤
Jul. – Aug. 2025
Conducted research and wrote reports on targeted areas of potential investment
Research
Used as reference for 2 promotional videos and 3 articles to expand Next Capital’s visibility
Informed decisions on emergent investment opportunities
Supported incubated projects through market research and field analysis
RESEARCH
Tracing Institutional Bias Transfer from Wikipedia to Large Language Models ➤
Oct. 2025 - present
Independent Research
Abstract
Large Language Models (LLMs) rely heavily on Wikipedia as part of their training data, yet Wikipedia contains socioeconomic biases in how it describes institutions such as high schools. This project investigates whether these biases—in article length, descriptive language, and geographic associations—transfer to LLMs even when entity names and locations are removed (“blinded”).
By training three GPT-2 models from scratch on controlled datasets (original Wikipedia, entity-blinded Wikipedia, and entity-blinded Wikipedia with high school articles added), this research evaluates whether entity blinding reduces institutional bias and how bias is reintroduced, with implications for safer AI development.
Developing a Machine Learning Algorithm for Wikipedia Vandalism Detection ➤
Summer 2024
Conducted with mentorship from Dr. Françeska Xhakaj of Carnegie Mellon University, through Pioneer Academics
Abstract
Wikipedia is the largest online encyclopedia and relies on volunteer contributors, making it vulnerable to malicious or biased edits (“vandalism”). This project applies machine learning to improve vandalism detection at scale.
Using the English Wikipedia PAN-WVC-10 dataset (Amazon Mechanical Turk), a linear regression model achieved 80% accuracy, 76% precision, 89% recall, and a 13% false positive rate, demonstrating the feasibility of scalable automated detection methods.
ACHIEVEMENTS
USACO Gold Division
2024 – 2025 Season
SSAT Perfect Score
November 2022
SKILLS & INTERESTS
Python (USACO + AI/ML), C++ (USACO), Java (AP CS A w/ Data Structures)
Native in Chinese and English
Water Polo Junior Varsity Starter
:wq<CR>