Skip to main navigation Skip to search Skip to main content

Enhancing RAG-Based MCQ Generation for Java Programming Education
: A Modular Evaluation of Chunking, Retrieval and LLM Performance

  • Erick Olibo

Student thesis: Bachelor

Abstract

Retrieval-Augmented Generation (RAG) systems offer a promising approach for automating educational content creation, but challenges remain in optimizing retrieval configurations and ensuring instructional quality in outputs such as multiple-choice questions (MCQs). This thesis investigates how retrieval-stage design choices and large language model (LLM) selection affect the semantic precision and pedagogical alignment of MCQ generation in Java programming education. A modular two-phase RAG architecture was developed and evaluated. Phase 1 tested 432 retrieval configurations, varying chunking strategies, query processing, retrieval backends, and reranking methods. Configurations were scored using four reference-free metrics and ranked via a weighted composite scoring framework. Phase 2 used the best-performing retrieval setup to generate and evaluate 240 MCQs using five instruction-tuned LLMs under standardized conditions. Generated MCQs were scored using a rubric-based framework implemented via GPT-4o, assessing clarity, relevance, distractor quality, and cognitive alignment based on Bloom’s taxonomy. Results show that sliding window chunking and hybrid retrieval yield superior semantic performance. Claude 3 Opus and GPT-4o achieved the highest rubric scores, with 81% and 75% of their MCQs rated as Excellent, respectively. Qwen 2.5-7B followed closely at 67%, emerging as a competitive open-source alternative, and outperforming Mistral 7B (40%) and LLaMA 3.1-8B (31%). The use of a standardized prompt schema and grounded retrieval contributed to cognitive diversity, including coverage of higher-order Bloom levels, in generated questions. The system offers a reproducible framework for scalable MCQ generation and retrieval evaluation, with implications for AI-assisted assessment in structured learning domains.
Date of Award2025-Jun
Original languageEnglish
SupervisorCharlotte Sennersten (Supervisor) & Craig Lindley (Supervisor)

Educational program

  • Bachelor programme in Computer Software Development

University credits

  • 15 HE credits

Swedish Standard Keywords

  • Computer Systems (20206)

Cite this

'