Muhammad Adnan Rizqullah
King Abdulaziz University
Faculty of Computing and Information Technology
Advisor: Dr. Emad Yosif Albassam
Test-Driven Prompting (TDP) shows promise but has critical limitations:
Limitations: Single language (Python only), outdated LLMs, low test coverage
Limitations: Single language focus, limited model diversity, no difficulty analysis
Mathews et al. (Waterloo): First to show test-as-prompt improves accuracy (+7-18%)
Piya et al. (Texas): TDD workflow with LLMs
Critical Gaps:
Tests guide implementation and ensure specification adherence
Test cases as executable specifications in prompts
Does specification clarity translate to performance gains?
What is the performance of test-driven code generation across programming languages of different popularity and type nature?
Scope: Python, JavaScript, C++, TypeScript, PHP, Ruby, Go, C#
What is the performance of test-driven code generation on various models with differing characteristics?
Scope: Closed/open-source, varying sizes, general vs specialized
What is the relationship between programming problem difficulty and LLM performance?
What is the relationship between test suite completeness and LLM performance?
How can a decision framework guide developers in selecting appropriate LLMs for platform-specific development?