MVP Proposal

Automated Short-Answer Grading Service

A production-ready ML service that evaluates student responses to scientific questions by comparing against reference answers, delivering accurate grading via HTTP API.

Core Output

3-Way Classification

Correct, Partially Correct, Incorrect

Throughput Target

10K/day

Scalable to 100K+ submissions

Latency Goal

<1s P95

Real-time processing target

Technology Stack
Production-grade Python ecosystem on AWS infrastructure

API Layer

FastAPISQLAlchemyPydanticPytest

ML Stack

PyTorchTransformersscikit-learnPandas

Infrastructure

AWS ECSRDS PostgreSQLElastiCacheCloudWatch
SciEntsBank Dataset
Scientific Entailment Bank for short-answer assessment (Dzikovska et al., 2013)

Primary Focus

Unseen Answers (UA)

Novel student expressions for known Q&A pairs

Input Schema

Question + Reference + Student

Three-text comparison task

Classification

3-Way Labels

Correct / Partially Correct / Incorrect

Label Definitions (3-way classification)

Correct:Semantically equivalent to reference answer
Partially Correct:Contains some correct elements but incomplete
Incorrect:Wrong, irrelevant, or contradictory response

Licensing & Attribution

SciEntsBank is available via Hugging Face (nkazi/SciEntsBank) under academic use terms. Original dataset: Dzikovska et al. (2013). For commercial deployment, licensing agreement with dataset authors/Cambridge University may be required.

Future Enhancements (Post-MVP)
Architecture designed to accommodate these features

Confidence Scores

Model prediction probability for each grade

Justification

Explanation of why an answer was marked incorrect

Formative Feedback

Guidance on how to improve the answer

Multi-Domain Support

Expand beyond science to other subjects

Data-First Design

Every request and response is persisted for audit trails, model retraining, and analytics. Schema supports future metadata expansion.

Production-Ready Security

API key authentication with rate limiting, input validation, and comprehensive logging for compliance requirements.