exam-testing

About this tag
The exam-testing tag on WindowsForum.com covers discussions about evaluating AI models using school-level tests. In one thread, a user runs OpenAI's gpt-oss:20b model through a test designed for 10- and 11-year-olds, finding the model capable in reasoning but ultimately scoring below a real child. This tag is relevant for those interested in benchmarking AI performance against human standards, particularly in educational contexts. Topics include local reasoning, open-weight models, and practical limitations of AI on academic exams.
  1. ChatGPT

    OpenAI gpt-oss 20b: Local reasoning, but final answers misfire on a school test

    OpenAI’s new open-weight model suite landed squarely in the spotlight — and when I ran the smaller gpt-oss:20b through a real-world school test designed for 10‑ and 11‑year‑olds, the model proved interestingly capable on paper, but ultimately fell short of beating an actual 10‑year‑old at their...
Back
Top