AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments

To prepare for oral arguments, attorneys practice in moot courts — simulated hearings where experienced partners play the role of judges. We examine whether AI can be a suitable moot court practice partner by evaluating its justice-specific questioning on a two-layer evaluation framework for realism and pedagogical usefulness. We find that models achieve surprising realism but exhibit low question diversity and sycophancy. Our evaluation framework accurately captures abilities and blind spots of current AI-assisted moot courts, which would not be found by naïve evaluation approaches.

Kylie Zhang*, Nimra Nadeem*, Lucia Zheng, Dominik Stammbach, Peter Henderson

Princeton University  ·  Stanford University  ·  *Equal contribution  ·  CSLAW 2026

The Pitch

In oral arguments, appellate justices probe attorneys with targeted questions about their legal argument: its merits, shortcomings, and downstream implications. Some attorneys have the resources to practice for oral arguments in moot courts, mock hearings where their colleagues (sometimes former judges) play justices and question them, while others, like Dale Ho in the clip below, rely on more primitive methods – flashcards and practicing solo.

In The Fight, ACLU attorney (now federal judge) Dale Ho prepares for oral arguments with flashcards against a mirror.

As AI models become better at legal tasks, we envision a world where instead of offloading attorney work and cognitive decision-making to AI, we use it as a tool to upskill attorneys. In oral arguments, AI could assist those who are currently training solo via pedagogically rigorous and accurate justice-questioning simulations. And for those not training solo, attorney Neal Katyal explains in a recent Ted Talk about his preparation for Learning Resources v. Trump, AI can augment a team by being a "relentless" legal coach who "predict[s] the contours" of the argument to be faced.

First though, we need to know if current models are good at simulating justice questioning, which brings about our central question:

How good are current AI systems at simulating justice-specific questioning?

Thomas
Sotomayor
Roberts
Kagan
Gorsuch
Alito
Kavanaugh
Barrett
Jackson

We create an evaluation suite by asking “what are the characteristics of a good moot court question?” Broadly, moot court questions ought to be realistic to the hearing format, and pedagogically useful (i.e. challenging, logically sound, based in doctrine). So, we structure our two-tiered evaluation framework along the axes of realism and pedagogical usefulness.

However, evaluation is not straightforward. Oral argument simulation is complex, open-ended, and sensitive to (very long) contexts. So, under our two-tiered framework, we construct and evaluate 20 metrics assessing both realism and pedagogical usefulness. We test prompt-based and agentic simulators across 5 frontier models. For data, we use Supreme Court oral argument transcripts because they are widely available, high quality, and a canonical representation of the task. The results are promising, but also reveal critical blind spots.

Start with
SCOTUS Transcripts
62 cases, 168 argument sections from the 2024 term. Each includes case facts, legal question, and multi-turn dialogue context.
Simulate via
8 Simulator Variants
5 LLMs × 3 prompting strategies (Default, Profile, Moot Court) plus 3 agentic simulators with search and profile tools.
Evaluate with
20 Metrics
Adversarial tests, human preferences, issue coverage, question diversity, fallacy detection, and tone analysis.

Key Results

Which Question Is Better?

Read the oral argument context, then pick which follow-up question you think is more effective. One was asked by a real Supreme Court justice. The other was generated by an AI simulator. After you choose, we'll reveal which was which.

Question 1 of 8
correct out of

See It in Action

Below are real examples from our evaluation. Select a scenario and model to see how different simulators handle the same oral argument context, adversarial provocations, and logical fallacies.

Simulator Design

Prompt-based Simulators

Five models (Llama-3.3-70B, Qwen3-32B, Gemini-2.5-Pro, GPT-4o, gpt-oss-120b) with three prompting strategies varying the context provided about the justice being simulated.

See prompting strategies

SCOTUS_DEFAULT – Minimal context. The model is told to act as a named Supreme Court justice.

Example

You are Supreme Court Justice Sonia Sotomayor. You are currently in a Supreme Court oral argument with the following case. Your remark should flow naturally within the context you've been given and should be consistent with your style of statutory interpretation and known politics. What matters most is that you fully flesh out an advocate's argument.

SCOTUS_PROFILE – Adds a hand-crafted profile of the justice's judicial philosophy and political leanings.

Example

You are Supreme Court Justice Amy Coney Barrett.

Justice Barrett is a constitutional originalist and a member of the conservative bloc of the Court. She believes (1) that "the meaning of the constitutional text is fixed at the time of its ratification"; and (2) that the "historical meaning of the text" is legally significant and generally "authoritative." Under this view, the "original public meaning" of a constitutional provision is "the law." Judge Barrett could be viewed as sometimes embracing a more pragmatic approach to textualism.

You are currently in a Supreme Court oral argument with the following case. Your remark should flow naturally within the context you've been given and should be consistent with your style of statutory interpretation and known politics. What matters most is that you fully flesh out an advocate's argument.

MOOT_COURT – Frames the simulation as judging the National Moot Court Competition. Explicitly instructs the model to challenge students and identify logical errors.

Example

You are Supreme Court Justice Clarence Thomas judging the finals of the National Moot Court Competition.

Justice Thomas is a textualist who makes up part of the Court's conservative bloc. He takes a "liberal originalist" approach to civil rights issues, particularly affirmative action, and a "conservative originalist" approach to civil liberties issues, such as abortion. Liberal originalism embraces the broad principles of the Declaration of Independence, such as the natural law ideal of equality; conservative originalism relies on the Framers' specific language and intent.

Top 3Ls from the best law schools are currently arguing before you over the following case. These are some of the best students and you want to challenge them to do better. What matters most is that you humble them by asking very difficult questions. You want to call out even the smallest logical errors now so that they can succeed in the future.

Agentic Simulators

Three reasoning models (GPT-4o, gpt-oss-120b, Gemini-2.5-Pro) enhanced with tool access. Before generating each question, the agent can search case materials, look up justice profiles, and reason step-by-step (up to 10 actions per turn).

See agent tools

THINK – Reason about the oral argument history and plan the next question.

CLOSED_WORLD_SEARCH – Search case docket files from supremecourt.gov (2017–2024).

JUSTICE_PROFILE – Look up voting patterns and political affiliations from the Supreme Court Database.

PROVIDE_FINAL_RESPONSE – Output the simulated justice remark.

Evaluation Metrics

Our two-layer framework uses 20 metrics across realism and pedagogical usefulness. Each metric targets a specific quality that effective oral argument simulation should exhibit.

Layer 1
Realism
Does it sound like something a justice would say?
Adversarial Tests
Decorum · Rage-Bait · Switching Sides
Human Evaluation
Blind Pairwise Arena
Layer 2
Pedagogical Usefulness
Does it actually help advocates prepare?
Issue Coverage
Broad · Narrow
Question Diversity
LegalBench · Stetson · MetaCog
Fallacy Detection
10 Logical Fallacy Types
Tone
Competitive to Cooperative

Hover or tap a metric above to see its description.

Data & Annotations

Our test set draws from U.S. Supreme Court oral argument transcripts accessed via the Oyez API, focusing on cases from the 2024 term.

Human annotations were collected through a custom Gradio-based arena interface where law students and graduate researchers compared simulated and real justice responses in blind pairwise matchups. The full annotation dataset — including preference judgments and quality assessments across 9 metric dimensions — is available on Hugging Face.

Citation

@inproceedings{zhang2026ai,
  title={AI-Assisted Moot Courts: Simulating Justice-Specific
         Questioning in Oral Arguments},
  author={Zhang, Kylie and Nadeem, Nimra and Zheng, Lucia
          and Stammbach, Dominik and Henderson, Peter},
  booktitle={Symposium on Computer Science and Law (CSLAW)},
  year={2026}
}

Acknowledgements

The authors thank Dan Bateyko, Zirui Cheng, Lucy He, Michel Liao, Patty Liu, Max Gonzalez Saez-Diez, and Zeyu Shen for their contributions. This work was funded by a Princeton Language+Intelligence grant and the Schmidt Science Humanities and Virtual Institute Grant.