τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan
In submission, NeurIPS 2024
paper |
code
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Anderson, Arjun Guha
Conference on Language Modeling (COLM) 2024
paper |
code
Type Prediction With Program Decomposition and Fill-in-the-Type Training
Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen
Arxiv
paper |
code
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao
Neural Information Processing Systems (NeurIPS) 2023
paper |
code