Measuring Progress on Scalable Oversight for Large Language Models

Paper report