Instructor: Harnessing LLMs for Structured Data

Instructor is a Python library designed to streamline the extraction of structured data like JSON from Large Language Models (LLMs) such as GPT-3.5 and GPT-4, including open-source models like Mistral and Anyscale.

By leveraging Pydantic for data validation, Instructor simplifies obtaining structured outputs, reducing the need for extensive coding and making the process more transparent and user-friendly.

The library supports various structuring modes, from Function Calling to JSON Schema, and is compatible with multiple programming languages including TypeScript, Elixir, and PHP.

This makes Instructor not only versatile but also a robust tool that integrates easily into existing development environments.

With Pydantic at its core, Instructor offers customizable validation rules and error messages, enhancing productivity and ensuring data accuracy across platforms. It is particularly beneficial for developers who use Pydantic’s type hints and seek to maintain high standards in data integrity and application reliability.

A simple example to demonstrate how easy it is to substitute instructor instead of the traditional OpenAI Python library:

import instructor
from pydantic import BaseModel
from openai import OpenAI


# Define your desired output structure
class UserInfo(BaseModel):
    name: str
    age: int


# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
user_info = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserInfo,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

print(user_info.name)
#> John Doe
print(user_info.age)
#> 30