Back to code sutra Archive
AI Power: Natural Language to Structured Data Query
Tuesday, February 10, 2026The Grind
Manually translating complex natural language requests into structured data models (like Pydantic objects for API calls or database queries) is a tiresome dance. It's easy to miss edge cases, introduce typos, and spend hours debugging serialization errors instead of focusing on the actual data insights.
The AI Workflow
Unlock the true power of prompt engineering by directly leveraging recently enhanced LLM capabilities and libraries like `litellm` to output validated Pydantic models. This ensures your human intent immediately transforms into machine-ready, structured data, skipping the messy parsing.
python
from pydantic import BaseModel, Field
from enum import Enum
import litellm # A recent update to LiteLLM or its underlying models improves Pydantic reliability
# Define your desired structured data shape with Pydantic
class AggregateFunction(str, Enum):
SUM = "sum"
MEAN = "mean"
COUNT = "count"
MAX = "max"
MIN = "min"
class QueryParameter(BaseModel):
column: str = Field(description="The column name to apply the operation on.")
value: str | int | float | None = Field(default=None, description="Value for filtering or comparison.")
operation: str | None = Field(default=None, description="Operation like 'equals', 'greater_than', 'contains'.")
aggregate: AggregateFunction | None = Field(default=None, description="Aggregation function to apply.")
class DataQuery(BaseModel):
select_columns: list[str] = Field(description="List of columns to select.")
filters: list[QueryParameter] = Field(default_factory=list, description="List of filter conditions.")
aggregations: list[QueryParameter] = Field(default_factory=list, description="List of aggregation operations.")
group_by_columns: list[str] = Field(default_factory=list, description="Columns to group by for aggregations.")
# The human's high-level request
user_request = "I need the average 'sales' and total 'profit' for 'region'='East' and 'product_category' containing 'Electronics', grouped by 'region' and 'month'."
# 🧠 The AI-augmented workflow: direct Pydantic model generation using Llama 3.1
# Llama 3.1 (released recently) excels at structured output when guided well.
response_object: DataQuery = litellm.completion(
model="ollama/llama3.1", # Use a powerful, recent SLM or cloud model
messages=[
{"role": "system", "content": "You are an expert data query generator. Based on the user's request, produce a structured data query adhering to the provided Pydantic schema for analysis with Polars/DuckDB. Ensure all fields are correctly populated and types match perfectly. Be concise."}, # Clear instructions
{"role": "user", "content": f"Generate a DataQuery object for: '{user_request}'"}
],
response_model=DataQuery, # LiteLLM's powerful Pydantic integration ensures valid output
temperature=0.1 # Keep it low for structured tasks
).choices[0].message.content # LiteLLM handles the parsing and validation for you!
print(response_object.model_dump_json(indent=2))
PRO
The Lever
This shifts hours of manual parsing and debugging into mere seconds. You gain a validated, ready-to-use Pydantic object, slashing development time by over 90% and virtually eliminating common data serialization errors, freeing you to focus on insightful analysis instead of tedious data wrangling.