When most developers think about Python, they think about code being executed.
We write:
price = 100
and Python runs it.
We write:
class TradeSchema:
symbol: str
price: float
and Python creates a class.
Simple.
But here’s an interesting question.
What if we wanted to understand what the code is doing without actually running it?
What if we wanted to answer questions like:
- Which fields exist in a schema?
- Did a developer rename a field?
- Was a new attribute added?
- Is a schema missing a required column?
All without executing a single line of code.
This is exactly the problem Python’s Abstract Syntax Tree, or AST, helps solve.
The Problem: Reading Code Like a Human
Imagine you are reviewing a pull request.
Yesterday the schema looked like this:
class TradeSchema:
symbol: str
price: float
volume: int
Today it looks like this:
class TradeSchema:
symbol: str
close_price: float
volume: int
As humans, we immediately notice:
price
↓
close_price
We recognize that the field was renamed.
But computers don’t naturally “read” code like humans do.
To a computer, source code is just text.
class TradeSchema:
symbol: str
price: float
volume: int
is nothing more than characters stored inside a file.
Before Python can execute that code, it must first understand its structure.
That’s where AST comes in.
The Translator Inside Python
Imagine you’re reading a book written in a language you don’t understand. Before you can understand the story, you need a translator. Python faces a similar challenge.
Before Python can execute code, it must translate it into a structure it can understand. That structure is called an Abstract Syntax Tree.
Think of AST as Python’s internal blueprint of your code.
Instead of seeing:
price = 100
Python sees something closer to:
Assignment
├── Variable: price
└── Value: 100
The code becomes a structured representation rather than raw text.
Why Is It Called a Tree?
The word “tree” comes from how information is organized.
Consider:
price = quantity * close
Python breaks this into smaller pieces.
Assignment
├── Variable: price
└── Multiplication
├── quantity
└── close
Each piece becomes a node. Nodes connect to other nodes. Eventually the entire program becomes a tree-like structure. This is the AST.
The important thing to remember is that Python does not execute source code directly. It first converts code into an AST. Only then does execution happen.
Seeing an AST for the First Time
Python exposes this functionality through the built-in ast module.
Suppose we have:
source = """
price = 100
"""
We can parse it:
import ast
tree = ast.parse(source)
print(ast.dump(tree, indent=4))
Output:
Module
body=[
Assign(
targets=[
Name(id='price')
],
value=Constant(value=100)
)
]
At first glance this looks intimidating.
But it is simply Python describing what it sees.
Instead of:
price = 100
Python sees:
Assignment
Variable -> price
Value -> 100
The AST is just a more detailed version of that idea.
Why AST Matters for Data Engineering
Let’s return to our schema problem.
Suppose a developer writes:
class TradeSchema:
symbol: str
price: float
volume: int
If we parse this code using AST, we can inspect the structure without importing or executing anything.
This is incredibly useful.
Imagine a CI/CD pipeline checking a pull request.
The pipeline can analyze the schema definition before deployment.
It can ask:
- Were fields removed?
- Were fields added?
- Were data types changed?
- Does this break existing consumers?
Because AST understands code structure, these checks become possible.
Building a Simple Schema Extractor
Suppose we parse:
class TradeSchema:
symbol: str
price: float
volume: int
AST represents the class as a ClassDef node.
Inside that node are annotation definitions.
We can walk through them:
import ast
source = """
class TradeSchema:
symbol: str
price: float
volume: int
"""
tree = ast.parse(source)
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
print(node.name)
for item in node.body:
if isinstance(item, ast.AnnAssign):
print(item.target.id)
Output:
TradeSchema
symbol
price
volume
Without executing the file, we have extracted the schema fields.
This is the foundation of static code analysis.
How Our Data Contract Framework Uses AST
In our project, developers define schemas like this:
@contract
class TradeSchema:
symbol: str
price: float
volume: int
When a new version appears:
@contract
class TradeSchema:
symbol: str
close_price: float
volume: int
we want to know what changed.
Using AST, we can extract both schema definitions.
Version 1:
symbol
price
volume
Version 2:
symbol
close_price
volume
Comparing them reveals:
BREAKING CHANGE
Removed:
price
Added:
close_price
No execution required.
No imports required.
Just source code analysis.
AST Is How Python Sees Your Code
One of the biggest misconceptions about AST is that it is some advanced feature used only by compiler engineers.
In reality, AST is how Python itself understands your code.
- Every script you run.
- Every class you create.
- Every function you define.
Python first transforms it into an Abstract Syntax Tree.
The ast module simply gives us access to that internal representation.
Once we have access to it, we can build tools that understand code instead of merely executing it.
Final Thoughts
Most developers use Python to run code. AST allows us to inspect code. That distinction is incredibly powerful.
For our data contract framework, AST will become the engine that detects schema changes, identifies breaking modifications, and generates migration reports before data pipelines fail.
Instead of waiting for production systems to break, we can analyze source code and catch problems early.
And it all starts with understanding how Python reads your code.