How Python Reads Your Code: AST Explained Using a Data Contract Framework

When most developers think about Python, they think about code being executed.

We write:

price = 100

price = 100

and Python runs it.

We write:

class TradeSchema:
    symbol: str
    price: float

class TradeSchema:
    symbol: str
    price: float

and Python creates a class.

Simple.

But here’s an interesting question.

What if we wanted to understand what the code is doing without actually running it?

What if we wanted to answer questions like:

Which fields exist in a schema?
Did a developer rename a field?
Was a new attribute added?
Is a schema missing a required column?

All without executing a single line of code.

This is exactly the problem Python’s Abstract Syntax Tree, or AST, helps solve.

The Problem: Reading Code Like a Human

Imagine you are reviewing a pull request.

Yesterday the schema looked like this:

class TradeSchema:
    symbol: str
    price: float
    volume: int

class TradeSchema:
    symbol: str
    price: float
    volume: int

Today it looks like this:

class TradeSchema:
    symbol: str
    close_price: float
    volume: int

class TradeSchema:
    symbol: str
    close_price: float
    volume: int

As humans, we immediately notice:

price
↓
close_price

price
↓
close_price

We recognize that the field was renamed.

But computers don’t naturally “read” code like humans do.

To a computer, source code is just text.

class TradeSchema:
    symbol: str
    price: float
    volume: int

class TradeSchema:
    symbol: str
    price: float
    volume: int

is nothing more than characters stored inside a file.

Before Python can execute that code, it must first understand its structure.

That’s where AST comes in.

The Translator Inside Python

Imagine you’re reading a book written in a language you don’t understand. Before you can understand the story, you need a translator. Python faces a similar challenge.

Before Python can execute code, it must translate it into a structure it can understand. That structure is called an Abstract Syntax Tree.

Think of AST as Python’s internal blueprint of your code.

Instead of seeing:

price = 100

price = 100

Python sees something closer to:

Assignment
├── Variable: price
└── Value: 100

Assignment
├── Variable: price
└── Value: 100

The code becomes a structured representation rather than raw text.

Why Is It Called a Tree?

The word “tree” comes from how information is organized.

Consider:

price = quantity * close

price = quantity * close

Python breaks this into smaller pieces.

Assignment
├── Variable: price
└── Multiplication
    ├── quantity
    └── close

Assignment
├── Variable: price
└── Multiplication
    ├── quantity
    └── close

Each piece becomes a node. Nodes connect to other nodes. Eventually the entire program becomes a tree-like structure. This is the AST.

The important thing to remember is that Python does not execute source code directly. It first converts code into an AST. Only then does execution happen.

Seeing an AST for the First Time

Python exposes this functionality through the built-in ast module.

Suppose we have:

source = """
price = 100
"""

source = """
price = 100
"""

We can parse it:

import ast

tree = ast.parse(source)

print(ast.dump(tree, indent=4))

import ast

tree = ast.parse(source)

print(ast.dump(tree, indent=4))

Output:

Module
    body=[
        Assign(
            targets=[
                Name(id='price')
            ],
            value=Constant(value=100)
        )
    ]

Module
    body=[
        Assign(
            targets=[
                Name(id='price')
            ],
            value=Constant(value=100)
        )
    ]

At first glance this looks intimidating.

But it is simply Python describing what it sees.

Instead of:

price = 100

price = 100

Python sees:

Assignment
Variable -> price
Value -> 100

Assignment
Variable -> price
Value -> 100

The AST is just a more detailed version of that idea.

Why AST Matters for Data Engineering

Let’s return to our schema problem.

Suppose a developer writes:

class TradeSchema:
    symbol: str
    price: float
    volume: int

class TradeSchema:
    symbol: str
    price: float
    volume: int

If we parse this code using AST, we can inspect the structure without importing or executing anything.

This is incredibly useful.

Imagine a CI/CD pipeline checking a pull request.

The pipeline can analyze the schema definition before deployment.

It can ask:

Were fields removed?
Were fields added?
Were data types changed?
Does this break existing consumers?

Because AST understands code structure, these checks become possible.

Building a Simple Schema Extractor

Suppose we parse:

class TradeSchema:
    symbol: str
    price: float
    volume: int

class TradeSchema:
    symbol: str
    price: float
    volume: int

AST represents the class as a ClassDef node.

Inside that node are annotation definitions.

We can walk through them:

import ast

source = """
class TradeSchema:
    symbol: str
    price: float
    volume: int
"""

tree = ast.parse(source)

for node in ast.walk(tree):

    if isinstance(node, ast.ClassDef):

        print(node.name)

        for item in node.body:

            if isinstance(item, ast.AnnAssign):

                print(item.target.id)

import ast

source = """
class TradeSchema:
    symbol: str
    price: float
    volume: int
"""

tree = ast.parse(source)

for node in ast.walk(tree):

    if isinstance(node, ast.ClassDef):

        print(node.name)

        for item in node.body:

            if isinstance(item, ast.AnnAssign):

                print(item.target.id)

Output:

TradeSchema
symbol
price
volume

TradeSchema
symbol
price
volume

Without executing the file, we have extracted the schema fields.

This is the foundation of static code analysis.

How Our Data Contract Framework Uses AST

In our project, developers define schemas like this:

@contract
class TradeSchema:
    symbol: str
    price: float
    volume: int

@contract
class TradeSchema:
    symbol: str
    price: float
    volume: int

When a new version appears:

@contract
class TradeSchema:
    symbol: str
    close_price: float
    volume: int

@contract
class TradeSchema:
    symbol: str
    close_price: float
    volume: int

we want to know what changed.

Using AST, we can extract both schema definitions.

Version 1:

symbol
price
volume

symbol
price
volume

Version 2:

symbol
close_price
volume

symbol
close_price
volume

Comparing them reveals:

BREAKING CHANGE

Removed:
    price

Added:
    close_price

BREAKING CHANGE

Removed:
    price

Added:
    close_price

No execution required.

No imports required.

Just source code analysis.

AST Is How Python Sees Your Code

One of the biggest misconceptions about AST is that it is some advanced feature used only by compiler engineers.

In reality, AST is how Python itself understands your code.

Every script you run.
Every class you create.
Every function you define.

Python first transforms it into an Abstract Syntax Tree.

The ast module simply gives us access to that internal representation.

Once we have access to it, we can build tools that understand code instead of merely executing it.

Final Thoughts

Most developers use Python to run code. AST allows us to inspect code. That distinction is incredibly powerful.

For our data contract framework, AST will become the engine that detects schema changes, identifies breaking modifications, and generates migration reports before data pipelines fail.

Instead of waiting for production systems to break, we can analyze source code and catch problems early.

And it all starts with understanding how Python reads your code.