Skip to content

Python script to extract LLM ready schema information from a Postgres database (local or remote)

License

Notifications You must be signed in to change notification settings

davehague/db-schema-extractor

Repository files navigation

DB Schema Extractor

A Python utility that extracts complete database schema definitions from PostgreSQL and CockroachDB databases and saves them to a file. This is especially useful for providing up-to-date context about your database structure to Large Language Models (LLMs) like Claude or ChatGPT.

Features

  • Extracts comprehensive database schema information:
    • Tables, columns, and data types
    • Primary and foreign keys
    • Indexes and constraints
    • Views and their definitions
    • Triggers and stored procedures
    • Sequences
  • Configurable via a simple configuration file
  • Multiple output formats (text or JSON)
  • Self-contained output suitable for LLM context
  • CockroachDB compatible with graceful fallbacks

Project Structure

db-schema-extractor/
├── __init__.py                  # Package initialization
├── db_schema_extractor.py       # Main entry point
├── config_manager.py            # Configuration handling
├── db_queries.py                # Database query functions
├── schema_generator.py          # Schema processing functions
├── configs/                     # Configuration files directory
│   └── config.ini.sample        # Example configuration template
├── output/                      # Schema output files directory
│   └── sample_schema_output.txt # Example schema output
├── LICENSE                      # License file
└── README.md                    # This documentation

Installation

Prerequisites

  • Python 3.6 or higher
  • PostgreSQL or CockroachDB database access

Setup with Virtual Environment

  1. Clone the repository:

    git clone https://github.com/yourusername/db-schema-extractor.git
    cd db-schema-extractor
  2. Create and activate a virtual environment:

    # Create virtual environment
    python -m venv venv
    
    # Activate on Windows
    venv\Scripts\activate
    
    # Activate on macOS/Linux
    source venv/bin/activate
  3. Install required dependencies:

    pip install psycopg2-binary

Configuration

Create a configuration file in the configs/ directory based on the provided template:

# Copy the sample config
cp configs/config.ini.sample configs/my-database.ini

# Edit with your database details
nano configs/my-database.ini

The configuration file format:

[database]
host = localhost
port = 5432
database = your_database_name
user = your_username
password = your_password

# Optional: CockroachDB specific settings
[cockroach]
# Uncomment and set these values for secure CockroachDB connections
# sslmode = verify-full
# sslrootcert = /path/to/ca.crt
# sslcert = /path/to/client.crt
# sslkey = /path/to/client.key

Note: Configuration files contain sensitive information. The configs/ directory is included in .gitignore by default to prevent accidental commits of database credentials.

Usage

Basic Usage

# Extract schema and save to output directory
python db_schema_extractor.py -c configs/my-database.ini -o output/my-schema.txt

Command Line Options

Option Short Description
--config -c Path to the configuration file (required)
--output -o Path to the output file (required)
--schema -s Database schema to extract (default: public)
--format -f Output format: text or json (default: text)
--db-type Database type: postgres or cockroachdb (auto-detected by default)

Examples

Extract the public schema in text format:

python db_schema_extractor.py -c configs/postgres-dev.ini -o output/postgres-dev-schema.txt

Extract a specific schema in JSON format:

python db_schema_extractor.py -c configs/postgres-prod.ini -o output/accounting-schema.json -s accounting -f json

Use with CockroachDB:

python db_schema_extractor.py -c configs/cockroach-prod.ini -o output/cockroach-schema.txt --db-type cockroachdb

Extract multiple schemas (save to separate files):

python db_schema_extractor.py -c configs/postgres-dev.ini -o output/public-schema.txt -s public
python db_schema_extractor.py -c configs/postgres-dev.ini -o output/hr-schema.txt -s hr

Output Format

Text Format (Default)

The text format produces a human-readable document with sections for tables, views, and sequences. This format is optimized for providing context to LLMs. See output/sample_schema_output.txt for an example.

JSON Format

The JSON format provides a structured representation of the schema that can be parsed programmatically if needed.

Integrating with LLMs

To use the extracted schema with an LLM:

  1. Run the schema extractor to generate an up-to-date schema file.
  2. When asking the LLM questions about your database, include the content of the schema file as context.

Example prompt:

I have the following database schema:

[PASTE SCHEMA HERE]

Based on this schema, how would I write a query to...

License

MIT License

About

Python script to extract LLM ready schema information from a Postgres database (local or remote)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages