Data & Vectors

The platform relies on a combination of relational data and high-dimensional vector embeddings to provide intelligent code reviews.

Database Schema (ER Diagram)

We use Prisma as our ORM and PostgreSQL as the primary data store.

Database Schema

┌──────────────┐
│     User     │
└──────┬───────┘
       │
       ▼

┌──────────────┐    ┌──────────────┐    ┌──────────────────┐
│   Session    │    │   Account    │    │    Repository    │
├──────────────┤    ├──────────────┤    ├──────────────────┤
│ id           │    │ id           │    │ id               │
│ userId       │    │ userId       │    │ userId           │
│ token        │    │ provider     │    │ githubId         │
│ expiresAt    │    │ tokens...    │    │ owner            │
└──────────────┘    └──────────────┘    │ name             │
                                        │ fullName         │
                                        └────────┬─────────┘
                                                 │
                                                 ▼

                    ┌──────────────────┐    ┌──────────────────┐
                    │      Review      │    │    CodeChunk     │
                    ├──────────────────┤    ├──────────────────┤
                    │ id               │    │ id               │
                    │ repositoryId     │    │ repositoryId     │
                    │ prNumber         │    │ filePath         │
                    │ status           │    │ startLine        │
                    │ review           │    │ endLine          │
                    └──────────────────┘    │ language         │
                                            │ signature        │
                                            │ embedding        │
                                            └──────────────────┘

┌──────┐
│ User │
└──┬───┘
   │
   ├─────────────────┬─────────────────┬─────────────────►
   │                 │                 │
   ▼                 ▼                 ▼

┌─────────┐    ┌─────────┐    ┌─────────────────┐
│ Session │    │ Account │    │   Repository    │
└─────────┘    └─────────┘    └────────┬────────┘
                                       │
                     ┌─────────────────┴─────────────────┐
                     │                                   │
                     ▼                                   ▼

              ┌─────────────┐                   ┌─────────────┐
              │   Review    │                   │  CodeChunk  │
              └─────────────┘                   └─────────────┘

RAG pipeline

┌─────────┐    ┌──────────────┐    ┌──────────────┐
│ PR Diff │───►│  Embedding   │───►│  pgvector    │
└─────────┘    └──────────────┘    └──────────────┘
                                           │
                                           ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Gemini AI    │◄───│   Prompt     │◄───│   Context    │
│   Review     │    │ Construction │    │  Retrieval   │
└──────────────┘    └──────────────┘    └──────────────┘

Chunking Strategy

We use Tree-sitter to ensure chunks are semantically meaningful.

AST Chunking
Module Fallback

Chunks are created at function and class boundaries. This ensures that the AI receives complete logical units rather than arbitrary snippets of text.