Skip to content

Inference Engine

The inference engine analyzes IR records to discover API structure, path parameters, and schemas.

Overview

The inference engine processes IR records and produces:

  • Endpoint patterns: Discovered API endpoints with path templates
  • Path parameters: Dynamic URL segments (UUIDs, IDs, slugs)
  • Request schemas: JSON Schema for request bodies
  • Response schemas: JSON Schema for response bodies by status code
  • Query parameters: Discovered query string parameters

Basic Usage

import "github.com/grokify/traffic2openapi/pkg/inference"

// Create engine with default options
engine := inference.NewEngine(inference.DefaultEngineOptions())

// Process records
engine.ProcessRecords(records)

// Get results
result := engine.Finalize()

// Result contains discovered endpoints and schemas
for path, endpoint := range result.Endpoints {
    fmt.Printf("Endpoint: %s\n", path)
    for method, operation := range endpoint.Operations {
        fmt.Printf("  %s: %d requests\n", method, operation.RequestCount)
    }
}

Engine Options

options := inference.EngineOptions{
    // Path parameter detection
    DetectPathParams: true,

    // Minimum occurrences to consider a pattern
    MinOccurrences: 2,

    // Include 4xx/5xx responses in schema inference
    IncludeErrorResponses: true,

    // Maximum depth for schema inference
    MaxSchemaDepth: 10,

    // Merge similar schemas
    MergeSchemas: true,
}

engine := inference.NewEngine(options)

Path Parameter Detection

The engine automatically detects dynamic path segments:

Pattern Detected As Example
UUID {id} /users/550e8400-e29b-41d4-a716-446655440000
Numeric ID {id} /users/12345
Short hash {hash} /commits/a1b2c3d
Date {date} /reports/2024-01-15
Slug {slug} /posts/hello-world

Context-aware naming:

/users/123        → /users/{userId}
/posts/456        → /posts/{postId}
/orders/789/items → /orders/{orderId}/items

Schema Inference

Type Detection

JSON Type Inferred Type
"hello" string
123 integer
12.5 number
true boolean
[] array
{} object
null nullable

Format Detection

Pattern Format
user@example.com email
550e8400-e29b-... uuid
2024-01-15T10:30:00Z date-time
2024-01-15 date
https://example.com uri
192.168.1.1 ipv4
::1 ipv6

Required vs Optional

Fields are tracked across multiple requests:

// Request 1: {"name": "Alice", "email": "alice@example.com"}
// Request 2: {"name": "Bob"}
// Request 3: {"name": "Charlie", "email": "charlie@example.com"}

// Result:
// - "name" is required (present in all requests)
// - "email" is optional (present in 2/3 requests)

Result Structure

type InferenceResult struct {
    // Discovered endpoints keyed by path template
    Endpoints map[string]*Endpoint

    // Global schemas that can be reused
    Schemas map[string]*Schema
}

type Endpoint struct {
    // Path template (e.g., "/users/{userId}")
    PathTemplate string

    // Path parameters
    PathParams []PathParam

    // Operations keyed by HTTP method
    Operations map[string]*Operation
}

type Operation struct {
    // HTTP method
    Method string

    // Number of requests observed
    RequestCount int

    // Query parameters
    QueryParams []QueryParam

    // Request body schema
    RequestSchema *Schema

    // Response schemas keyed by status code
    ResponseSchemas map[int]*Schema
}

Processing Modes

Batch Processing

// Process all records at once
engine := inference.NewEngine(options)
engine.ProcessRecords(records)
result := engine.Finalize()

Streaming Processing

// Process records one at a time
engine := inference.NewEngine(options)

reader, _ := provider.NewReader(ctx, "traffic.ndjson")
for {
    record, err := reader.Read()
    if err == io.EOF {
        break
    }
    engine.ProcessRecord(record)
}

result := engine.Finalize()

Incremental Processing

// Add more records to existing engine
engine.ProcessRecords(batch1)
// ... later ...
engine.ProcessRecords(batch2)
// Only finalize when done
result := engine.Finalize()

Convenience Functions

// Infer from directory of IR files
result, err := inference.InferFromDir("./traffic/")

// Infer from single file
result, err := inference.InferFromFile("traffic.ndjson")

Integration with OpenAPI Generator

import (
    "github.com/grokify/traffic2openapi/pkg/inference"
    "github.com/grokify/traffic2openapi/pkg/openapi"
)

// Infer API structure
engine := inference.NewEngine(inference.DefaultEngineOptions())
engine.ProcessRecords(records)
result := engine.Finalize()

// Generate OpenAPI spec
options := openapi.DefaultGeneratorOptions()
options.Title = "My API"
options.Version = openapi.Version31

spec := openapi.GenerateFromInference(result, options)
openapi.WriteFile("openapi.yaml", spec)

Best Practices

Sufficient Sample Size

More requests lead to better inference:

  • Path parameters: Need multiple values to detect patterns
  • Required fields: Need multiple requests to distinguish required/optional
  • Response schemas: Need examples of each status code

Representative Traffic

Capture diverse traffic for best results:

  • All API endpoints
  • Various query parameter combinations
  • Different request body shapes
  • Success and error responses

Pre-filtering

Filter traffic before inference:

// Only process successful responses
var filtered []*ir.IRRecord
for _, record := range records {
    if record.Response.Status >= 200 && record.Response.Status < 300 {
        filtered = append(filtered, record)
    }
}
engine.ProcessRecords(filtered)