Getting Started

Learn how to use the AI Infrastructure Planner to size your workloads and calculate total cost of ownership.

Platform Demo

Watch this comprehensive walkthrough to see how to use all the features of the AI Infrastructure Planner.

GPU Sizing Concepts

Memory-First Sizing Approach

This tool uses a memory-first approach to GPU sizing, recognizing that for most GenAI inference workloads, memory capacity is the primary constraint, not compute performance. The sizing calculation considers:

Model Weights: The base memory footprint determined by parameter count and precision (4-bit, 8-bit, 16-bit, or 32-bit)
KV Cache: Memory required to store key-value attention states for all concurrent requests, calculated from actual model architecture (layers, hidden size, attention heads)
Concurrent Requests: Each active request requires KV cache memory for its full sequence length (input + output tokens)
Advanced Architectures: Automatically handles Grouped Query Attention (GQA), Mixture of Experts (MoE), and Multi-Head Latent Attention (MLA) optimizations

Model Instance Deployment

The tool intelligently determines how to deploy your model across GPUs and servers:

Small Models: Models using <80% of single GPU memory run 1 instance per GPU for maximum throughput
Medium Models: Models using >80% of GPU memory span 2 GPUs per instance to ensure stability
Large Models: Models exceeding server capacity (default 8 GPUs) can span 2-4 servers per instance using tensor parallelism
Multiple Instances: When workload demands exceed single instance capacity, the tool calculates optimal instance count and distribution

Throughput Estimation

Throughput estimates assume memory bandwidth is the bottleneck for autoregressive inference (typical for LLMs). The tool:

Calculates theoretical tokens/second based on GPU memory bandwidth and KV cache size
Applies a configurable efficiency modifier (default 70%) to account for real-world batching and scheduling overhead
Scales linearly with GPU count for multi-GPU deployments

Using the GenAI Sizing Calculator

The GenAI Sizing Calculator can pull model configurations directly from Hugging Face or accept manual input:

Automated: Select from predefined models or enter any Hugging Face model ID to automatically load parameters, architecture details, and calculate accurate memory requirements
Manual: Enter parameter count and optionally paste config.json for custom or private models
Precision Control: Configure model weights and KV cache precision independently for optimization
Push to TCO: Send your sizing results directly to the TCO Calculator to see infrastructure costs

Vector Database Sizing

The Vector DB Sizing tool helps estimate storage and memory requirements for vector databases used in RAG applications:

Calculate storage from document corpus size or direct vector count
Account for index overhead (HNSW, IVF, DiskANN, etc.)
Estimate RAM requirements based on memory mode (all-in-RAM, hybrid, or DiskANN)
Include replication, backup, and metadata in sizing calculations

TCO Methodology

What is Total Cost of Ownership?

Total Cost of Ownership (TCO) provides a comprehensive financial comparison between on-premises and cloud AI infrastructure over a specified time period (1-5 years). The tool calculates all costs including hardware, energy, personnel, facilities, and operations to help you make informed infrastructure decisions.

On-Premises Cost Components

The tool calculates on-premises costs from granular hardware specifications:

Hardware (CAPEX): GPU servers, control servers, storage servers, network switches, and optical transceivers - each with customizable unit costs
Energy Costs: Power consumption calculated from GPU type and server count, with configurable PUE (Power Usage Effectiveness) for cooling overhead
Facilities: Rack space costs calculated from power and space requirements, accounting for rack power limits (default 30kW per rack)
Personnel: Platform engineers and network engineers with fully loaded costs (includes 30% overhead for benefits)
Maintenance: Annual hardware maintenance as percentage of hardware cost (default 10%)
Software: Annual costs for data platform, monitoring, and management tools

Cloud Cost Components

Cloud costs are calculated based on 24/7 operation with configurable adjustments:

Compute: Instance pricing pulled from live AWS pricing API, matching GPU type to appropriate instance family (p5.48xlarge for H100, p6-b200.48xlarge for B200, etc.)
Storage: Separate pricing for EBS (block storage), S3 (object storage), FSx for Lustre (high-performance file system), and EFS (network file system)
Networking: Data transfer costs for inter-AZ traffic and internet egress, calculated from monthly transfer volumes
Discounts: Configurable discount percentage applied to all cloud services (compute, storage, networking, software)
Personnel: Cloud engineers for infrastructure management (typically fewer FTEs than on-prem)
Optional Services: SageMaker (15% overhead on compute) and configurable enterprise support (% of total cloud spend)

Networking Calculations

On-premises networking is calculated based on cluster size using spine-leaf architecture:

Small clusters (≤16 GPU servers): 2 spine switches
Larger clusters: Leaf switches (1 per 8 GPU servers) + spine switches (half the number of leaf switches, minimum 2)
Very large clusters (>512 servers): Additional super-spine layer
Optics calculated from switch ports (96 effective ports per switch × 2 optics per connection), assuming 400G connections to each GPU NIC where each 800G switch port uses a splitter to create two 400G connections

Key Decision Factors

Analysis Period:

Longer periods (3-5 years) allow on-prem hardware costs to amortize. The tool shows break-even point and cumulative costs over time.

Utilization:

Tool assumes 24/7 operation by default. Cloud costs scale linearly with usage while on-prem has fixed costs regardless of utilization.

Hardware Pricing:

All hardware unit costs are customizable. Default values reflect enterprise pricing but can be adjusted based on your vendor negotiations.

Facility Costs:

Rack costs, power pricing ($/kWh), and PUE are all configurable to match your specific data center environment.

Using the TCO Calculator

The TCO Calculator workflow:

Select Workload Preset: Start with predefined cluster sizes (Small: 2 nodes, Medium: 8 nodes, Large: 16 nodes, XL: 32 nodes) or customize
Configure GPU Hardware: Adjust GPU server count, GPU type, and server specifications. Networking is auto-calculated but customizable.
Set On-Premises Parameters: Configure facilities (power cost, PUE, rack costs), personnel (engineers, salaries), and maintenance rates
Set Cloud Parameters: Choose region, configure storage options (S3, FSx, EFS), set discounts, add optional services (SageMaker, enterprise support)
Choose Analysis Period: Select 1-5 years to see how costs evolve and where the break-even point occurs
Review Results: Detailed breakdown shows cost by category, break-even analysis, and per-unit cloud pricing used in calculations
Export PDF: Generate comprehensive reports for stakeholder review

Workload Presets

Presets provide starting points for common AI cluster configurations with pre-calculated networking and support infrastructure:

Small Cluster (2 nodes): 16 GPUs, minimal control plane, 2 switches
Medium Cluster (8 nodes): 64 GPUs, production control plane (5 nodes), storage servers
Large Cluster (16 nodes): 128 GPUs, full spine-leaf networking
XL Cluster (32 nodes): 256 GPUs, enterprise-scale infrastructure

💡 Pro Tip

The detailed view in the results page shows the actual per-unit cloud pricing used in calculations (pulled from AWS pricing API). This transparency helps you verify calculations and understand cost drivers. All pricing updates automatically when you change GPU types or regions.

Additional Resources

For more information or to provide feedback on the platform, please use the feedback link in the sidebar navigation.

This tool is designed to provide guidance for infrastructure planning. Always validate assumptions and pricing with your specific vendors and requirements.