Metrics Throughput TTFT and TPS

Metrics Throughput TTFT and TPS overview

Throughput metrics measure AI model performance in processing and generating output. The two primary indicators monitored are TTFT and TPS.


Time To First Token (TTFT)

TTFT is the time required for the model to generate the first token after receiving input. This metric measures initial system responsiveness.

Why TTFT Matters:

  • Determines the initial response speed of the application
  • Influences user perception of system performance
  • Indicates efficiency of model loading and initialization processes
  • Lower TTFT results in a more responsive user experience

Tokens Per Second (TPS)

TPS measures the number of tokens generated by the model per second after the first token. This metric indicates sustained processing speed.

Why TPS Matters:

  • Determines overall speed in generating complete output
  • Affects system throughput efficiency
  • Indicates GPU utilization and model optimization
  • Higher TPS enables processing more concurrent requests

Throughput Chart

The chart displays a comparison of TTFT and TPS over time. This visualization helps:

  • Identify model performance patterns
  • Detect anomalies or performance degradation
  • Optimize system configuration based on historical data
  • Compare performance across usage sessions

Throughput data is updated after each model inference completion to provide an accurate representation of system performance.