FLOPs, Parameters, and Tokens

What every product manager should know about AI’s building blocks

Jul 01, 2025

Artificial intelligence (AI) is reshaping industries, and nowhere is this more evident than in fast-moving sectors. You’ll often hear AI folks talk about FLOPs, parameters, and tokens. But what do these terms actually mean and why should product and tech leaders care?

As AI models get larger and more capable, these three concepts are shaping how we build, scale, and evaluate modern AI systems. And if you’re a product manager (PM), understanding these foundations isn’t just helpful. It’s strategic.

Let’s break down what these terms mean, why they matter, and how they impact your business.

FLOPs (Floating Point Operations)

The computational effort behind AI. It’s a measure of the number of mathematical calculations a model must perform to train or make predictions
More FLOPs = more compute power required = higher infrastructure cost.
Think of FLOPs as the physical effort it takes to train the model.

Example: Training OpenAI’s GPT-3 required roughly 3.14 × 10²³ FLOPs—hundreds of petaflop/s-days of compute, costing millions of dollars in cloud resources. Newer models like GPT-4 and Gemini Ultra are even more demanding, with training compute costs reaching into the tens or hundreds of millions

Why should PMs care?
If your product relies on AI, the FLOPs required for training or inference directly impact your budget and infrastructure choices. High FLOPs can mean slower deployment, higher costs, and more complex scaling challenges.

Tokens

The building blocks of AI understanding. A token is the smallest unit of data processed by an AI model. In natural language processing (NLP), tokens are usually smaller than words, often sub-word units, characters, or even parts of images or audio. Tokenization is the process of breaking down input data into these digestible chunks. Punctuation, capitalisation, and unusual formatting also affect how many tokens are created.
More tokens = more learning data.
More tokens during training = a broader knowledge base.
More tokens at inference = more room for nuanced understanding or longer responses

Example: "The cat sat" = 3 tokens.
The sentence “ChatGPT is amazing!” may seem like 3–4 words, but the model breaks it down into 5 tokens as["Chat", "G", "PT", " is", " amazing!"] Notice how punctuation and formatting influence tokenization.

Why do tokens matter?
More tokens during training: the model learns from a broader, more diverse dataset
More tokens at inference: The model can handle longer, more nuanced prompts or responses.
Context window: Models like Gemini 1.5 can process up to 10 million tokens in a single prompt, enabling reasoning over long documents, videos, or code

Real-world impact:
If your AI product needs to process large transaction histories, customer support chats, or complex documents, understanding tokenization helps you scope data requirements and anticipate performance limits.

Parameters

The models capacity for learning. Parameters are the internal weights the model learns and adjusts during training. They define how the model interprets data, stores knowledge, and generates outputs.

Example:
GPT-3: 175 billion parameters
GPT-4 & Gemini 1.5: Estimate trillions of parameters, using a mixture-of-experts (MoE) architecture for efficiency and scale

Why does parameter count matter?

Higher accuracy: more parameters allow a model to understand and generate more sophisticated outputs.
Trade-offs: larger models require more compute, are slower at inference, and cost more to run.

Inference is the process of using a trained model to generate predictions or answers from new data. It’s what happens when you ask ChatGPT a question or when a fraud detection system scores a transaction in real time.

Why Inference matter for product teams:
Speed: inference latency directly affects user experience. Slow response times can lead to abandoned checkouts or frustrated customers.
Cost: every inference uses compute resources, and costs scale with usage
Model choice: larger models may be more accurate but slower and more expensive at inference.

An Analogy: teaching a student

Tokens are like the books a student reads.
Parameters are the neural connections they form.
FLOPs are the effort it takes to study and process the material.

The more books they read (tokens), the more they can learn.
But how deeply they understand depends on their brain’s capacity (parameters) and how hard they train (FLOPs).

Use Case:

Let’s say you’re a PM in Payments space building a fraud detection feature using AI. Here’s how these concepts apply:

Tokens: you feed the model structured data like transaction descriptions, merchant names, timestamps, and geolocation. These are your learning materials. More tokens = richer context for learning.
Parameters: ff your model needs to detect subtle fraud behavior across millions of users, you might choose a larger model with more parameters. If speed and cost matter more—say, for real-time approvals at checkout—you might go with a smaller, faster model.
FLOPs: a high-accuracy model might be expensive to train. You’ll need to weigh whether that investment makes sense for your business or whether you can use a pre-trained model with fine-tuning.
Inference: fraud scoring needs to happen in milliseconds. Even a high-performing model is useless if it slows down the transaction. Understanding inference latency helps you make the right trade-offs.

🎯 What This Means for You

As a PM, understanding these tradeoffs helps you:

✅ Scope the right-sized model for your use case
✅ Make informed build vs. buy decisions
✅ Collaborate better with engineering and ML teams
✅ Balance cost, latency, and accuracy in production

Whether you’re working on fraud detection, personalization, customer support, or payments infrastructurem grasping how tokens, parameters, FLOPs, and inference work gives you a seat at the AI table.

TL;DR

Tokens = what the model learns from
Parameters = how much the model can understand
FLOPs = how much effort it takes to train
Inference = how fast and accurately it performs in the real world

Together, they define the cost, performance, and possibility of every AI-powered product you ship.

Here’s a related post: AI resources for technical & non-technical audiences.

Wander & Ponder

Discussion about this post