FAANG Strategy

NVIDIA: How the AI Chip Leader Makes Money

NVIDIA data center revenue gross margin AI infrastructure decoded: ~75% margins, 90% from one segment, and the customer concentration that makes it fragile.

By Colson · Founder & Tech Business Analyst

June 27, 2026 15 min read

A circular gold silicon wafer standing on slate catching a gold highlight, a how-NVIDIA-makes-money metaphor in slate and gold

NVIDIA makes money the way a toll road does. It sits at the one chokepoint in AI infrastructure that every hyperscaler has to pay to cross, and it charges margins that look more like software than hardware. The clearest expression of that is a single phrase: NVIDIA data center revenue gross margin AI infrastructure economics now run at roughly 75%, on a segment that is about 90% of the whole company.

Per NVIDIA’s FY2026 Form 10-K (fiscal year ended January 25, 2026), Data Center revenue was $193.7 billion of $215.9 billion total, with company-wide gross margin at 71.1% GAAP. By Q1 FY2027 (Form 8-K, quarter ended April 26, 2026), Data Center revenue reached $75.2 billion of $81.6 billion total, at roughly a 75% gross margin.

A 75% gross margin on hardware is not a product cycle. It is a bottleneck made visible. When demand for accelerators outruns supply and the buyers are racing each other, the seller sets the price and the buyers do not negotiate hard on the one input they cannot substitute.

This piece reads how that money is actually made, through the filings rather than the keynotes. Every company figure ties to a specific filing and fiscal period. The framing is analytical: how the business works and where it is fragile, not what to do about any stock.

Key takeaways

One segment is the whole story. Data Center was about 90% of FY2026 revenue ($193.7B of $215.9B) and stepped to roughly 92% in Q1 FY2027 ($75.2B of $81.6B), per NVIDIA Forms 10-K (FY2026) and 8-K (ended April 26, 2026).
The margin is software-like, not hardware-like. Company GAAP gross margin was 71.1% in FY2026 (Form 10-K), with Data Center near 75% in Q1 FY2027 (Form 8-K). Typical server hardware sits closer to 35% to 50%.
The toll is paid by a few payers. Reported 10-Q disclosures put NVIDIA’s top two direct customers at 39% of Q2 FY2026 revenue (CNBC, August 2025), the same concentration that funds every customer’s in-house silicon program.
The durable risk is not demand, it is who controls inference. Custom chips from the largest cloud builders target inference workloads, where pricing is more elastic, while NVIDIA still owns large-scale training.
Margin is a clock, not a constant. A 75% margin funds the competition aimed straight at it. The bear case below weighs how fast that clock might run.

Why does NVIDIA control the gate to AI infrastructure?

NVIDIA controls the gate because every other layer of AI is built on accelerators it sells, the supply of those accelerators is bottlenecked, and the software and networking around them raise the cost of switching. That combination lets one vendor price the scarcest input in the entire stack.

Think of AI infrastructure as the same vertical stack laid out in the AI infrastructure market map: chips at the bottom, hyperscale capacity above them, model labs above that, applications on top. Margin pools at whichever layer is hardest to substitute. In 2026 that is the chip layer, and NVIDIA is the chip layer.

The toll-road analogy is precise on one point. A toll road does not make money by being the only way to travel. It makes money by being the cheapest, fastest, lowest-risk way to cross a gap everyone needs to cross right now. NVIDIA’s position is the same: a hyperscaler could route around it, but during a supply-constrained race, routing around the leader is slower and riskier than paying the toll.

That is why the spend keeps flowing toward it. The capital intensity behind the demand, and why the buyers accept it, is the subject of the AI capex arms race. The short version: when four companies are each spending tens of billions a quarter to win, the one input they cannot make themselves yet gets paid first.

The NVIDIA Margin Stack: where the toll actually comes from

Here is the central framework, which I will call the NVIDIA Margin Stack. It is the analytical asset this piece is built around, and it is meant to be citeable on its own: a one-screen read of what NVIDIA actually sells, what margin character each layer carries, and what defends that margin.

NVIDIA is often described as a chip company. The filings describe something wider: a vendor that sells silicon, complete systems, networking, and a software platform as one bundle, where each layer makes the others harder to replace.

Layer	What NVIDIA charges for	Margin character	What defends it
Chips (GPUs/accelerators)	Compute per accelerator	Highest; priced into a bottleneck	Supply scarcity + performance lead
Systems (servers/racks)	Integrated compute platforms	High; bundles chips with engineering	Integration and time-to-deploy
Networking (interconnect)	High-speed fabric between chips	High; sold with the system	Scale-up performance hard to match
Software (CUDA + libraries)	Mostly bundled, near-zero marginal cost	Software-like	Ecosystem lock-in and developer base

The NVIDIA Margin Stack. The framework is original analysis; the segment figures referenced throughout are sourced to the NVIDIA filings cited inline.

Read the right two columns together. The chip layer carries the headline margin because supply is scarce, but it would not hold that margin alone. The software layer is what makes a customer’s accumulated code, models, and tooling expensive to move, so the buyer keeps paying the toll even when a cheaper road appears. The networking layer is what makes the largest training clusters work at all, which is exactly the workload where alternatives are weakest.

The lesson generalizes. A bundle where each layer raises the switching cost of the others is the same structural advantage that turns a single owned input into durable economics, the pattern at the center of why gross margin is destiny in SaaS. NVIDIA runs that pattern at the hardware layer, which is why its margin reads like software.

The numbers that matter: revenue, margin, and data center dominance

The cleanest read on how NVIDIA makes money is the segment table. One line dominates everything.

Metric	FY2026 (10-K)	Q1 FY2027 (8-K)	Source
Total revenue	$215.9B	$81.6B	NVIDIA Form 10-K (FY2026); Form 8-K (ended Apr 26, 2026)
Data Center revenue	$193.7B	$75.2B	NVIDIA Form 10-K (FY2026); Form 8-K (ended Apr 26, 2026)
Data Center share of total	~90%	~92%	Derived from filings above
Company gross margin (GAAP)	71.1%	~75% Data Center	NVIDIA Form 10-K (FY2026); Form 8-K (ended Apr 26, 2026)

Sources: NVIDIA Corporation Form 10-K, FY2026 (ended January 25, 2026); NVIDIA Corporation Form 8-K, Q1 FY2027 (ended April 26, 2026).

Three facts fall straight out of the table. First, Data Center is not a segment, it is the company. At about 90% of revenue, every other product line is a rounding adjustment to the income statement, useful for optionality and not much else.

Second, the margin is climbing, not compressing. Company-wide GAAP gross margin was 71.1% in FY2026, and the Data Center segment ran near 75% by Q1 FY2027. Hardware margins are supposed to erode under competition and component cost. NVIDIA’s went the other way during the same window.

Third, the growth is steep enough that the base period barely matters. Q1 FY2027 Data Center revenue of $75.2 billion in a single quarter is most of what the segment did for the entire prior fiscal year. That is the shape of a supply-constrained market clearing at the seller’s price.

The reason a 75% margin is the tell, and not the revenue line, is the same reason it is the tell across the whole stack. The layer that can refuse to discount is the layer that owns the economics. Everything above it is a price taker until the bottleneck clears.

Who actually pays the toll: customer concentration and the big-four problem

NVIDIA’s revenue is paid by a very small number of very large buyers. That is the strength and the fault line in one number.

Per reported 10-Q disclosures cited by CNBC in August 2025, NVIDIA’s top two direct customers accounted for 39% of revenue in Q2 FY2026, with one customer near 23% and another near 16%. Subsequent quarters showed a similar pattern, with a handful of large customers accounting for a majority of revenue. (Direct customers in NVIDIA’s filings can be intermediaries; the end demand traces to the hyperscalers building AI capacity.)

This concentration has three consequences worth stating plainly:

Pricing power sits with a few buyers. When a small set of customers drives most of the revenue, each one carries unusual weight in pricing and allocation conversations.
Every large customer is also a competitor. The same hyperscalers funding NVIDIA’s margin are funding their own custom silicon. A 75% margin is precisely the incentive to design your way out of it.
Policy and macro risk concentrate too. Export-control changes or a single buyer pausing its build would land on a narrow revenue base rather than a diversified one.

This is the same analytical move an operator runs on any filing with a lopsided revenue base, the discipline laid out in customer concentration risk in SaaS filings. The question is identical whether the customer list is four hyperscalers or four enterprise accounts: how much of the business walks if one of them walks?

The honest counterweight is that the demand is currently booked, not hoped for. The buyers are committing capital years ahead because the workload they cannot yet run on their own chips, large-scale model training, still needs NVIDIA. Concentration is a vulnerability that has not triggered, not an absence of demand.

The durability question: will 75% margins survive custom silicon?

The most valuable thing about a 75% hardware margin is also the most dangerous: it funds everyone trying to take it. That is the bear case in one sentence, and it deserves the full version.

The largest cloud builders have every reason to design their own accelerators. They are the ones writing the largest checks, and the line item they most want to shrink is the one they do not control. Reported programs span Google’s TPU, AWS’s Trainium and Inferentia, Microsoft’s Maia, and Meta’s MTIA. These are not science projects; they are deployed against real internal workloads.

But the threat is uneven by workload. Custom silicon has made the most progress on inference, where the job is to serve a trained model cheaply at high volume. Inference is repetitive, more price-elastic, and easier to target with a purpose-built chip. The reported economics favor custom silicon here: meaningful per-query cost reductions on workloads a company runs constantly at scale.

Training is the harder nut. Training the largest frontier models depends on tying thousands of accelerators together with high-bandwidth interconnect, and on a mature software stack, both of which sit in the NVIDIA Margin Stack above. That is where NVIDIA’s lead is widest and where in-house silicon has the least traction. Even the companies pursuing open-weight strategies still train on large GPU clusters before they give the models away.

So the durability question resolves into a split screen. Inference margin is exposed and will face real custom-silicon pressure over time. Training margin is defended for now by physics and software, not just by supply scarcity. NVIDIA’s task is to keep the second from following the first.

NVIDIA’s response: control allocation, bundle the stack, widen the base

NVIDIA’s defense is not one move, it is a stack of them, and they line up with the layers in the Margin Stack.

Own training outright. Concentrate the performance lead where alternatives are weakest. As long as the largest training runs need NVIDIA’s interconnect and software, the highest-value workload stays captive.
Bundle networking and software. Selling the fabric and the CUDA platform with the chips raises the cost of switching any single layer. A customer evaluating a cheaper accelerator has to also replace the software and networking that made the cluster work.
Control allocation of scarce chips. When the newest generation is supply-constrained, the seller decides who gets it and when. Allocation is pricing power expressed as a queue.
Diversify the customer base. Favoring specialized cloud providers alongside the largest hyperscalers hedges against the exact concentration risk the filings disclose. A broader base of buyers dilutes any single customer’s pricing power.
Move up into systems. Selling integrated platforms rather than loose chips captures more of the deployment and makes NVIDIA the default architecture, not a component.

This is the same playbook a platform owner runs when it wants to convert a temporary lead into a structural one, the dynamic dissected in Microsoft Copilot and enterprise lock-in: bundle enough that leaving costs more than staying. It buys a cycle of durability. It does not repeal the long-run incentive every large customer has to build its own inference silicon.

The bear case

A model this confident about a 75% margin deserves the strongest argument against it. The bear case is that the current margin is a snapshot of a bottleneck, not a durable structure, and that betting on today’s scarcity is the classic way to be wrong about a chip company.

Start with the margin itself. A 75% gross margin on hardware is not a fortress so much as a bounty. It funds every competing design house, every customer’s in-house silicon team, and every foundry expansion aimed at the bottleneck. The buyers concentrated enough to pay 39% of revenue between two of them (CNBC, August 2025, citing 10-Q disclosures) are exactly the ones with the capital and the motive to route around the toll. If accelerator supply normalizes even partially, the margin reverts toward ordinary hardware economics.

Now the concentration. The same demand strength that makes the business look unassailable is what makes it brittle. A diversified hardware vendor can lose a customer and barely notice. A vendor where a handful of buyers drive the majority of revenue cannot. One large customer shifting inference inward, or a policy change removing a buyer, lands on a narrow base.

And the workload split cuts both ways. Today inference is the smaller, more contested half and training is the defended half. But inference volume grows as deployed AI scales, and the cheaper a model is to serve, the more it gets served. If the high-volume, price-elastic half of the market is the half custom silicon wins, the long-run revenue mix could tilt toward exactly the workloads NVIDIA defends least well.

The honest weighing: each of these is a statement about timing, not about the mechanism. NVIDIA describes 2026, where the bottleneck is real and visible in the filings. The bear case is right that the margin will compress as supply normalizes and inference moves to custom silicon. What it does not overturn is the rule: margin pools at whichever layer is scarcest, and right now that is still the chip layer. The bear case is a forecast that the scarce layer changes. The Margin Stack is a method for finding the scarce layer in any year. Both can hold.

Where NVIDIA is vulnerable: margin-compression scenarios

A credible read names where it breaks. The first crack is the inference shift, and the second is concentration.

The inference crack is structural. As more AI moves from training to serving, the revenue mix tilts toward the workload where custom silicon is most credible. NVIDIA defends training with interconnect and software; it defends inference mainly with supply scarcity and ecosystem inertia, which are softer moats. If hyperscalers move a meaningful share of their inference load to in-house chips over the next several years, Data Center gross margin faces real downward pressure, because the displaced demand is the more price-elastic kind.

The concentration crack is acute rather than gradual. A single large customer is reportedly capable of shifting a material share of revenue if it favors its own silicon for a flagship workload. The filings show how few buyers carry the business. That is a fragility a more diversified vendor does not have, and it is the reason NVIDIA’s own playbook puts customer diversification on the list.

The distribution dynamic underneath all of this is worth flagging. The hyperscalers most able to build their way off NVIDIA are the ones that also own the distribution surfaces AI runs on, the advantage examined in Google’s AI strategy as a distribution war. A buyer that owns both the demand and a credible chip program is the most dangerous kind of customer, because it can move workloads inward without losing the end user.

Methodology: how to read the margin-durability link

Inputs: NVIDIA FY2026 revenue $215.9B and Data Center $193.7B at 71.1% GAAP gross margin (Form 10-K, FY2026); Q1 FY2027 total $81.6B and Data Center $75.2B at roughly 75% segment margin (Form 8-K, ended April 26, 2026); top-two customer concentration at 39% of Q2 FY2026 revenue (CNBC, August 2025, citing 10-Q disclosures).
Assumption: the current margin reflects a supply bottleneck plus software and networking lock-in, not a permanent cost structure, so it is sensitive to both supply normalization and workload mix.
Sensitivity: if a meaningful share of inference demand shifts to custom silicon while training stays captive, the blended Data Center margin compresses toward normal hardware economics over a multi-year horizon, with the speed set by how fast inference outgrows training.
What this misses: NVIDIA’s filings disclose direct-customer concentration, which can route through intermediaries, and they do not break Data Center margin into training versus inference. The exact per-workload economics are therefore not attributable from public filings alone; the reported custom-silicon figures are analyst estimates, not filings.

What operators should take from this

The map does not tell you which chip wins. It tells you that the layer keeping the money is the one nobody else can reproduce yet, and that the moment they can, the money moves. Here is how to act on that if you build software rather than silicon.

Find your bottleneck layer and own it. NVIDIA’s margin is not a reward for making good chips. It is a reward for being the scarce input in a stack everyone else rents. In your own business, locate the layer that is hardest to substitute (proprietary data, a workflow lock-in, a distribution surface) and build your defensibility there, not in the commodity layer.
Bundle to raise switching cost. The reason NVIDIA’s 75% holds is the software and networking around the chip, not the chip alone. Pair your differentiated layer with the adjacent ones that make leaving expensive. A single owned input is fragile; a bundle where each layer defends the others is durable.
Watch concentration like a credit risk. If a few customers drive most of your revenue, model the case where the biggest one builds or buys its way out. NVIDIA’s own response, diversifying the buyer base, is the move. Do it before the concentration is forced on you.
Treat a fat margin as a clock. Any margin far above the cost of substitutes funds the substitutes. If you are pricing a multi-year plan against today’s input scarcity, model the year the bottleneck clears and your pricing power fades.
Separate timing risk from structure risk. When you read a thesis like this, sort the claims into mechanism (durable: scarcity owns the margin) and snapshot (perishable: the chip layer is scarce in 2026). Build on the mechanism, hedge the snapshot.

That is the operator-scale version of the toll-road question. As a small, illustrative analog from running an AI feature inside a product like PDF9to5: the difference between routing every request to a premium rented model and caching plus routing the easy majority of calls to a cheaper path is the difference between a thin margin and a healthy one, on identical revenue. The buyer pays the toll until it builds the bypass. The same logic governs a $215.9 billion segment and a single AI feature.

Analysis, not investment advice. Figures are drawn from public SEC filings cited inline by fiscal period (NVIDIA Form 10-K, FY2026; NVIDIA Forms 8-K for Q1 FY2027 and Q4 FY2026; NVIDIA Forms 10-Q for Q2 and Q3 FY2026), with customer-concentration figures as reported by CNBC citing those 10-Q disclosures. Custom-silicon figures are reported analyst estimates, labeled as such, not filings. Frameworks here, including the NVIDIA Margin Stack, are for understanding business structure and tradeoffs, not for making buy or sell decisions.

Want the full toolkit for reading filings like this, the bottleneck-margin framework, the NVIDIA Margin Stack, and the customer-concentration scorecard used above? It’s in the Tech Business Analysis Playbook.

#nvidia #ai chips #data center #gross margin #semiconductors

Sources

NVIDIA Corporation Form 10-K, FY2026 (fiscal year ended January 25, 2026)
NVIDIA Corporation Form 8-K, Q1 FY2027 (quarter ended April 26, 2026)
NVIDIA Corporation Form 8-K, Q4 FY2026 (quarter ended January 25, 2026)
NVIDIA Corporation Form 10-Q, Q2 FY2026 (customer concentration disclosure)
NVIDIA Corporation Form 10-Q, Q3 FY2026 (customer concentration disclosure)
CNBC, August 2025: Nvidia's top two customers made up 39% of Q2 revenue
Industry analyst reports on hyperscaler custom-silicon programs (reported estimates)

Figures are drawn from public filings and primary documents, cited inline by fiscal period. Analysis only, not investment advice.

Frequently asked questions

What percentage of NVIDIA's revenue comes from the Data Center segment?

Data Center represented roughly 90% of NVIDIA's FY2026 revenue: $193.7B of $215.9B total (NVIDIA Form 10-K, FY2026), and it stepped up to about 92% in Q1 FY2027 at $75.2B of $81.6B (NVIDIA Form 8-K, ended April 26, 2026). The segment is essentially the entire business now. The other segments matter for optionality, not for the income statement.

How does NVIDIA's data center gross margin compare to normal hardware?

NVIDIA's Data Center gross margin was about 75% in Q1 FY2027 (NVIDIA Form 8-K, ended April 26, 2026), against company-wide 71.1% GAAP for FY2026 (NVIDIA Form 10-K, FY2026). Typical server and semiconductor hardware runs closer to 35% to 50%. The gap is the fingerprint of a supply bottleneck plus software lock-in, not a normal hardware cost curve.

Is NVIDIA's customer concentration a material risk?

Yes. Reported 10-Q disclosures put NVIDIA's top two direct customers at 39% of Q2 FY2026 revenue (CNBC, August 2025), and a handful of large buyers account for a majority. That concentrates pricing power in a few hyperscalers, each of which is funding its own custom silicon. A single large customer shifting workloads inward would hit revenue and margin directly.

Which custom silicon chips threaten NVIDIA's margin first?

The most operational threats are inference-focused custom chips from the largest cloud builders, deployed on their own internal workloads first. Reported programs include Google's TPU, AWS's Trainium and Inferentia, Microsoft's Maia, and Meta's MTIA. These target inference, where the work is more price-elastic. NVIDIA still dominates large-scale training, where its interconnect and software remain hard to match.

What is NVIDIA's strategic response to in-house silicon from its customers?

NVIDIA's playbook is to own training (where alternatives are weakest), bundle networking and the CUDA software stack to raise switching costs, control allocation of scarce newest-generation chips, and diversify its customer base toward specialized cloud providers. That widens the moat for a cycle. It does not eliminate the long-run pressure on inference margins as custom silicon scales.

What metrics should an analyst track to gauge NVIDIA's durability?

Track Data Center gross margin (watch for sustained drift below ~72%), the customer-concentration percentage in each 10-K and 10-Q, Data Center as a share of total revenue, forward backlog and supply-commitment commentary, and competitor ASIC ramp announcements. The first two are the cleanest early reads on whether pricing power is holding or eroding.