Interview Handbook
Optimization Deep Dive — Staff-Level Performance Handbook
Every frontend performance technique organized by use case — from measurement and bundle size to rendering, React internals, network, and memory. Each with its trade-offs and when NOT to use it.
Measurement & Profiling
Before you touch a single line of code, you must identify the real bottleneck — not the one you think exists. At scale, optimizing the wrong thing wastes engineering time and can degrade user experience. This section teaches you to measure precisely, distinguish lab from field data, and apply a disciplined measure → optimize → verify loop.
Core Web Vitals: The North Star Metrics
Google's Core Web Vitals define user-centric thresholds: LCP (<2.5s) for loading, INP (<200ms) for interactivity, CLS (<0.1) for visual stability. Also track TTFB (<800ms) and FCP (<1.8s). These are the metrics that impact SEO and real user experience. A staff engineer knows that meeting thresholds in the lab does not guarantee field performance.
- LCP: Largest Contentful Paint — measures perceived load speed. Optimize by preloading critical resources, reducing render-blocking CSS/JS, and using a CDN.
- INP: Interaction to Next Paint — measures responsiveness. Optimize by breaking long tasks (<50ms), using web workers, and debouncing input handlers.
- CLS: Cumulative Layout Shift — measures visual stability. Optimize by setting explicit dimensions on images/ads, using font-display: swap, and avoiding dynamic content injection above the fold.
- TTFB: Time to First Byte — measures server response. Optimize with edge caching, server-side rendering, and reducing backend latency.
- FCP: First Contentful Paint — measures first visual. Optimize by inlining critical CSS and deferring non-critical scripts.
Lab Data vs Field Data: Why They Disagree
Lab data (Lighthouse, WebPageTest) runs in a controlled environment with a consistent device and network. Field data (RUM, Chrome UX Report) captures real user conditions — slow networks, low-end devices, ad blockers. They disagree because lab data measures potential; field data measures reality. A staff engineer uses both: lab for debugging and regression detection, field for prioritization and impact assessment.
| Aspect | Lab Data | Field Data |
|---|---|---|
| Environment | Controlled (e.g., Moto G4, Slow 3G) | Real user devices, networks, locations |
| Metrics | Synthetic scores (Lighthouse performance) | Real user percentiles (p75, p95) |
| Use Case | Debugging, CI gates, regression detection | Prioritization, user impact, long-term trends |
| Limitation | May not reflect real-world variability | Noisy, requires sufficient sample size |
A junior engineer says 'I think the bottleneck is the hero image.' A staff engineer says 'Let me check the field data — if LCP is above 2.5s for 75% of users, then we profile the main thread to see if it's network or rendering.' Never guess; always measure with real data.
Chrome DevTools Performance Panel: Flame Charts & Long Tasks
The Performance panel records a timeline of main thread activity. Look for long tasks (>50ms) that block the main thread — these cause INP issues. Flame charts show function call stacks; identify expensive scripts, layout thrashing, or excessive garbage collection. Use the Bottom-Up tab to sort by total time and find the heaviest functions.
// Simulate a long task that blocks the main thread
function blockMainThread(ms) {
const start = performance.now();
while (performance.now() - start < ms) {
// Busy wait — never do this in production
}
}
// Use requestIdleCallback to defer non-critical work
requestIdleCallback(() => {
// Process analytics or logging here
}, { timeout: 2000 });// Measure long tasks programmatically with PerformanceObserver
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.duration > 50) {
console.warn(`Long task detected: ${entry.duration}ms`, entry);
}
}
});
observer.observe({ type: 'longtask', buffered: true });Performance Budgets & CI Gates
Performance budgets enforce thresholds in CI. Common budgets: bundle size (<300kB gzipped for JS), Lighthouse score (>90), LCP (<2.5s in lab). Use tools like bundlesize or Lighthouse CI to fail builds that exceed budgets. A staff engineer sets budgets based on field data percentiles, not arbitrary numbers.
// Example: bundlesize configuration (bundlesize.json)
{
"files": [
{
"path": "./dist/main.js",
"maxSize": "200 kB"
},
{
"path": "./dist/vendor.js",
"maxSize": "100 kB"
}
]
}// Example: Lighthouse CI configuration (lighthouserc.js)
module.exports = {
ci: {
collect: {
url: ['https://example.com'],
numberOfRuns: 3
},
assert: {
assertions: {
'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
'interaction-to-next-paint': ['error', { maxNumericValue: 200 }]
}
},
upload: {
target: 'temporary-public-storage'
}
}
};The Measure → Optimize → Verify Loop
This is the core discipline: measure the current state (lab + field), optimize one bottleneck at a time, then verify the improvement with the same metrics. Never skip verification — an optimization that doesn't move the needle is wasted effort. Use A/B testing or feature flags to isolate changes.
Optimizing before measuring is the #1 performance mistake. A junior engineer might preload every image or inline all CSS, bloating the page. A staff engineer measures first, finds the real bottleneck (e.g., a 300ms server response), and optimizes that. Premature optimization adds complexity without guaranteed benefit.
User-Centric Metrics vs Vanity Metrics
Vanity metrics like DOMContentLoaded or total page weight don't correlate with user experience. User-centric metrics like LCP, INP, and CLS measure what users actually perceive. A staff engineer ignores vanity metrics and focuses on Core Web Vitals and custom metrics that reflect user tasks (e.g., time to first interaction).
Your team's Lighthouse score is 95, but field data shows LCP is 3.2s at p75. What should you do?
Bundle Size & Code Splitting
When your JavaScript bundle exceeds ~200kB (uncompressed), you start to see real-world regressions in LCP (download time), INP (parse/eval blocking the main thread), and First Byte (server push limits). At scale, a single monolithic bundle can cost you 1–2 seconds of LCP and push your Total Blocking Time over 300ms. This section covers the techniques and trade-offs to shrink and split your bundle without breaking your architecture.
Bundle Analysis: Know What You Ship
Before optimizing, measure. Use webpack-bundle-analyzer (or source-map-explorer for production source maps) to visualize module sizes. A common pattern: a single utility library (e.g., lodash) imported as import _ from 'lodash' pulls in the entire 500kB+ library. The analyzer reveals these bloat sources immediately.
// webpack.config.js
const BundleAnalyzerPlugin = require('webpack-bundle-analyzer').BundleAnalyzerPlugin;
module.exports = {
plugins: [
new BundleAnalyzerPlugin({
analyzerMode: 'static',
reportFilename: 'bundle-report.html',
openAnalyzer: false
})
]
};Tree Shaking and the sideEffects Flag
Tree shaking relies on ES module static analysis. Webpack/Rollup can only eliminate dead code if imports are named (not default) and the package declares "sideEffects": false in its package.json. Without this flag, bundlers assume any import may have side effects and keep the entire module. Always verify your dependencies set this flag; if not, consider alternatives like lodash-es over lodash.
// Good: tree-shakeable
import { debounce } from 'lodash-es';
// Bad: pulls in entire lodash
import _ from 'lodash';Tree shaking is ineffective for CommonJS modules (require()) or packages without sideEffects: false. Also, if your code uses dynamic property access (e.g., import(`./${name}`)), the bundler cannot statically analyze it. Don't waste time micro-optimizing tree shaking on a 10kB utility; focus on the 200kB+ libraries first.
Dynamic import() and Route-Based Splitting
Use import() to split your bundle at route boundaries. This is the highest-impact split because each route's code is only loaded when the user navigates to it. For a typical SPA, this can reduce initial bundle size by 40–60%. The trade-off: you add a network round-trip per route transition, which can hurt INP if the chunk is large (>100kB). Prefetch critical routes using or webpackPrefetch.
// Route-based splitting with React Router
const Dashboard = React.lazy(() => import('./pages/Dashboard'));
const Settings = React.lazy(() => import('./pages/Settings'));
function App() {
return (
<Suspense fallback={<Spinner />}>
<Routes>
<Route path="/dashboard" element={<Dashboard />} />
<Route path="/settings" element={<Settings />} />
</Routes>
</Suspense>
);
}React.lazy + Suspense for Component Splitting
For components that are not route-level (e.g., a heavy chart or a rich text editor), use React.lazy with Suspense to split them out. This is component-based splitting. The cost: you introduce a loading state and a potential layout shift if the fallback is not sized correctly. Only split components that are >50kB or that are rarely used (e.g., modals, admin panels).
// Component-based splitting for a heavy chart
const HeavyChart = React.lazy(() => import('./HeavyChart'));
function ReportPage() {
const [showChart, setShowChart] = useState(false);
return (
<div>
<button onClick={() => setShowChart(true)}>Show Chart</button>
{showChart && (
<Suspense fallback={<div style={{ height: 400 }}>Loading chart...</div>}>
<HeavyChart />
</Suspense>
)}
</div>
);
}| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| Route-based splitting | Large SPAs with many pages | Adds network round-trip per route | Routes are <10kB each |
| Component-based splitting | Heavy, rarely-used components | Loading state + potential CLS | Component is <50kB or used on every page |
| Vendor chunking | Stable third-party deps | Cache invalidation complexity | Vendor is <100kB or rarely changes |
| Shared chunks | Common modules across routes | Increased initial bundle size | Shared code is <5kB per route |
Barrel-File Pitfalls (index.ts Re-exports)
Barrel files (index.ts that re-export everything) are a common pattern that breaks tree shaking. When you import from a barrel, the bundler often includes all re-exported modules because it cannot statically determine which are used. This can inflate your bundle by 20–50%. Prefer direct imports from the module file, or use export * only when the barrel is small and all exports are used.
// Bad: barrel file that breaks tree shaking
// utils/index.ts
export * from './format';
export * from './validate';
export * from './api';
// Good: direct import
import { formatDate } from './utils/format';A junior engineer will say 'let's split everything into lazy chunks.' A staff engineer says: 'Let's run a bundle analysis, identify the top 3 bloat sources, and then decide if splitting or tree shaking gives the best ROI. For example, if a 200kB chart library is only used on one route, route-based splitting saves 200kB. But if it's used on 5 routes, a shared chunk might be better. Always measure the actual impact on LCP and INP before and after.'
Vendor Chunking and Shared Chunks
Vendor chunking extracts third-party libraries (e.g., React, Lodash) into a separate chunk. This leverages browser caching: if the vendor chunk changes infrequently, users don't re-download it on every deploy. The trade-off: you add an extra HTTP request. For shared chunks (common modules used by multiple routes), the trade-off is that you increase the initial bundle size to avoid duplication. Use splitChunks in webpack with minSize: 20000 (20kB) to avoid creating too many tiny chunks.
// webpack.config.js
module.exports = {
optimization: {
splitChunks: {
chunks: 'all',
cacheGroups: {
vendor: {
test: /[\\/]node_modules[\\/]/,
name: 'vendors',
chunks: 'all',
priority: 10
},
common: {
minChunks: 2,
minSize: 20000,
name: 'common',
priority: 5
}
}
}
}
};You are optimizing a large SPA with 50 routes. Your bundle analysis shows that a 300kB chart library is used on 3 routes. What is the best approach?
Build Tooling & Chunking
Build tooling and chunking decisions directly control how fast your app loads, how efficiently it caches, and how quickly developers iterate. At scale, a poorly configured bundler can inflate bundle size by 40%+, waste cache with every deploy, and make micro-frontend integration a nightmare. This section covers the trade-offs between modern bundlers, chunk splitting strategies, and caching techniques that separate staff-level engineers from those who just know the APIs.
Bundler Speed vs Output Quality: webpack, Vite, esbuild, Rollup
Each bundler optimizes for different constraints. webpack is the most configurable but slowest for cold builds (30-60s for medium apps). Vite uses esbuild for dev (sub-second HMR) and Rollup for production (better tree-shaking). esbuild is 10-100x faster than webpack but lacks plugin maturity and produces slightly larger bundles (5-10% more bytes). Rollup excels at library bundling with superior tree-shaking and smaller output, but its dev server is less polished. For production, Vite (Rollup) often gives the best balance: <2s cold start, <50ms HMR, and output within 5% of Rollup's optimal size.
| Bundler | Dev Speed | Production Output Size | Plugin Ecosystem | Best For |
|---|---|---|---|---|
| webpack 5 | Slow (10-60s cold) | Good (baseline) | Excellent | Complex SPAs, legacy migration |
| Vite | Fast (<2s cold, <50ms HMR) | Excellent (Rollup) | Good (growing) | New SPAs, SSR, modern stacks |
| esbuild | Very fast (<1s cold) | Good (5-10% larger) | Limited | Dev builds, simple apps, transpilation |
| Rollup | Moderate (no HMR) | Excellent (smallest) | Good | Libraries, component packages |
A junior says 'Vite is faster' and switches. A staff engineer says: 'I measured our cold build at 45s and HMR at 200ms. The bottleneck was our custom webpack plugin, not webpack itself. We profiled, found the plugin was doing O(n²) lookups, fixed it, and got cold builds to 12s without changing bundlers. Only then did we evaluate Vite for the next project.' Always profile your actual build pipeline before blaming the tool.
Chunk Splitting Strategy: splitChunks vs manualChunks
Chunk splitting balances parallelism (more small files) against HTTP/2 multiplexing overhead (too many files). The goal: keep each chunk under 100-200kB for fast parsing, but avoid creating more than 20-30 chunks for a typical app. splitChunks (webpack) or manualChunks (Vite/Rollup) let you group dependencies by stability: vendor (React, lodash), shared UI (antd, material), and async routes. A common mistake is splitting too aggressively—creating 50+ chunks that each take 50ms to fetch, adding 2.5s to load time. Instead, aim for 3-6 core chunks: app, vendor, shared, and route-level async chunks.
// webpack.config.js - production-grade splitChunks
module.exports = {
optimization: {
splitChunks: {
chunks: 'all',
cacheGroups: {
vendor: {
test: /[\\/]node_modules[\\/](react|react-dom|redux)[\\/]/,
name: 'vendor-core',
chunks: 'all',
priority: 20
},
shared: {
test: /[\\/]node_modules[\\/](antd|@ant-design)[\\/]/,
name: 'vendor-ui',
chunks: 'all',
priority: 10
},
common: {
minChunks: 2,
minSize: 30000,
name: 'common',
priority: 5
}
}
}
}
};// vite.config.js - manualChunks for predictable caching
import { defineConfig } from 'vite';
export default defineConfig({
build: {
rollupOptions: {
output: {
manualChunks: {
'vendor-react': ['react', 'react-dom'],
'vendor-utils': ['lodash', 'axios'],
'vendor-ui': ['antd', '@ant-design/icons']
}
}
}
}
});Long-Term Caching with Content-Hash Filenames
Content-hash filenames (e.g., main.a1b2c3.js) ensure that unchanged chunks keep the same URL, so browsers reuse cached versions. This is critical: a single deploy that changes main.js to main.js invalidates the entire cache. With hashes, only the changed chunk gets a new URL. The key metric: aim for 90%+ cache hit rate on vendor chunks. A common pitfall is using chunkhash (webpack) or hash (Vite) incorrectly—Vite uses content hash by default; webpack needs [contenthash] in output filenames. Also, avoid mixing hash types: use [contenthash:8] for long-term caching, not [hash] which changes on every build.
// webpack.config.js - contenthash for cache busting
module.exports = {
output: {
filename: '[name].[contenthash:8].js',
chunkFilename: '[name].[contenthash:8].chunk.js'
}
};If your app has fewer than 10 routes and total bundle size is under 200kB, splitting into 10+ chunks adds HTTP overhead and complexity without benefit. For small apps, a single bundle with code-splitting only for heavy third-party libraries (e.g., charting) is better. Measure: if your total JS is <100kB, don't split at all—the 16ms parsing time is negligible. Only split when you see route-level bundles exceeding 50kB or when vendor code is >30% of total.
Module Federation for Micro-Frontends
Module Federation (webpack 5) allows independent teams to deploy separate bundles that share dependencies at runtime. This solves the 'one monolith deploy' problem but introduces new performance costs: shared dependencies must be negotiated (version conflicts), and each micro-frontend adds a network request. The trade-off: you gain deployment independence but lose some caching efficiency. For example, if two micro-frontends share React 18, you can configure a shared singleton—but if one team upgrades to React 19, the other must either upgrade or use a separate copy, doubling the React bundle. Staff engineers enforce a shared dependency policy (e.g., 'all teams must use React 18.x until a coordinated upgrade') and use eager: true for critical shared modules to avoid async loading delays.
// webpack.config.js - Module Federation with shared singleton
new ModuleFederationPlugin({
name: 'host',
remotes: {
app1: 'app1@http://localhost:3001/remoteEntry.js',
app2: 'app2@http://localhost:3002/remoteEntry.js'
},
shared: {
react: { singleton: true, requiredVersion: '^18.0.0', eager: true },
'react-dom': { singleton: true, requiredVersion: '^18.0.0', eager: true }
}
})Persistent / Incremental Build Caching
Incremental caching reduces rebuild times from minutes to seconds. webpack 5's cache.type: 'filesystem' stores intermediate build artifacts, cutting subsequent builds by 60-80%. Vite uses esbuild's cache by default (sub-second rebuilds). The catch: cache invalidation is tricky—if you change a shared module, all dependent chunks must be rebuilt. A staff engineer configures cache with buildDependencies (webpack) to invalidate when config changes, and monitors cache hit rates. For CI, use a persistent cache volume (e.g., GitHub Actions cache) to avoid cold builds on every PR. Target: cold build <30s, incremental build <5s for a medium app.
// webpack.config.js - filesystem cache with build dependencies
module.exports = {
cache: {
type: 'filesystem',
buildDependencies: {
config: [__filename]
},
cacheDirectory: path.resolve(__dirname, '.webpack-cache'),
maxAge: 604800000 // 7 days
}
};Minification: Terser vs esbuild vs SWC
Minification reduces bundle size but at a time cost. Terser (webpack default) produces the smallest output (5-10% smaller than esbuild) but is slow: 2-5s for a 500kB bundle. esbuild minifies 10-20x faster but outputs slightly larger bundles (e.g., 180kB vs 165kB). SWC is similar to esbuild in speed but integrates better with webpack via swc-loader. The trade-off: for production builds, use Terser if bundle size is critical (e.g., LCP budget <2.5s and JS is the bottleneck). For dev builds or CI where speed matters, use esbuild. A staff engineer measures: 'Our vendor chunk was 300kB. Terser reduced it to 270kB (10% savings) but added 4s to build time. Since our LCP was already 1.8s, we switched to esbuild minification and saved 3s per build.'
| Minifier | Speed (500kB bundle) | Output Size Reduction | Best For |
|---|---|---|---|
| Terser | 2-5s | 10-15% | Production builds where size is critical |
| esbuild | 0.1-0.3s | 5-10% | Dev builds, CI, fast iterations |
| SWC | 0.2-0.5s | 5-10% | Webpack projects needing speed |
Your team's React app has a 400kB vendor chunk (React, Redux, lodash) and a 200kB app chunk. LCP is 3.2s, and the bottleneck is JS parsing (150ms on a mid-range device). You have two options: (A) Split the vendor chunk into two 200kB chunks to reduce parse time, or (B) Use Terser to minify the vendor chunk to 340kB. Which approach is better, and why?
Image & Font Optimization
Images and fonts are the heaviest resources on the web, often accounting for 60-80% of page weight and causing Layout Shift (CLS) when dimensions aren't reserved. At scale, every 100KB of image weight adds ~1s to LCP on 3G, and font swaps can push INP > 200ms on low-end devices. This section covers the trade-offs between modern formats, responsive loading, and font strategies that keep your LCP < 2.5s and CLS < 0.1.
Modern Image Formats: AVIF, WebP, and Fallbacks
AVIF offers 50% smaller files than JPEG at equivalent quality, but decode time can be 2-3x slower on older CPUs. WebP is a safe middle ground with ~30% savings and broad support. Always serve AVIF with WebP and JPEG fallbacks using the <picture> element. Never rely on <img srcset> alone for format switching — it only handles pixel density, not codec negotiation.
<picture> <source srcset="hero.avif" type="image/avif"> <source srcset="hero.webp" type="image/webp"> <img src="hero.jpg" alt="Hero" width="1200" height="600" loading="lazy"> </picture>
Avoid AVIF for images that are critical to LCP (e.g., hero banners) on low-end devices. Decoding a 2MB AVIF can take 80ms on a Moto G4, blowing the 50ms long-task threshold. Use WebP for LCP images and reserve AVIF for non-critical, large background images.
Responsive Images: srcset, sizes, and the picture element
srcset with sizes lets the browser pick the best resolution based on viewport width and device pixel ratio. The sizes attribute is critical — without it, the browser assumes 100vw, often downloading a desktop-sized image on mobile. Use the picture element for art direction (different crops) or format switching, but prefer srcset for simple resolution switching to reduce HTML complexity.
<img src="photo-800.jpg"
srcset="photo-400.jpg 400w, photo-800.jpg 800w, photo-1200.jpg 1200w"
sizes="(max-width: 600px) 100vw, (max-width: 1200px) 50vw, 800px"
alt="Photo" width="800" height="600" loading="lazy">| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| srcset + sizes | Resolution switching | Low: one img tag | Art direction needed |
| <picture> | Format switching, art direction | Medium: multiple source tags | Simple resolution only |
| Client Hints (DPR, Viewport-Width) | Automatic selection | Low: HTTP header | Privacy restrictions, legacy browsers |
| CDN image transformation | Dynamic resizing | Variable: per-request cost | Static assets, low traffic |
Lazy Loading: loading="lazy" and IntersectionObserver
Native loading="lazy" defers offscreen images until they're near the viewport (typically 1250px away). It's zero-cost and works in all modern browsers. For complex scenarios (e.g., loading placeholders, triggering analytics), use IntersectionObserver with a rootMargin of 200px to preload before the user scrolls. Never lazy-load LCP images — they must be eager to hit LCP < 2.5s.
const observer = new IntersectionObserver((entries) => {
entries.forEach(entry => {
if (entry.isIntersecting) {
const img = entry.target;
img.src = img.dataset.src;
img.onload = () => img.classList.add('loaded');
observer.unobserve(img);
}
});
}, { rootMargin: '200px' });
document.querySelectorAll('img[data-src]').forEach(img => observer.observe(img));A junior engineer adds lazy loading to every image. A staff engineer measures: 'Our LCP image is 1.8s on 4G — lazy loading it would push LCP to 3.2s. Instead, I'll eager-load the hero and lazy-load the 12 gallery images below the fold.' Always profile with Lighthouse and real-user monitoring (RUM) before applying optimizations.
CLS Prevention: width/height and aspect-ratio
Layout shift occurs when images load without reserved space. Always set explicit width and height on <img> tags, or use aspect-ratio in CSS for responsive containers. The browser calculates the aspect ratio from dimensions and reserves space before the image loads. For fluid layouts, combine aspect-ratio with max-width: 100% and height: auto.
/* CSS: prevents CLS for responsive images */
img {
max-width: 100%;
height: auto;
aspect-ratio: attr(width) / attr(height); /* modern browsers infer this */
}
/* Or explicitly */
.hero-image {
aspect-ratio: 16 / 9;
width: 100%;
height: auto;
}Font Optimization: Subsetting, font-display, and Preloading
Font files can be 200-400KB each. Subsetting removes unused glyphs (e.g., only Latin characters), reducing size by 70-90%. Use font-display: swap for brand-critical fonts to show fallback text immediately, but beware of FOUT (Flash of Unstyled Text). For non-critical text, font-display: optional avoids layout shift entirely — the browser may skip the font download if it takes too long. Preload the primary font file (e.g., woff2) to reduce LCP font delay.
@font-face {
font-family: 'Inter';
src: url('/fonts/inter-latin.woff2') format('woff2');
font-display: swap; /* or optional for non-critical */
font-weight: 400 700;
unicode-range: U+0000-00FF; /* Latin subset */
}
/* Preload in HTML */
<link rel="preload" href="/fonts/inter-latin.woff2" as="font" type="font/woff2" crossorigin>| Strategy | FOUT/FOIT | CLS risk | Best for |
|---|---|---|---|
| font-display: swap | FOUT (visible fallback) | Low if fallback metrics match | Brand fonts, headings |
| font-display: optional | FOIT (invisible text up to 100ms) | None if font fails | Body text, non-critical |
| font-display: block | FOIT (up to 3s) | High | Avoid in production |
| Variable fonts | Single file for multiple weights | Low (one download) | Sites with many font weights |
FOUT vs FOIT and Variable Fonts
FOUT (Flash of Unstyled Text) shows fallback font immediately, then swaps — better for perceived performance. FOIT (Flash of Invisible Text) hides text for up to 3 seconds — worse for LCP and user experience. Use font-display: swap with size-adjust to match fallback metrics and reduce CLS. Variable fonts bundle multiple weights into one file (e.g., 50KB vs 200KB for 4 weights), but increase decode time. Only use them if you need 3+ weights; otherwise, individual woff2 files are faster.
Subsetting to only ASCII characters breaks internationalization. If your site supports Japanese or Cyrillic, include those ranges. Always test with a full character set in staging. A subset that's too aggressive can cause missing glyphs (tofu boxes) that hurt UX more than the 50KB savings.
Your team's e-commerce site has a hero image that is the LCP element. The image is 1.2MB in JPEG. You've converted it to WebP (800KB) and AVIF (500KB). The site's LCP is currently 3.1s on 4G. Which approach best balances performance and risk?
Critical Rendering Path
The Critical Rendering Path (CRP) governs how the browser converts HTML, CSS, and JavaScript into pixels. When the CRP is blocked, the user stares at a blank white screen — directly harming LCP (target <2.5s) and INP (target <200ms). At scale, every millisecond of CRP delay costs revenue; a 100ms improvement can lift conversion by 1-2%. This section covers the techniques and trade-offs to unblock painting above-the-fold content.
Critical CSS Extraction and Inlining
Inline only the CSS required to render the above-the-fold viewport (typically <14kB compressed). Defer the full stylesheet with media="print" onload="this.media='all'". Tools like Critical or Penthouse automate extraction. The trade-off: inlining increases HTML size and cache invalidation cost — if the critical CSS changes, the HTML must be re-served. Measure the impact on First Contentful Paint (FCP) and LCP before committing.
<style>
/* Critical CSS for above-the-fold content */
.hero { display: flex; min-height: 80vh; }
.cta-button { background: #007bff; color: white; }
</style>
<link rel="stylesheet" href="/styles.full.css" media="print" onload="this.media='all'">
<noscript><link rel="stylesheet" href="/styles.full.css"></noscript>A junior engineer inlines all CSS. A staff engineer measures: if the full CSS is <14kB gzipped, inlining everything may be faster than splitting. Use Lighthouse and WebPageTest to compare FCP with and without extraction. The decision depends on your CSS size, cache hit rate, and HTML delivery latency.
Render-Blocking CSS and JS
CSS is render-blocking by default — the browser pauses painting until all CSS is downloaded and parsed. JavaScript is both parser-blocking and render-blocking if placed in the without defer or async. The key metric: any render-blocking resource >14kB (uncompressed) above the fold delays FCP. Eliminate render-blocking by deferring non-critical CSS and JS, or inlining critical resources.
- CSS: Use
mediaqueries to mark non-critical stylesheets (e.g.,media="print"). - JS: Always use
deferfor scripts that need DOM order; useasyncfor independent scripts. - Audit with Lighthouse's 'Eliminate render-blocking resources' audit — target <1 render-blocking request above the fold.
Defer vs Async Script Loading
<!-- Defer: executes after HTML parsing, preserves order --> <script defer src="app.js"></script> <!-- Async: executes as soon as downloaded, no order guarantee --> <script async src="analytics.js"></script>
| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| defer | Scripts that depend on DOM or other deferred scripts | Delays execution until after parsing | Scripts that must run before DOMContentLoaded |
| async | Independent scripts (analytics, ads) | No order guarantee; can block DOMContentLoaded | Scripts with dependencies or DOM manipulation |
| sync (no attr) | Legacy or critical inline scripts | Blocks parsing and painting entirely | Any script >1kB that can be deferred |
Resource Hints: Preload, Prefetch, Preconnect, DNS-Prefetch, Modulepreload
Resource hints tell the browser to fetch or connect early, reducing latency. Preload is for critical resources needed in the current page (e.g., hero image, font). Prefetch is for likely-next-page resources — use sparingly to avoid bandwidth waste. Preconnect warms up DNS+TCP+TLS for cross-origin requests (saves ~100-300ms). DNS-prefetch is lighter but only resolves DNS. Modulepreload is for ES modules — it fetches and compiles the module early. The trade-off: overusing preload can starve other resources; prefetch can waste data on mobile.
<!-- Preload critical font --> <link rel="preload" href="/fonts/inter-var.woff2" as="font" type="font/woff2" crossorigin> <!-- Preconnect to analytics origin --> <link rel="preconnect" href="https://analytics.example.com"> <!-- Modulepreload for ES module --> <link rel="modulepreload" href="/app.js">
Preloading too many resources (e.g., all images) can delay the LCP resource by competing for bandwidth. The browser has a limited number of preload slots (typically 3-6). Only preload resources that are discovered late in the HTML or are critical for above-the-fold rendering. Measure with Priority Hints and Lighthouse to validate.
Priority Hints (fetchpriority) and Above-the-Fold Prioritization
The fetchpriority attribute lets you hint the browser's resource priority: high, low, or auto. Use fetchpriority="high" on the LCP image or critical fetch request. Use fetchpriority="low" on below-the-fold images or non-critical scripts. The browser still makes the final decision, but this hint can improve LCP by 5-10% in practice. The trade-off: overusing high can cause priority inversion — the browser may deprioritize other critical resources.
<!-- LCP image: high priority --> <img src="hero.webp" fetchpriority="high" alt="Hero"> <!-- Below-the-fold image: low priority --> <img src="footer-bg.webp" fetchpriority="low" alt="Footer" loading="lazy">
Eliminating Render-Blocking Requests
The goal is to have zero render-blocking requests above the fold. Techniques: inline critical CSS (<14kB), defer all non-critical CSS, defer all JS (or async), and use preload for critical resources. Measure with Lighthouse — the 'Eliminate render-blocking resources' audit should show 0 requests. In practice, a single render-blocking request can delay FCP by 200-500ms on 3G. The cost: inlining increases HTML size and reduces cacheability. For large CSS (>50kB), consider code-splitting by route.
Your team has a 200kB CSS file. The above-the-fold CSS is 12kB. You inline the critical CSS and defer the rest. However, the full CSS changes weekly. What is the primary trade-off you must evaluate?
Third-Party Scripts
Third-party scripts—analytics, tag managers, chat widgets—are the single largest source of uncontrollable performance debt at scale. A single tag manager container can inject 200+ kB of JavaScript, block the main thread for 300+ ms, and delay LCP by multiple seconds. At staff level, you don't just add async; you measure the real cost, apply facade patterns, and decide when to offload or eliminate entirely.
The Real Cost of Analytics and Tag Managers
Every third-party script competes for the 16ms frame budget and the 50ms long-task threshold. A tag manager like Google Tag Manager (GTM) loads a container script that then fetches and executes multiple tags—each one a separate network request, parse, and eval. The hidden cost: network contention (blocking critical resources) and main-thread churn (delaying INP). At scale, a single chat widget can add 1.2s to LCP on slow connections.
- Measure Total Blocking Time (TBT) before and after loading third-party scripts. A 300ms increase is common.
- Use Chrome DevTools > Performance > Bottom-Up to attribute long tasks to specific script origins.
- Track LCP regression: a 500ms delay from a third-party script is unacceptable for <2.5s target.
A junior says 'we should async all scripts.' A staff engineer says 'let's measure the real cost with DevTools Performance tab, then decide which scripts are critical and which can be deferred or removed.' The key is data-driven triage.
Facade Pattern: Load Heavy Embeds Only on Interaction
The facade pattern replaces a heavy third-party widget (e.g., YouTube embed, chat widget, map) with a lightweight placeholder that loads the real script only when the user interacts. This can save 200-500 kB of JavaScript and improve LCP by 1-2 seconds. The trade-off: the widget is not immediately interactive, which may affect user experience for power users.
// Facade for a chat widget
class ChatFacade {
constructor(containerId, widgetUrl) {
this.container = document.getElementById(containerId);
this.widgetUrl = widgetUrl;
this.renderPlaceholder();
}
renderPlaceholder() {
this.container.innerHTML = `
<div class="chat-placeholder" style="cursor:pointer;background:#eee;padding:20px;text-align:center;">
💬 Click to chat
</div>
`;
this.container.querySelector('.chat-placeholder').addEventListener('click', () => this.loadWidget());
}
loadWidget() {
const script = document.createElement('script');
script.src = this.widgetUrl;
script.async = true;
this.container.appendChild(script);
}
}
new ChatFacade('chat-container', 'https://widget.example.com/chat.js');Async/Defer for Third-Party Scripts
Use async for scripts that don't depend on DOM or other scripts (e.g., analytics). Use defer for scripts that need DOM but not immediate execution (e.g., tag managers). Never use synchronous scripts—they block parsing and delay LCP. The trade-off: async scripts execute in unpredictable order, which can cause race conditions if they depend on each other.
<!-- Good: async for independent analytics --> <script async src="https://www.googletagmanager.com/gtag/js?id=GA_MEASUREMENT_ID"></script> <!-- Better: defer for tag manager that needs DOM but not immediate --> <script defer src="https://www.googletagmanager.com/gtm.js?id=GTM-XXXX"></script> <!-- Bad: synchronous blocks everything --> <script src="https://widget.example.com/heavy.js"></script>
Offloading to a Web Worker (Partytown)
Partytown moves third-party scripts to a web worker, freeing the main thread for user interactions. This can reduce TBT by 200-400ms and improve INP. However, it requires careful setup: scripts must be compatible with worker context (no DOM access), and Partytown uses postMessage for communication, adding latency. Best for analytics and tag managers; not for DOM-manipulating widgets.
// Partytown configuration example (in HTML head)
<script>
window.partytown = {
lib: '/~partytown/',
forward: ['dataLayer.push'],
};
</script>
<script type="text/partytown" src="https://www.googletagmanager.com/gtm.js?id=GTM-XXXX"></script>Partytown is not a silver bullet. Avoid it for scripts that manipulate the DOM (e.g., chat widgets, A/B testing tools) because they require synchronous DOM access. Also, the postMessage overhead can add 10-50ms per call, which may negate benefits for high-frequency interactions. Always measure TBT and INP before and after.
Script Prioritization and Self-Hosting vs CDN
Self-hosting third-party scripts eliminates DNS lookups and reduces network latency by 100-300ms on first load, but you lose CDN caching and automatic updates. Use preconnect and dns-prefetch for CDN-hosted scripts to mitigate. Prioritize critical scripts (e.g., analytics) over non-critical (e.g., social widgets) using fetchpriority="high" or async.
| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| Self-hosting | Critical scripts (e.g., analytics, auth) | Maintenance overhead, no CDN caching | Scripts that update frequently (e.g., tag managers) |
| CDN with preconnect | Non-critical scripts (e.g., fonts, widgets) | Extra DNS lookup (mitigated by preconnect) | Scripts that need low latency on first load |
| Facade pattern | Heavy embeds (chat, video, maps) | Delayed interactivity | Scripts needed for initial UX (e.g., login) |
| Partytown | Analytics, tag managers | Setup complexity, postMessage latency | DOM-manipulating scripts |
Measuring Third-Party Impact in DevTools
Use Chrome DevTools Performance tab to record a page load. Look for long tasks (>50ms) attributed to third-party origins in the Bottom-Up or Call Tree views. The Network tab shows blocking time and transfer size. For LCP, check the Timings panel to see if a third-party script delayed the largest element. At scale, set up Lighthouse CI to track TBT and LCP regressions per commit.
Your team uses a tag manager that loads 15 analytics tags. The page has a 2.8s LCP (target <2.5s) and 350ms TBT. Which optimization should you prioritize first?
Rendering & Paint Performance
When your UI janks, drops frames, or stutters during interaction, you are likely exceeding the 16ms frame budget (60fps) or hitting 50ms long-task thresholds that degrade INP beyond 200ms. At scale, even a single layout-triggering animation can cascade into 0.1+ CLS or push LCP past 2.5s. This section dissects the pixel pipeline, the compositor thread, and the trade-offs of every optimization so you can diagnose and fix jank with precision.
The Pixel Pipeline: Reflow vs Repaint vs Composite
Every frame goes through JavaScript → Style → Layout → Paint → Composite. Reflow (Layout) recalculates geometry and is the most expensive — it invalidates the entire subtree. Repaint (Paint) redraws pixels but skips layout. Composite only recombines existing layers on the GPU. The goal is to stay in Composite-only territory. Changing width or top triggers layout; changing transform or opacity only composites.
/* ❌ Triggers layout + paint + composite */
.element {
left: 100px;
width: 200px;
}
/* ✅ Only composite (GPU-accelerated) */
.element {
transform: translateX(100px);
opacity: 0.5;
}Layout Thrashing & Read-Then-Write Batching (FastDOM Pattern)
Layout thrashing occurs when you interleave forced style recalculations (reading layout properties like offsetHeight) with writes (setting style.left). Each read forces a synchronous layout flush. Batch all reads first, then writes. The FastDOM pattern is a manual queue. At scale, use requestAnimationFrame to schedule writes after all reads are done.
// ❌ Layout thrashing: read → write → read → write
for (let i = 0; i < items.length; i++) {
const h = items[i].offsetHeight; // forces layout
items[i].style.height = (h + 10) + 'px'; // invalidates layout
}
// ✅ Batched: all reads first, then writes
const heights = [];
for (let i = 0; i < items.length; i++) {
heights.push(items[i].offsetHeight);
}
for (let i = 0; i < items.length; i++) {
items[i].style.height = (heights[i] + 10) + 'px';
}The Compositor Thread & GPU-Accelerated Properties
The compositor thread runs on the GPU and handles scrolling, transforms, and opacity independently of the main thread. Only transform and opacity are guaranteed to be composited without triggering layout or paint. filter and clip-path may promote to a layer but can still repaint. Always verify in DevTools > Layers panel that your animated element is on its own compositor layer.
Animating Transform/Opacity Instead of Layout Properties
Animating width, height, left, top, margin, or padding triggers layout on every frame — impossible to stay under 16ms for complex pages. Use transform: scale() for size changes and transform: translate() for position. opacity is the only other property that composites without paint. This is non-negotiable for smooth 60fps animations.
| Technique | Best For | Cost | Avoid When |
|---|---|---|---|
| transform/opacity animation | Position, size, visibility | Zero layout/paint; ~0.1ms composite | Need to change actual layout (e.g., reflow siblings) |
| will-change: transform | Pre-promote to compositor layer | Memory: ~1-2MB per layer on mobile | More than 10 elements; causes layer explosion |
| content-visibility: auto | Off-screen content in long lists | Initial layout cost; ~0.5ms per element | Above-the-fold content; can delay LCP |
| CSS containment | Isolate subtrees from layout | Minimal; ~0.01ms per container | Small components with no layout impact |
Will-Change and Its Cost (Layer Explosion)
will-change: transform hints the browser to promote an element to its own compositor layer. Overuse creates layer explosion — hundreds of layers consuming GPU memory (1-2MB each on mobile) and increasing paint times. Use it sparingly on elements you actually animate, and remove it after the animation ends. A staff engineer measures layer count in DevTools before and after.
Do not apply will-change to every animated element. On mobile, 20+ layers can cause GPU memory pressure and actually increase jank. Only promote elements that are actively animating and where you've measured a benefit. For simple transitions, the browser's own heuristics are often sufficient.
Content-Visibility and CSS Containment
content-visibility: auto skips rendering for off-screen elements, reducing initial paint time by up to 50% for long pages. It implicitly applies contain: layout style paint. However, it can delay LCP if applied to above-the-fold content. Use it on list items, comments, or sections below the fold. contain: layout isolates a subtree so changes don't affect the rest of the page — critical for component libraries.
/* ✅ Safe for off-screen sections */
.long-list-item {
content-visibility: auto;
contain-intrinsic-size: 200px; /* reserve space to prevent CLS */
}
/* ❌ Never on above-the-fold hero */
.hero {
content-visibility: auto; /* delays LCP */
}The 16ms Frame Budget
At 60fps, each frame has 16.67ms to complete the entire pipeline. Budget breakdown: ~5ms for JavaScript, ~3ms for style/layout, ~5ms for paint, ~3ms for composite. If any step exceeds its slice, the frame is dropped. Use performance.now() and DevTools Performance panel to measure. A long task >50ms blocks the main thread and degrades INP. Aim for idle time > 50ms between interactions.
A junior says 'I'll add will-change to fix jank.' A staff engineer says 'Let me profile the frame budget first — if the bottleneck is layout, I'll batch reads; if it's paint, I'll promote to compositor; if it's JavaScript, I'll defer work.' Never optimize without a DevTools Performance recording. The 16ms budget is a target, not a guarantee — measure the actual cost of your change.
You have a scrollable list of 1000 items. Each item has a hover animation that changes its background color and slightly scales it up. The page janks during scroll. What is the most effective optimization with the least risk?
CSS Performance
At scale, CSS performance is not about micro-optimizing selectors but about managing the browser's layout, paint, and compositing pipelines within the 16ms frame budget. A single expensive style recalculation can push a long task past 50ms, directly harming INP and CLS. This section covers the real costs of CSS — from selector matching and runtime-in-JS to containment and layer promotion — and when each technique is worth the trade-off.
Selector Matching Cost and Complexity
Modern browsers match selectors right-to-left, so the key selector (the rightmost part) determines matching cost. A universal selector like * or a tag selector like div forces the browser to check every element in the subtree. Class and ID selectors are O(1) hash lookups. The real cost is not parsing but style recalculation when a class changes on a deep DOM — the browser must re-match all affected selectors. For most apps with < 10k elements, this is negligible; above that, prefer flat, class-based selectors.
/* Slow: descendant selector with tag key */
ul li a { color: blue; }
/* Fast: class-based key selector */
.nav-link { color: blue; }// JavaScript: measuring style recalculation time
const start = performance.now();
element.classList.add('active');
requestAnimationFrame(() => {
const elapsed = performance.now() - start;
if (elapsed > 5) console.warn('Style recalc took', elapsed, 'ms');
});Junior engineers rewrite all selectors to BEM. Staff engineers profile with Performance panel first. If style recalculation is under 1ms for a 5000-element tree, selector optimization is noise. Focus on layout thrashing and paint complexity instead.
CSS-in-JS Runtime Cost vs Zero-Runtime
Runtime CSS-in-JS (e.g., styled-components) generates styles during component mount, adding 0.5–2ms per component on initial render and serializing props to class names. For a page with 200 components, that's 100–400ms of JavaScript execution — easily a long task. Zero-runtime solutions (vanilla-extract, Tailwind) extract static CSS at build time, eliminating runtime overhead. Tailwind's JIT mode generates only used utilities, keeping CSS bundles under 10kB gzipped for most apps. The trade-off: runtime CSS-in-JS enables dynamic theming without rebuilds; zero-runtime requires build-time token generation.
// Runtime CSS-in-JS (styled-components)
const Button = styled.button`
background: ${props => props.$primary ? 'blue' : 'gray'};
padding: 8px 16px;
`;
// Zero-runtime (vanilla-extract)
import { style } from '@vanilla-extract/css';
export const button = style({
padding: '8px 16px',
});
// Dynamic variants via data attributes or class composition| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| Runtime CSS-in-JS | Highly dynamic theming, server-rendered apps with small component trees | 0.5–2ms per mount, 10–50kB JS bundle overhead | Pages with >100 components or strict INP budgets (<200ms) |
| Zero-runtime (vanilla-extract) | Static or token-based theming, large component libraries | Build-time only, <1kB runtime JS | Need runtime color swapping without rebuild |
| Tailwind CSS | Utility-first design, rapid prototyping, small bundles | JIT scanning adds ~200ms to dev build | Custom design systems with complex component APIs |
CSS Containment for Isolating Layout and Paint
The contain property tells the browser that an element's subtree is independent, enabling layout and paint isolation. contain: layout prevents internal layout changes from affecting outside; contain: paint clips overflow and skips painting outside the box; contain: strict applies all. This is critical for widgets, modals, or virtualized lists where a single item's style change should not trigger a full-page layout. Without containment, a class toggle on one card can recalculate layout for 10,000 siblings — easily exceeding the 16ms budget.
/* Isolate each card's layout and paint */
.card {
contain: layout paint style;
/* or contain: strict; for full isolation */
}Overusing contain: strict on every element can increase memory usage because the browser creates separate rendering contexts. Only apply to components that are truly independent (e.g., widgets, list items, modals). For static text, it's unnecessary overhead. Profile with Chrome DevTools 'Rendering' > 'Layer borders' to see if you're creating too many layers.
GPU Layer Promotion and Avoiding Layer Explosion
Promoting an element to its own GPU compositor layer (via will-change: transform or translateZ(0)) can reduce paint cost for animations, but each layer consumes GPU memory (~100–500kB per layer). A page with 1000 promoted layers can use 50–500MB of GPU memory, causing jank on mobile. Only promote elements that animate frequently (e.g., carousel items, parallax). Use will-change sparingly and remove it when the animation ends. The browser's layerization heuristic is usually sufficient for static content.
/* Good: promote only the animating element */
.carousel-item {
will-change: transform;
/* or: transform: translateZ(0); */
}
/* Bad: promote everything */
* {
transform: translateZ(0);
}Unused CSS Removal (PurgeCSS, Content-Aware Tailwind)
A typical Bootstrap or Tailwind build can ship 200–500kB of unused CSS. PurgeCSS (used by Tailwind's JIT) scans your templates and removes unused classes, reducing bundles to 5–15kB gzipped. The trade-off: dynamic class construction (e.g., className={`btn-${variant}`}) can break purge if not safelisted. Content-aware Tailwind (JIT) solves this by generating only used utilities at build time, but adds ~200ms to dev builds. For large codebases, this is a must — unused CSS is pure download cost with zero benefit.
// tailwind.config.js — safelist dynamic classes
module.exports = {
safelist: [
{ pattern: /^btn-(primary|secondary|danger)$/ },
],
}Container Queries Cost vs Media Queries
Container queries (@container) allow components to respond to their parent's size rather than the viewport. They are slightly more expensive than media queries because the browser must track each container's size and re-evaluate queries on resize. For a page with 50 containers, the cost is ~0.1–0.3ms per resize event — negligible. However, nesting container queries (a container inside another container) can cause cascading recalculations. Use container queries for reusable components (cards, sidebars) where viewport-relative sizing is wrong. For page-level layouts, media queries are cheaper and simpler.
/* Container query: component responds to its parent */
.card-container {
container-type: inline-size;
}
@container (min-width: 400px) {
.card {
flex-direction: row;
}
}| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| Container queries | Reusable components, widget libraries | ~0.1ms per container per resize | Nested containers or >100 containers on a page |
| Media queries | Page-level layouts, responsive breakpoints | ~0.01ms per query, no per-element tracking | Components that need to respond to parent size |
Your team uses styled-components for a dashboard with 300 components. The INP is 350ms on mobile. Which optimization should you prioritize first?
Large Lists & Heavy DOM
Rendering thousands of DOM nodes synchronously blocks the main thread for hundreds of milliseconds, causing frame drops, janky scrolling, and an INP > 200ms. At staff level, you must decide when to virtualize, how to size your DOM budget, and which technique (windowing, pagination, infinite scroll) fits the user's mental model — not just throw a library at it.
Virtualization / Windowing: How It Works
Virtualization renders only the visible rows plus a small overscan buffer (typically 2-5 rows). The scroll container stays at the full height via a spacer element, so the browser's scrollbar behaves naturally. react-window and TanStack Virtual both use this pattern, but differ in API and flexibility: TanStack Virtual supports dynamic measurements and is framework-agnostic.
import { useVirtualizer } from '@tanstack/react-virtual';
function VirtualList({ items }) {
const parentRef = useRef(null);
const virtualizer = useVirtualizer({
count: items.length,
getScrollElement: () => parentRef.current,
estimateSize: () => 50, // fixed row height
overscan: 5,
});
return (
<div ref={parentRef} style={{ height: '600px', overflow: 'auto' }}>
<div style={{ height: virtualizer.getTotalSize(), position: 'relative' }}>
{virtualizer.getVirtualItems().map((virtualItem) => (
<div
key={virtualItem.key}
style={{
position: 'absolute',
top: 0,
left: 0,
width: '100%',
height: virtualItem.size,
transform: `translateY(${virtualItem.start}px)`,
}}
>
{items[virtualItem.index].name}
</div>
))}
</div>
</div>
);
}Fixed vs Variable Row Heights
Fixed row heights (e.g., 50px) allow O(1) scroll-to-index and minimal re-measurement. Variable heights require measuring each row after render, which adds layout thrashing if not batched. TanStack Virtual's measureElement callback lets you measure dynamically, but each measurement triggers a re-render. Trade-off: fixed heights are faster but break with dynamic content; variable heights are flexible but cost ~1-2ms per measurement. For lists under 10,000 rows, variable is fine; above that, prefer fixed or estimate with a fallback.
// Variable height example with TanStack Virtual
const virtualizer = useVirtualizer({
count: items.length,
getScrollElement: () => parentRef.current,
estimateSize: () => 50,
measureElement: (el) => el.getBoundingClientRect().height,
});Decision Matrix: Windowing vs Pagination vs Infinite Scroll
| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| Windowing (virtualized) | Real-time scroll, 1000+ items, same view | ~2-5ms per scroll frame, DOM < 100 nodes | Items need full height for print/SEO |
| Pagination (page-based) | Search results, table UIs, data export | Server round-trip per page, ~200ms INP | Continuous browsing, mobile swipe |
| Infinite scroll (append) | Social feeds, activity logs | DOM grows unbounded, memory leak risk | User needs to find old items quickly |
A junior reaches for virtualization at 100 rows. A staff engineer measures: if the list takes < 50ms to render (under the long-task threshold), virtualization adds complexity without benefit. Profile with performance.measure() or React DevTools profiler. Only virtualize when the raw DOM render exceeds 16ms per frame or INP > 200ms.
DOM Node Count Budget
The browser's layout engine slows down non-linearly past ~1,500 DOM nodes. A virtualized list should keep visible nodes under 100 (including overscan). Each node adds ~1-2kB of memory and ~0.1ms of layout cost. For a 10,000-row list, virtualization reduces active nodes from 10,000 to ~30, cutting layout time from ~1s to ~3ms. Budget rule: keep total DOM nodes under 2,000 for smooth 60fps scrolling; virtualize anything above that.
Virtualization breaks Ctrl+F find-in-page, printing, and accessibility tree navigation because non-visible rows are not in the DOM. If your users rely on browser search or need to select all items, use pagination or a static list with content-visibility: auto instead. Also avoid virtualization for lists under 500 items — the overhead of scroll listeners and measurements outweighs the benefit.
content-visibility: auto for Long Static Pages
For static content (e.g., documentation, blog posts), content-visibility: auto defers rendering of off-screen sections until they are near the viewport. This reduces initial layout cost by skipping off-screen elements. Trade-off: it adds a 0.1 CLS risk if the deferred element's height is unknown — always set contain-intrinsic-size to reserve space. Use this over virtualization when the content is not scrollable via a custom container.
/* CSS for long static list */
.long-list > li {
content-visibility: auto;
contain-intrinsic-size: 50px; /* reserve height */
}Recycling DOM Nodes
Instead of creating/destroying DOM nodes on scroll, recycling reuses a fixed pool of nodes and updates their content. This avoids garbage collection pauses (which can exceed 50ms) and reduces memory churn. Libraries like react-window do this internally by keeping a fixed set of mounted components. In vanilla JS, implement a pool of 30-50 nodes and swap innerText or data-index on scroll. Measure: recycling reduces GC time from ~30ms to < 1ms per scroll event.
Your team is building a real-time stock ticker that updates 10,000 rows every second. The rows have variable heights due to dynamic content. Which approach minimizes INP while maintaining accuracy?
Main Thread & Concurrency
Heavy computation on the main thread blocks user interaction, causing jank, frozen UI, and poor INP (Interaction to Next Paint). At scale — think real-time collaboration, image processing, or large data visualizations — a single synchronous task can degrade the experience for thousands of users. This section covers concurrency patterns that keep the main thread responsive while still doing the work.
Web Workers: When and How to Offload
Web Workers run scripts in a separate OS thread, but they have no DOM access and communicate via postMessage. Use them for CPU-bound tasks like parsing, compression, or image processing. The cost is serialization overhead — passing large objects (e.g., ArrayBuffer) is cheap; passing JSON-serialized data is not. For tasks under ~10ms, the overhead often outweighs the benefit.
// main.js
const worker = new Worker('worker.js');
worker.postMessage({ data: largeArray });
worker.onmessage = (e) => {
console.log('Result:', e.data);
};// worker.js
self.onmessage = (e) => {
const result = e.data.data.map(x => expensiveTransform(x));
self.postMessage(result);
};Avoid workers for trivial tasks like formatting a date or updating a single DOM element. The cost of spawning a worker (1–5ms) and serializing data (especially large strings) can exceed the computation time. Also, workers cannot access window, document, or localStorage — if your task needs those, you must restructure.
Debounce vs Throttle: Implementation and When to Use Each
Both limit execution frequency, but they solve different problems. Debounce waits for a pause — ideal for autocomplete or resize handlers. Throttle ensures a maximum rate — ideal for scroll or mousemove. The key trade-off: debounce can delay feedback indefinitely if events keep firing; throttle guarantees periodic updates but may miss the last event.
function debounce(fn, delay) {
let timer;
return (...args) => {
clearTimeout(timer);
timer = setTimeout(() => fn(...args), delay);
};
}
function throttle(fn, limit) {
let inThrottle = false;
return (...args) => {
if (!inThrottle) {
fn(...args);
inThrottle = true;
setTimeout(() => inThrottle = false, limit);
}
};
}| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| Debounce | Autocomplete, resize, save-on-stop | Delayed response; may never fire if events keep coming | Real-time feedback (e.g., drawing, animation) |
| Throttle | Scroll, mousemove, progress updates | May skip trailing events; fixed rate can feel choppy | One-shot actions (e.g., button click) |
| requestAnimationFrame | Visual updates tied to paint cycle | Only fires ~60fps; not for non-visual work | Non-visual computation or background tasks |
requestIdleCallback for Low-Priority Work
Schedules a callback during idle periods, after the browser has processed pending input and rendered. Use it for non-urgent tasks like analytics, prefetching, or log flushing. The callback receives a deadline object with timeRemaining() — aim to stay under 50ms per chunk. Note: not supported in Safari; polyfill with setTimeout fallback.
function processChunks(data) {
let index = 0;
function doWork(deadline) {
while (index < data.length && deadline.timeRemaining() > 0) {
// process data[index]
index++;
}
if (index < data.length) {
requestIdleCallback(doWork);
}
}
requestIdleCallback(doWork);
}A junior engineer reaches for Web Workers or requestIdleCallback at the first sign of slowness. A staff engineer profiles with Chrome DevTools Performance panel, identifies tasks exceeding the 50ms long-task threshold, and measures the actual impact on INP (target <200ms). If the task takes 30ms and runs once per page load, the optimization is premature. Only invest when the bottleneck is confirmed.
Time-Slicing Long Tasks and the 50ms Threshold
The browser considers any task >50ms a long task, which blocks the main thread and delays user input. Break long tasks into chunks using setTimeout or scheduler.postTask with priority: 'user-visible'. The goal: yield to the event loop every ~50ms so the browser can process pending input and paint. This directly improves INP.
function timeSlice(array, processItem, chunkSize = 50) {
let i = 0;
function nextChunk() {
const start = performance.now();
while (i < array.length && performance.now() - start < chunkSize) {
processItem(array[i]);
i++;
}
if (i < array.length) {
setTimeout(nextChunk, 0); // yield
}
}
nextChunk();
}scheduler.postTask and Yielding to the Main Thread
The scheduler.postTask API (available in Chromium) allows explicit priority control: 'user-blocking', 'user-visible', or 'background'. Use it to defer non-critical work without setTimeout hacks. It also supports signal for cancellation. For yielding, scheduler.yield() (proposal) will be the standard way to voluntarily yield the main thread. Currently, polyfill with setTimeout or postMessage.
// Defer low-priority analytics
scheduler.postTask(() => sendAnalytics(), { priority: 'background' });
// Yield to main thread (polyfill)
function yieldToMain() {
return new Promise(resolve => setTimeout(resolve, 0));
}OffscreenCanvas and SharedArrayBuffer
OffscreenCanvas moves canvas rendering off the main thread via a Web Worker — critical for heavy drawing (e.g., data viz, games). SharedArrayBuffer enables shared memory between workers without serialization, but requires Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy headers. Use it for high-throughput data pipelines (e.g., audio processing, real-time collaboration). The security setup is non-trivial and blocks cross-origin iframes.
// main.js
const canvas = document.getElementById('myCanvas');
const offscreen = canvas.transferControlToOffscreen();
const worker = new Worker('canvas-worker.js');
worker.postMessage({ canvas: offscreen }, [offscreen]);
// canvas-worker.js
self.onmessage = (e) => {
const canvas = e.data.canvas;
const ctx = canvas.getContext('2d');
// draw heavy frames here
};Using SharedArrayBuffer requires your site to be cross-origin isolated. This blocks loading cross-origin resources (e.g., CDN scripts, iframes) unless they opt in via Cross-Origin-Resource-Policy. For many apps, the isolation cost outweighs the performance gain. Only reach for it when you need sub-millisecond shared state between workers — otherwise, stick with postMessage and Transferable objects.
You're optimizing a real-time collaborative text editor. Users report lag when typing, and profiling shows a 120ms task on every keystroke for syntax highlighting. Which approach should you take?
React Re-render Optimization
At scale, excessive React re-renders are the #1 cause of janky interactions and sluggish INP. A single unnecessary re-render of a deep component tree can blow past the 16ms frame budget and create long tasks >50ms. This section covers the real trade-offs: when memoization helps, when it hurts, and how to measure before you optimize.
React.memo and When Memoization Actually Hurts
React.memo performs a shallow comparison of props. It helps when a component re-renders with the same props due to a parent re-render. But it hurts when: (1) the component is cheap to render (a simple div), because the comparison cost outweighs the render cost; (2) props are always new objects/functions, making the comparison useless; (3) the component has many children that also need memoization, creating a cascade.
// Bad: memoizing a cheap component that always gets new props
const ExpensiveButton = React.memo(({ onClick, label }) => {
return <button onClick={onClick}>{label}</button>;
});
// Parent creates new onClick every render — memoization is wasted
function Parent() {
return <ExpensiveButton onClick={() => alert('hi')} label="Click" />;
}// Good: memoizing a heavy list item with stable props
const ListItem = React.memo(({ item, onSelect }) => {
return <div onClick={() => onSelect(item.id)}>{item.name}</div>;
});
// Parent passes stable onSelect via useCallback
function Parent({ items }) {
const onSelect = useCallback((id) => { /* ... */ }, []);
return items.map(item => <ListItem key={item.id} item={item} onSelect={onSelect} />);
}Do not wrap every component in React.memo. The shallow comparison itself costs ~0.01ms per prop. For a component that renders in <0.1ms, memoization is a net loss. Only memoize if the component re-renders frequently with the same props AND its render cost is >1ms (e.g., a chart, a large list, a complex form).
useMemo / useCallback: Cost-Benefit and Referential Equality
useMemo and useCallback are not free. They add memory overhead (storing the cached value) and a dependency comparison on every render. Their primary benefit is referential equality — preventing child re-renders when passed as props to a memoized component. Use them only when: (1) the computation is expensive (>1ms); (2) the value is passed to a memoized child; (3) the value is a dependency of another hook.
| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| React.memo | Expensive leaf components with stable props | Shallow prop compare (~0.01ms/prop) | Cheap renders (<0.1ms), always-new props |
| useMemo | Expensive computations (>1ms), referential stability | Memory for cached value, dep array compare | Trivial calculations (<0.1ms) |
| useCallback | Stable function references for memoized children | Memory for cached function, dep array compare | Functions not passed to memoized children |
| State colocation | Local UI state (toggles, inputs) | None (it's the default) | Global state that needs sharing |
State Colocation and Pushing State Down
The simplest and most effective optimization: keep state as close to where it's used as possible. If only a child needs a piece of state, don't lift it to the parent. This prevents unnecessary re-renders of sibling subtrees. For example, a form input's value should live in the input component, not in a page-level store.
// Bad: state in parent causes entire list to re-render on every keystroke
function Page() {
const [search, setSearch] = useState('');
return (
<>
<SearchInput value={search} onChange={setSearch} />
<ExpensiveList /> {/* re-renders on every keystroke! */}
</>
);
}
// Good: push state down into SearchInput
function SearchInput() {
const [search, setSearch] = useState('');
return <input value={search} onChange={e => setSearch(e.target.value)} />;
}
function Page() {
return (
<>
<SearchInput />
<ExpensiveList /> {/* stable, no re-render */}
</>
);
}Context Splitting and Selector Patterns
React Context re-renders all consumers when the context value changes. To limit this: (1) split context into separate providers for unrelated concerns (e.g., ThemeContext vs UserContext); (2) use useMemo on the context value to avoid unnecessary updates; (3) for frequent updates, use a selector pattern (like use-context-selector or Zustand) to only re-render components that subscribe to a specific slice.
// Bad: single context with all state — any change re-renders all consumers
const AppContext = createContext();
function AppProvider({ children }) {
const [user, setUser] = useState(null);
const [theme, setTheme] = useState('light');
return <AppContext.Provider value={{ user, theme }}>{children}</AppContext.Provider>;
}
// Good: split contexts
const UserContext = createContext();
const ThemeContext = createContext();
function AppProvider({ children }) {
const [user, setUser] = useState(null);
const [theme, setTheme] = useState('light');
return (
<UserContext.Provider value={user}>
<ThemeContext.Provider value={theme}>
{children}
</ThemeContext.Provider>
</UserContext.Provider>
);
}Key Stability and Reconciliation
Stable key props are critical for efficient reconciliation. Using array indices as keys causes React to re-mount components when the list order changes, destroying and recreating DOM nodes. Always use a unique, stable identifier (like a database ID). For lists that are never reordered, index keys are acceptable but risky.
A staff engineer never optimizes without profiling. Use the React DevTools Profiler to identify actual re-render hotspots. Look for components that re-render more than expected or take >1ms to render. The 80/20 rule applies: 80% of performance gains come from fixing 20% of re-renders. Don't guess — measure with the Profiler's flamegraph and ranked timeline.
Why You Usually DON'T Need to Memoize
React is fast for most UIs. A typical component renders in <0.5ms. Memoization adds complexity and can mask design issues (like lifting state too high). Only memoize when the Profiler shows a clear bottleneck. For apps with <1000 components, premature memoization often does more harm than good. The React team recommends starting without memoization and adding it only when needed.
The React Compiler
The React Compiler (formerly React Forget) automatically memoizes components and hooks at build time. It analyzes your code and inserts React.memo, useMemo, and useCallback where beneficial. This eliminates manual memoization for most cases. However, it's not a silver bullet — it can't fix architectural issues like state colocation or context splitting. As of React 19, it's opt-in via Babel plugin. Expect it to reduce manual memoization by ~90%.
DevTools Profiler
The React DevTools Profiler is your primary tool. Record a session, then inspect the flamegraph: look for components that re-render without prop changes (highlighted in gray). The ranked timeline shows which components took the longest. A component taking >16ms is a red flag. Also check the 'Why did this render?' feature (via the why-did-you-render library) to identify unnecessary re-renders caused by unstable props.
You profile a React app and find a <ListItem> component re-renders 200 times on a single user interaction. The component renders a simple <div> with text. What should you do?
React Concurrent & Loading
At scale, expensive renders and data fetches can freeze the UI, pushing Interaction to Next Paint (INP) past the 200ms threshold. React's concurrent features let you mark updates as interruptible, defer non-critical work, and stream in code and data without blocking the main thread. The goal is to keep the frame budget under 16ms even during large state transitions.
Suspense for Code & Data Splitting
React.lazy enables code splitting at the component level, loading chunks only when rendered. Combined with Suspense, you can show a fallback while the chunk loads. For data, Suspense integrates with frameworks like Relay or Next.js to suspend rendering until async data is ready. This avoids waterfall loading and keeps the initial bundle lean (< 100kB for critical path).
import React, { Suspense, lazy } from 'react';
const HeavyChart = lazy(() => import('./HeavyChart'));
function Dashboard() {
return (
<Suspense fallback={<div>Loading chart...</div>}>
<HeavyChart />
</Suspense>
);
}useTransition for Non-Urgent Updates
useTransition marks a state update as low priority, allowing React to interrupt it if a higher-priority update (like a user input) arrives. This prevents long tasks (>50ms) from blocking the UI. Use it for filtering large lists or navigating between views where the result can be slightly delayed.
import { useTransition, useState } from 'react';
function SearchPage() {
const [query, setQuery] = useState('');
const [isPending, startTransition] = useTransition();
function handleChange(e) {
startTransition(() => {
setQuery(e.target.value);
});
}
return (
<div>
<input onChange={handleChange} />
{isPending && <Spinner />}
<SearchResults query={query} />
</div>
);
}useDeferredValue for Expensive Derived UI
useDeferredValue lets you keep the old value on screen while a new, expensive value is computed in the background. Unlike useTransition, it doesn't wrap a state update—it defers the value itself. Ideal for memoized computations (e.g., filtering a 10,000-item list) where you want the input to stay responsive.
| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| useTransition | Marking state updates as low priority | Extra re-render for pending state | Updates that must be synchronous (e.g., form validation) |
| useDeferredValue | Deferring derived values from props/state | Memory overhead for keeping old value | Simple computations (< 1ms) |
| React.lazy + Suspense | Code splitting large components | Network latency for chunk load | Tiny components (< 5kB) or SSR without streaming |
| startTransition | Non-urgent updates outside hooks | No built-in pending indicator | Inside event handlers that need immediate feedback |
Concurrent Rendering, Interruption, and startTransition
Concurrent rendering allows React to work on multiple versions of the UI at once. When a higher-priority update arrives, React interrupts the current render, discards partial work, and starts the new one. startTransition is the imperative API to wrap updates outside hooks. This reduces INP by ensuring user interactions are never blocked by a long render (>50ms).
Don't reach for useTransition or useDeferredValue until you've measured a long task (>50ms) or INP >200ms. Premature concurrency adds complexity and can cause visual jank if the deferred value lags too much. Profile with React DevTools or Chrome Performance panel before optimizing.
The Tearing Problem and useSyncExternalStore
Tearing occurs when concurrent renders see inconsistent external state (e.g., from Redux or Zustand). useSyncExternalStore ensures that React always reads a consistent snapshot, preventing visual tearing. It's required for any external store used with concurrent features. Without it, users might see half-updated UI during a transition.
import { useSyncExternalStore } from 'react';
function subscribe(callback) {
window.addEventListener('online', callback);
return () => window.removeEventListener('online', callback);
}
function getSnapshot() {
return navigator.onLine;
}
function OnlineStatus() {
const isOnline = useSyncExternalStore(subscribe, getSnapshot);
return <div>{isOnline ? 'Online' : 'Offline'}</div>;
}Don't wrap every state update in useTransition. If the update is small (< 1ms of work), the overhead of the transition mechanism (extra re-render, pending state) can actually increase latency. Also, avoid useDeferredValue for values that change rapidly (e.g., every keystroke) because the deferred value may never catch up, causing a stale UI.
You have a search input that filters a 50,000-item list. The filter function takes 80ms. The input feels sluggish. Which approach should you take?
SSR & Hydration Cost
Server-side rendering (SSR) improves TTFB and FCP by sending pre-rendered HTML, but it introduces a hydration tax: the client must download, parse, and execute the JavaScript bundle to rehydrate the DOM, delaying TTI and INP. At scale, this tax can push interactivity beyond 5 seconds on slow devices, eroding the initial gain. The staff-level challenge is not choosing SSR vs. CSR, but how to minimize the cost of making the page interactive.
The Hydration Tax: Why TTI Lags Behind FCP
Hydration is the process of attaching event listeners and re-running component logic on the client. The cost is proportional to the size of the JavaScript bundle and the number of DOM nodes. A typical Next.js page with 50 components can require 200-400ms of main-thread work on a mid-range phone, blocking user input. This is the hydration tax: the browser must replay the entire component tree, even if only a button needs interactivity.
// Example: Hydration cost measured via Performance Observer
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.name === 'first-interaction') {
console.log('Time to interactive:', entry.startTime);
}
}
});
observer.observe({ type: 'event', buffered: true });
// Long task threshold: 50ms
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.duration > 50) {
console.warn('Long task blocking hydration:', entry.duration);
}
}
}).observe({ type: 'longtask', buffered: true });Streaming SSR and Progressive Hydration
Streaming SSR sends HTML in chunks, allowing the browser to paint content before the full response arrives. Progressive hydration extends this by hydrating components in order of priority: visible, interactive elements first. This reduces the time to first interaction from the full hydration time to the time for the first chunk. However, it requires careful orchestration to avoid hydration mismatches when streaming.
// React 18: Streaming SSR with Suspense boundaries
import { Suspense } from 'react';
import { renderToPipeableStream } from 'react-dom/server';
function App() {
return (
<html>
<body>
<Header /> {/* Hydrated immediately */}
<Suspense fallback={<Spinner />}>
<SlowComponent /> {/* Streamed and hydrated later */}
</Suspense>
</body>
</html>
);
}
// Server-side streaming
const { pipe } = renderToPipeableStream(<App />, {
onShellReady() {
response.setHeader('content-type', 'text/html');
pipe(response);
}
});Selective Hydration and Islands Architecture
Islands architecture (used by Astro, Marko) treats interactive components as isolated 'islands' in a sea of static HTML. Only the islands are hydrated, reducing the total hydration cost to the sum of their individual bundles. Selective hydration (React 18) allows hydrating components independently based on user interaction or visibility. The trade-off: islands require explicit boundaries, increasing developer complexity and potentially fragmenting state management.
| Technique | Best For | Cost | Avoid When |
|---|---|---|---|
| Full SSR + Hydration | Simple pages, small bundles (<50kB) | High TTI, blocks main thread | Complex apps with large JS bundles |
| Streaming SSR + Progressive Hydration | Content-heavy pages, news sites | Moderate complexity, risk of mismatches | Real-time apps needing instant interactivity |
| Islands Architecture | Marketing sites, dashboards with few interactive widgets | Lower hydration cost, but state management overhead | Highly interactive apps (e.g., Figma, Google Docs) |
| React Server Components | Data-heavy pages, e-commerce product lists | Zero client JS for server components, but RSC payload size | Apps requiring client-side interactivity on every component |
React Server Components: A Hydration-Reduction Strategy
React Server Components (RSC) run entirely on the server, sending a serialized RSC payload (not HTML) to the client. This payload is used to render the component tree without downloading any JavaScript for server components. Only client components (marked with 'use client') are hydrated. This can reduce the bundle size by 40-60% for data-heavy pages, directly lowering the hydration tax. The trade-off: RSC payloads can be large (e.g., 10-20kB for a product list) and require a server runtime.
A junior might immediately reach for islands or streaming. A staff engineer first instruments Long Tasks and First Input Delay (FID) to quantify the actual hydration cost. If the total hydration time is under 200ms on a Moto G4, the optimization may be premature. Use performance.measure around the hydration root to get concrete numbers.
Partial Hydration and Resumability (Qwik)
Partial hydration (e.g., Marko) hydrates only the components that need interactivity, skipping static ones. Resumability (Qwik) takes this further: instead of replaying component logic, it serializes the application state in HTML and resumes execution on interaction. This eliminates hydration entirely, achieving near-zero JavaScript on page load. The cost: a larger initial HTML payload (state serialization) and a steeper learning curve. Qwik claims sub-50ms TTI even on slow devices, but the trade-off is compatibility with existing React ecosystems.
// Qwik: Resumable component (no hydration)
import { component$, useSignal } from '@builder.io/qwik';
export const Counter = component$(() => {
const count = useSignal(0);
return (
<button onClick$={() => count.value++}>
{count.value}
</button>
);
});
// The HTML includes serialized state, not JavaScript
// <button on:click="q://...">0</button>Avoid islands or resumability if your app has fewer than 5 interactive components or a total bundle size under 30kB. The overhead of splitting into islands (build tooling, state management) can outweigh the benefits. Similarly, don't use streaming SSR if your server response time is already under 200ms — the complexity of handling streaming errors and mismatches isn't worth it.
Measuring Hydration Cost
To measure hydration cost, track Time to Interactive (TTI) and First Input Delay (FID). Use the Long Tasks API to identify tasks over 50ms that block the main thread. A common metric: hydration time = TTI - FCP. For React, use ReactDOM.hydrateRoot and wrap it with performance.mark. Aim for hydration time < 200ms on a mid-range device (Moto G4). If it exceeds 500ms, consider islands or RSC.
Your team is building a high-traffic e-commerce product listing page. The page has 50 product cards, each with a 'Add to Cart' button. Current metrics: FCP = 1.2s, TTI = 4.8s, bundle size = 180kB. Which optimization strategy should you prioritize?
Network & Caching
When requests are slow or there are too many round-trips, the bottleneck is rarely the server's processing time — it's the network. At scale, every millisecond of latency compounds across millions of users, directly impacting LCP, INP, and bounce rates. This section covers the caching and protocol strategies that eliminate redundant trips, compress payloads, and keep the critical path under 2.5s LCP even on 3G.
HTTP Caching: Cache-Control Directives
The Cache-Control header is your primary lever for avoiding round-trips. max-age sets the freshness lifetime in seconds. immutable tells the browser never to revalidate during that lifetime — critical for versioned assets like app.abc123.js. stale-while-revalidate allows serving stale content while fetching a fresh copy in the background, hiding latency from the user.
// Server response for a versioned JS bundle Cache-Control: public, max-age=31536000, immutable // Server response for an API endpoint that can serve stale data Cache-Control: public, max-age=60, stale-while-revalidate=3600
ETag and Conditional Requests
ETags enable lightweight validation: the browser sends If-None-Match with the stored ETag, and the server returns 304 Not Modified with an empty body. This saves bandwidth but still costs one round-trip. Use ETags for dynamic resources that change infrequently but can't have a long max-age. For static assets, prefer immutable with a content-hash filename — that eliminates the round-trip entirely.
// Request with ETag validation GET /api/user/profile HTTP/1.1 If-None-Match: "686897696a7c876b7e" // Response when unchanged HTTP/1.1 304 Not Modified ETag: "686897696a7c876b7e"
| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| max-age + immutable | Versioned static assets (JS, CSS, fonts) | Low; no revalidation | Unversioned or frequently changing resources |
| ETag + 304 | Dynamic API responses, user-specific data | One round-trip per validation | High-frequency polling; use stale-while-revalidate instead |
| stale-while-revalidate | News feeds, product listings | Background fetch; no user wait | Real-time data (chat, stock prices) |
| no-cache | Always-fresh content (e.g., CSRF tokens) | Full round-trip each time | Static assets; wastes bandwidth |
A junior engineer sets max-age to 3600 on everything. A staff engineer measures the cache hit rate, the cost of a miss (e.g., 200ms vs 50ms), and the frequency of content changes. They know that immutable is only safe when the URL changes on every deploy — otherwise you break updates. Always validate with Lighthouse or WebPageTest before deciding on a caching strategy.
CDN and Edge Caching
CDNs cache at the edge, reducing latency from 200ms (cross-continent) to <20ms (local PoP). Use a CDN for static assets and cacheable API responses. Set s-maxage for CDN-specific TTLs, and stale-while-revalidate at the edge to serve stale content during origin failures. For dynamic content, consider edge workers (e.g., Cloudflare Workers, Lambda@Edge) to cache personalized responses with short TTLs.
Avoid caching authenticated or user-specific data at the edge unless you use a CDN that supports key-based cache invalidation (e.g., Varnish with Vary: Cookie). Caching a user's private dashboard for all visitors is a data leak. Also, don't cache POST responses — they are rarely idempotent. Stick to GET and HEAD.
HTTP/2 Multiplexing and Server Push Deprecation
HTTP/2 multiplexes multiple streams over a single TCP connection, eliminating head-of-line blocking. This reduces round-trips for multiple assets from N to 1. However, HTTP/2 server push was deprecated in Chrome (2022) because it often pushed resources the browser already cached, wasting bandwidth. Instead, use 103 Early Hints or preload hints in the HTML .
<!-- Use preload instead of server push --> <link rel="preload" href="/styles/main.css" as="style"> <link rel="preload" href="/scripts/app.js" as="script"> <!-- 103 Early Hints (server response before full HTML) --> HTTP/1.1 103 Early Hints Link: </styles/main.css>; rel=preload; as=style
HTTP/3 and QUIC Benefits
HTTP/3 uses QUIC over UDP, eliminating TCP head-of-line blocking and reducing connection setup from 3 round-trips (TCP+TLS) to 1. On mobile networks with packet loss, QUIC can improve load times by 15-30% because it doesn't stall all streams when one packet is lost. Enable HTTP/3 on your CDN — most major CDNs support it. For users on HTTP/2, ensure your server supports Alt-Svc headers to advertise HTTP/3.
Compression: Brotli vs Gzip
Brotli compresses text assets 20-30% better than gzip at level 4-6, with similar decompression speed. Use Brotli for HTML, CSS, JS, and SVG. Gzip is still useful for legacy clients (pre-2015 browsers) and for binary formats like WOFF fonts. Set Content-Encoding: br and negotiate via Accept-Encoding. For dynamic content, use Brotli level 4-5 to balance compression speed and ratio; for static, level 11 is fine.
// Nginx configuration for Brotli brotli on; brotli_comp_level 6; brotli_types text/html text/css application/javascript image/svg+xml; // Apache configuration AddOutputFilterByType BROTLI_COMPRESS text/html text/css application/javascript
Connection Reuse and Cache Hierarchy
Connection reuse reduces DNS lookups, TCP handshakes, and TLS negotiations. Use keep-alive (default in HTTP/1.1) and HTTP/2 multiplexing. The cache hierarchy is: browser memory cache (<10ms) → browser disk cache (10-50ms) → service worker cache (0-100ms) → CDN edge cache (10-100ms) → origin server (100-500ms). Optimize for the fastest layer: use Cache-Control: max-age=... immutable to hit memory cache, and register a service worker for offline-first strategies.
Target: LCP < 2.5s, INP < 200ms, CLS < 0.1. Each round-trip adds ~50ms on 4G, ~300ms on 3G. A 100kB Brotli-compressed JS file (vs 130kB gzip) saves ~30ms on 4G. Use Server-Timing headers to measure cache hit/miss at each layer.
You have a versioned CSS file (style.a1b2c3.css) that changes once a month. Which caching strategy minimizes round-trips while ensuring users get updates promptly?
Data Fetching Optimization
When your data layer over-fetches, waterfalls, or refetches needlessly, you waste bandwidth, block interactivity, and degrade Core Web Vitals. At scale, each unnecessary request adds latency and server load, pushing LCP beyond 2.5s and INP above 200ms. This section covers the patterns and trade-offs to eliminate these inefficiencies with judgement, not just tools.
Client Cache with React Query / SWR (Stale-While-Revalidate)
Both React Query and SWR implement stale-while-revalidate: serve cached data instantly (even if stale), then refetch in the background. This eliminates synchronous loading states and reduces perceived latency. The key trade-off is stale time — too short refetches on every mount, too long shows outdated data. Set staleTime based on data volatility: user profile (5-30 min), feed (30s-2min), real-time (0).
import { useQuery } from '@tanstack/react-query';
function Profile({ userId }) {
const { data, isLoading } = useQuery({
queryKey: ['user', userId],
queryFn: () => fetch(`/api/user/${userId}`).then(r => r.json()),
staleTime: 5 * 60 * 1000, // 5 min — user data rarely changes
gcTime: 30 * 60 * 1000, // keep in cache 30 min
});
// ...
}A junior adds caching blindly. A staff engineer instruments cache hit/miss ratios and measures time-to-interactive before and after. If your cache hit rate is below 80%, your staleTime or query key design is wrong. Use browser DevTools or RUM to confirm.
Prefetching and Preloading Data on Intent
Prefetch data before the user navigates — on hover, focus, or intersection. This hides network latency behind user thinking time. Use queryClient.prefetchQuery in React Query or preload in Next.js. The cost: extra bandwidth if the user doesn't follow through. Only prefetch when probability > 50% (e.g., hover on a link, not on page load).
import { useQueryClient } from '@tanstack/react-query';
function ProductCard({ productId }) {
const queryClient = useQueryClient();
const prefetch = () => {
queryClient.prefetchQuery({
queryKey: ['product', productId],
queryFn: () => fetch(`/api/product/${productId}`).then(r => r.json()),
staleTime: 60 * 1000,
});
};
return (
<a href={`/product/${productId}`} onMouseEnter={prefetch}>
{productId}
</a>
);
}Request Deduplication
When multiple components mount simultaneously and request the same data, deduplication merges them into one network call. React Query and SWR do this automatically by queryKey. Without it, you waste bandwidth and risk race conditions. The cost: a tiny CPU overhead for key comparison (< 0.1ms per key). Always use stable, serializable keys.
If your API returns user-specific data (e.g., /api/orders?userId=X), deduplication across users is dangerous. Ensure query keys include all parameters that affect the response. Also, avoid deduplication for real-time streams (WebSocket) — use a different cache layer.
Eliminating Over-Fetching: GraphQL Field Selection and REST Sparse Fieldsets
Over-fetching sends more data than needed, increasing payload size and parse time. GraphQL solves this natively with field selection. For REST, implement ?fields=id,name,email (sparse fieldsets). The trade-off: more complex API contracts and potential N+1 queries if fields require joins. Measure payload size — if you save < 1kB per request, the complexity isn't worth it.
| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| GraphQL field selection | Complex UIs with varying data needs | Schema maintenance, resolver complexity | Simple CRUD with fixed views |
| REST sparse fieldsets | Legacy APIs, simple clients | Backend parsing overhead, caching granularity | Payloads < 2kB already |
| Over-fetching (no optimization) | Prototypes, low-traffic pages | Bandwidth waste, slower parse time | Any page with > 10 requests or > 50kB total |
Cursor vs Offset Pagination
Offset pagination (?page=2&limit=20) is simple but breaks when items are inserted/deleted — users see duplicates or gaps. Cursor pagination (?cursor=abc123&limit=20) is stable and faster for large datasets because the database can use index scans. The trade-off: cursor pagination requires a unique, sortable cursor (often a timestamp or UUID) and is harder to implement for 'jump to page N' UX. For infinite scroll, always use cursor. For paginated tables with page numbers, offset is acceptable if data is static.
// Cursor-based pagination with React Query
async function fetchPosts({ pageParam = null }) {
const params = new URLSearchParams({ limit: '20' });
if (pageParam) params.set('cursor', pageParam);
const res = await fetch(`/api/posts?${params}`);
const data = await res.json();
return { posts: data.posts, nextCursor: data.next_cursor };
}
useInfiniteQuery({
queryKey: ['posts'],
queryFn: fetchPosts,
initialPageParam: null,
getNextPageParam: (lastPage) => lastPage.nextCursor,
});Optimistic Updates and Normalized Caching
Optimistic updates apply UI changes immediately before the server confirms, reducing perceived latency to < 50ms. Normalized caching (e.g., using normalizr or React Query's structural sharing) stores entities once and references them, preventing stale data across views. The cost: rollback logic on failure, and increased cache complexity. Only use optimistic updates for actions with high success probability (> 95%) and fast rollback (< 200ms).
// Optimistic update with React Query
const mutation = useMutation({
mutationFn: (newTodo) => fetch('/api/todos', { method: 'POST', body: JSON.stringify(newTodo) }),
onMutate: async (newTodo) => {
await queryClient.cancelQueries({ queryKey: ['todos'] });
const previous = queryClient.getQueryData(['todos']);
queryClient.setQueryData(['todos'], (old) => [...old, { ...newTodo, id: 'temp' }]);
return { previous };
},
onError: (err, newTodo, context) => {
queryClient.setQueryData(['todos'], context.previous);
},
onSettled: () => queryClient.invalidateQueries({ queryKey: ['todos'] }),
});Waterfall Elimination
Waterfalls occur when requests depend on previous responses (e.g., fetch user, then fetch their orders). Each hop adds RTT (often 50-200ms). Eliminate by: (1) parallelizing independent requests with Promise.all, (2) using server-side data aggregation (BFF pattern), or (3) prefetching dependent data. The goal: reduce waterfall depth to 1-2 levels. Measure with Chrome DevTools Network tab — any chain longer than 3 requests is a red flag.
Your team is building an infinite-scroll feed. The API currently uses offset pagination. Users report seeing duplicate items after scrolling. Which approach should you recommend, and what's the primary trade-off?
Perceived Performance
Perceived performance is the gap between what the browser reports and what the user feels. At scale, a 200ms improvement in occupied time can increase conversion by 2-5%, even when Lighthouse scores are perfect. This section covers techniques that manipulate user perception — not just metrics — to make interactions feel instant.
Skeleton Screens vs Spinners
A spinner says 'wait, I'm loading.' A skeleton screen says 'here's the structure, content is coming.' The key difference is occupied time — the user's brain is occupied processing layout, so the wait feels shorter. Skeletons reduce perceived latency by 30-50% in controlled studies, but they increase CLS if not sized correctly. Use skeletons for content with predictable layout (e.g., list items, cards); use spinners for unpredictable or short loads (<1s).
// React skeleton component with stable dimensions
function SkeletonCard() {
return (
<div className="skeleton-card" style={{ height: 120, width: '100%' }}>
<div className="skeleton-line" style={{ width: '60%', height: 16, marginBottom: 8 }} />
<div className="skeleton-line" style={{ width: '80%', height: 14 }} />
</div>
);
}
// CSS: animate with shimmer, no layout shift
.skeleton-card {
background: #f0f0f0;
border-radius: 8px;
padding: 16px;
animation: shimmer 1.5s infinite;
}
@keyframes shimmer {
0% { opacity: 0.6; }
50% { opacity: 1; }
100% { opacity: 0.6; }
}| Technique | Best for | Cost | Avoid when |
|---|---|---|---|
| Skeleton screens | Predictable layout, >1s load | CLS risk if no fixed dimensions | Dynamic content (e.g., search results) |
| Spinners | Unpredictable layout, <1s load | No layout shift, but feels slower | Long loads (>3s) without progress |
| Optimistic UI | Idempotent actions (like, save) | State rollback complexity | Non-idempotent or error-prone actions |
Optimistic UI for Instant Feedback
Optimistic UI updates the view before the server confirms the action. This keeps interactions under the 100ms threshold for perceived instantaneity. The trade-off: you must handle rollback on failure. Use it for idempotent actions (likes, toggles) where failure is rare. Never use it for financial transactions or destructive operations without confirmation.
// Optimistic like toggle with rollback
async function handleLike(postId) {
const previousLiked = posts[postId].liked;
// Optimistic update
setPosts(prev => ({
...prev,
[postId]: { ...prev[postId], liked: !previousLiked, likes: prev[postId].likes + (previousLiked ? -1 : 1) }
}));
try {
await api.likePost(postId);
} catch (error) {
// Rollback on failure
setPosts(prev => ({
...prev,
[postId]: { ...prev[postId], liked: previousLiked, likes: prev[postId].likes + (previousLiked ? 1 : -1) }
}));
showError('Failed to update like');
}
}A junior says 'we use skeletons.' A staff engineer says 'we measured that skeletons reduced perceived wait by 40% in our A/B test, but we also tracked that they increased CLS by 0.02. We accepted that trade-off because the conversion lift was 3%.' Always quantify the perception gap with real user monitoring (RUM) data.
Progressive and Streaming Loading
Streaming HTML (via frameworks like React 18's renderToPipeableStream) sends content as it's ready, not all at once. This improves First Paint and LCP by 30-50% for server-rendered pages. Progressive loading (e.g., lazy-loading images below the fold) reduces initial payload. The cost: more complex error handling and potential layout shifts if placeholders aren't sized.
// React 18 streaming with Suspense boundaries
import { Suspense } from 'react';
import { renderToPipeableStream } from 'react-dom/server';
function App() {
return (
<html>
<body>
<Header />
<Suspense fallback={<SkeletonNav />}>
<SlowNav />
</Suspense>
<Suspense fallback={<SkeletonContent />}>
<SlowContent />
</Suspense>
</body>
</html>
);
}
// Server-side streaming
app.get('/', (req, res) => {
const stream = renderToPipeableStream(<App />, {
onShellReady() {
res.setHeader('Content-Type', 'text/html');
stream.pipe(res);
}
});
});Instant Feedback Under 100ms
The 100ms threshold is from Jakob Nielsen's research: under 100ms, users perceive the system as reacting instantly. For interactions, this means the INP (Interaction to Next Paint) must be <200ms, but the visual feedback (e.g., button press state) should appear within 50ms. Use transition: transform 0.1s for press effects, and avoid long tasks (>50ms) on the main thread during interaction.
/* Instant button press feedback under 50ms */
button:active {
transform: scale(0.95);
transition: transform 0.05s; /* 50ms */
}
/* For async actions, show spinner immediately */
button.loading {
pointer-events: none;
opacity: 0.7;
}Optimistic UI is dangerous for non-idempotent actions (e.g., charging a credit card, sending an email). If the server fails, you've shown a success state that's false. Always pair optimistic updates with a clear error state and rollback. Also avoid it when the action has side effects that are hard to reverse (e.g., deleting a user).
The Loading Hierarchy
A loading hierarchy prioritizes what the user sees first. For a page: 1. Shell (header, nav) → 2. Skeleton layout → 3. Above-the-fold content → 4. Below-the-fold content. Each step should take <1s. The goal is to make the page feel interactive as fast as possible, even if not all data is loaded. This is the PRPL pattern (Push, Render, Pre-cache, Lazy-load).
- Shell: Render header, nav, footer immediately (static HTML or inline CSS).
- Skeleton: Show placeholder shapes for content areas.
- Above-the-fold: Load critical data first (e.g., hero image, main text).
- Below-the-fold: Lazy-load images, comments, related content.
Perceived vs Actual Performance
Actual performance is measured by LCP, FID/INP, CLS. Perceived performance is about occupied time — the time the user's brain is busy processing visual changes. A 2s load with a skeleton feels faster than a 1.5s load with a blank screen. Use First Meaningful Paint (FMP) or Speed Index to approximate perceived speed. The gap between actual and perceived is where UX wins happen.
Predictive Prefetch on Hover/Intent
Prefetching on hover (or even on touch start) can reduce perceived latency to near-zero for navigation. Use or fetch() with priority: 'low' when the user hovers over a link. The cost: extra bandwidth and CPU. Only prefetch if the user is likely to click (e.g., hover >200ms). For SPAs, preload the route's JavaScript bundle.
// Predictive prefetch on hover with debounce
let prefetchTimer;
function onLinkHover(url) {
clearTimeout(prefetchTimer);
prefetchTimer = setTimeout(() => {
const link = document.createElement('link');
link.rel = 'prefetch';
link.href = url;
document.head.appendChild(link);
}, 200); // Only prefetch after 200ms hover
}
// For SPA route prefetch
function onLinkHover(route) {
import(`./pages/${route}.js`); // Webpack will prefetch the chunk
}Prefetching every link on hover can waste bandwidth and CPU, especially on mobile. A 100ms hover threshold is too aggressive — use 200ms minimum. Also avoid prefetching for links that are likely to be cancelled (e.g., dropdown menus). Measure the cache hit rate: if <20% of prefetched resources are used, you're wasting resources.
You're optimizing a social media feed. Users report it 'feels slow' when liking a post, but the API response time is 80ms. What's the most likely cause and best fix?
Memory Optimization
In long-lived SPAs, memory growth is a silent killer. A leak that adds 1MB per hour will crash a user's tab after 8 hours of use — common in dashboards, chat apps, and editors. At scale, this degrades INP (long GC pauses >50ms) and increases OOM crashes. Staff engineers don't just fix leaks; they design systems that stay flat under load.
Leak Sources: The Usual Suspects
- Event listeners:
addEventListeneron removed DOM nodes withoutremoveEventListener. - Timers:
setIntervalthat references large objects and never clears. - Closures: Callbacks that capture entire scope chains, preventing GC of large arrays or DOM subtrees.
- Detached DOM: Nodes removed from the tree but still referenced by JS (e.g., in a cache or closure).
- Global caches: Unbounded
MaporArraythat grows with user actions.
// Leak: event listener on detached node
function setupButton() {
const button = document.createElement('button');
button.addEventListener('click', () => {
console.log('clicked');
});
document.body.appendChild(button);
// Later: button removed from DOM, but listener keeps reference -> detached DOM
document.body.removeChild(button);
}// Fixed: use AbortController or explicit cleanup
function setupButton() {
const button = document.createElement('button');
const controller = new AbortController();
button.addEventListener('click', () => {
console.log('clicked');
}, { signal: controller.signal });
document.body.appendChild(button);
// Cleanup
controller.abort();
document.body.removeChild(button);
}Garbage Collection: Mark-and-Sweep & Generational
V8 uses a generational GC: young generation (new space, ~1-8MB) collected frequently via scavenge; old generation (old space) collected via mark-and-sweep. A full GC can pause for 100-200ms on a 100MB heap. Staff engineers monitor GC pauses via performance.measureUserAgentSpecificMemory() (Chrome 89+) and aim for <50ms long tasks.
Before optimizing, run a heap snapshot in DevTools. If heap size is stable after 5 minutes of use, don't touch it. Premature memory optimization adds complexity. A staff engineer says: 'Show me the allocation timeline, then we talk.'
WeakMap / WeakSet / WeakRef for Collectable References
Use WeakMap when you need to associate metadata with a DOM node or object without preventing its GC. WeakRef is for advanced cases like caching large objects that should be collected under memory pressure. WeakSet is rare but useful for marking objects without preventing collection.
// WeakMap for DOM node metadata
const nodeData = new WeakMap();
function attachData(node, data) {
nodeData.set(node, data);
}
// When node is removed from DOM and all references gone, the WeakMap entry is GC'd automatically.
// No manual cleanup needed.| Technique | Best For | Cost | Avoid When |
|---|---|---|---|
| WeakMap | DOM node metadata, private data | Slightly slower lookup than Map | Need to iterate keys (not possible) |
| WeakSet | Marking objects without preventing GC | Minimal | Need to store primitives |
| WeakRef | Large caches that can be recreated | GC may clear ref at any time | Data must always be available |
| Map/Set | General purpose, iterable | Prevents GC of keys | Memory-sensitive contexts |
Heap Snapshots and Allocation Profiling
In DevTools, take a heap snapshot before and after a user action. Filter by 'detached' to find orphaned DOM nodes. Use the Allocation Profiler to record allocations over time — look for sawtooth patterns (GC cycles) that don't return to baseline. A 10MB sawtooth that grows to 15MB over 10 cycles indicates a leak.
WeakRef is non-deterministic — the GC may clear it at any time. Never use it for critical data that must be available synchronously. Also, creating many WeakRefs can itself increase GC overhead. Stick to WeakMap for 99% of cases.
Detecting Detached DOM Nodes
A detached DOM node is one removed from the document tree but still referenced in JS. In DevTools heap snapshot, search for 'Detached' in the class filter. Common culprits: closures in event listeners, React components not cleaned up, or cached DOM fragments. Aim for zero detached nodes in a production heap snapshot.
Object Pooling and Memory in SPAs
Object pooling reuses objects to reduce GC pressure. Useful for high-frequency operations like particle systems or virtual scrolling. But pooling adds complexity and can cause stale data bugs. Only pool when allocation rate exceeds 10MB/s or GC pauses exceed 50ms.
// Simple object pool
class VectorPool {
constructor(size) {
this.pool = Array.from({ length: size }, () => ({ x: 0, y: 0 }));
this.index = 0;
}
acquire() {
if (this.index >= this.pool.length) {
this.pool.push({ x: 0, y: 0 });
}
return this.pool[this.index++];
}
release() {
this.index = 0;
}
}Your SPA dashboard shows a 5MB heap growth after 1 hour of use. You take a heap snapshot and find 200 detached DOM nodes. What is the BEST first step?