Making HTML and 3D Work Together

Three.js renders to a canvas element. Everything else in a web application is HTML. Buttons, forms, tooltips, panels, data tables, navigation menus. Making these two worlds work together is one of the most persistent problems in Three.js development.

The question appears repeatedly across the Three.js forum, Stack Overflow, and Reddit. There is no single dominant answer because the right approach depends on what you are building: labels that track 3D objects, floating panels with detailed information, toolbar controls for the 3D scene, or full application UI that coexists with a 3D viewport.

Annotated Satellite Explorer

Seven annotation hotspots placed on different faces of a 3D satellite. Labels fade in when their surface faces the camera and fade out when it turns away. You cannot see all seven at once. The visibility check is a dot product between each annotation's surface normal and the camera direction: positive means facing, negative means hidden. Drag to rotate, scroll to zoom.

Each frame: Vector3.project() for screen position, dot(surfaceNormal, toCamera) for facing opacity. Overlay: pointer-events:none parent, pointer-events:auto cards.

The Constraint: Two Rendering Pipelines

The WebGL canvas and the HTML DOM are fundamentally different rendering pipelines. The canvas paints pixels directly to a bitmap. The DOM is a tree of styled elements managed by the browser's layout engine. They do not naturally interact. Layering HTML on top of a canvas is trivial (CSS z-index). The hard parts are not.

Position Synchronisation

Making an HTML element follow a 3D object as the camera moves. The element's CSS position must be recalculated every frame by projecting the 3D world position to 2D screen coordinates.

Pointer Event Conflicts

When HTML elements overlap the canvas, clicks hit either the HTML element or pass through to the 3D scene. Getting this right for overlapping tooltips, clickable labels, and transparent overlay regions requires careful event management.

Depth Ordering

In 3D, an object behind another should be occluded. HTML elements layered on a canvas do not participate in the 3D depth buffer. A label for a hidden object still appears on top unless you manually hide it.

Performance at Scale

Updating 500 HTML element positions every frame (16ms budget) adds significant DOM manipulation overhead. CSS transforms are fast, but element count matters.

These four constraints shape every integration decision. The approach that solves one often makes another worse.

The Naive Approach

The common first attempt works with 5 to 10 labels on a simple scene. It breaks with 50+ labels, complex layering, or any scene where occlusion matters.

✗

style.left/top per frame. Positioning HTML elements in pixels calculated from Vector3.project() every frame. Triggers layout recalculation for each element.

✗

Create/remove on hover. Show tooltips by creating a new DOM element on hover and removing it on mouse leave. Element creation and removal every interaction causes layout thrashing.

✗

Single event listener. One click handler that tries to determine whether the user clicked the HTML overlay or the 3D scene underneath. Edge cases multiply with every new interactive element.

✗

Ignore depth. Labels for objects behind other objects remain visible, overlapping and creating visual clutter that makes the interface unreadable.

CSS2DRenderer: Labels and Annotations

Three.js includes CSS2DRenderer, which positions HTML elements to match 3D object positions. Elements are regular HTML (style with CSS, populate with any content) but their position tracks a point in 3D space. CSS2DObject wraps an HTML element and stores a 3D position. CSS2DRenderer projects that position to screen coordinates every frame and applies a CSS transform.

✓

Searchable, selectable labels. CSS2D elements are real DOM nodes. Users can select text, screen readers can read them, browsers can search them.

✓

HTML tooltip content. Links, formatted text, images, embedded charts. Any HTML inside a tracking label.

✓

Screen-space collision avoidance. Because elements are in the DOM, you can detect overlaps and adjust positions to prevent label pile-up.

✓

Accessibility. Screen readers can navigate CSS2D elements. ARIA attributes work. Focus management is possible.

Performance note: CSS2DRenderer updates every element every frame. At 200+ elements, the DOM update cost becomes significant. Hide off-screen elements (check projected position against viewport bounds), use intersection observer patterns, or limit visible labels to the nearest N objects.

CSS3DRenderer: Panels in 3D Space

CSS3DRenderer places HTML elements into the 3D scene as if they were 3D objects. Elements have perspective, rotate with the scene, and respect 3D transforms. Useful for in-scene panels, floating screens, and information displays that feel like part of the 3D world.

When to use CSS3DRenderer

Panels that should feel like physical objects in the scene: information kiosks, floating dashboards, annotation planes. Content that needs to rotate and scale with the 3D perspective. Interfaces where the boundary between UI and 3D should blur.

The trade-off: CSS3D elements do not interact with the WebGL depth buffer. They always render on top of the canvas. For proper depth integration, you need careful masking or accept the limitation.

Canvas-Based UI: Text as Texture

For labels that must participate in 3D depth (visible or hidden based on position relative to other objects), render text to a 2D canvas and use it as a texture on a sprite or plane mesh. The result is a regular 3D object: it participates in depth testing, can be raycasted, and is occluded by objects in front of it.

Create an offscreen canvas element. Set dimensions based on expected text length and font size.

Draw text using the Canvas 2D API. Set font, colour, alignment, background.

Create a Three.js Texture from the canvas. Set filtering and wrapping.

Apply the texture to a SpriteMaterial. Sprites always face the camera.

Add the Sprite to the scene. It renders as a 3D object with full depth integration.

The trade-offs: canvas-rendered text is rasterised at a fixed resolution (blurry when zoomed), updating text content requires redrawing the canvas and reuploading the texture, and you lose CSS styling and DOM interactivity entirely.

In-Scene UI Libraries

For applications that need buttons, sliders, and input fields inside the 3D scene (VR/AR interfaces, immersive applications), dedicated libraries provide layout and interaction inside WebGL.

three-mesh-ui

UI panels as 3D meshes. MSDF fonts, block layout. Designed for WebXR.

pmndrs/uikit

For R3F. Flexbox in 3D space. Buttons, inputs, scrolling.

Troika 3D UI

CSS-inspired positioning and sizing. Flexbox-style layout.

These are appropriate when the entire experience is 3D (VR, immersive installations). For standard web applications with a 3D viewport, HTML overlays are simpler, more accessible, and offer the full power of CSS styling.

Approach	Positioning	Depth	Performance	Accessibility
CSS2DRenderer	Tracks 3D, screen-space	No depth integration	Good to 200 elements	Full DOM access
CSS3DRenderer	Full 3D perspective	No depth buffer	Good to 50 elements	Full DOM access
Canvas texture	Full 3D positioning	Participates in depth	Scales to thousands	No DOM access
In-scene UI	Full 3D positioning	Participates in depth	Moderate	No DOM access

Pointer Event Management

The most common interaction bug: clicking a tooltip closes it and simultaneously selects a 3D object behind it. Or clicking a 3D object does nothing because an invisible overlay absorbs the event. Correct pointer handling requires explicit layering.

HTML layer receives first

HTML elements sit above the canvas (z-index). They receive pointer events first. When a click hits an HTML element, stop propagation. Use pointer-events: none on transparent overlays; pointer-events: auto on interactive elements within.

Canvas layer receives passthrough

The canvas receives pointer events only when they pass through the HTML layer. Use Three.js raycasting on canvas events to determine which 3D object was hit.

Coordination flag

Set an isOverUI flag when the pointer enters an HTML element. Clear it on leave. In the canvas event handler, check this flag and skip raycasting if the pointer is over UI.

Touch devices: Touch events do not have persistent hover state. There is no mouseenter equivalent that fires before a click. Use pointerdown/pointerup rather than click for both layers. On pointerdown, check if the target is an HTML interactive element. If yes, handle in DOM. If no, forward to raycasting.

Layout Patterns

The arrangement of 3D viewport and HTML controls follows three common patterns, each suited to different application types.

Split View

3D viewport occupies 60-70% of the screen. Side panel shows details, controls, and data tables. Draggable splitter for user control.

Use ResizeObserver on the canvas container (not window resize) to catch both window and splitter-driven size changes.

Floating Panels

Information panels floating above the 3D scene, positioned absolutely. Property inspectors, sensor details, configuration forms.

Position relative to the viewport, not the 3D scene. A panel tracking a hidden object creates cognitive dissonance.

HUD Overlay

Persistent status information: current selection, coordinates, tool mode, camera position. Fixed relative to the viewport.

Keep minimal. Every pixel of overlay is a pixel of 3D scene the user cannot see or interact with.

The Business Link

HTML-WebGL integration determines how professional and usable a 3D application feels. The difference between a finished product and a frustrating prototype comes down to these details.

The economics: The patterns here are straightforward engineering, not novel research. The cost of getting them right is a few days of careful implementation. The cost of getting them wrong is an application that frustrates users on every interaction. Crisp, responsive option panels next to a smooth 3D viewer feel polished. Laggy labels, phantom clicks, and overlapping tooltips feel broken.

Build Polished 3D Interfaces

We build Three.js interfaces where the HTML and 3D layers work together without friction. Labels that track, panels that respond, events that route correctly, and performance that holds up at scale.

Let's talk about your interface →

← Back to Three.js Interfaces · Related: User Experience → · Related: Low-Friction Interfaces →

Development

Systems