Cute Kiwi doing cute things
FrostKiwi's
Secrets

Video Game Blurs (and how the best one works)

Created: 2025.09.03

Blurs are the basic building block for many video game post-processing effects and essential for sleek and modern GUIs. Video game Depth of Field and Bloom or frosted panels in modern user interfaces - used subtly or obviously - they’re everywhere. Even your browser can do it, just tap this sentence!

Texture coordinates, also called UV Coordinates or UVs for short
Effect of "Bloom", one of many use-cases for blur algorithms

Conceptually, “Make thing go blurry” is easy, boiling down to some form of “average colors in radius”. Doing so in realtime however, took many a graphics programmer through decades upon decades of research and experimentation, across computer science and maths. In this article, we’ll follow their footsteps.

A graphics programming time travel, if you will.

Using the GPU in the device you are reading this article on, and the WebGL capability of your browser, we’ll implement realtime blurring techniques and retrace the trade-offs graphics programmers had to make in order to marry two, sometimes opposing, worlds: Mathematical theory and Technological reality.

This is my submission to this year's Summer of Math Exposition

With many interactive visualizations to guide us, we’ll journey through a bunch of blurs, make a detour through frequency space manipulations, torture your graphics processor to measure performance, before finally arriving at an algorithm with years worth of cumulative graphics programmer sweat - The ✨ Dual Kawase Blur 🌟

Setup - No blur yet #

In the context of video game post-processing, a 3D scene is drawn, also called rendering, and saved to an intermediary image - a framebuffer. In turn, this framebuffer is processed to achieve various effects. Since this processing happens after a 3D scene is rendered, it’s called post-processing. All that, many times a second.

Depending on technique, framebuffers can hold non-image data and post-processing effects like Color-correction or Tone-mapping don't even require intermediate framebuffers: There's more than one way (@35:20)

This is where we jump in: with a framebuffer in hand, after the 3D scene was drawn. We’ll use a scene from a mod called NEOTOKYO°. Each time we’ll implement a blur, there will be a box, a canvas instructed with WebGL 1.0, rendering at native resolution of your device. Each box has controls and relevant parts of its code below.

No coding or graphics programming knowledge required to follow along. But also no curtains! You can always see how we talk with your GPU. Terms and meanings will be explained, once it's relevant.
❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
FPS:?/?ms Resolution:?x?
lightBrightness
lightBrightness100 %
Blur Fragment Shader noBlurYet.fs
/* This is the blur "fragment shader", a program that runs on the GPU.
   In *this* article, the blur fragment shader runs once per output pixel of the
   canvas */

/* Required in WebGL 1 Shaders and depending on platform may have no effect.
   For Later: Strong blurs may have a lot of minute color contributions, so we
   set it "highp" here, the maximum. */
precision highp float;

/* UV coordinates, passed in from the Vertex Shader "simpleQuad.vs".
   This tells our current output pixel where to read our texture from. */
varying vec2 uv;

/* lightBrightness input. The reason light brightness is in the fragment shader
   of the blur and not a value applied in a step before our blur shader before,
   is due to color precision limits. */
uniform float lightBrightness;

/* Out texture input */
uniform sampler2D texture;

/* The "main" function, where which is executed by our GPU */
void main() {
	/* gl_FragColor is the output of our shader. texture2D is the texture read,
	   performed with the current 'uv' coordinate. Then multiplied by
	   our lightBrightness value (a multiplier with eg. 1.0 at 100%, 0.5 at 50%)
	   In "scene" mode, this value is locked to 1.0 so it has no effect */
	gl_FragColor = texture2D(texture, uv) * lightBrightness;
}
WebGL Javascript simple.js
import * as util from '../utility.js'

export async function setupSimple() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-Simple');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	/* State and Objects */
	const ctx = {
		/* State for of the Rendering */
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		/* Textures */
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null },
		/* Framebuffers */
		fb: { scene: null, final: null },
		/* Shaders and their respective Resource Locations */
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, lightBrightness: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		}
	};

	/* Shaders */
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const noBlurYetFrag = await util.fetchShader("shader/noBlurYet.fs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Force set to scene to fix a reload bug in Firefox Android */
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	/* Draw Texture Shader */
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	/* Draw bloom Shader */
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	/* Helper for recompilation */
	function reCompileBlurShader() {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, noBlurYetFrag, ["lightBrightness"]);
	}

	/* Blur Shader */
	reCompileBlurShader()

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* UI Stats */
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		/* Setup PostProcess Framebuffer */
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		/* Draw Call */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		/* Box blur at native resolution */
		gl.useProgram(ctx.shd.blur.handle);
		const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
		gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.lightBrightness, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		if (ctx.mode == "bloom") {
			/* Now do the bloom composition to the screen */
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		/* Ask for CPU-GPU Sync to prevent overloading the GPU during compositing.
		   In reality this is more likely to be flush, but still, it seems to
		   help on multiple devices with during low FPS */
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

We don’t have a blur implemented yet, not much happening. Above the box you have an Animate button, which will move the scene around to tease out problems of upcoming algorithms. Movement happens before our blur will be applied, akin to the player character moving. To see our blur in different use-cases, there are 3 modes:

Different blur algorithms behave differently based on use-case. Some are very performance efficient, but break under movement. Some reveal their flaws with small, high contrast regions like far-away lights
Adding the blurred emission pass as we do in this article, or thresholding the scene and blurring that, is not actually how modern video games do bloom. We'll get into that a bit later.

Finally, you see Resolution of the canvas and Frames per Second / time taken per frame, aka “frametime”. A very important piece of the puzzle is performance, which will become more and more important as the article continues and the mother of invention behind our story.

Frame-rate will be capped at your screen's refresh rate, most likely 60 fps / 16.6 ms. We'll get into proper benchmarking as our hero descents this article into blurry madness

Technical breakdown #

Understanding the GPU code is not necessary to follow this article, but if you do choose to peek behind the curtain, here is what you need to know

We’ll implement our blurs as a fragment shader written in GLSL. In a nut-shell, a fragment shader is code that runs on the GPU for every output-pixel, in-parallel. Image inputs in shaders are called Textures. These textures have coordinates, often called UV coordinates - these are the numbers we care about.

Technically, fragment shaders run per fragment, which aren't necessarily pixel sized and there are other ways to read framebuffers, but none of that matters in the context of this article.
Texture coordinates, also called UV Coordinates or UVs for short
Texture coordinates, also called "UV" Coordinates or "UVs" for short
Note the squished appearance of the image

UV coordinates specify the position we read in the image, with bottom left being 0,0 and the top right being 1,1. Neither UV coordinates, nor shaders themselves have any concept of image resolution, screen resolution or aspect ratio. If we want to address individual pixels, it’s on us to express that in terms of UV coordinates.

Although there are ways to find out, we don't know which order output-pixels are processed in, and although the graphics pipeline can tell us, the shader doesn't even know which output-pixel it currently processes

The framebuffer is passed into the fragment shader in line uniform sampler2D texture as a texture. Using the blur shader, we draw a “Full Screen Quad”, a rectangle covering the entire canvas, with matching 0,0 in the bottom-left and 1,1 in the top-right varying vec2 uv UV coordinates to read from the texture.

Due to automatic interpolation

The texture’s aspect-ratio and resolution are the same as the output canvas’s aspect-ratio and resolution, thus there is a 1:1 pixel mapping between the texture we will process and our output canvas. The graphics pipeline steps and vertex shader responsible for this are not important for this article.

The blur fragment shader accesses the color of the texture with texture2D(texture, uv), at the matching output pixel’s position. In following examples, we’ll read from neighboring pixels, for which we’ll need to calculate a UV coordinate offset, a decimal fraction corresponding to one pixel step, calculated with with 1 / canvasResolution

One way to think of fragment shader code is "What are the instructions to construct this output pixel?"

Graphics programming is uniquely challenging in the beginning, because of how many rules and limitations the hardware, graphics APIs and the rendering pipeline impose. But it also unlocks incredible potential, as other limitations dissolve. Let’s find out how graphics programmers have leveraged that potential.

Box Blur #

From a programmer’s perspective, the most straight forward way is to average the neighbors of a pixel using a for-loop. What the fragment shader is expressing is: “look Y pixels up & down, X pixels left & right and average the colors”. The more we want to blur, the more we have to increase kernelSize, the bounds of our for-loop.

/* Read from the texture y amount of pixels above and below */
for (int y = -kernel_size; y <= kernel_size; ++y) {
/* Read from the texture x amount of pixels to the left and the right */
	for (int x = -kernel_size; x <= kernel_size; ++x) {
		/* Offset from current pixel, indicating which pixel to read */
		vec2 offset = vec2(x, y) * samplePosMult * frameSizeRCP;
		/* Read and sum up the color contribution of that pixel */
		sum += texture2D(texture, uv + offset);
	}
}

The bigger the for-loop, the more texture reads we perform, per output-pixel. Each texture read is often called a “texture tap” and the total amount of those “taps” per-frame will now also be displayed. New controls, new samplePosMultiplier, new terms - Play around with them, get a feel for them, with a constant eye on FPS.

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
FPS:?/?ms Resolution:?x?Texture Taps:?
kernelSize
kernelSize7x7 px
samplePosMultiplier
samplePosMultiplier100 %
lightBrightness
lightBrightness100 %
Blur Fragment Shader boxBlur.fs
/* Float precision to highp, if supported. Large Kernel Sizes result many color
   contributions and thus require the highest precision to avoid clipping.
   Required in WebGL 1 Shaders and depending on platform may have no effect */
precision highp float;
/* UV coordinates, passed in from the Vertex Shader */
varying vec2 uv;

/* Resolution Reciprocal. Getting to the next pixel requires us to calculate
   `UV Coordinate / frameSize`. On hardware, doing a division is slightly slower
   than doing a multiplication. Since the shader is run per pixel, we avoid the
   per-pixel division, by calculating the reciprocal 1 / frameSize and pass it
   into the shader. A very popular micro-optimization across graphics programming */
uniform vec2 frameSizeRCP;
uniform float samplePosMult; /* Multiply to push blur strength past the kernel size */

uniform float bloomStrength; /* bloom strength */

uniform sampler2D texture;
/* `KERNEL_SIZE` added during compilation */
const int kernel_size = KERNEL_SIZE;

void main() {
	/* Variable to hold our final color for the current pixel */
	vec4 sum = vec4(0.0);
	/* How big one side of the sampled square is */
	const int size = 2 * kernel_size + 1;
	/* Total number of samples we are going to read */
	const float totalSamples = float(size * size);

	/* Read from the texture y amount of pixels above and below */
	for (int y = -kernel_size; y <= kernel_size; ++y) {
	/* Read from the texture x amount of pixels to the left and the right */
		for (int x = -kernel_size; x <= kernel_size; ++x) {
			/* Offset from the current pixel, indicating which pixel to read */
			vec2 offset = vec2(x, y) * samplePosMult * frameSizeRCP;
			/* Read and sum up the contribution of that pixel */
			sum += texture2D(texture, uv + offset);
		}
	}

	/* Return the sum, divided by the number of samples (normalization) */
	gl_FragColor = (sum / totalSamples) * bloomStrength;
}
WebGL Javascript boxBlur.js
import * as util from '../utility.js'

export async function setupBoxBlur() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-BoxBlur');
	const WebGLBoxDetail = document.getElementById('WebGLBox-BoxBlurDetail');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	/* State and Objects */
	const ctx = {
		/* State for of the Rendering */
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		/* Textures */
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null },
		/* Framebuffers */
		fb: { scene: null, final: null },
		/* Shaders and their respective Resource Locations */
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: WebGLBoxDetail.querySelector('#renderer'),
			iterTime: WebGLBoxDetail.querySelector('#iterTime'),
			tapsCount: WebGLBoxDetail.querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	/* Shaders */
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const boxBlurFrag = await util.fetchShader("shader/boxBlur.fs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Force set to scene to fix a reload bug in Firefox Android */
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		/* spin up the Worker (ES-module) */
		const worker = new Worker("./js/benchmark/boxBlurBenchmark.js", { type: "module" });

		/* pass all data the worker needs */
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: boxBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value
		});

		/* Benchmark */
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	/* Draw Texture Shader */
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	/* Draw bloom Shader */
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	/* Helper for recompilation */
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, boxBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	/* Blur Shader */
	reCompileBlurShader(ui.blur.kernelSize.value)

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* UI Stats */
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		const tapsNewText = (canvas.width * canvas.height * KernelSizeSide * KernelSizeSide / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		/* Setup PostProcess Framebuffer */
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		/* Draw Call */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		/* Box blur at native resolution */
		gl.useProgram(ctx.shd.blur.handle);
		const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
		gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
		gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		if (ctx.mode == "bloom") {
			/* Now do the bloom composition to the screen */
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		/* Ask for CPU-GPU Sync to prevent overloading the GPU during compositing.
		   In reality this is more likely to be flush, but still, it seems to
		   help on multiple devices with during low FPS */
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

Visually, the result doesn’t look very pleasing. The stronger the blur, the more “boxy” features of the image become. This is due to us reading and averaging the texture in a square shape. Especially in bloom mode, with strong lightBrightness and big kernelSize, lights become literally squares.

Performance is also really bad. With bigger kernelSizes, our Texture Taps count skyrockets and performance drops. Mobile devices will come to a slog. Even the worlds fastest PC graphics cards will fall below screen refresh-rate by cranking kernelSize and zooming the article on PC, thus raising canvas resolution.

We kinda failed on all fronts. It looks bad and runs bad.

Then, there’s this samplePosMultiplier. It seems to also seemingly increase blur strength, without increasing textureTaps or lowering performance (or lowering performance just a little on certain devices). But if we crank that too much, we get artifacts in the form of repeating patterns. Let’s play with a schematic example:

kernelSize
kernelSize3×3
samplePosMult
samplePosMult100%

On can say, that an image is a “continous 2D signal”. When we texture tap at a specific coordinate, we are sampling the “image signal” at that coordinate. As previously mentioned, we use UV coordinates and are not bound by concepts like “pixels position”. Where we place our samples is completely up to us.

A fundamental blur algorithm option is increasing the sample distance away from the center, thus increasing the amount of image we cover with our samples - more bang for your sample buck. This works by multiplying the offset distance. That is what samplePosMult does and is something you will have access to going forward.

Doing it too much, brings ugly repeating patterns. This of course leaves some fundamental questions, like where these artifacts come from and what it even means to read between two pixels. And on top of that we have to address performance and the boxyness of our blur! But first…

What even is a kernel? #

What we have created with our for-loop, is a convolution. Very simplified, in the context of image processing, it’s usually a square of numbers constructing an output pixel, by gathering and weighting pixels, that the square covers. The square is called a kernel and was the thing we visualized previously.

For blurs, the kernel weights must sum up to 1. If that were not the case, we would either brighten or darken the image. Ensuring that is the normalization step. In the box blur above, this happens by dividing the summed pixel color by totalSamples, the total amount of samples taken. A basic “calculate the average” expression.

The same can be expressed as weights of a kernel, a number multiplied with each sample at that position. Since the box blur weighs all sample the same regardless of position, all weights are the same. This is visualized next. The bigger the kernel size, the smaller the weights.

kernelSize
kernelSize3×3
samplePosMult
samplePosMult100%

Kernels applied at the edges of our image will read from areas “outside” the image, with UV coordinates smaller than 0,0 and bigger than 1,1. Luckily, the GPU handles this for us and we are free to decide what happens to those outside samples, by setting the Texture Wrapping mode.

Texture Wrapping Modes and results on blurring
Texture Wrapping Modes and results on blurring (Note the color black bleeding-in)
Top: Framebuffer, zoomed out. Bottom: Framebuffer normal, with strong blur applied

Among others, we can define a solid color to be used, or to “clamp” to the nearest edge’s color. If we choose a solid color, then we will get color bleeding at the edges. Thus for almost all post-processing use-cases, edge color clamping is used, as it prevents weird things happening at the edges. This article does too.

You may have noticed a black "blob" streaking with stronger blur levels along the bottom. Specifically here, it happens because the lines between the floor tiles align with the bottom edge, extending black color to infinity

Convolution as a mathematical concept is surprisingly deep and 3blue1brown has an excellent video on it, that even covers the image processing topic. Theoretically, we won’t depart from convolutions. We can dissect our code and express it as weights and kernels. With the for-loop box blur, that was quite easy!

But what is a convolution?
YouTube Video by 3Blue1Brown

On a practical level though, understanding where the convolution is, how many there are and what kernels are at play will become more and more difficult, once we leave the realm of classical blurs and consider the wider implications of reading between pixel bounds. But for now, we stay with the classics:

Gaussian Blur #

The most famous of blur algorithms is the Gaussian Blur. It uses the normal distribution, also known as the bell Curve to weight the samples inside the kernel, with a new variable sigma σ to control the flatness of the curve. Other than generating the kernel weights, the algorithm is identical to the box blur algorithm.

Gaussian blur weights formula for point (x,y) (Source)

To calculate the weights for point (x,y), the above formula is used. The gaussian formula has a weighting multiplier 1/√(2πσ²). In the code, there is no such thing though. The formula expresses the gaussian curve as a continuous function going to infinity. But our code and its for-loop are different - discrete and finite.

float gaussianWeight(float x, float y, float sigma)
{
	/* (x² + y²) / 2 σ² */
	return exp(-(x * x + y * y) / (2.0 * sigma * sigma));
}
For clarity, the kernel is generated in the fragment shader. Normally, that should be avoided. Fragment shaders run per-output-pixel, but the kernel weights stay the same, making this inefficient.

Just like with the box blur, weights are summed up and divided at the end, instead of the term 1/√(2πσ²) precalculating weights. sigma controls the sharpness of the curve and thus the blur strength, but wasn’t that the job of kernelSize? Play around with all the values below and get a feel for how the various values behave.

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
FPS:?/?ms Resolution:?x?Texture Taps:?
kernelSize
kernelSize7x7 px
samplePosMultiplier
samplePosMultiplier100 %
lightBrightness
lightBrightness100 %
sigma
sigma±2.00σ
Blur Fragment Shader gaussianBlur.fs
/* Float precision to highp, if supported. Large Kernel Sizes result many color
   contributions and thus require the highest precision to avoid clipping.
   Required in WebGL 1 Shaders and depending on platform may have no effect */
precision highp float;
/* UV coordinates, passed in from the Vertex Shader */
varying vec2 uv;

uniform vec2 frameSizeRCP; /* Resolution Reciprocal */
uniform float samplePosMult; /* Multiply to push blur strength past the kernel size */
uniform float sigma;

uniform float bloomStrength; /* bloom strength */

uniform sampler2D texture;
/* `KERNEL_SIZE` added during compilation */
const int kernel_size = KERNEL_SIZE;

float gaussianWeight(float x, float y, float sigma)
{
	/* (x² + y²) / 2 σ² */
	return exp(-(x * x + y * y) / (2.0 * sigma * sigma));
}

void main() {
	/* Variable to hold our final color for the current pixel */
	vec4 sum = vec4(0.0);
	/* Sum of all weights */

	float weightSum = 0.0;
	/* How big one side of the sampled square is */
	const int size = 2 * kernel_size + 1;
	/* Total number of samples we are going to read */
	const float totalSamples = float(size * size);

	/* Read from the texture y amount of pixels above and below */
	for (int y = -kernel_size; y <= kernel_size; ++y) {
	/* Read from the texture x amount of pixels to the left and the right */
		for (int x = -kernel_size; x <= kernel_size; ++x) {

			/* Calculate the required weight */
			float w = gaussianWeight(float(x), float(y), sigma);
			/* Offset from the current pixel, indicating which pixel to read */
			vec2 offset  = vec2(x, y) * samplePosMult * frameSizeRCP;

			/* Read and sum up the contribution of that pixel, weighted */
			sum += texture2D(texture, uv + offset) * w;
			weightSum += w;
		}
	}

	/* Return the sum, divided by the number of samples (normalization) */
	gl_FragColor = (sum / weightSum) * bloomStrength;
}
WebGL Javascript gaussianBlur.js
import * as util from '../utility.js'

export async function setupGaussianBlur() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-GaussianBlur');
	const WebGLBoxDetail = document.getElementById('WebGLBox-GaussianBlurDetail');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	/* State and Objects */
	const ctx = {
		/* State for of the Rendering */
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		/* Textures */
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null },
		/* Framebuffers */
		fb: { scene: null, final: null },
		/* Shaders and their respective Resource Locations */
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, sigma: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			sigma: WebGLBox.querySelector('#sigmaRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: WebGLBoxDetail.querySelector('#renderer'),
			iterTime: WebGLBoxDetail.querySelector('#iterTime'),
			tapsCount: WebGLBoxDetail.querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	/* Shaders */
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const gaussianBlurFrag = await util.fetchShader("shader/gaussianBlur.fs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.sigma.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Force set to scene to fix a reload bug in Firefox Android */
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		/* spin up the Worker (ES-module) */
		const worker = new Worker("./js/benchmark/gaussianBlurBenchmark.js", { type: "module" });

		/* pass all data the worker needs */
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: gaussianBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value,
			sigma: ui.blur.sigma.value
		});

		/* Benchmark */
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	/* Draw Texture Shader */
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	/* Draw bloom Shader */
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);


	/* Helper for recompilation */
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, gaussianBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength", "sigma"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	/* Blur Shader */
	reCompileBlurShader(ui.blur.kernelSize.value)

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);


		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* UI Stats */
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		const tapsNewText = (canvas.width * canvas.height * KernelSizeSide * KernelSizeSide / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		/* Setup PostProcess Framebuffer */
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		/* Draw Call */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		/* Gaussian blur at native resolution */
		gl.useProgram(ctx.shd.blur.handle);
		const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
		gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
		gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
		gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		if (ctx.mode == "bloom") {
			/* Now do the bloom composition to the screen */
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		/* Ask for CPU-GPU Sync to prevent overloading the GPU during compositing.
		   In reality this is more likely to be flush, but still, it seems to
		   help on multiple devices with during low FPS */
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

The blur looks way smoother than our previous box blur, with things generally taking on a “rounder” appearance, due to the bell curve’s smooth signal response. That is, unless you move the sigma slider down. If you move sigma too low, you will get our previous box blur like artifacts again.

Let’s clear up what the values actually represent and how they interact. The following visualization shows the kernel with its weights expressed as height in an Isometric Dimetric perspective projection. There are two different interaction modes with sigma when changing kernelSize and two ways to express sigma.

kernelSize
kernelSize7×7
sigma
sigma ±3.00σ 1.00px

sigma describes the flatness of our mathematical curve, a curve going to infinity. But our algorithm has a limited kernelSize. Where the kernel stops, no more pixel contributions occur, leading to box-blur-like artifacts due to the cut-off. In the context of image processing, there are two ways to setup a gaussian blur…

A small sigma, thus a flat bell curve, paired with a small kernel size effectively is a box blur, with the weights making the kernel box-shaped.

… way 1: Absolute Sigma. sigma is an absolute value in pixels independent of kernelSize, with kernelSize acting as a “window into the curve” or way 2: sigma is expressed relative to the current kernelSize. For practical reasons (finicky sliders) the relative to kernelSize mode is used everywhere.

Eitherway, the infinite gaussian curve will have a cut-off somewhere. sigma too small? - We get box blur like artifacts. sigma too big? - We waste blur efficiency, as the same perceived blur strength requires bigger kernels, thus bigger for-loops with lower performance. An artistic trade-off every piece of software has to make.

An optimal kernel would be one, where the outer weights are almost zero. Thus, if we increased kernelSize in Absolute Sigma mode by one, it would make close to no more visual difference.

There are other ways of creating blur kernels, with other properties. One way is to follow Pascal’s triangle to get a set of predefined kernel sizes and weights. These are called Binomial Filters and lock us into specific “kernel presets”, but solve the infinity vs cut-off dilemma, by moving weights to zero within the sampling window.

Binomial Kernels are also Gaussian-like in their frequency response. We won’t expand on these further, just know that we can choose kernels by different mathematical criteria, chasing different signal response characteristics. But speaking of which, what even is Gaussian Like? Why do we care?

What is Gaussian-like? #

In Post-Processing Blur algorithms you generally find two categories. Bokeh Blurs and Gaussian-Like Blurs. The gaussian is chosen for its natural appearance, its ability to smooth colors without “standout features”. Gaussian Blurs are generally used as an ingredient in an overarching visual effect, be it frosted glass Interfaces or Bloom.

Bokeh blur, gaussian blur comparison
Bokeh Blur and Gaussian Blur compared.

In contrast to that, when emulating lenses and or creating Depth of Field, is “Bokeh Blur” - also known as “Lens Blur” or “Cinematic Blur”. This type of blur is the target visual effect. The challenges and approaches are very much related, but algorithms used differ.

Algorithms get really creative in this space, all with different trade-offs and visuals. Some sample using a poission disk distribution and some have cool out of the box thinking: Computerphile covered a comlex numbers based approach to creating Bokeh Blurs, a fascinating number theory cross-over.

Video Game & Complex Bokeh Blurs
YouTube Video by Computerphile

This article though doesn’t care about these stylistics approaches. We are here to chase a basic building block of graphics programming and realtime visual effects, a “Gaussian-Like” with good performance. Speaking of which!

Performance #

The main motivator of our journey here, is the chase of realtime performance. Everything we do must happen within few milliseconds. The expected performance of an algorithm and the practical cost once placed in the graphics pipeline, are sometimes surprisingly different numbers though. Gotta measure!

This chapter is about a very technical motivation. If you don't care about how fast a GPU does what it does, feel free to skip this section.

With performance being such a driving motivator, it would be a shame if we couldn’t measure it in this article. Each WebGL Box has a benchmark function, which blurs random noise at a fixed resolution of 1600x1200 with the respective blur settings you chose and a fixed iteration count workload, a feature hidden so far.

Realtime graphics programming is sometimes more about measuring than programming.

Benchmarking is best done by measuring shader execution time. This can be done in the browser reliably, but only on some platforms. No way exists to do so across all platforms. Luckily, there is the classic method of “stalling the graphics pipeline”, forcing a wait until all commands finish, a moment in time we can measure.

Across all platforms a stall is guaranteed to occur on command gl.readPixels(). Interestingly, the standards conform command for this: gl.finish() is simply ignored by mobile apple devices.

Below is a button, that unlocks this benchmarking feature, unhiding a benchmark button and Detailed Benchmark Results section under each blur. This allows you to start a benchmark with a preset workload, on a separate Browser Worker. There is only one issue: Browsers get very angry if you full-load the GPU this way.

If the graphics pipeline is doing work without reporting back (called “yielding”) to the browser for too long, browsers will simply kill all GPU access for the whole page, until tab reload. If we yield back, then the measured results are useless and from inside WebGL, we can’t stop the GPU, once its commands are issued.

⚠️ Especially on mobile: please increase kernelSize and iterations slowly. The previous algorithms have bad kernelSize performance scaling on purpose, be especially careful with them.

Stay below 2 seconds of execution time, or the browser will lock GPU access for the page, disabling all blur examples, until a browser restart is performed. On iOS Safari this requires a trip to the App Switcher, a page reload won't be enough.
iOS and iPad OS are especially strict, will keep GPU access disabled, even on Tab Reload. You will have go to the App Switcher (Double Tap Home Button), Swipe Safari Up to close it and relaunch it from scratch.

What are we optimizing for? #

With the above Box Blur and above Gaussian Blur, you will measure performance scaling very badly with kernelSize. Expressed in the Big O notation, it has a performance scaling of O(pixelCount * kernelSize²). Quadratic scaling of required texture taps in terms of kernelSize. We need to tackle this going forward.

Especially dedicated Laptop GPUs are slow to get out of their lower power states. Pressing the benchmark button multiple times in a row may result in the performance numbers getting better.

Despite the gaussian blur calculating the kernel completely from scratch on every single pixel in our implementation, the performance of the box blur and gaussian blur are very close to each other at higher iteration counts. In fact, by precalculating the those kernels we could performance match both.

But isn't gaussian blur is more complicated algorithm?

As opposed to chips from decades ago, modern graphics cards have very fast arithmetic, but comparatively slow memory access times. With workloads like these, the slowest thing becomes the memory access, in our case the texture taps. The more taps, the slower the algorithm.

Our blurs perform a dependant texture read, a graphics programming sin. This is when texture coordinates are determined during shader execution, which opts out of a many automated shader optimizations.

Especially on personal computers, you may also have noticed that increasing samplePosMultiplier will negatively impact performance (up to a point), even though the required texture taps stay the same.

This is due hardware texture caches accelerating texture reads which are spatially close together and not being able to do so effectively, if the texture reads are all too far apart. Platform dependant tools like Nvidia NSight can measure GPU cache utilization. The browser cannot.

These are key numbers graphics programmers chase when writing fragment shaders: Texture Taps and Cache Utilization. There is another one, we will get into in a moment. Clearly, our Blurs are slow. Time for a speed up!

Separable Gaussian Blur #

We have not yet left the classics of blur algorithms. One fundamental concept left on the table is “convolution separability”. Certain Convolutions like our Box Blur, our Gaussian Blur and the Binominal filtering mentioned in passing previously can all be performed in two separate passes, by two separate 1D Kernels.

Gaussian blur weights formula for, separated

Not all convolutions are separable. In the context of graphics programming: If you can express the kernel weights as a formula with axes X, Y and factor-out both X and Y into two separate formulas, then you have gained separability of a 2D kernel and can perform the convolution in two passes, massively saving on texture taps.

Some big budget video games have used effects with kernels that are not separable, but did it anyway in two passes + 1D Kernel for the performance gain, with the resulting artifacts being deemed not too bad.

Computerphile covered the concept of separability in the context of 2D image processing really well, if you are interested in a more formal explanation.

Separable Filters and a Bauble
YouTube Video by Computerphile

Here is our Gaussian Blur, but expressed as a separable Version. You can see just Pass 1 and Pass 2 in isolation or see the final result. Same visual quality as our Gaussian Blur, same dials, but massively faster, with no more quadratic scaling of required texture taps.

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
FPS:?/?ms Resolution:?x?Texture Taps:?
kernelSize
kernelSize7x7 px
samplePosMultiplier
samplePosMultiplier100 %
lightBrightness
lightBrightness100 %
sigma
sigma±2.00σ
Blur Fragment Shader gaussianSeparableBlur.fs
/* Float precision to highp, if supported. Large Kernel Sizes result many color
   contributions and thus require the highest precision to avoid clipping.
   Required in WebGL 1 Shaders and depending on platform may have no effect */
precision highp float;
/* UV coordinates, passed in from the Vertex Shader */
varying vec2 uv;

uniform vec2 frameSizeRCP; /* Resolution Reciprocal */
uniform float samplePosMult; /* Multiply to push blur strength past the kernel size */
uniform float sigma;
uniform vec2 direction; /* Direction vector: (1,0) for horizontal, (0,1) for vertical */

uniform float bloomStrength; /* bloom strength */

uniform sampler2D texture;
/* `KERNEL_SIZE` added during compilation */
const int kernel_size = KERNEL_SIZE;

float gaussianWeight(float x, float sigma)
{
	/* x² / 2 σ² */
	return exp(-(x * x) / (2.0 * sigma * sigma));
}

void main() {
	/* Variable to hold our final color for the current pixel */
	vec4 sum = vec4(0.0);
	/* Sum of all weights */
	float weightSum = 0.0;
	
	/* How big one side of the sampled line is */
	const int size = 2 * kernel_size + 1;

	/* Sample along the direction vector (horizontal or vertical) */
	for (int i = -kernel_size; i <= kernel_size; ++i) {
		/* Calculate the required weight for this 1D sample */
		float w = gaussianWeight(float(i), sigma);
		
		/* Offset from the current pixel along the specified direction */
		vec2 offset = vec2(i) * direction * samplePosMult * frameSizeRCP;

		/* Read and sum up the contribution of that pixel, weighted */
		sum += texture2D(texture, uv + offset) * w;
		weightSum += w;
	}

	/* Return the sum, divided by the total weight (normalization) */
	gl_FragColor = (sum / weightSum) * bloomStrength;
}
WebGL Javascript gaussianSeparableBlur.js
import * as util from '../utility.js'

export async function setupGaussianSeparableBlur() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-GaussianSeparableBlur');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	/* State and Objects */
	const ctx = {
		/* State for of the Rendering */
		mode: "scene",
		passMode: "pass1",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		/* Textures */
		tex: { sdr: null, selfIllum: null, frame: null, frameIntermediate: null, frameFinal: null },
		/* Framebuffers */
		fb: { scene: null, intermediate: null, final: null },
		/* Shaders and their respective Resource Locations */
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, sigma: null, bloomStrength: null, direction: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			sigma: WebGLBox.querySelector('#sigmaRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[name="modeGaussSep"]'),
			passModes: WebGLBox.querySelectorAll('input[name="passMode"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#renderer'),
			passMode: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#passMode'),
			iterTime: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	/* Shaders */
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const gaussianBlurFrag = await util.fetchShader("shader/gaussianBlurSeparable.fs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.sigma.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Force set to scene to fix a reload bug in Firefox Android */
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});
	
	/* Pass Mode */
	ui.rendering.passModes.forEach(radio => {
		/* Force set to pass1 to fix a reload bug in Firefox Android */
		if (radio.value === "pass1")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.passMode = event.target.value;
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		/* spin up the Worker (ES-module) */
		const worker = new Worker("./js/benchmark/gaussianSeparableBlurBenchmark.js", { type: "module" });

		/* pass all data the worker needs */
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: gaussianBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value,
			sigma: ui.blur.sigma.value,
			passMode: ctx.passMode
		});

		/* Benchmark */
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.passMode.textContent = event.data.passMode;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	/* Draw Texture Shader */
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	/* Draw bloom Shader */
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);


	/* Helper for recompilation */
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, gaussianBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength", "sigma", "direction"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	/* Blur Shader */
	reCompileBlurShader(ui.blur.kernelSize.value)

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.intermediate);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.intermediate, ctx.tex.frameIntermediate] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		// Clear intermediate texture to prevent lazy initialization warnings
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate);
		gl.clearColor(0.0, 0.0, 0.0, 1.0);
		gl.clear(gl.COLOR_BUFFER_BIT);


		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* UI Stats */
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		/* Separable blur: pass1/pass2 = 1 pass, combined = 2 passes */
		const samplesPerPixel = ctx.passMode == "combined" ? KernelSizeSide * 2 : KernelSizeSide;
		const tapsNewText = (canvas.width * canvas.height * samplesPerPixel / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		/* Setup PostProcess Framebuffer */
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		/* Draw Call */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		/* Separable Gaussian blur implementation */
		gl.useProgram(ctx.shd.blur.handle);
		
		if (ctx.passMode == "pass1") {
			/* Pass 1 only: Horizontal blur directly to screen */
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 1.0, 0.0); // Horizontal direction
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		} else if (ctx.passMode == "pass2") {
			/* Pass 2 only: Vertical blur directly to screen */
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 0.0, 1.0); // Vertical direction
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		} else {
			/* Combined: Two-pass separable blur */
			/* Pass 1: Horizontal blur to intermediate buffer */
			gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 1.0, 0.0); // Horizontal direction
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
			
			/* Pass 2: Vertical blur to final destination */
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 0.0, 1.0); // Vertical direction
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameIntermediate);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		if (ctx.mode == "bloom") {
			/* Now do the bloom composition to the screen */
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		/* Ask for CPU-GPU Sync to prevent overloading the GPU during compositing.
		   In reality this is more likely to be flush, but still, it seems to
		   help on multiple devices with during low FPS */
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameIntermediate); ctx.tex.frameIntermediate = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.intermediate); ctx.fb.intermediate = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

If you benchmark the performance, you will see a massive performance uplift, as compared to our Gaussian Blur! But there is a trade-off made, that’s not quite obvious. In order to have two passes, we are writing out a new framebuffer. Remember the “modern chips are fast but memory access in relation is not” thing?

With a modern High-res 4k screen video game, multi-pass anything implies writing out 8.2 Million Pixels to memory, just to read them back in. With smaller kernels on high-res displays, a separable kernel may not always be faster. But with bigger kernels, it almost always is. With a massive speed-up gained, how much faster can we go?

The magic of frequency space #

…how about blurs that happen so fast, that they are considered free! We are doing a bit of a detour into Frequency Space image manipulation.

Any 2D image can be converted and edited in frequency space, which unlocks a whole new sort of image manipulation. To blur an image in this paradigm, we perform an image Fast Fourier Transform, then mask high frequency areas to perform the blur and finally do the inverse transformation.

A Fourier Transform decomposes a signal into its underlying Sine Frequencies. The output of an image Fast Fourier Transform are “Magnitude” and “Phase” component images. These images can be combined back together with the inverse image FFT to produce the original image again…

FFT Viz Input image
Input image for the following interactive FFT example
The green stripes are not an error, they are baked into the image on purpose.

…but before doing so, we can manipulate the frequency representation of the image in various ways. Less reading, more interaction! In the following interactive visualization you have the magnitude image, brightness boosted into a human visible representation on the left and the reconstructed image on the right.

For now, play around with removing energy. You can paint on the magnitude image with your fingers or with the mouse. The output image will be reconstructed accordingly. Also, play around with the circular mask and the feathering sliders. Try to build intuition for what’s happening.

frequencyCutRadius
frequencyCutRadiusoff
feather
feather0

The magnitude image represents the frequency make-up of the image, with the lowest frequencies in the middle and higher at the edges. Horizontal frequencies (vertical features in the image) follow the X Axis and vertical frequencies (Horizontal features in the image) follow the Y Axis, with in-betweens being the diagonals.

Repeating patterns in the image lighten up as bright-points in the magnitude representation. Or rather, their frequencies have high energy: E.g. the green grid I added. Removing it in photoshop wouldn’t be easy! But in frequency space it is easy! Just paint over the blueish 3 diagonal streaks.

Removing repeating features by finger-painting black over frequencies still blows me away.

As you may have noticed, the Magnitude representation holds mirrored information. This is due to the FFT being a complex number analysis and our image having only “real” component pixels, leaving redundant information. The underlying number theory was covered in great detail by 3Blue1Brown:

But what is the Fourier Transform? A visual introduction.
YouTube Video by 3Blue1Brown

The underlying code this time is not written by me, but is from @turbomaze’s repo JS-Fourier-Image-Analysis. There is no standard on how you are supposed to plot the magnitude information and how the quadrants are layed out. I changed the implementation by @turbomaze to follow the convention used by ImageMagick.

We can blur the image by painting the frequency energy black in a radius around the center, thus eliminating higher frequencies and blurring the image. If we do so with a pixel perfect circle, then we get ringing artifacts - The Gibbs phenomenon. By feathering the circle, we lessen this ringing and the blur cleans up.

Drawing a circle like this? That's essentially free on the GPU! We get the equivalent of get super big kernels for free!

But not everything is gold that glitters. First of all, performance. Yes, the “blur” in frequency space is essentially free, but the trip to frequency space, is everything but. The main issue comes down to FFT transformations performing writes to exponentially many pixels per input pixel, a performance killer.

And then there's still the inverse conversion!

But our shaders work the other way around, expressing the “instructions to construct an output pixel”. There are fragment shader based GPU implementations, but they rely on many passes for calculation, a lot of memory access back and forth. Furthermore, non-power of two images require a slower algorithm.

This article is in the realm of fragment shaders and the graphics pipeline a GPU is part of, but there are also GPGPU and compute shader implementations with no fragment shader specific limitations. Unfortunately the situation remains: Conversion of high-res images to frequency space is too costly in the context of realtime graphics.

Deleting the frequencies of that grid is magical, but leaves artifacts. In reality it's worse, as my example is idealized. Click Upload Image, take a photo of a repeating pattern and see how cleanly you can get rid of it.

Then there are the artifacts I have glossed over. The FFT transformation considers the image as an infinite 2D signal. By blurring, we are bleeding through color from the neighbor copies. And that’s not to mention various ringing artifacts that happen. None of this is unsolvable! But there a more underlying issue…

What is a Low-Pass filter? #

It's a filter that removes high frequencies and leaves the low ones, easy!

Try the FFT Example again and decrease the frequencyCutRadius to blur. At some point the green lines disappear, right? It is a low pass filter, one where high frequencies are literally annihilated. Small bright lights in the distance? Also annihilated…

frequencyCutRadius
frequencyCutRadiusoff
feather
feather0

If we were to use this to build an effect like bloom, it would remove small lights that are meant to bloom as well! Our gaussian blur on the other hand, also a low-pass filter, samples and weights every pixel. In a way it “takes the high frequency energy and spreads it into low frequency energy”.

So Low Pass Filter ≠ Low Pass Filter, it depends on context as to what is meant by that word and the reason the article didn’t use it until now. Frequency Space energy attenuations are simply not the correct tool for our goal of a “basic graphics programming building block” for visual effects.

This is a deep misunderstanding I held for year, as in why didn't video games such a powerful tool?

There are other frequency space image representations, not just FFT Magnitude + Phase. Another famous one is Discrete cosine transform. Again, computerphile covered it in great detail in a video. As for realtime hires images, no. DCT conversion is multiple magnitudes slower. Feel free to dive deeper into frequency space…

JPEG DCT, Discrete Cosine Transform (JPEG Pt2)
YouTube Video by Computerphile

…as for this article, it’s the end of our frequency space detour. We talked so much about what’s slow on the GPU. Let’s talk about something that’s not just fast, but free:

Bilinear Interpolation #

Reading from textures comes with a freebie. When reading between pixels, the closet four pixel are interpolated bilinearly to create the final read, unless you switch to Nearest Neightbor mode. Below you can drag the color sample with finger touch or the mouse. Take note of how and when the color changes in the respective modes.

Since reading between pixels gets a linear mix of pixel neighbors, we can linearly interpolate part of our gaussian kernel, sometimes called a Linear Gaussian. By tweaking gaussian weights and reducing the amount of samples we could do a 7 × 7 gaussian kernel worth of information with only a 4 × 4 kernel, as shown in the linked article.

Though mathematically not the same, visually the result is very close. There are a lot of hand-crafted variations on this, different mixes of kernel sizes and interpolation amounts.

Bilinear interpolation allows us to resize an image by reading from it at lower resolution. In a way, it’s a free bilinear resize built into every graphics chip, zero performance impact. But there is a limit - the bilinear interpolation is limited to a 2 × 2 sample square. Try to resize the kiwi below in different modes.

To make this more obvious, the following canvas renders at 25% of native resolution.
❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
kiwiSize
kiwiSize100 %
WebGL Vertex Shader circleAnimationSize.vs
/* Our Vertex data for the Quad */
attribute vec2 vtx;
varying vec2 uv;

/* Position offset for the animation */
uniform vec2 offset;
uniform float kiwiSize;

void main()
{
	/* Make the texture Coordinates read in the fragment shader coordinates */
	uv = vtx * vec2(0.5, -0.5) + 0.5;
	
	/* Animate Quad in a circle */
	gl_Position = vec4(vtx * kiwiSize + offset, 0.0, 1.0);
}
WebGL Fragment Shader simpleTexture.fs
precision highp float;
varying vec2 uv;

uniform sampler2D texture;

void main() {
	gl_FragColor = texture2D(texture, uv);
}
WebGL Javascript bilinear.js
import * as util from './utility.js'

export async function setupBilinear() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-Bilinear');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;
	
	/* Resolution divider for framebuffer rendering */
	const resDiv = 4; // Hardcoded quarter resolution
	let renderFramebuffer, renderTexture;
	let buffersInitialized = false;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: true,
	});

	/* State and Objects */
	const ctx = {
		mode: "nearest",
		flags: { isRendering: false, initComplete: false },
		/* Textures */
		tex: { sdr: null },
		/* Shaders and their respective Resource Locations */
		shd: {
			kiwi: { handle: null, uniforms: { offset: null, kiwiSize: null } },
			blit: { handle: null, uniforms: { texture: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg'),
			contextLoss: canvas.parentElement.querySelector('div'),
		},
		rendering: {
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			animate: WebGLBox.querySelector('#animateCheck'),
			kiwiSize: WebGLBox.querySelector('#kiwiSize'),
		}
	};

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Force set to nearest to fix a reload bug in Firefox Android */
		if (radio.value === "nearest")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			if (!ui.rendering.animate.checked) redraw();
		});
	});


	/* Shaders */
	const circleAnimationSize = await util.fetchShader("shader/circleAnimationSize.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.rendering.kiwiSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	/* Draw Texture Shader */
	ctx.shd.kiwi = util.compileAndLinkShader(gl, circleAnimationSize, simpleTexture, ["offset", "kiwiSize"]);
	
	/* Blit Shader for upscaling */
	ctx.shd.blit = util.compileAndLinkShader(gl, simpleQuad, simpleTexture, ["texture"]);
	
	/* Set initial shader state */
	gl.useProgram(ctx.shd.kiwi.handle);

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	/* THis genius workaround is based on @Kaiido's: https://stackoverflow.com/a/69385604/6240779 */
	function loadSVGAsImage(blob) {
		return new Promise((resolve) => {
			const img = new Image();
			const url = URL.createObjectURL(blob);
			
			img.onload = () => {
				URL.revokeObjectURL(url);
				resolve(img);
			};
			
			img.src = url;
		});
	}

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		buffersInitialized = true;
		ctx.flags.initComplete = false;

		/* Create framebuffer for quarter resolution rendering */
		gl.deleteFramebuffer(renderFramebuffer);
		renderFramebuffer = gl.createFramebuffer();
		gl.bindFramebuffer(gl.FRAMEBUFFER, renderFramebuffer);

		/* Create RGBA framebuffer texture manually to preserve alpha */
		gl.deleteTexture(renderTexture);
		renderTexture = gl.createTexture();
		gl.bindTexture(gl.TEXTURE_2D, renderTexture);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
		gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, canvas.width / resDiv, canvas.height / resDiv, 0, gl.RGBA, gl.UNSIGNED_BYTE, null);
		gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, renderTexture, 0);
		buffersInitialized = true;

		/* Load kiwi texture */
		let base = await fetch("img/kiwi4by3.svg");
		let baseBlob = await base.blob();
		let baseImage = await loadSVGAsImage(baseBlob);
		let baseBitmap = await createImageBitmap(baseImage, { resizeWidth: canvas.width / resDiv, resizeHeight: canvas.height / resDiv, colorSpaceConversion: 'none', resizeQuality: "high" });

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.NEAREST, baseBitmap, 4);

		baseBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	async function redraw() {
		if (!buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* Pass 1: Render to framebuffer at reduced resolution */
		gl.viewport(0, 0, canvas.width / resDiv, canvas.height / resDiv);
		if (!renderFramebuffer) return;
		gl.bindFramebuffer(gl.FRAMEBUFFER, renderFramebuffer);
		gl.clear(gl.COLOR_BUFFER_BIT);
		
		/* Use kiwi shader */
		gl.useProgram(ctx.shd.kiwi.handle);
		
		/* Bind kiwi texture and set filtering mode */
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, ctx.mode == "nearest" ? gl.NEAREST : gl.LINEAR);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, ctx.mode == "nearest" ? gl.NEAREST : gl.LINEAR);

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		
		gl.uniform2fv(ctx.shd.kiwi.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.kiwi.uniforms.kiwiSize, ui.rendering.kiwiSize.value);

		/* Draw kiwi to framebuffer */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.bindFramebuffer(gl.FRAMEBUFFER, null);
		
		/* Use blit shader */
		gl.useProgram(ctx.shd.blit.handle);
		
		/* Bind framebuffer texture with nearest neighbor for pixelated upscaling */
		if (!renderTexture) return;
		gl.bindTexture(gl.TEXTURE_2D, renderTexture);
		
		/* Draw full-screen quad to upscale */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;
			gl.viewport(0, 0, canvas.width, canvas.height);

			stopRendering();
			startRendering();
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(renderTexture); renderTexture = null;
		gl.deleteFramebuffer(renderFramebuffer); renderFramebuffer = null;
		buffersInitialized = false;
		ctx.flags.initComplete = false;
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

Nearest Neightbor looks pixelated, if the size is not at 100% size, which is equivalent to 1:1 pixel mapping. At 100% it moves “jittery”, as it “snaps” to the nearest neighbor. Bilinear keeps things smooth, but going below 50%, especially below 25%, we get exactly the same kind of aliasing, as we would get from nearest neighbor!

You may have noticed similar aliasing when playing YouTube Videos at a very high manually selected video resolution, but in a small window. Same thing!

With 2 × 2 samples, we start skipping over color information, if the underlying pixels are smaller than half a pixel in size. Below 50% size, our bilinear interpolation starts to act like nearest neighbor interpolation. So as a result, we can shrink image in steps of 50%, without “skipping over information” and creating aliasing. Let’s use that!

Downsampling #

One fundamental thing thing you can do in post-processing is to shrink “downsample” first, perform the processing at a lower resolution and upsample again. With the idea being, that you wouldn’t notice the lowered resolution. Below is the Separable Gaussian Blur again, with a variable downsample / upsample chain.

Each increase of downSample adds a 50% scale step. Let’s visualize the framebuffers in play, as it gets quite complex. Here is an example of a square 1024 px² image, a downSample of 2 and our two pass separable Gaussian blur.

Downsample and Blur Framebuffers
Framebuffers and their sizes, as used during the downsample + blur chain
One unused optimization is that the blur can read straight from the 512 px² framebuffer and output the 256 px² directly, skipping one downsample step.

Below you have the option to skip part of the downsample or part of the upsample chain, if you have downSample set to higher than 1. What may not be quite obvious is why we also upsample in steps. Play around with all the dials and modes, to get a feel for what’s happening.

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
FPS:?/?ms Resolution:?x?Texture Taps:?
kernelSize
kernelSize7x7 px
samplePosMultiplier
samplePosMultiplier100 %
downSample
downSample0
lightBrightness
lightBrightness100 %
sigma
sigma±2.00σ
Blur Fragment Shader gaussianSeparableBlur.fs
/* Float precision to highp, if supported. Large Kernel Sizes result many color
   contributions and thus require the highest precision to avoid clipping.
   Required in WebGL 1 Shaders and depending on platform may have no effect */
precision highp float;
/* UV coordinates, passed in from the Vertex Shader */
varying vec2 uv;

uniform vec2 frameSizeRCP; /* Resolution Reciprocal */
uniform float samplePosMult; /* Multiply to push blur strength past the kernel size */
uniform float sigma;
uniform vec2 direction; /* Direction vector: (1,0) for horizontal, (0,1) for vertical */

uniform float bloomStrength; /* bloom strength */

uniform sampler2D texture;
/* `KERNEL_SIZE` added during compilation */
const int kernel_size = KERNEL_SIZE;

float gaussianWeight(float x, float sigma)
{
	/* x² / 2 σ² */
	return exp(-(x * x) / (2.0 * sigma * sigma));
}

void main() {
	/* Variable to hold our final color for the current pixel */
	vec4 sum = vec4(0.0);
	/* Sum of all weights */
	float weightSum = 0.0;
	
	/* How big one side of the sampled line is */
	const int size = 2 * kernel_size + 1;

	/* Sample along the direction vector (horizontal or vertical) */
	for (int i = -kernel_size; i <= kernel_size; ++i) {
		/* Calculate the required weight for this 1D sample */
		float w = gaussianWeight(float(i), sigma);
		
		/* Offset from the current pixel along the specified direction */
		vec2 offset = vec2(i) * direction * samplePosMult * frameSizeRCP;

		/* Read and sum up the contribution of that pixel, weighted */
		sum += texture2D(texture, uv + offset) * w;
		weightSum += w;
	}

	/* Return the sum, divided by the total weight (normalization) */
	gl_FragColor = (sum / weightSum) * bloomStrength;
}
WebGL Javascript downsample.js
import * as util from '../utility.js'

/* Quick note about the implementation. In here I use one framebuffer for each size step.
   In actually, this is *not* how it should be done. Graphics pipelines have "Mip Maps"
   for textures and we actually can use one texture and render each downsample step into
   each mip-map size. Then we also get a nice blending slider between the size provided
   by the hardware as well!
   
   That is unfortunately not possible in WebGL 1.0 ( ; __ ;   )
   I went with WebGL 1.0 for maximum compatibility across devices. */

export async function setupGaussianDownsampleBlur() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-GaussianDownsampleBlur');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	/* State and Objects */
	const ctx = {
		/* State for of the Rendering */
		mode: "scene",
		skipMode: "normal",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		/* Textures */
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null, down: [], intermediate: [], nativeIntermediate: null },
		/* Framebuffers */
		fb: { scene: null, final: null, down: [], intermediate: [], nativeIntermediate: null },
		/* Shaders and their respective Resource Locations */
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			passthrough: { handle: null },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, sigma: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			sigma: WebGLBox.querySelector('#sigmaRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
			downSample: WebGLBox.querySelector('#downSampleRange'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			skipModes: WebGLBox.querySelectorAll('input[name="skipMode"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#renderer'),
			skipMode: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#skipMode'),
			iterTime: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	/* Shaders */
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const gaussianBlurFrag = await util.fetchShader("shader/gaussianBlurSeparable.fs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.sigma.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.downSample.addEventListener('input', () => { 
		updateSkipModeControls();
		if (!ui.rendering.animate.checked) redraw() 
	});

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Skip skipMode radio buttons */
		if (radio.name === "skipMode") return;
		
		/* Force set to scene to fix a reload bug in Firefox Android */
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	/* Skip Mode */
	ui.rendering.skipModes.forEach(radio => {
		/* Force set to normal to fix a reload bug in Firefox Android */
		if (radio.value === "normal")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.skipMode = event.target.value;
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	/* Helper function to update skip mode controls */
	function updateSkipModeControls() {
		const hasIntermediarySteps = ui.blur.downSample.value > 1;
		ui.rendering.skipModes.forEach(radio => {
			radio.disabled = !hasIntermediarySteps;
		});
		/* Reset to normal if disabled */
		if (!hasIntermediarySteps && ctx.skipMode !== "normal") {
			ctx.skipMode = "normal";
		}
		/* Always sync UI radio buttons with current ctx.skipMode */
		ui.rendering.skipModes.forEach(radio => {
			radio.checked = (radio.value === ctx.skipMode);
		});
	}

	/* Initialize skip mode controls */
	updateSkipModeControls();

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		/* spin up the Worker (ES-module) */
		const worker = new Worker("./js/benchmark/downsampleBenchmark.js", { type: "module" });

		/* pass all data the worker needs */
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: gaussianBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value,
			sigma: ui.blur.sigma.value,
			downSample: ui.blur.downSample.value,
			skipMode: ctx.skipMode
		});

		/* Benchmark */
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.skipMode.textContent = event.data.skipMode;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	/* Draw Texture Shader */
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	/* Draw bloom Shader */
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	/* Simple Passthrough */
	ctx.shd.passthrough = util.compileAndLinkShader(gl, simpleQuad, simpleTexture);

	/* Helper for recompilation */
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, gaussianBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength", "sigma", "direction"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	/* Blur Shader */
	reCompileBlurShader(ui.blur.kernelSize.value)

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		
		/* Add native resolution intermediate buffer for separable blur */
		gl.deleteFramebuffer(ctx.fb.nativeIntermediate);
		gl.deleteTexture(ctx.tex.nativeIntermediate);
		[ctx.fb.nativeIntermediate, ctx.tex.nativeIntermediate] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		const maxDown = ui.blur.downSample.max;
		for (let i = 0; i < ui.blur.downSample.max; ++i) {
			gl.deleteFramebuffer(ctx.fb.down[i]);
			gl.deleteTexture(ctx.tex.down[i]);
			gl.deleteFramebuffer(ctx.fb.intermediate[i]);
			gl.deleteTexture(ctx.tex.intermediate[i]);
		}
		ctx.fb.down = [];
		ctx.tex.down = [];
		ctx.fb.intermediate = [];
		ctx.tex.intermediate = [];

		let w = canvas.width, h = canvas.height;
		for (let i = 0; i < maxDown; ++i) {
			w = Math.max(1, w >> 1);
			h = Math.max(1, h >> 1);
			const [fb, tex] = util.setupFramebuffer(gl, w, h);
			ctx.fb.down.push(fb);
			ctx.tex.down.push(tex);
			const [intermediateFb, intermediateTex] = util.setupFramebuffer(gl, w, h);
			ctx.fb.intermediate.push(intermediateFb);
			ctx.tex.intermediate.push(intermediateTex);
		}

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	/* Perform separable blur: horizontal pass followed by vertical pass */
	function performSeparableBlur(srcTexture, targetFB, width, height, intermediateFB, intermediateTex, bloomStrength) {
		gl.useProgram(ctx.shd.blur.handle);
		
		/* Set common uniforms */
		gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / width, 1.0 / height);
		gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
		gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
		gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, bloomStrength);
		
		/* Horizontal pass */
		gl.bindFramebuffer(gl.FRAMEBUFFER, intermediateFB);
		gl.viewport(0, 0, width, height);
		gl.uniform2f(ctx.shd.blur.uniforms.direction, 1.0, 0.0); // Horizontal
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, srcTexture);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		
		/* Vertical pass */
		gl.bindFramebuffer(gl.FRAMEBUFFER, targetFB);
		gl.viewport(0, 0, width, height);
		gl.uniform2f(ctx.shd.blur.uniforms.direction, 0.0, 1.0); // Vertical
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, intermediateTex);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* UI Stats */
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		const effectiveRes = [Math.max(1, canvas.width >> +ui.blur.downSample.value), Math.max(1, canvas.height >> +ui.blur.downSample.value)];
		const tapsNewText = (effectiveRes[0] * effectiveRes[1] * KernelSizeSide * 2 / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		/* Setup PostProcess Framebuffer */
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		/* Draw Call */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		/* down-sample chain */
		const levels = ui.blur.downSample.value;
		let srcTex = ctx.tex.frame;
		let w = canvas.width, h = canvas.height;

		if (levels > 0) {
			if (ctx.skipMode === "skipDown") {
				/* Skip downsample steps: jump directly to target level and blur */
				const lastDownsampleFB = ctx.fb.down[levels - 1];
				const lastIntermediateFB = ctx.fb.intermediate[levels - 1];
				const lastIntermediateTex = ctx.tex.intermediate[levels - 1];
				/* Calculate target resolution directly */
				w = Math.max(1, canvas.width >> levels);
				h = Math.max(1, canvas.height >> levels);
				const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
				
				performSeparableBlur(srcTex, lastDownsampleFB, w, h, lastIntermediateFB, lastIntermediateTex, bloomStrength);
				srcTex = ctx.tex.down[levels - 1];
			} else {
				/* Normal mode: Downsample up to the second to last level */
				gl.useProgram(ctx.shd.passthrough.handle);
				for (let i = 0; i < levels - 1; ++i) {
					const fb = ctx.fb.down[i];
					w = Math.max(1, w >> 1);
					h = Math.max(1, h >> 1);

					gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
					gl.viewport(0, 0, w, h);

					gl.activeTexture(gl.TEXTURE0);
					gl.bindTexture(gl.TEXTURE_2D, srcTex);
					gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
					gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
					srcTex = ctx.tex.down[i];
				}

				/* Blur into the last downsample buffer */
				const lastDownsampleFB = ctx.fb.down[levels - 1];
				const lastIntermediateFB = ctx.fb.intermediate[levels - 1];
				const lastIntermediateTex = ctx.tex.intermediate[levels - 1];
				w = Math.max(1, w >> 1);
				h = Math.max(1, h >> 1);
				const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
				
				performSeparableBlur(srcTex, lastDownsampleFB, w, h, lastIntermediateFB, lastIntermediateTex, bloomStrength);
				srcTex = ctx.tex.down[levels - 1];
			}
		} else {
			/* Run Gaussian blur at native resolution when no downsample */
			const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			
			performSeparableBlur(srcTex, ctx.fb.final, canvas.width, canvas.height, ctx.fb.nativeIntermediate, ctx.tex.nativeIntermediate, bloomStrength);
			srcTex = ctx.tex.frameFinal;
		}

		/* Upsample chain */
		if (levels > 0) {
			if (ctx.skipMode === "skipUp") {
				/* Skip upsample steps: srcTex stays at the lowest resolution */
				/* Final pass will handle upscaling to full resolution */
			} else {
				/* Normal mode: Upsample through the mip levels */
				gl.useProgram(ctx.shd.passthrough.handle);
				for (let i = levels - 2; i >= 0; i--) {
					const fb = ctx.fb.down[i];
					let upsampleW = Math.max(1, canvas.width >> (i + 1));
					let upsampleH = Math.max(1, canvas.height >> (i + 1));
					gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
					gl.viewport(0, 0, upsampleW, upsampleH);
					gl.activeTexture(gl.TEXTURE0);
					gl.bindTexture(gl.TEXTURE_2D, srcTex);
					gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
					gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
					srcTex = ctx.tex.down[i];
				}
			}
		}

		/* Final pass to present to screen (with upscaling if needed) */
		/* Skip final pass in bloom mode with no downsampling to avoid feedback loop */
		if (!(ctx.mode == "bloom" && levels == 0)) {
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.useProgram(ctx.shd.passthrough.handle);
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, srcTex);
			gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		if (ctx.mode == "bloom") {
			/* Now do the bloom composition to the screen */
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		/* Ask for CPU-GPU Sync to prevent overloading the GPU during compositing.
		   In reality this is more likely to be flush, but still, it seems to
		   help on multiple devices with during low FPS */
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		gl.deleteFramebuffer(ctx.fb.nativeIntermediate); ctx.fb.nativeIntermediate = null;
		gl.deleteTexture(ctx.tex.nativeIntermediate); ctx.tex.nativeIntermediate = null;
		for (let i = 0; i < ui.blur.downSample.max; ++i) {
			gl.deleteTexture(ctx.tex.down[i]);
			gl.deleteFramebuffer(ctx.fb.down[i]);
			gl.deleteTexture(ctx.tex.intermediate[i]);
			gl.deleteFramebuffer(ctx.fb.intermediate[i]);
		}
		ctx.tex.down = [];
		ctx.fb.down = [];
		ctx.tex.intermediate = [];
		ctx.fb.intermediate = [];
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

With each downsample step, our kernel covers more and more area, thus increasing the blur radius. Performance is again, a massive lift up, as we quadratically get less and less pixels to blur with each step. We get bigger blurs with the same kernelSize and with stronger blurs in Scene mode, the resolution drop is not visible.

With smaller blurs you will get “shimmering”, as aliasing artifacts begin, even with our bilinear filtering in place. Small blurs and lower resolution don’t mix. This is especially painful in bloom mode with strong lightBrightness, as lights will start to “turn on and off” as they are not resolved correctly at lower resolutions.

There must be some kind of sweet spot of low resolution and blur strong enough to hide the the low resolution.

Skipping Downsample steps will bring obviously horrible aliasing. As for upsampling, there is a deep misunderstanding I held for years, until I read the SIGGRAPH 2014 presentation Next Generation Post Processing in Call of Duty: Advanced Warfare by graphics magician Jorge Jimenez. One page stuck out to me:

Page 159 from presentation Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez
Page 159 from presentation Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez

With upsampling, even when going from low res to high res in one jump, we aren’t “skipping” any information, right? Nothing is missed. But if you look closely at the above demo with larger downSample chains with Skip Upsample Steps mode, then you will see a vague grid like artifact appearing, especially with strong blurs.

Nearest Neightbor Interpolation Bilinear Neightbor Interpolation
Visualization of the bilinear interpolation (Source)

How to keeps things smooth when upsampling is the field of “Reconstruction filters”. By skipping intermediate upsampling steps we are performing a 2 × 2 sample bilinear reconstruction of very small pixels. As a result we get the bilinear filtering characteristic pyramid shaped hot spots. How we upscale matters.

Smooth Blur Animation #

One fundamental challenge with advanced blur algorithms is that it becomes challenging to get smooth blur sliders and smooth blur strength animations. Eg. with our separable gaussian blur, you could set kernelSize to the maximum required and adjust samplePosMultiplier smoothly between 0% and 100%.

With downsampling in the picture, this becomes more difficult and solutions to this are very context dependant, so we won’t dive into it. One approach you see from time to time is to simply give up on animating blur strength and blend between a blurred and unblurred version of the scene, as shown below. Visually, not very pleasing.

blurMix
blurMix80%

Kawase Blur #

Now we get away from the classical blur approaches. It’s the early 2000s and graphics programmer Masaki Kawase, today senior graphics programmer at Tokyo based company Silicon Studio, is programming the Xbox video game DOUBLE-S.T.E.A.L, a game with vibrant post-processing effects.

During the creation of those visual effects, Masaki Kawase used a new blurring technique that he presented in the 2003 Game Developers Conference talk Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (Wreckless). This technique became later referred to as the “Kawase Blur”. Let’s take a look at it:

kawase blur filter pattern kawase blur filter pattern
Sample placement in what later become known as the "Kawase Blur"
Excerpt from GDC presentation Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (2003)

This technique does not have a kernelSize parameter anymore. It works in passes of 4 equally weighted samples, placed diagonally from the center output pixel, in the middle where the corners of 4 pixels touch. These samples get color contributions equally from their neighbors, due to bilinear filtering.

This is new, there is no center pixel sample and, except for the required normalization, no explicit weights! The weighting happens as a result of bilinear filtering.

After a pass is complete, that pass is used as the input to the next pass, where the outer 4 diagonal samples increase in distance by one pixel length. With each pass, this distance grows. Two framebuffers are required for this, which switch between being input and output between passes. This setup is often called “ping-ponging”.

L The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
FPS:?/?ms Resolution:?x?Texture Taps:?
iterations
iterations3
samplePosMultiplier
samplePosMultiplier100%
lightBrightness
lightBrightness100%
WebGL Fragment Shader kawase.fs
/* Float precision to highp, if supported. Large iterations result many color
   contributions and thus require the highest precision to avoid clipping.
   Required in WebGL 1 Shaders and depending on platform may have no effect */
precision highp float;
/* UV coordinates, passed in from the Vertex Shader */
varying vec2 uv;

uniform vec2 frameSizeRCP; /* Resolution Reciprocal */
uniform float samplePosMult; /* Multiply to push blur strength past the pixel offset */
uniform float pixelOffset; /* Pixel offset for this Kawase iteration */
uniform float bloomStrength; /* bloom strength */

uniform sampler2D texture;

void main() {
	/* Kawase blur samples 4 corners in a diamond pattern */
	vec2 o = vec2(pixelOffset + 0.5) * samplePosMult * frameSizeRCP;
	
	/* Sample the 4 diagonal corners with equal weight */
	vec4 color = vec4(0.0);
	color += texture2D(texture, uv + vec2( o.x,  o.y)); /* top-right */
	color += texture2D(texture, uv + vec2(-o.x,  o.y)); /* top-left   */
	color += texture2D(texture, uv + vec2(-o.x, -o.y)); /* bottom-left */
	color += texture2D(texture, uv + vec2( o.x, -o.y)); /* bottom-right */
	color /= 4.0;
	
	/* Apply bloom strength and output */
	gl_FragColor = color * bloomStrength;
}
WebGL Javascript kawase.js
import * as util from '../utility.js'

export async function setupKawaseBlur() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-KawaseBlur');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	/* State and Objects */
	const ctx = {
		/* State for of the Rendering */
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		/* Textures */
		tex: { sdr: null, selfIllum: null, frame: null, frameIntermediate1: null, frameIntermediate2: null, frameFinal: null },
		/* Framebuffers */
		fb: { scene: null, intermediate1: null, intermediate2: null, final: null },
		/* Shaders and their respective Resource Locations */
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			kawase: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, bloomStrength: null, pixelOffset: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			iterations: WebGLBox.querySelector('#iterationsRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[name="modeKawase"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#renderer'),
			kawaseIterations: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#kawaseIterations'),
			iterTime: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	/* Shaders */
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const kawaseFrag = await util.fetchShader("shader/kawase.fs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.blur.iterations.addEventListener('input', () => { 
		// Lock/unlock samplePos based on iterations
		const iterations = parseInt(ui.blur.iterations.value);
		ui.blur.samplePos.disabled = iterations === 0;
		ui.blur.samplePosReset.disabled = iterations === 0;
		if (!ui.rendering.animate.checked) redraw() 
	});
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Force set to scene to fix a reload bug in Firefox Android */
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		/* spin up the Worker (ES-module) */
		const worker = new Worker("./js/benchmark/kawaseBenchmark.js", { type: "module" });

		/* pass all data the worker needs */
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			kawaseShaderSrc: kawaseFrag,
			kawaseIterations: ui.blur.iterations.value,
			samplePos: ui.blur.samplePos.value
		});

		/* Benchmark */
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.kawaseIterations.textContent = event.data.kawaseIterations;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	/* Draw Texture Shader */
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	/* Draw bloom Shader */
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	/* Kawase Blur Shader */
	ctx.shd.kawase = util.compileAndLinkShader(gl, simpleQuad, kawaseFrag, ["frameSizeRCP", "samplePosMult", "pixelOffset", "bloomStrength"]);

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.intermediate1);
		gl.deleteFramebuffer(ctx.fb.intermediate2);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.intermediate1, ctx.tex.frameIntermediate1] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.intermediate2, ctx.tex.frameIntermediate2] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		// Clear intermediate textures to prevent lazy initialization warnings
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate1);
		gl.clearColor(0.0, 0.0, 0.0, 1.0);
		gl.clear(gl.COLOR_BUFFER_BIT);
		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate2);
		gl.clearColor(0.0, 0.0, 0.0, 1.0);
		gl.clear(gl.COLOR_BUFFER_BIT);

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* UI Stats */
		const iterations = parseInt(ui.blur.iterations.value);
		/* Kawase blur: 4 samples per iteration, 0 iterations = no blur (1 sample) */
		const samplesPerPixel = iterations === 0 ? 1 : iterations * 4;
		const tapsNewText = (canvas.width * canvas.height * samplesPerPixel / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		/* Setup PostProcess Framebuffer */
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		/* Draw Call */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		/* Handle 0 iterations case - direct copy to output */
		if (iterations === 0) {
			/* Direct copy from scene to final destination using Kawase shader with no offset */
			const finalFB = ctx.mode === "bloom" ? ctx.fb.final : null; // null = screen
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			
			/* Use Kawase shader with pixelOffset=0 and samplePosMult=0 for simple copy */
			gl.useProgram(ctx.shd.kawase.handle);
			gl.uniform2f(ctx.shd.kawase.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.kawase.uniforms.samplePosMult, 0.0); // No offset
			gl.uniform1f(ctx.shd.kawase.uniforms.pixelOffset, 0.0); // No offset
			gl.uniform1f(ctx.shd.kawase.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		} else {
			/* Kawase Blur implementation - iterative ping-pong between framebuffers */
			gl.useProgram(ctx.shd.kawase.handle);
			gl.uniform2f(ctx.shd.kawase.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.kawase.uniforms.samplePosMult, ui.blur.samplePos.value);
			
			/* Apply brightness only on final iteration to match Gaussian behavior */

			let currentInputTex = ctx.tex.frame;
			let currentInputFB = ctx.fb.scene;
			
			for (let i = 0; i < iterations; i++) {
			/* Determine output framebuffer */
			let outputFB, outputTex;
			if (i === iterations - 1) {
				/* Last iteration - output to final destination */
				outputFB = ctx.mode === "bloom" ? ctx.fb.final : null; // null = screen
			} else {
				/* Intermediate iterations - ping-pong between buffers */
				if (i % 2 === 0) {
					outputFB = ctx.fb.intermediate1;
					outputTex = ctx.tex.frameIntermediate1;
				} else {
					outputFB = ctx.fb.intermediate2;
					outputTex = ctx.tex.frameIntermediate2;
				}
			}

			/* Setup output framebuffer */
			gl.bindFramebuffer(gl.FRAMEBUFFER, outputFB);
			gl.viewport(0, 0, canvas.width, canvas.height);

			/* Bind input texture */
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, currentInputTex);

			/* Set pixel offset for this iteration */
			gl.uniform1f(ctx.shd.kawase.uniforms.pixelOffset, i);

			/* Apply distributed brightness, due to color precision limitations and multi pass nature of this blur algorithm */
			const finalBrightness = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			const distributedBrightness = Math.pow(finalBrightness, 1.0 / iterations);
			gl.uniform1f(ctx.shd.kawase.uniforms.bloomStrength, distributedBrightness);

			/* Draw */
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

			/* Setup for next iteration */
			if (i < iterations - 1) {
				if (i % 2 === 0) {
					currentInputTex = ctx.tex.frameIntermediate1;
				} else {
					currentInputTex = ctx.tex.frameIntermediate2;
				}
			}
		}
		}

		if (ctx.mode == "bloom") {
			/* Now do the bloom composition to the screen */
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		/* Ask for CPU-GPU Sync to prevent overloading the GPU during compositing.
		   In reality this is more likely to be flush, but still, it seems to
		   help on multiple devices with during low FPS */
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameIntermediate1); ctx.tex.frameIntermediate1 = null;
		gl.deleteTexture(ctx.tex.frameIntermediate2); ctx.tex.frameIntermediate2 = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.intermediate1); ctx.fb.intermediate1 = null;
		gl.deleteFramebuffer(ctx.fb.intermediate2); ctx.fb.intermediate2 = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Initialize UI state */
	const initialIterations = parseInt(ui.blur.iterations.value);
	ui.blur.samplePos.disabled = initialIterations === 0;
	ui.blur.samplePosReset.disabled = initialIterations === 0;

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

Akin to the Central Limit Theorem making repeated passes of a box-blur approach a gaussian blur, our Kawase blur provides a smooth gaussian-like results, due to the iterative convolution at play. Technically, there are two convolutions happening at the same time - bilinear filtering and the diagonal samples with increasing distance.

Two different origins: The Gaussian blur came from a mathematical concept entering graphics programming. The Kawase blur was born to get the most out of what hardware provides for free.

It is not a separable convolution, due to its diagonal sampling nature. As no downsampling is used, this means that we write all pixels out to memory, each pass. Even if we could separate, the cost of writing out twice as many passes to memory would outweigh the benefit of going from 4 samples per-pass to 2.

With so few samples, you cannot increase samplePosMultiplier without instantly getting artifacts. We mess up the sample pattern.

Take note of textures taps: They grow linearly, with increasing blur radius. In DOUBLE-S.T.E.A.L, Masaki Kawase used it to create the bloom effect, calculated at a lower resolution. But there is one more evolution coming up - We have blur, we have downsampling. Two separate concepts. What if we “fused” them?

Dual Kawase Blur #

Marius Bjørge, principal graphics architect at ARM took this thought to its logical conclusion, when he was optimizing mobile graphics rendering with ARM graphics chips. In a SIGGRAPH 2015 talk he presented an algorithm, that would later become knows as the ✨ Dual Kawase Blur 🌟, this article’s final destination.

Dual Kawase sampling patterns
Dual Kawase sampling patterns
Excerpt from Bandwidth-Efficient Rendering, talk by Marius Bjørge

This blur works with the idea of Masaki Kawase’s “placing diagonal samples at increasing distance”, but does so in conjunction with downsampling, which effectively performs this “increase in distance”. There is also a dedicated upsample filter. I’ll let Marius Bjørge explain this one, an excerpt from his talk mentioned above

Marius Bjørge: For lack for a better name, dual filter is something I come up with when playing with different downsampling and upsampling patterns. It's sort of a derivative of the Kawase filter, but instead of ping-ponging between two equally sized textures, this filter works by having the same filter for down sampling and having another filter for upsampling.

The downsample filter works by sampling four pixels covering the target pixel, and you also have four pixels on the corner of this pixel to smudge in some information from all the neighboring pixels. So the end upsample filter works by reconstructing information from the downsample pass. So this pattern was chosen to get a nice smooth circular shape.

Let’s try it. This time, there are two blur shaders, as there is an upsample and downsample stage. Again, there is no kernelSize. Instead there are downsampleLevels, which performs the blur in conjunction with the down sampling. Play around with all the slider and get a feel for it.

L The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.
FPS:?/?ms Resolution:?x?Texture Taps:?
downsampleLevels
downsampleLevels2
samplePosMultiplier
samplePosMultiplier100%
lightBrightness
lightBrightness100%
WebGL Fragment Shader dual-kawase-down.fs
/* Float precision to highp, if supported. Large iterations result many color
   contributions and thus require the highest precision to avoid clipping.
   Required in WebGL 1 Shaders and depending on platform may have no effect */
precision highp float;
/* UV coordinates, passed in from the Vertex Shader */
varying vec2 uv;

uniform vec2 frameSizeRCP; /* Resolution Reciprocal */
uniform float offset; /* Offset multiplier for blur strength */
uniform float bloomStrength; /* bloom strength */

uniform sampler2D texture;

void main() {
	/* Dual Kawase downsample: sample center + 4 diagonal corners */
	vec2 halfpixel = frameSizeRCP * 0.5;
	vec2 o = halfpixel * offset;
	
	/* Sample center with 4x weight */
	vec4 color = texture2D(texture, uv) * 4.0;
	
	/* Sample 4 diagonal corners with 1x weight each */
	color += texture2D(texture, uv + vec2(-o.x, -o.y)); /* bottom-left */
	color += texture2D(texture, uv + vec2( o.x, -o.y)); /* bottom-right   */
	color += texture2D(texture, uv + vec2(-o.x,  o.y)); /* top-left */
	color += texture2D(texture, uv + vec2( o.x,  o.y)); /* top-right */
	
	/* Apply bloom strength and normalize by total weight (8) */
	gl_FragColor = (color / 8.0) * bloomStrength;
}
WebGL Fragment Shader dual-kawase-up.fs
/* Float precision to highp, if supported. Large iterations result many color
   contributions and thus require the highest precision to avoid clipping.
   Required in WebGL 1 Shaders and depending on platform may have no effect */
precision highp float;
/* UV coordinates, passed in from the Vertex Shader */
varying vec2 uv;

uniform vec2 frameSizeRCP; /* Resolution Reciprocal */
uniform float offset; /* Offset multiplier for blur strength */
uniform float bloomStrength; /* bloom strength */

uniform sampler2D texture;

void main() {
	/* Dual Kawase upsample: sample 4 edge centers + 4 diagonal corners */
	vec2 halfpixel = frameSizeRCP * 0.5;
	vec2 o = halfpixel * offset;
	
	vec4 color = vec4(0.0);
	
	/* Sample 4 edge centers with 1x weight each */
	color += texture2D(texture, uv + vec2(-o.x * 2.0, 0.0)); /* left */
	color += texture2D(texture, uv + vec2( o.x * 2.0, 0.0)); /* right */
	color += texture2D(texture, uv + vec2(0.0, -o.y * 2.0)); /* bottom */
	color += texture2D(texture, uv + vec2(0.0,  o.y * 2.0)); /* top */
	
	/* Sample 4 diagonal corners with 2x weight each */
	color += texture2D(texture, uv + vec2(-o.x,  o.y)) * 2.0; /* top-left */
	color += texture2D(texture, uv + vec2( o.x,  o.y)) * 2.0; /* top-right */
	color += texture2D(texture, uv + vec2(-o.x, -o.y)) * 2.0; /* bottom-left */
	color += texture2D(texture, uv + vec2( o.x, -o.y)) * 2.0; /* bottom-right */
	
	/* Apply bloom strength and normalize by total weight (12) */
	gl_FragColor = (color / 12.0) * bloomStrength;
}
WebGL Javascript dual-kawase.js
import * as util from '../utility.js'

export async function setupDualKawaseBlur() {
	/* Init */
	const WebGLBox = document.getElementById('WebGLBox-DualKawaseBlur');
	const canvas = WebGLBox.querySelector('canvas');

	/* Circle Rotation size */
	const radius = 0.12;

	/* Main WebGL 1.0 Context */
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	/* State and Objects */
	const ctx = {
		/* State for of the Rendering */
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		/* Textures */
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null, down: [] },
		/* Framebuffers */
		fb: { scene: null, final: null, down: [] },
		/* Shaders and their respective Resource Locations */
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			passthrough: { handle: null },
			downsample: { handle: null, uniforms: { frameSizeRCP: null, offset: null, bloomStrength: null } },
			upsample: { handle: null, uniforms: { frameSizeRCP: null, offset: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	/* UI Elements */
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			downsample: WebGLBox.querySelector('#downsampleRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#renderer'),
			downsampleLevels: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#downsampleLevels'),
			iterTime: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	/* Shaders */
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const dualKawaseDown = await util.fetchShader("shader/dual-kawase-down.fs");
	const dualKawaseUp = await util.fetchShader("shader/dual-kawase-up.fs");

	/* Elements that cause a redraw in the non-animation mode */
	ui.blur.downsample.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	/* Events */
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.downsample.addEventListener('input', () => {
		ui.blur.samplePos.disabled = ui.blur.downsample.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.downsample.value == 0;
	});

	/* Render Mode */
	ui.rendering.modes.forEach(radio => {
		/* Force set to scene to fix a reload bug in Firefox Android */
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		/* spin up the Worker (ES-module) */
		const worker = new Worker("./js/benchmark/dualKawaseBenchmark.js", { type: "module" });

		/* pass all data the worker needs */
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			downShaderSrc: dualKawaseDown,
			upShaderSrc: dualKawaseUp,
			downsampleLevels: ui.blur.downsample.value,
			samplePos: ui.blur.samplePos.value
		});

		/* Benchmark */
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.downsampleLevels.textContent = event.data.downsampleLevels;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	/* Draw Texture Shader */
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	/* Draw bloom Shader */
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	/* Simple Passthrough */
	ctx.shd.passthrough = util.compileAndLinkShader(gl, simpleQuad, simpleTexture);

	/* Dual Kawase Shaders */
	ctx.shd.downsample = util.compileAndLinkShader(gl, simpleQuad, dualKawaseDown, ["frameSizeRCP", "offset", "bloomStrength"]);
	ctx.shd.upsample = util.compileAndLinkShader(gl, simpleQuad, dualKawaseUp, ["frameSizeRCP", "offset", "bloomStrength"]);

	/* Send Unit code verts to the GPU */
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		const maxDown = parseInt(ui.blur.downsample.max);
		for (let i = 0; i < maxDown; ++i) {
			gl.deleteFramebuffer(ctx.fb.down[i]);
			gl.deleteTexture(ctx.tex.down[i]);
		}
		ctx.fb.down = [];
		ctx.tex.down = [];

		let w = canvas.width, h = canvas.height;
		for (let i = 0; i < maxDown; ++i) {
			w = Math.max(1, w >> 1);
			h = Math.max(1, h >> 1);
			const [fb, tex] = util.setupFramebuffer(gl, w, h);
			ctx.fb.down.push(fb);
			ctx.tex.down.push(tex);
		}

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		/* UI Stats */
		const levels = parseInt(ui.blur.downsample.value);
		/* Calculate texture taps based on computational resolution at each level */
		let totalTaps = 0;
		for (let i = 0; i < levels; i++) {
			const levelW = Math.max(1, canvas.width >> (i + 1));
			const levelH = Math.max(1, canvas.height >> (i + 1));
			totalTaps += levelW * levelH * 5; // 5 samples per downsample pass
			if (i < levels - 1) totalTaps += levelW * levelH * 8; // 8 samples per upsample pass (except final)
		}
		if (levels > 0) totalTaps += canvas.width * canvas.height * 8; // Final upsample to full res
		const tapsNewText = (totalTaps / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		/* Display actual output resolution (always full canvas size) */
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		/* Circle Motion */
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		/* Setup PostProcess Framebuffer */
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		/* Draw Call */
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		const downsampleLevels = parseInt(ui.blur.downsample.value);
		let srcTex = ctx.tex.frame;

		if (downsampleLevels > 0) {
			/* Apply distributed brightness, due to color precision limitations and multi pass nature of this blur algorithm */
			const finalBrightness = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			const totalPasses = 2 * downsampleLevels;
			const distributedBrightness = Math.pow(finalBrightness, 1.0 / totalPasses);
			
			/* Downsample chain */
			gl.useProgram(ctx.shd.downsample.handle);
			gl.uniform1f(ctx.shd.downsample.uniforms.offset, ui.blur.samplePos.value);
			
			let w = canvas.width, h = canvas.height;
			for (let i = 0; i < downsampleLevels; ++i) {
				const fb = ctx.fb.down[i];
				w = Math.max(1, w >> 1);
				h = Math.max(1, h >> 1);

				gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
				gl.viewport(0, 0, w, h);

				const frameSizeRCP = [1.0 / w, 1.0 / h];
				gl.uniform2fv(ctx.shd.downsample.uniforms.frameSizeRCP, frameSizeRCP);
				gl.uniform1f(ctx.shd.downsample.uniforms.bloomStrength, distributedBrightness);

				gl.activeTexture(gl.TEXTURE0);
				gl.bindTexture(gl.TEXTURE_2D, srcTex);
				gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
				srcTex = ctx.tex.down[i];
			}

			/* Upsample chain */
			gl.useProgram(ctx.shd.upsample.handle);
			gl.uniform1f(ctx.shd.upsample.uniforms.offset, ui.blur.samplePos.value);
			
			for (let i = downsampleLevels - 2; i >= 0; i--) {
				const fb = ctx.fb.down[i];
				w = Math.max(1, canvas.width >> (i + 1));
				h = Math.max(1, canvas.height >> (i + 1));

				gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
				gl.viewport(0, 0, w, h);

				const srcW = Math.max(1, canvas.width >> (i + 2));
				const srcH = Math.max(1, canvas.height >> (i + 2));
				const frameSizeRCP = [1.0 / srcW, 1.0 / srcH];
				gl.uniform2fv(ctx.shd.upsample.uniforms.frameSizeRCP, frameSizeRCP);
				gl.uniform1f(ctx.shd.upsample.uniforms.bloomStrength, distributedBrightness);

				gl.activeTexture(gl.TEXTURE0);
				gl.bindTexture(gl.TEXTURE_2D, srcTex);
				gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
				srcTex = ctx.tex.down[i];
			}

			/* Final upsample to full resolution */
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);

			const srcW = Math.max(1, canvas.width >> 1);
			const srcH = Math.max(1, canvas.height >> 1);
			const frameSizeRCP = [1.0 / srcW, 1.0 / srcH];
			gl.uniform2fv(ctx.shd.upsample.uniforms.frameSizeRCP, frameSizeRCP);
			gl.uniform1f(ctx.shd.upsample.uniforms.bloomStrength, distributedBrightness);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, srcTex);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
			
			if (ctx.mode != "bloom") {
				srcTex = ctx.tex.frameFinal;
			}
		} else {
			/* No blur - direct passthrough */
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.useProgram(ctx.shd.passthrough.handle);
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, srcTex);
			gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
			const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			gl.uniform1f(gl.getUniformLocation(ctx.shd.passthrough.handle, "bloomStrength"), bloomStrength);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
			
			if (ctx.mode != "bloom") {
				srcTex = ctx.tex.frameFinal;
			}
		}

		if (ctx.mode == "bloom") {
			/* Now do the bloom composition to the screen */
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		/* Ask for CPU-GPU Sync to prevent overloading the GPU during compositing.
		   In reality this is more likely to be flush, but still, it seems to
		   help on multiple devices with during low FPS */
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	/* Render at Native Resolution */
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	/* Resize Event */
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		/* Start rendering, when canvas visible */
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		/* Stop another redraw being called */
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		/* Force the rendering pipeline to sync with CPU before we mess with it */
		gl.finish();

		/* Delete the buffers to free up memory */
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		for (let i = 0; i < parseInt(ui.blur.downsample.max); ++i) {
			gl.deleteTexture(ctx.tex.down[i]);
			gl.deleteFramebuffer(ctx.fb.down[i]);
		}
		ctx.tex.down = [];
		ctx.fb.down = [];
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	/* Only render when the canvas is actually on screen */
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

It’s also a gaussian-like blur. Remember our first gaussian Blur? Its performance tanked exponentially, as we increased kernel radius. But now, with each downsample step, the required texture taps grow slower and slower. The stronger our blur, the less additional samples we require!

This was of special interest to Marius Bjørge, as his goal was to reduce memory access, which is especially slow on mobile devices, and still produce a motion-stable non shimmering blur. Speaking of which, go into bloom mode, crank lightBrightness and compare it to our downsample example.

Even though the resolution is reduced to the same downSample level, no shimmering! That’s the Dual Kawase Blur for you - A gaussian-like post-processing blur, with good performance, no heavy repeated memory writes and motion stable output. This makes it ideal as a basic building block for visual effects like bloom.

What are the big boys doing? #

The Dual Kawase Blur has found its way into game engines and user interfaces alike. For instance the Linux Desktop Environment KDE uses it as the frosted backdrop effect since in 2018, where it remains the algorithm of choice to this day. I used KDE’s implementation as a guide when creating my demo above.

KDE Plasma's Blur with noise at max strength
KDE Plasma's Blur with noise at max strength (Source)

Of course, graphics programming didn’t stop in 2015 and there have been new developments. The previously mentioned talk Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez showcased an evolution on the “downsample while blurring” idea to handle far-away and very bright lights at high blur strengths better.

Uneven interpolation of bright, small light sources (Left), Page 156 from presentation
Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez

In turn, this technique got picked up two years later by graphics programmer Mikkel Gjoel, when working on the video game INSIDE by Studio Playdead. In the GDC 2016 talk Low Complexity, High Fidelity - INSIDE Rendering he showcased a further optimization, reducing the number of texture reads required.

Blur algorithm used for Bloom in video game Inside
Excerpt from talk Low Complexity, High Fidelity - INSIDE Rendering by Mikkel Gjoel & Mikkel Svendsen

I showcased the bloom use-case a lot. The technique used in my demos is rather primitive, akin to the time of video game bloom disasters, where some many games had radioactive levels of bloom, showing off a then novel technique. In this older style an extra lights pass or the scene after thresholding, was blurred and added on top.

Bloom in Video game
Bloom in Video game The Elder Scrolls IV: Oblivion, from article by Bloom Disasters

These days 3D engines follow a Physically based shading model, with HDR framebuffers capturing pixels in an energy conserving manner. Specular light reflections preserve the super bright pixels from the lamp they originated from.

With such a wide range of energy values, light that should bloom doesn’t need special selection anymore. Instead of defining what to blur, everything is blurred and the bright parts naturally start glowing, without predefined “parts to blur”.

Physically Based Blur
Multiple blurs stacked to create a natural light fall-off
Page 144 in Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez

The result isn’t just blurred once, but rather multiple blur strengths are stacked on top of each other, for a more natural light fall-off, as shown in the previously mentioned talk by Jorge Jimenez. This isn’t an article about bloom, but the underlying building block. The Blur.

This was a journey through blurs and I hope you enjoyed the ride! If you are a new visitor from the Summer of Math Exposition and enjoyed this article, you’ll enjoy my other graphics programming deep-dives on this blog. Also during SoME 3, my submission was a WebApp + Video Adventure into Mirrorballs:

Mathematical Magic Mirrorball #SoME3
YouTube Video by FrostKiwi