2D Sprites into 3D with Depth Maps and Forward Warping

2D to 3D; how?

A few months ago, I came across a very interesting tool published to Steam called Smack Studio.

This unique tool and game combination featured tooling that allowed for conversion of flat, two dimensional sprites into a three dimensional version using a combination of sophisticated sprite projection techniques and depth estimation. With no understanding of how this worked, this seemed very magical, so I decided to dive deeper into what I believe is the technique they used to accomplish this extremely interesting effect. I've consolidated my research on the subject here, alongside some visualizations that should help illustrate the methods in an intuitive way.

End result. Demonstration of things we will cover!

Shamelessly, I'd like to keep the really cool looking stuff above the fold for this article. For that reason, I've attached an interactive demonstration below which demonstrates a combination of several techniques applied to several two dimensional sprite images that allow for a skeletal rigging set-up that mimics a three dimensional pipeline. We won't cover all of these techniques here, but we will in follow-up articles.

The bone selector allows you to rotate specific bones. By default, hip will allow you to rotate along the y-axis (the default axis chosen, to simplify). Depth is by default set to luminance, which we will cover the meaning of later. Depth can be configured to give the model more 'thickness', which will also be discussed.

Interactive demo requires a larger screen

Angle0°Depth10HolesSkeleton

Forward Warping & Fake Depth; what?

Before understanding the techniques applied by Smack Studio to achieve skeletal rigging and animation akin to a three-dimensional art pipeline, we can get a good basis learning how to simply rotate a single image along a fixed axis. For our purposes, let's use this ball I drew up in Aseprite:

Pixel art ball study

Not my best work, but it's got enough detail to serve its purpose. Note that this ball has some 'implied' depth, visible through the shading. This is an artistic choice often given to flat images to inspire a three-dimensional feel. However, as engineers, we can leverage this artistic choice in another way; by fetching the luminance values.

The core idea behind it is very simple. Pixels closer to the camera are brighter, while pixels farther from the camera are darker. Pretty much exactly as we interpret light in real life. To capture this as code, we can define a fixed size depth buffer that approximates the z-depth of the image, and then sort our pixels by relative luminance values into the buffer.

Pixel	Color	R	G	B	Luminance	Depth
Edge (left)	Dark grey	60	60	60	0.24	2.4
Inner shadow	Mid grey	120	125	130	0.49	4.9
Mid	Light blue-grey	170	180	190	0.71	7.1
Highlight	White	240	245	250	0.96	9.6
Mid	Light blue-grey	170	180	190	0.71	7.1

Depth values here are assuming a depthScale of 10. The bright highlighted pixels have the most depth (9.6). Therefore, they will move the farthest during rotation. The dark edge pixel barely moves (2.4).

At a full 90° rotation, the ball's edge faces the camera. At this angle, what you see is effectively only the depth profile. The bright highlight pixels push out the farthest, and the dark edges stay near the axis. The image at this angle essentially becomes a cross-section of its own luminance values:

        ┌── depth 9.6
        │
   ┌────┤
   │    │    At 90°, the ball's depth profile
   │    │    becomes its visible width.
   |    |    Everything between here is the 'ranking'
   |    |    of the luminance values.
───┤    │    Bright pixels are far from axis.
   │    │    Dark pixels are close to axis.
   │    │
   └────┤
        │
        └── depth 2.4

I've mentioned depth maps quite a bit, so I think a proper explanation of them would help. Depth maps are a very well understood concept in computer graphics, which you can learn more about from that link.

Halfway there. Time for forward warping.

Anyways, back to the issue at hand. Now that we have an approximate depth mapping from our luminance values, we've solved for a few states:

What our image looks like at 0 degrees (facing the camera)
90 degrees
-90 degrees
180 degrees (assuming we just copy the same image for the back)

The next question is a bit harder to answer; how do we actually interpolate pixels while rotating for, well, all of the other degrees? To answer that, we have to look to a technique called Forward Warping.

What exactly is forward warping?

Forward Warping as a technique has been around for over 3 decades, dating back to pioneering work done by McMillan & Bishop at UNC Chapel Hill in 1995 with the release of "Plenoptic Modeling". To summarize their work as well as the concept, a layman's definition of forward warping would be the following:

Take a flat image, and assume that each pixel of this image has a height.
Rotate the image. To project a given pixel p, rotate around a designated center point cx, cy, and re-project p using some trigonometry. It boils down to this assumption: taller pixels will move more during the rotation, and pixels that are shorter will move less or stay put. This rotation is relative to their depth. Essentially, depth becomes a fake Z coordinate.

The above is done for each pixel every time to simulate rotation. The end result is an approximation based on depth of how an object would look rotated. For a bit more math context, here's the exact algorithmic approach:

The math behind it.

Assume each pixel has position (x, y), with a depth value of d. Assume a fixed rotation point of (cx, cy).

For each pixel in the image:

Define local x and y as being the distance from the center point (cx, cy).
For rotating an image along the y-axis:
1. Define the target x as the local x multiplied by cos(theta) plus the depth multiplied by sin of theta. This foreshortens the pixel's horizontal position as it rotates away, and then applies depth as an additional horizontal displacement.
```
newX = (x - cx) · cos(θ) + d · sin(θ)
```
2. Define the target y as the local y.
```
newY = y - cy
```
3. Define the target z as negative local x multiplied by sin(theta) + depth multiplied by cos(theta). This moves the z value into/out of the screen.
```
newZ = -(x - cx) · sin(θ) + d · cos(θ)
```

Rotation along the x-axis works similarly by inverting the math applied to local x and local y.

newX = x - cx
newY = (y - cy) · cos(θ) - d · sin(θ)
newZ = (y - cy) · sin(θ) + d · cos(θ)

The end result is being able to rotate a sprite along a fixed rotational point. The result is below — depth estimated purely from the luminance values:

Interactive demo requires a larger screen

However, we can see some very clear deficiencies with this approach. Why does it look so... weird?

It's got a lot to do with our approach to approximating depth. Presently, we are just purely ranking the luminance values to determine depth, nothing more. For that reason, some rather obvious yet interesting things can happen.

Luminance as depth; flaky at best

Here's the deal. I'm no great artist. I definitely tried to make this circle look good, but I definitely missed out on the detail in the shading. Now, our algorithm is paying the price. There's simply not enough information present in our colors here to accurately apply them to our depth buffer. However, there's another way...

Shape Hints

Well, it's pretty obvious. If we know the shape to begin with, no need for guessing at depth at all! Toss luminance out the window.

For our sphere here, the depth at any point follows a hemisphere equation: depth = sqrt(r² - d²), where d is the distance from the center and r is the radius. At 90° rotation, the Y coordinates don't change. The height stays fixed at the original sprite's vertical extent. So depthScale must match the vertical half-extent of the sprite to produce a circular profile. Easy enough!

Interactive demo requires a larger screen

Not bad, not bad! But not perfect. We can do much better. And we will, with part two of this series, where we will cover more sophisticated warp techniques, as well as applying our work to a skeleton to allow for some more interesting rigging applications. Sign up for email alerts in the header to be notified when that drops!

I'm a big believer in attributing credit. I didn't invent any of these techniques, I just compiled a little bit of stuff together to help my own understanding a bit. Definitely explore the references below to learn more, and take a look at Smack Studio on Steam!

References

Smack Studio

A tool for converting 2D pixel art sprites into rotatable 3D versions using depth estimation and forward warping.

store.steampowered.com

Plenoptic Modeling: An Image-Based Rendering System

McMillan & Bishop, SIGGRAPH 1995. The foundational paper on forward warping for image-based rendering.

cseweb.ucsd.edu

Depth Map

Wikipedia overview of depth maps — per-pixel distance from the camera, used in displacement mapping, stereoscopy, and sprite projection.

en.wikipedia.org

Relief Texture Mapping

Oliveira et al., SIGGRAPH 2000. Factors the forward warp equations into a 1D pre-warp followed by texture mapping.

inf.ufrgs.br

Forward Rasterization

Popescu & Rosen, TOG 2006. Quad mesh rasterization from depth images — the technique behind gap-free forward warping.

cs.purdue.edu

Normal Map Generation Techniques for Pixel Art

IEEE 2022. Surveys six techniques for generating normal/height maps from pixel art, including distance-transform-based depth estimation.

arxiv.org

← Back to home