Quantcast
Viewing all articles
Browse latest Browse all 3

Ray Tracer in F# – Part I

In this series of posts I’m going to demonstrate how to implement a ray tracing engine in F#. The purpose of these posts isn’t just to build a ray tracer – instead, they are put together as a tutorial on the F# language. This program will demonstrate most (if not all) of F#’s major features and show you how to use standard F# patterns and idioms.
Image may be NSFW.
Clik here to view.

I don’t know how long this series will be at this stage. I chose to use a ray tracer for a couple of reasons. Firstly, I already implemented a basic ray tracer in F# for a university assignment last year (see image on right), so I have the skeleton of the code available to me already (although the final product will be much more polished). Another reason is that ray tracing is an excellent application of functional programming. It’s a process that is conducive to immutable data structures, side-effect-free functions and stateless programming. It’s also simple enough to be able to blog about, but complex enough to not seem like a “toy example” to demonstrate F# in.

You should be warned before going into this that there will be some mathematics. For the purposes of learning F# it’s not imperative that you understand the maths, but if you want to understand the ray tracer itself then it will certainly help. I will explain the ray tracing background to everything that I do, but not in excruciating detail, because I want F# to be the focus here.

So let’s begin. First of all, what is a ray tracer? Ray casting (not tracing) is the process of casting a “ray” from some imaginary eye point, through a screen and into a scene. Whatever the ray strikes in the scene is what we will display at the point on the screen where that ray passed through it. If we repeat this process at every point on the screen that will correspond to a pixel, then we can build an image of the scene. This image might help illuminate things.

Image may be NSFW.
Clik here to view.

These images come from Richard Lobb's ray caster demo, used with permission.


The eye point is the black sphere on the left. You can see a ray for each pixel on the “screen” passing from the eye point and through the center of each pixel. Where the ray intersects an object in the scene (everything behind the screen) the pixel that ray ran through becomes coloured appropriately. There will also be a default “background” colour for rays which intersect nothing in the scene.

I should note that some people refer to this as “reverse ray casting, which actually makes a bit more sense. Obviously in real life rays travel from light sources and into the eye. In our ray caster we do that backwards. Ray tracing is the logical extension of this idea when we take into account things like reflection and refraction. We will define this more rigorously when we get there, but for this iteration we will stick to just ray casting.

So that’s the extremely brief introduction to the geometry of ray casting, before we get into the code we need just one more very quick intro lesson, and that’s lighting. I encourage you to explore this further yourself because I’m going to be very brief here. We’re going to use a simple Phong illumination model (actually it’s a slightly modified Phong model known as Blinn-Phong) to light our objects. This takes into account the ambient, specular and diffuse components of light, and we’ll (eventually) be applying this to each of multiple light sources in a scene. The referenced Wikipedia articles are actually a pretty good start to understanding this.

So let’s get started. The very first thing we are going to do is just render a flat sphere with no shading at all. It’s not very impressive but getting something onto the screen ensures we have a decent understanding of the ray tracing basics.

Let’s set up some simple data types for our ray tracer.

open System
open System.Windows.Media.Media3D
open System.Drawing
open System.Windows.Forms


type Sphere = { center:Point3D; radius: float }
type Camera = { position:Point3D; lookAt:Point3D; lookUp:Vector3D }
type Scene  = { camera:Camera; sphere:Sphere }
type Ray    = { origin:Point3D; direction:Vector3D }

We’re going to start very basic, with a scene that has no lighting and just a simple sphere. These type declarations give us a jumping-off point. When I’m writing F# code, especially when I’m just getting started, I like to throw everything into F# interactive, just to give myself a sanity check that my types all look right. You can quickly do this by hitting ctrl-A to select all of your code and hitting alt-enter to execute it in F# interactive.

One annoying thing about the System.Windows.Media.Media3D.Vector3D type is that there is no built-in immutable way to normalize vectors. Calling the Normalize method will actually mutate the vector you call it on, so let’s quickly define our own functional version of normalization.

let norm (v:Vector3D) = 
    let abs = sqrt (v.X * v.X + v.Y * v.Y + v.Z * v.Z)
    v / abs

Unfortunately for this ray tracer there’s a fair bit of annoying plumbing code to make things work. I keep this function at the bottom of the main source file. Most of what is in it is not difficult to understand, it’s just necessary.

do
    let width = 480
    let height = 320

    // Vertical and horizontal field of view
    let hfov = System.Math.PI/3.5
    let vfov = hfov * float(height)/float(width)

    // Pixel width and height
    let pw = 2.0 * System.Math.Tan(float(hfov/2.0))/float(width)
    let ph = 2.0 * System.Math.Tan(float(vfov/2.0))/float(height)    

    let box = new PictureBox(BackColor = Color.White, Dock = DockStyle.Fill, SizeMode = PictureBoxSizeMode.CenterImage)
    let bmp = new Bitmap(width,height)

    // sphere
    let sphere = { center=Point3D(1.0,1.0,1.0); radius=0.4 }

    // camera
    let camera = { position=Point3D(0.0,0.0,0.0); lookAt=Point3D(1.0,1.0,1.0); lookUp=Vector3D(0.0,1.0,0.0) }
    
    // scene
    let scene = { camera=camera; sphere=sphere }

    // set up the coordinate system
    let n = norm (camera.position - camera.lookAt)
    let u = norm (Vector3D.CrossProduct(camera.lookUp, n))
    let v = norm (Vector3D.CrossProduct(n, u))
    let vpc = camera.position - n

    for x in 0..(width-1) do
        for y in 0..(height-1) do
            let rayPoint = vpc + float(x-width/2)*pw*u + float(y-height/2)*ph*v
            let rayDir = norm (rayPoint - scene.camera.position)
            let ray = { origin = scene.camera.position; direction = rayDir }
            match castRay ray scene with
            | true -> bmp.SetPixel(x, y, Color.Red)
            | false -> bmp.SetPixel(x, y, Color.Gray)
            

    bmp.Save("output.jpg")

I’ll explain some of this, because it’s not all completely obvious, although there’s nothing particularly fancy in F# terms in here.
hfov andvfov define our horizontal and vertical fields of view. Vertical field of view is defined in terms of the horizontal field of view and the aspect ratio (determined by the width and height of our screen). This ensures a correct aspect ratio in our images.

pw and ph are pixel width and pixel height. Because we want to send our rays through the center of each pixel, we can’t think of pixels are infinitely small. Each pixel has some height (defined in terms of world coordinates) and width. Hopefully you understand enough trigonometry to see how we got these values in terms of the field of view (an angular measurement).

The next few lines define some really basic things like where the camera is and where it’s pointing, where our sphere is, etc.

Next is the really tricky part (if you have never done 3D graphics before). If you don’t understand how points and vectors interact in terms of addition and subtraction you may have some trouble understanding this. Basically what we’re after here is a set of three lines which are mutually perpendicular and a single (world) unit in length. These lines will define our camera’s coordinate system. So we start by defining an arbitrary line, n, as being the vector between the point the camera is looking at and the point where the camera is located, normalized. This gives us some vector as long as the camera is not located exactly where it is looking.

We then find a vector orthogonal to this one by taking the cross product of n and the camera’s look-up vector. Because a cross product gives a vector mutually perpendicular to the input vectors, this is guaranteed to be perpendicular to n. The constraint on this is that the camera must not be looking directly upwards, or u will be zero in magnitude and impossible to normalize.

To find a third orthogonal vector we simple take the cross product of the former two vectors and normalize. There we have it, three mutually orthogonal normalized vectors! Note also that n is negative in the direction that the camera looks. We use this to create vpc, or “view port center”, a point one unit along the camera look-at vector.

The nested for-loop construct should be fairly straight-forward (if not especially functional, but don’t worry, we’ll refactor this as we go). We’re filling in a bitmap grid so we’re just going to loop over each pixel in the grid. For each pixel we are calculating rayPoint, which is some arbitrary point on the ray, which allows us to calculate rayDir. Studying these two lines for a few minutes should allow you to see how this works – the maths isn’t terribly complicated. From this ray direction we can compose a ray our of the camera position and the ray direction. We then cast the ray, and if it hits we render the pixel red, otherwise black.

I’d like to take a small digression to explain that match syntax. This keyword allows for pattern matching, a very powerful technique in F#. We will get into more complex pattern matching scenarions later in this series. The basic idea is that you can pass some expression into a match function along with a series of pipe-separated options for possible matches. The code under the option that matches the expression first is the branch that will execute. This may sound somewhat basic (and with mine looking pretty much like an if-statement, it is in this case) but this actually facilitates some really cool stuff. If you just can’t wait until my next couple of posts, I highly recommend doing some Googling to find out what makes this so interesting.

So pretty much all we have left to do is the castRay function. Again I will post the code and then explain.

let castRay ray scene =
    let s = ray.origin - scene.sphere.center
    let rayDir = norm ray.direction
    let sv = Vector3D.DotProduct(s,rayDir)
    let ss = Vector3D.DotProduct(s,s)
    let discr = sv*sv - ss + scene.sphere.radius*scene.sphere.radius
    if discr < 0.0 then false
    else true

The gist of this is that if the ray intersects the sphere we will return true, otherwise false. For general ray-casting or ray-tracing this isn’t especially interesting, and we’ll change it later, but for now it allows us to at least render our sphere.

I’ll quickly go over the maths of this function because if you haven’t done a lot of geometry you’re probably wondering WTF is going on there. My crappy variable names probably aren’t helping.

We can define the ray parametrically as Image may be NSFW.
Clik here to view.
r(t) = r_0 + {r_d}t
with Image may be NSFW.
Clik here to view.
r_0 = [x_0, y_0, z_0]
and Image may be NSFW.
Clik here to view.
r_d = [x_d, y_d, z_d]
.
Further, we can define our sphere as the set of points Image may be NSFW.
Clik here to view.
[x_s, y_s, z_s]
such that Image may be NSFW.
Clik here to view.
(x_s - x_c)^2 + (y_s - y_c)^2 + (z_s - z_c)^2 = {S_r}^2
where Image may be NSFW.
Clik here to view.
[x_c, y_c, z_c]
is the sphere’s center point and Image may be NSFW.
Clik here to view.
S_r
is the sphere’s radius. Solve by substituting the ray equation into the sphere equation to get
Ray:
Image may be NSFW.
Clik here to view.
x = x_0 + {x_d}t

Image may be NSFW.
Clik here to view.
y = y_0 + {y_d}t

Image may be NSFW.
Clik here to view.
z = z_0 + {z_d}t

Sub into the sphere equation for x, y and z
Image may be NSFW.
Clik here to view.
(x_0 + {x_d}t - x_c)^2 + (y_0 + {y_d}t - y_c)^2 + (z_0 + {z_d}t - z_c)^2 = {S_r}^2

if you expand this out you’ll find you get
Image may be NSFW.
Clik here to view.
At^2 + Bt + C = 0

Image may be NSFW.
Clik here to view.
A = {x_d}^2 + {y_d}^2 + {z_d}^2

Image may be NSFW.
Clik here to view.
B = 2(x_d (x_0 - x_c) + y_d (y_0 - y_c) + z_d (z_0 - z_c))

Image may be NSFW.
Clik here to view.
C = (x_0 - x_c)^2 + (y_0 - y_c)^2 + (z_0 - z_c)^2 - {S_r}^2

With a bit of inspection you should be able to see how this fits into our F# code above. If the ray direction is normalized then A will always be 1. You can see that B is the dot product of the ray’s direction and a vector between the ray origin and the sphere’s center (sv in the code) and C is the vector between the ray’s origin and the sphere’s center, dotted with itself (ss in the code). Google “sphere ray intersection” for more detail.

So now that we have an expression for the intersection points of a ray and a sphere, all we’re doing is seeing if the intersection actually exists by solving the quadratic, using the regular quadratic formula. In this case we’re just checking for the discriminant, because if it’s negative then there is no intersection, otherwise there are two. For now we don’t care where the intersections happened, just that they did.

Now we put all of that together and we get this:
Image may be NSFW.
Clik here to view.

Okay, so it’s not exactly thrilling, but it is a genuine ray caster. The next step is to add some lighting to make this look like a real sphere. The next article will be posted within a weekis now available. To whet your appetite, here is an image and a quick video of what the product will look like after just a few articles.
Image may be NSFW.
Clik here to view.

Source code so far


Viewing all articles
Browse latest Browse all 3

Trending Articles