2021-03-28 00:04:01

Quick category theory: Functors

Hi all! I want to talk about functors today!

This will not be a post about "how to use functors practically in Haskell". That is cool and all, but it's not what I want to talk about. Instead, I want to try to give an intuition for what a functor is in category theory, and how it relates to the way we use them in our programming languages. Is this knowledge going to be useful? Probably not. Most likely not. But I like talking about things that are cool.

First of all, I'll say that, at the time of writing, I am still a mathematics student. My knowledge of category theory is not very deep, I may make mistakes, but I have at least encountered functors before. Also, I try to assume very little about the reader, in order to be as accessible as possible, but this will probably make more sense if you've used Haskell before, specifically the Functor typeclass. This post is mostly aimed at people who know how to use functors but wonder what the relation to mathematics actually is. I'll give a brief introduction, though, and I hope that is enough for anyone to follow along. You can skip this section if you already know what I'm talking about.

Functors in Haskell

In Haskell, Functor is a typeclass. Object oriented programmers may know this as an "interface", a "trait" or a "protocol", but the idea is similar. This means that any type can be a Functor if we define the correct functions (methods) for it. Well, not any type. We're only concerned with type constructors, which you may know as "generics" in other languages. Let's take the simplest possible example from Haskell: the Maybe type.

data Maybe a = Nothing | Just a

The a in this code can be substituted for any type: we can have Maybe Int, or Maybe String, or whatever else we like. It has two constructor functions: Nothing, and Just. This data type represents a value that may or may not be there. It's sort of like the null value in some languages, but Haskell doesn't have an implicit null for all types, so we need to do it explicitly. For example, we could ask our user to optionally input their age, and store this with type Maybe Int: this value could be Just 20, Just 100, Just -1, or Nothing if they didn't input it.

Maybe is an instance of Functor. This means that it implements the fmap function. The way this is usually taught is by saying that fmap lets functions act on values that are wrapped. Say we have a function between integers, called addOne. Then we can't apply it to the value Just 20: it's a Maybe Int, not an Int! That's what fmap is for:

addOne 20 -- 21
addOne (Just 20) -- Error!
fmap addOne (Just 20) -- Just 21
fmap addOne Nothing -- Nothing

If our value is a Just, then we apply the function to whatever is inside. If it's a Nothing, then we do nothing. This is a simple example of a Functor.

Categories in mathematics

Functors belong in a branch of mathematics called "category theory", so if we want to understand what a functor is, we first need to understand what a category is.

Put simply, a category is a bunch of objects with a bunch of morphisms between them.

That's a very general phrase, and also, what the heck is a "morphism"? Well, if you imagine drawing all your objects as points on a sheet of paper, then morphisms would be drawn as arrows between them. If you're familiar with abstract math, there are tons of examples of categories: for example, the category where objects are vector spaces, and morphisms are linear maps between them. Or maybe your objects could be topological spaces, and then morphisms are continuous maps. If this doesn't make sense to you, don't worry, we'll give an example:

A -> B -> C

This is a category. I mean, it's a tiny one, but a category nonetheless. Our objects are called A, B and C, and we have a morphism connecting A and B (we say that there is a morphism from A to B), and another one from B to C. A key property of a category is that we can compose morphisms: by composing A -> B and B -> C, we get A -> C. So in this category, we also have a morphism from A to C.

Do we know of any examples of morphisms in programming? Well, yeah. Functions! Ordinary, everyday functions are an example of morphisms. For example, take the function we defined before: addOne. Looking at the function signature we see:

addOne :: Int -> Int

So, if Int is an object in our category, the set of all integers (well, not all of them, because our memory is finite, but close enough!), then addOne is a morphism from Int to Int. Since A and B are the same here, we call this an endomorphism, but that's not important.

Why are these morphisms? Because we can compose them! If we had a function from Int to Bool, and another from Bool to String, we would certainly have a function from Int to String just by composing them:

isEven :: Int -> Bool
show :: Bool -> String
show . isEven :: Int -> String
-- Alternatively:
show <<< isEven :: Int -> String

The . symbol is function composition. On the other hand, <<< is morphism composition, and it works for any category. It just so happens that functions are usually the only morphisms we care about.

Functors in mathematics

Now that we understand categories, we can ask ourselves: What about functors? Well, this is where it gets a little tricky: in the same way that functions are maps between objects, functors are maps between categories. A regular function would take you from object A to object B, for example, isEven. But a functor will take you from category X to category Y.

I know, I know, this is weird. Don't worry, I have more examples.

  A ->   B ->   C

F A -> F B -> F C

Here, F is a functor between the top row and the bottom row, which are both categories. It seems similar to a map, but there is one crucial difference: we can't just map objects, we have to map morphisms too. That's actually the key property of a functor. It takes objects to objects, and morphisms to morphisms. In the diagram above, we have taken the object A to F A, and B to F B, but not only that: we have also taken the top arrow between them to the bottom arrow, which is now a different arrow. This is important! If you know some math, a very famous example of a functor is the fundamental group: it takes topological spaces to groups, and continuous maps to group homomorphisms.

And how does all this complicated stuff relate to Haskell? It's actually very simple. Let's return to our example, Maybe. Remember that a functor has two parts: taking objects to objects, and morphisms to morphisms. Let's start with the first: if our object is Int, which is the space of (pretty much) all integers, our functor Maybe will take that to the object Maybe Int, which is the space of all integers plus the special value Nothing. Simple enough, right?

Now, when we get to morphisms is when this gets interesting. Let's think about a mini category:

    addOne
Int -----> Int

Here, the arrow represents the morphism addOne. What happens when we apply the functor Maybe to this category? We get a new diagram:

Maybe Int -> Maybe Int

This is a different arrow now! We know how addOne works on integers: it just adds one. How should this new morphism work on Maybe Ints? Well, there's a reasonable way:

  • If the argument is an integer, just apply addOne to it normally. What you get out the other side is an integer, so in particular it's also a Maybe Int.
  • If the argument is the special value Nothing, we can take that to the special value Nothing on the right as well.

Seems reasonable, right? Well, surprise surprise, this is exactly the way Maybe implements fmap in Haskell. When specialized to our function, it would look something like this, although of course it would work just the same for any other function:

fmap addOne (Just x) = Just (addOne x)
fmap addOne Nothing = Nothing

Here's the key point: taking objects to objects is easy, but what makes something a functor is whether it can also take morphisms to morphisms, which is harder. In order to tell Haskell how our functor does this, we simply implement fmap. So, as it turns out, the role of fmap is to give us the bottom arrow in the diagram when we know the top one, that's all there is to it.

So yeah. I don't really have any conclusion here. I just think math is neat.


Posted by Emi Socks | Permanent link

2020-10-12 12:27:24

OpenGL and imperative Haskell

Hi all! At the end of my last post I mentioned the fact that you can, in fact, use a programming language in a style different from the most common one, and gave the example of using monads to program imperatively in Haskell. Today I wanted to share a complete example of how one would do this.

About a year ago, while I was starting out with Haskell, I decided to follow a couple of tutorials about graphics rendering with OpenGL (an extremely common standard for computer graphics, which your computer probably has an implementation of) using Haskell. This seems like a particularly interesting challenge, because graphics programming is famously all about state, and Haskell is a language that is usually best suited for stateless programs. But it can be done, of course! And here I plan to show you how.

The two resources that I followed (only up to a certain point) are Panitz's tutorial and the tutorial on the Haskell wiki. I'm not going to be giving a step-by-step guide to get you up and running with the library, so if you want to try this out yourself, feel free to check them out. My objective is only to show how Haskell can be used for imperative programming, so I will be skipping a lot of steps along the way.

Imperative programming with monads

I've already talked about why I think monads are great, so I'll try to keep it short. The summary is that, while Haskell is usually a stateless, purely functional language, we can use an abstraction called Monad to work within a context that acts like a temporary state. You can mix this with purely functional code no problem, so you can use a state only where you need to, and enjoy the benefits of stateless programming elsewhere. In many other languages, an implicit shared state is inevitable.

In Haskell, a monadic value represents an action that possibly (but not always) returns a value. The way to combine these actions in Haskell is by using do notation, like so:

echoLine :: IO ()
echoLine = do
  line <- getLine
  putStrLn line

This is a program in the IO monad that reads input from the user and echoes it back to them. The type is IO (), and the parentheses mean that this doesn't return any values. An action that returns a value could be an IO Int or an IO String, for example. We use a do block to create a new monadic value (an action) that runs a bunch of other actions. In this case, what it does is run the action getLine which reads a line of input, store its contents in the line variable, and then run the action putStrLn line which prints a line. This is the structure of every do block. The actions get executed sequentially, one after the other, and if an action returns a value, you can store it in a variable to use it for future actions. This is exactly what an imperative program is.

Okay, but how does it work?

If you don't care about what is going on under the hood, you can skip this part. Basically, do notation isn't magic, it's syntactic sugar. In order for a type to be a Monad it has to implement a function called (>>=) (read as "bind"). What this function does is take a monadic value with return type a, a function that takes a pure a and returns a monad with type b, and combines those two into a monad that returns a b. That's a bit of a mouthful, so it might be clearer with the type signature:

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

Conceptually, we're taking an action and a function, and saying "okay, run the action, and then pass its return value through the function and run the result of that too". You may already see how you can implement do blocks with this. The example from before could be rewritten as:

echoLine = getLine >>= (\line -> putStrLn line)
-- Or, more concisely:
echoLine = getLine >>= putStrLn

You might need to stare at it for a bit to convince yourself that the type signatures match. So, under the hood, what every do block is doing is: take the first line as an action, and bind it to a lambda function that takes one argument, with whatever name we gave the bind variable, and returns the rest of the block. Do this recursively for as many lines as your do block has. You can probably see that, while we could do this without do, it would get unwieldy very very fast, so it's a very convenient tool.

OpenGL in practice (code)

The objective that I had with this little exercise was to be able to render a simple 3D shape made of polygons (someone kindly provided me with a .obj model of a donut for testing). There was some setup involved, like parsing the .obj file. I won't get much into that code here, you can read the code yourself if you want to. What's interesting is to look at the structure of the program. Though keep in mind that this is not ideal Haskell code, and if I repeated this project today there are a lot of things that I would do differently.

main :: IO ()
main = do
  (_progName, args) <- getArgsAndInitialize
  let filename = args !! 0
  objModel <- readObjFile filename
  initialDisplayMode $= [DoubleBuffered, WithDepthBuffer]
  _window <- createWindow "OpenGL tutorial"
  depthFunc $= Just Less
  pos <- newIORef (0.0, 0.0)
  rot <- newIORef 0.0
  sc <- newIORef 1.0
  rotDelta <- newIORef 0.1
  reshapeCallback $= Just reshape
  keyboardMouseCallback $= Just (keyboardMouse pos sc rotDelta)
  displayCallback $= display pos rot sc objModel
  idleCallback $= Just (idle rot rotDelta)
  mainLoop

A lot of these lines are just initialization. The first interesting line is this:

initialDisplayMode $= [DoubleBuffered, WithDepthBuffer]

The ($=) operator is a bit of a strange one. It looks like regular assignment, but remember, there are no assignments in Haskell because there is no state. Instead, this operation, which comes from the StateVar package, simulates an assignment. It takes two arguments, a StateVar and a value, and returns an action that assigns that value to that StateVar. A lot of state manipulation in the OpenGL package is done with StateVars, so we can use the get and ($=) operators to work with them.

StateVar is actually a typeclass, not a type, so many types can implement it (think "interface" if you're used to OOP programming). The two that we care about are OpenGL values and IORefs. An IORef is a mutable variable that we can create and access in the IO monad. It's normally not very useful for purely functional programs, since most of our code will not be happening in IO, but they work for a case like this. IORefs will be our way of passing and storing values of our own for the program. This is what this line does:

pos <- newIORef (0.0, 0.0)

This is an IO action that returns an IORef. We will use this reference in the future to read and modify the x and y position of our model. We do similar things for rotation and scale.

Callbacks

In order to run code that is not just the initialization code, we declare "callbacks", which are monadic actions that will get run by the engine at certain points. For example:

keyboardMouseCallback $= Just (keyboardMouse pos sc rotDelta)

This action goes to the StateVar that holds the keyboard and mouse callback and sets a callback of our own. (The Just part is because it's not a KeyboardMouseCallback, but a Maybe KeyboardMouseCallback. This simply means that it could theoretically be Nothing. Remember that Haskell doesn't have null values, so if we want an optional value we have to use Maybe!) So, let's take a look at the callback we have defined:

keyboardMouse :: IORef (GLfloat, GLfloat) -> IORef GLfloat -> IORef GLfloat -> KeyboardMouseCallback
keyboardMouse pos sc rotDelta key Down _ _ = case key of
  Char ' ' -> rotDelta $~! negate
  Char '+' -> rotDelta $~! (*2)
  Char '-' -> rotDelta $~! (/2)
  Char 'p' -> sc $~! (*2)
  Char 'o' -> sc $~! (/2)
  SpecialKey KeyLeft  -> pos $~! first (subtract 0.1)
  SpecialKey KeyRight -> pos $~! first (+ 0.1)
  SpecialKey KeyUp    -> pos $~! second (+ 0.1)
  SpecialKey KeyDown  -> pos $~! second (subtract 0.1)
  _ -> return ()
keyboardMouse _ _ _ _ _ _ _ = return ()

There's not actually all that much going on here, though it may look complicated. This is a function that takes three IORefs and returns a KeyboardMouseCallback, that is, an action to be run every time the user presses a key or does something with their mouse. In this case we're only contemplating key events. For example:

Char ' ' -> rotDelta $~! negate

The operator ($~!) is just for modifying a StateVar with a function. (The exclamation just means it's strict. I may write a post about strictness and laziness in the future. For now, don't worry about it.) So, rotDelta $~! negate takes whatever is inside the rotDelta StateVar, applies the function negate to it, and stores it. In this context, rotDelta is how much the model will spin, so it means reversing the direction of spinning. The rest of the cases are similar, they just do other actions for other keys.

That was the keyboard and mouse callback. The display callback is where the interesting stuff happens:

display :: IORef (GLfloat, GLfloat) -> IORef GLfloat -> IORef GLfloat -> ObjModel -> DisplayCallback
display pos rot sc objModel = do
  clear [ColorBuffer, DepthBuffer]
  loadIdentity
  -- Global translation
  (xp, yp) <- get pos
  translate $ Vector3 xp yp 0
  -- Global rotation
  a <- get rot
  rotate a $ Vector3 0 1 1
  -- Global scale
  s <- get sc
  scale s s s
  -- Drawing
  color $ Color3 1 1 (1 :: GLfloat)
  object objModel
  color $ Color3 0 0 (1 :: GLfloat)
  objectBorder objModel
  swapBuffers

This is the core of the program, in a way. The display callback is the action that gets run whenever the graphics on screen have to be redrawn. Let's analyze it:

clear [ColorBuffer, DepthBuffer]

This line clears whatever was on screen before so we can start drawing on a blank canvas. If we didn't do this, all of the stuff we draw would get superimposed on itself, and it would look weird. Not much more to it.

loadIdentity

Here's where the real stateful nature of OpenGL comes into play. When we draw something in OpenGL, we don't say "draw this with this color at this position and with this rotation". Instead, we simply say "draw this", and OpenGL looks at a bunch of implicit variables (that we have control over) to figure out where and how it should be drawn. The values for position, rotation and scale in particular are called the transformation matrix. What loadIdentity does is essentially reset it. The matrix stored in the state was probably altered during the previous cycle, so we want to start clean.

(xp, yp) <- get pos

This is just getting the value that we stored in the IORef earlier. It's useful to be able to store our own values for stuff like this on top of all the state that OpenGL has.

translate $ Vector3 xp yp 0

Here, we move the transformation matrix using the xp and yp values that we got from the IORef. The result of this is that, whenever we want to change the position of our model, we simply have to write the value we want into the pos IORef, and the display callback will pick that up and use it. A similar thing happens for rotation and scale, so I won't go into it.

color $ Color3 1 1 (1 :: GLfloat)

Similarly to the translate, this writes to the OpenGL state, essentially saying "hey, when you draw something from now on, do it with this color". In this case, we give it three values that represent the R, G and B components of a color in the range from 0 to 1, so (1, 1, 1) represents full red, full green and full blue; that is, white.

object objModel

The object function is an action that I defined myself in order to draw an object from a list of triangles that I parsed beforehand. This post is long enough as it is, so I won't go into it, but the code is actually very simple:

object :: ObjModel -> IO ()
object model = renderPrimitive Triangles $ traverse_ vertex4f $ objPoints model

You can take a look at the code yourself if you want to.

color $ Color3 0 0 (1 :: GLfloat)
objectBorder objModel

This is similar to the previous two lines, only the color this time is blue, and objectBorder draws only the borders of the triangles instead of the whole thing.

swapBuffers

I'll be honest, I have no idea what this does. I think double buffering is a technique used to not modify stuff while you're writing to it at the same time? Don't quote me on this. All I know is that it makes the program look good without flickering.

The result

That should be all we need! After all that drawing, this is what the result looks like:

A donut! Made up of white triangles and a blue mesh. And it rotates, but that's not visible.

Hopefully you can see why being able to program imperatively in Haskell is so important for some tasks! There's simply no way to work with OpenGL without a state. It sounds antithetical to what Haskell is all about, but that's the beauty of monads. We can make even the most stateful programs work in a purely functional language.

I know there's a lot of stuff I didn't cover. If you're curious about this, I highly encourage you to go read these tutorials and write some graphics code yourself!


Posted by Emi Socks | Permanent link

2020-09-30 22:31:51

The functional programming mindset

Hey all! Today I wanted to talk about my experience with functional programming, and in particular about the different ways in which I approach a problem depending on the language that I'm using. This post is mostly for people with little to no experience using a functional language, so if you've ever thought about giving the functional style a go, I hope this will be useful to you.

We're going to be solving a completely made-up and extremely simplified problem with two different languages: JavaScript, in an imperative way, and Haskell, in a functional way. As for the problem, this is the idea: We're given a list of names, and we want to remove all instances of the letter "o" (lowercase) in them. I mean, words without an "o" are just more trustworthy, don't you think? What is with that letter, anyways? Is it a zero with aspirations? Clearly we need to get rid of it right away.

(In all seriousness, I usually prefer to use examples of problems one could encounter in the wild, but I wanted a simple problem that I could break down as much as possible. Plus, you could use a similar method for, say, stripping invalid characters from a list of usernames, so it's not completely useless.)

We won't be concerned with how we get our input, since that's not important right now. We don't care about what to do with the output when we're done, either. We only need to write a function in both cases that takes the input and returns the output.

JavaScript side

Here in JS Land we do things imperatively. That means we think of our programs as a list of instructions to be followed. This is, if nothing else, the most popular way of thinking about a computer program, and for good reasons. It's somewhat closer to what the computer is doing "down at the metal", and arguably the whole point of computers is to automate repetitive tasks and do them much quicker than a human ever could, so it makes sense to think of a task given to a computer as a list of instructions that it must follow one by one. Anything you do on top of it is an abstraction, and those weird, nonsensical, head-in-the-clouds people over at Haskell Land love their abstractions, even if they hide what's really going on.

If you've ever programmed anything before, this should be pretty familiar to you, but I will explain it anyways. In JavaScript, lists are represented by arrays, so our input will be an array containing the data that we want to change:

["Sarah McMonad", "Ethan Semigroupsen", "Lorraine Applicatif"]

Let's start thinking about what we need to do in order to solve our little problem. Well, first of all, we have an array, but working with it directly is going to be difficult. What we actually want is to do something to each of its elements, not to the array itself. This is what for loops were made for:

for (int i = 0; i < names.length; i++) {
  var name = names[i];
  // Whatever we want to do goes here
}

Now, what do we want to do to each element? We want to remove all of its "o"s. Of course, JavaScript has a method that does exactly this, if used with a regular expression:

names[i] = name.replace(/o/g, "");

(If you don't understand regular expressions, don't worry about it, it's not important for this; the g here just means "replace all of the instances, not just the first one you find").

And we're done, right? Putting it all together, our code will run the replace operation on each of the names:

function removeBadLetters(names) {
  for (var i = 0; i < names.length; i++) {
    var name = names[i];
    names[i] = name.replace(/o/g, "");
  }
}

We run this, and voilĂ , we get a set of names that is objectively better than before:

["Sarah McMnad", "Ethan Semigrupsen", "Lrraine Applicatif"]

If you've encountered this a million times before, you may be wondering what the big deal is. Let's hop over to the other side to answer that.

Haskell side

Here in Haskell Land (Glasgow, probably?) we do things functionally. That means we think of our programs as a series of transformations chained together in various ways. For a functional programmer, every program is basically just a function: it takes inputs, it gives outputs, and that's all it does. Similarly, every function is just its own miniature program that takes inputs and gives outputs.

It's important to know that when we talk about functions here in Haskell Land, we're talking about pure functions. This means a function only communicates with the rest of the program it's a part of by means of its inputs and outputs. This is not true over in that dirty, rotten JavaScript Land, where you can declare a variable that is shared by the entire program, and then any function can change it. We don't do things like that here. Remember, each function is a mini-program of its own, and will always give the same output given the same input.

So, let's start thinking about how to accomplish our goal, but this time, functionally. Again, we have a list as an input, and lists in Haskell are represented by... well, a list:

["Sarah McMonad", "Ethan Semigroupsen", "Lorraine Applicatif"]

Again, we have many values that we want to work with. No problem. We simply write a function that passes all the elements of the list through a smaller function. We can do that with map:

removeBadLetters names = map removeOs names

But the function removeOs doesn't exist yet, does it? No worries, we'll declare it now. This time, we'll look at our name as a list of characters (since that's what strings are in Haskell), and our program can simply take only those elements from the list that we want, with filter:

removeOs name = filter (not . isO) name

Again, we don't have a function isO yet, but suppose we did. It would be a mini-program that takes a letter and returns a boolean (True or False). Conveniently enough, we have another mini-program called not that takes a boolean and returns another boolean, its opposite. So we can chain those two together (that's what the . operator does) to form a bigger function, that takes a letter and returns the opposite of whether or not it is an "o". We now pass that to filter, which takes a mini-program as its input, runs it on each of the elements, and keeps only those that return True.

Anyways, the last step is saying what isO actually is:

isO letter = letter == 'o'

Putting it all together:

removeBadLetters names = map removeOs names
removeOs name = filter (not . isO) name
isO letter = letter == 'o'

(Yes, normally we would write type signatures, but I'm skipping that step).

Run our program on the input, and presto, we have a new list of beautiful, "o"-less names:

["Sarah McMnad", "Ethan Semigrupsen", "Lrraine Applicatif"]

Comparing the two

The process of solving the problem was very similar in both cases, but there are a few key differences. When writing functional code, we focus on what our program will do, not how it will do that. We knew that we wanted our program to run the smaller removeOs program on each element, and left the details of that for later. In an imperative language, we have no choice but to explicitly say what instructions the computer will follow, because that's what being imperative means.

Another easy to miss but very important difference is that in JavaScript we were actively altering the state of our program. We have an array loaded into memory, we're running through a series of steps, and at each step we have a very clear concept of what the "current" element is. This is not the case in Haskell. Our function map removeOs names is not an instruction, it's a declaration. Is it stepping through the elements of the list one by one under the hood? Probably, but we don't care. We're telling our program what to do, not how to do it.

Here are some of the reasons why I usually prefer a functional style of programming:

  • Functional programming tends to result in clearer, more readable code. Looking back at both examples above, I think the second one has much less clutter and cuts right to what I want to do with this program, even though we didn't have the built-in replace function.
  • The lack of a shared state means reasoning about the program is much easier. We don't need to mentally keep track of what the list looks like at any step, since there are no steps.
  • Without even needing to, we came up with two extra functions, isO and removeOs that we could use for another purpose in the future. Composability means reusability.

Still, there are just as many reasons why one would choose to program in an imperative style instead:

  • The lack of a shared state means that you have to pass to a function all the information that it needs with its inputs. In more complex programs, this can get very long very quickly. Sometimes we want a shared state.
  • Less abstraction usually means the code is closer to the metal, and thus more efficient. I realize JavaScript wasn't the best example for this, but it is something to consider. Even though Haskell can be a very performant language, all abstractions come at a cost.
  • Some people just find it simpler to think of a list of instructions. I don't, but different mentalities are more natural to different people.

One important note is that you're not locked into a style because of your language of choice. JavaScript is very dynamic and adaptable, and you can also use tools such as map and filter with JavaScript arrays. Plus, there are libraries that let you treat your data as if you couldn't change it, which works really well with a functional style (look up "immutability" if you're interested). On the other hand, in Haskell, there is an entire mechanism built to let you program in an imperative style whenever that's more convenient: monads.

In the end, functional programming is a tool, suited to some tasks better than others. Still, to me it has become an unexpectedly useful tool, and I think it can be worth it for many programmers out there to at least give it a shot.


Posted by Emi Socks | Permanent link

2020-02-22 17:47:52

A use for monads

In my last post, I commented on the fact that the random package, for some reason, does not use a monadic interface. I thought this would be an excellent opportunity to discuss monads, since it seems to be one of the concepts that scare new Haskell programmers the most. Specifically, I'm going to try to give you an intuition for why monads are useful in the first place and why you'd want to use them in your real-life programs, and we're going to build a (very simple) monadic interface for random.

Our motivation

To start with, here's the code for generating a random colour in our program:

randomColour :: (RandomGen g) => g -> (RGB Word8, g)
randomColour gen =
  let (w1, gen1) = random gen
      (w2, gen2) = random gen1
      (w3, gen3) = random gen2
   in (RGB w1 w2 w3, gen3)

It's understandable enough, but the main problem with it is that we have to be constantly juggling the generators. We need to come up with a unique name for each of them, use them in the correct line, and if we wanted to add another value in the future (say, an alpha channel) then we'd need to be careful to return the value of that last generator in our function. It's doable, but clunky. What's the solution to this? Well, ideally, what we would want to do is something like this:

randomColour gen =
  let w1 = random
      w2 = random
      w3 = random
   in RGB w1 w2 w3

Or, if we're feeling ambitious, even something like:

randomColour gen = RGB random random random

That would be so much clearer, wouldn't it? Unfortunately, that's not possible in Haskell. Functions (in particular, the random function) are pure: if we run them with the same arguments, we'll get the same result. That's one of Haskell's biggest strengths, and what makes it so easy to test and debug, but it's holding us back here. But what if there was a way around this? What we need... is a context.

The core idea is this: we could have a datatype, let's call it M, that represents an action in a context. This would be a type constructor; that means we can't just have an M, but we could have an M Int, M String, M Word8, or whatever else. (For OOP programmers, this is similar to generic types.) So, instead of having our function return a Word8, we have it return an M Word8, which represents an action that, when run in a context, will give us a Word8. Sort of like... a promise that "I will give you a Word8 in the future when you provide me with a context". In our case, the context is the RandomGen generator.

Well, it might seem like we haven't gained much, right? We've just pushed the problem back one step. Now we can delay actually getting the Word8, big deal. Well, here's the trick: we can work within that context. So, say I have three M Word8s. What if I could combine all of them into an M RGB Word8? Remember, we're still dealing with actions that will be run in the future here, so we don't need any context so far. And now, what if we want our action to return a list of colours instead of just one? We could turn our M RGB Word8 into an M [RGB Word8]. Then, once we have what we need, we finally give it a context (a RandomGen) and claim back our [RGB Word8]. We only need to provide one generator, and the program takes care of the rest. Wouldn't that be much more comfortable to work with?

Well, that's exactly what a monad is. Monad is not a single type, but a whole class of types (a typeclass, similar to interfaces in OOP); any type that implements Monad represents a context, and values of that type represent actions in that context.

Writing a monad (code)

In order to show how to work with monads, we're going to write our own, based on an already existing monad from the mtl package: the State monad. State is a very useful monad that allows us to keep a read/write state in our context, and its actions are read/write operations. In our case, we'll want our state to be a RandomGen (we pick the particular type StdGen, since RandomGen is also a typeclass):

type RandomM = State StdGen

(Note: type is an awful choice of keyword. It should be either synonym or alias.)

Now, we need an action that returns any value that can be generated randomly. We'll call it random, which normally would overlap with the function of the same name in the random package, but we're assuming that won't be a problem, since we won't be using both our module and the one in random at the same time:

random :: (Random a) => RandomM a
random = state R.random

The implementation of this function might seem a bit confusing, but take a look at the documentations for the state and random functions and you'll see that they do exactly what we want: use the generator to create a value and a new generator, and store that new one in our state. We also wrote an equivalent randomR function, to get a random value in a range.

And finally, we need a way to run our actions given a context:

runRandom :: RandomM a -> StdGen -> a
runRandom = evalState

Once again, evalState does exactly what we want. (Yeah, the tools in these libraries are incredibly complete.) We could have made our functions more generic, to accept any RandomGen and not just an StdGen, but I didn't bother since we won't need it in this case.

Using our monad in practice (code)

Well, that's all the tools we need. As a refresher, here's what we (ideally) wanted our function to look like:

randomColour gen =
  let w1 = random
      w2 = random
      w3 = random
   in RGB w1 w2 w3

And here's what it ends up looking like:

randomColour = do
  w1 <- random
  w2 <- random
  w3 <- random
  return $ RGB w1 w2 w3

That's not half bad, is it? Like I said, the first form is impossible because of Haskell's purity, but the second form uses what we call a do block, which is a way of generating new monadic values (actions, remember?) that do what we want. Here's how to read that block:

randomColour = do         -- randomColour is an action that DOES this:
  w1 <- random            -- Run the "random" action and call its result "w1"
  w2 <- random            -- Run the "random" action and call its result "w2"
  w3 <- random            -- Run the "random" action and call its result "w3"
  return $ RGB w1 w2 w3   -- Run an action that does nothing and returns as its result "RGB w1 w2 w3"

And the action resulting from that block does all of these actions sequentially and returns whatever is returned by its last line, which in this case is the RGB that we want. No generator juggling required, since it all happens in the context. We'll give the randomFlag function a similar treatment:

randomFlag :: RandomM Flag
randomFlag = do
  n <- randomR (3, 7)
  cs <- randomColours n
  return $ Flag {stripes = n, colours = cs}

If you want an exercise, try to read this do block and figure out what it does. It's very similar to the last one.

Remember: up until this point, we don't actually have any flags, just actions that will return a flag when given the right context! So, the last step is to swap out this line in our previous code:

let image = rawFlag $ randomFlag (generator name)

For this one:

let image = rawFlag $ runRandom randomFlag (generator name)

And that's all! Our flag generator works just like before, but now, with more monads:

The monads pride flag!

Addendum: The monad laws and IO

I've mostly glossed over what monads actually are in favour of how they work, because I think the latter is more important for people trying to get a feel for functional programming. If you're left wondering about how this idea is formalized, let me explain that quickly. If not, you can skip this part; you won't miss much.

One big reason why monads are so powerful is that they are everywhere. Any type can be a monad as long as it follows a set of laws, which basically describe the utility that we need to be able to write do blocks. The laws define a few operations that we should be able to do on a monad, and a few rules stating that these operations do sensible things with each other. I won't go into details with the rules, but here's the operations. If a type is a (useful) monad, then we should be able to:

  1. Transform an action's return value with a unary function
  2. Create an action that does nothing but return a pure value
  3. Combine two (or n) actions into one with a binary (or n-ary) function
  4. Chain the result of one action into the next one

Remember, typeclasses are like interfaces: they define a few operations that we should be able to perform for a type, and, when we implement the typeclass for a type, we just say how each of these operations is actually done. So, in order to make a type into a monad, all we have to do is implement these four operators that correspond to the four points above, respectively: fmap, pure (also called return), <*> and >>=. Under the hood, do blocks aren't magic, they just use these definitions (particularly >>=). Feel free to check out the Monad documentation (or search them up on Hoogle) if you want to learn what each of them does!

As a last note, you might notice that one very useful operation is missing from that list: we can't extract an action's value. This tends to trip people up ("I have an M Int, how do I get an Int from it?"), but it makes perfect sense if you stop and think about it for a moment. A monad represents an action in a context. If all we have is an action and no context, then we can't actually get any value. It's up to each monad to define how we run it, and that operation is not part of the Monad typeclass. Why? Well, because of (arguably) the reason monads exist in the first place: the IO monad, which represents input and output with the world outside of our program.

An IO String represents an action that gathers a String from somewhere as input. It might be as simple as a user typing on a console, or as complex as a series of HTTP requests to a web service. The key part is, we can't know what that value will be ahead of time, because it will change every time. You can't rely on a user to always type the same things on a keyboard. This breaks Haskell's promise of purity... and it sounds awfully similar to what we went through with our RandomM monad, doesn't it? The difference is that, with RandomM, we can give it a generator to actually run the action; with IO, we can't give it a context. In a way, the outside world is the context. So how do we actually run an IO action? Simple: we call it main and make it the entry point of our program. That's it. There is no other way to run an IO action in Haskell. Hopefully you can see why all this heavy machinery that we have for monads, like do blocks, isn't just convenient, but crucial in the case of IO.

Well, that's all I have to say about monads for now. As always, feedback via email or fediverse is greatly appreciated, and I really hope this post helped a few ideas click in your head!


Posted by Emi Socks | Permanent link

2020-02-15 20:06:52

Pride flag generator

Hi all! I made a pride flag generator! It's dead simple. But I wanna talk about it anyways.

So I was thinking about small project ideas that I could realistically implement in a short time. All my ideas tend to get out of scope really fast, so I'm trying to limit myself to things I can actually finish and gain experience while I'm at it. I thought something like Identicon could be fun to make. The idea is that you create a unique image for every string (normally a username or email), so it can be used to identify a user who hasn't uploaded an avatar yet. We want a service like this to be deterministic: it should be fairly random, but the same string should always generate the same icon. That way, if some platform wants to use our service, they don't have to store the user's avatar; they can point to our service, and the same avatar will be generated every time. Plus, if the user ever makes an account on a different platform that uses the same service, they get to keep their visual identity. Pretty cool.

This seemed like a good opportunity to make something with Haskell, and randomly generated pride flags sounded like a fun idea, so I got to work. The repository is here, so you can follow along, but I'll be posting code snippets anyways. I created the project skeleton with Stack, which makes setting up Haskell projects super easy, and started thinking about the dependencies I'd need.

  • Haskell is purely functional, so it treats randomness a bit differently from other languages (functions with random inputs aren't pure!). We can use the random package for that.
  • Obviously we need some way to create images. I went with Rasterific.
  • I thought colour manipulation might come in handy, so colour could take care of that.
  • If we want to turn this into an actual service, a web server would be good. scotty is one I've used a couple times before, so I know it's really simple to use.

With all of that in place, we can get coding. I decided I'd split the logic into three parts: flag generation, converting flags into images and the server. Let's start with the first one:

Flag generation (code)

First of all, we need a data type to represent "a flag", whatever that means. My initial idea of a randomly generated flag is just a few evenly spaced horizontal stripes of random colours. Alright, that sounds simple enough. A flag, then, is just a number of stripes and a list of colours:

data Flag = Flag
  { stripes :: Int
  , colours :: [RGB Word8]
  }

Colours (in this case from the colours library) are represented as a triple of Word8s, that is, 8 bits. So we need a way to generate three random Word8s whenever we want. This is where the random library comes in, and I struggled with it a bit.

The way random works is: you have a RandomGen, which is a generator, and you use that to generate a value and another generator. (Side note, but it's absolutely wild to me that there is no monadic interface in random. It seems like such a perfect use case, and doing it manually like this gets kind of uncomfortable, but oh well...) We can use a RandomGen provided to us, or we can generate our own from a String, which is guaranteed to always be the same for the same input. Perfect, right? That's exactly what we want. So let's generate a colour:

randomColour :: (RandomGen g) => g -> (RGB Word8, g)
randomColour gen =
  let (w1, gen1) = random gen
      (w2, gen2) = random gen1
      (w3, gen3) = random gen2
   in (RGB w1 w2 w3, gen3)

We're basically copying the way in which the functions in random work: we take a generator, we spit out a value and another generator (it's exactly what the random function does). We get three Word8s like that, and make an RGB with them. Wonderful. There's another function called randomColours which does the same, but giving us n colours instead of one. So far so good.

With that, we can do something similar to generate a flag:

randomFlag :: (RandomGen g) => g -> Flag
randomFlag gen =
  let (n, gen1) = randomR (3, 7) gen
      (cs, _) = randomColours gen1 n
   in Flag {stripes = n, colours = cs}

Similar idea. Normally we'd return the generator along with the flag, but I couldn't be bothered in this case, since I know the flag is what I ultimately want and I won't generate anything more after it. We generate a random integer n between 3 and 7, and then n colours. If I re-did this code, I'd probably give the flag a list of infinite colours. Haskell is cool like that: you can give it infinite stuff, and it will only take what it needs and run in finite time. For this case, though, this is good enough. Just gotta be careful to not break it in the future by giving it less than n colours.

Painting the flag (code)

Once we know the shape of our data (number of stripes and list of colours), painting it is pretty straightforward. Rasterific does mercifully come with a monadic interface, but it took me a bit to figure out that it's based on JuicyPixels, so we need to use the JuicyPixels data types to represent our pixel colours and textures. Just have to add the correct imports, and write a helper function to convert our colour from the colours data types to the JuicyPixels ones:

texture :: RGB Word8 -> Texture PixelRGBA8
texture (RGB r g b) = uniformTexture $ PixelRGBA8 r g b 0xFF

Apparently we need to use PixelRGBA8 instead of PixelRGB8, or the compiler will complain. Makes sense, we'll be generating PNG files so we need an alpha channel. Other than that, translation is straightforward.

Anyways, all the fun happens here:

renderFlag :: Flag -> Image PixelRGBA8
renderFlag Flag {stripes = s, colours = cs} =
  renderDrawing (floor width) (floor height) backgroundColor $ forM_ (zip [0..s] cs) $ \(n, c) -> do
    withTexture (texture c) $ do
      let h = height / fromIntegral s
          y = h * fromIntegral n
       in fill $ rectangle (V2 0 y) width h
  where
    backgroundColor = PixelRGBA8 0xFF 0xFF 0xFF 0xFF

If you don't care about monads, you can either skip the next two paragraphs, or stare at the code until you convince yourself that it does a reasonable thing. If you do: the Drawing type from Rasterific is what we call a monad. A monadic value represents an action, or computation. That action can "return" pure values; in this case, they don't. Monads are incredibly useful in Haskell, because they come with lots of mechanisms to, say, transform the result of an action, or chain actions together. The Drawing monad represents the action of "painting something onto a canvas". So if we have a Drawing that means "paint a rectangle here" and another Drawing that says "paint a circle there", we can combine (chain) them together to get a Drawing that means "paint a rectangle here, then a circle there". Sounds obvious, but it's incredibly powerful.

In our case, what we want is to take our colours and paint a rectangle for each of them; that is, take our list, turn each element into a Drawing, then combine the results. That's exactly what the forM_ function does, as seen in the documentation. (Yeah, I meant it when I said Haskell has a lot of tools for working with monads.) As for zip, it just turns our list of colours [a, b, ...] into a list of pairs [(0, a), (1, b), ...]. We need that index to know where to draw each colour. The rest is just simple geometry.

Finally, the function renderDrawing just takes our Drawing and turns it into an Image that we can then convert into a PNG file. The encodePng function does exactly that, so we can create a convenience function, rawFlag, that just puts the whole process together.

rawFlag :: Flag -> BS.ByteString
rawFlag = encodePng . renderFlag

In Haskell, the . operator is function composition: take the output of renderFlag and pass it into encodePng. That way we don't have to actually name the Flag-type argument; instead of saying "this function when applied to flag does this", we can just say "this function is the result of piping these two functions together". Same result, less variable names to think of. It's super convenient and clean.

The server (code)

As I said, we'll be using scotty for the web server. The ScottyM type is also a monad! In this case, each action (each value of type ScottyM) represents attaching a route to our server. Here, we have two routes (both will handle GET requests), and each route is given by a path and an ActionM value... which is, you guessed it, a monad. We're using do blocks all throughout to make our life easier:

scotty port $ do
  get "/flag/:name" $ do
    name <- param "name"
    let image = rawFlag $ randomFlag (generator name)
    setHeader "Content-Type" "image/png"
    raw $ image
  get "/" $ do
    html $ mainPage

For those curious: A do block just takes a bunch of monadic values and chains them together. Their usefulness comes from binding: here, I bind the name name to the result of the ActionM param "name". As you can probably guess, param is an action that returns the value of a parameter (taken from either the URL or form data if it's a POST). So, it doesn't give us a String, but an ActionM String (that is, an "action that returns a string"). Binding just lets us treat it as a String in another action, which I do in the very next line. That's how actions get chained together. Don't worry if you didn't catch that last part, this is what most people (as far as I know) struggle with when first working with monads.

In any case, we've now defined a route that takes the name from the URL, passes it to our rawFlag function, and returns the result as a PNG image. There's also another route for the web frontend, but I won't get into that here.

Now we just have to test it out, and... it works... sort of. It creates pride flags, for sure, but it seems to generate the same flag for, say, "dvorak" and "dvorak_keyboard". After some experimenting, it turns out it's only using the first six characters of our string. (Thanks Holly Lotor for finding this!)

It turns out, the cause is our random generator: the docs fail to mention that, when converting a string into a generator, it only uses the first six characters. That's not ideal. We don't want users "socks@onedomain.com" and "socks@anotherdomain.com" to get the same avatar. So we have to find a way around it. Luckily, the solution is simple: use a hash function, like md5. If you don't know what that is, think of it as a function that always gives the same result from the same input, but it gives wildly different results for even very similar inputs. That's exactly what we need. So, we define generator name as follows:

generator :: String -> StdGen
generator = fst . head . reads . md5String

md5String :: String -> String
md5String = show . hashWith MD5 . C.pack

Again, we're using function composition to make things much cleaner. The whole process ends up being: "take the input string, interpret it as bytes, pass it through md5, convert the result back to a string (hopefully that doesn't destroy any information...), then make that string into a random generator". This does mean that we only have six bytes of possibilities, or 2^(6*8), which is... about 200 trillion possible pride flags, I think. Yeah, I'm pretty sure we'll be fine.

The result

The Emi Socks pride flag!

It works! And it will always generate the same flag for the same string. Of course, it's extremely simplistic. There's much more we could do with it: we could add symbols, a triangle to the side, make vertical stripes, or even randomly generate only a colour hue and pick the saturation and lightness ourselves; that's one of the reasons I wanted to use the colours library in any case, but I haven't gotten to it yet. In any case, suggestions are very welcome, either by email or fediverse!


Posted by Emi Socks | Permanent link | File under: haskell