Detailed Walkthrough of a Haskell Program for Beginners

박준규 @curry@hackers.pub
This article is a translation of the original text using ChatGPT.
- Author: Gabriella Gonzalez
- Link: https://www.haskellforall.com/2018/10/detailed-walkthrough-for-beginner.html
- License: Creative Commons Attribution 4.0 International License
This article provides a step-by-step guide to developing a small Haskell program that aligns equals signs (=) across multiple lines of text. It's aimed at beginner programmers, so some steps and concepts are explained in more detail than usual.
The article explains how to write, compile, and run a Haskell program as a single file to make experimentation and learning easier. For larger Haskell projects, you would typically use cabal
or stack
to create, run, and share your project structure with others. This single-file approach is used because it's the simplest way to get started and try out the language immediately.
Background
I focus more on making code easy to read rather than easy to write, so I pay a lot of attention to visual tidiness. One way I do this is by aligning equals signs (=). For example, code might initially be written like this:
address = "192.168.0.44"
port = 22
hostname = "wind"
Later, I manually adjust the indentation to align all the equals signs at the same position, like this:
address = "192.168.0.44"
port = 22
hostname = "wind"
I use vim
as my editor, and I could install the Tabular
plugin for this purpose. However, I thought it would be educational to implement this from scratch as an example of how to write a program in a functional style.
One useful feature of vim
is that you can use any command-line program to transform text within the editor. For example, you can select text in visual mode and then type:
:!some-command
Then vim
will send the selected text as standard input to the command-line program some-command
, and replace the selected text with whatever the program outputs to standard output.
So all we need to do is write a program that takes text to be aligned via standard input and outputs the aligned text to standard output. We'll call this program align-equals
.
Development Environment
My "IDE" is the command line. I typically work with up to three terminal windows open:
- In one window, I edit text using
vim
. - In one window, I use
ghcid
to check for type errors in real-time. - In one window, I test my Haskell code in the REPL.
I also use Nix, particularly nix-shell
, to set up my development tools. I prefer Nix for setting up development tools because I want to avoid accumulating unnecessary programs system-wide. With nix
, I can temporarily configure the necessary development tools or libraries using nix-shell
.
All the examples that follow will be run inside a Nix shell like this (run this each time you open a new terminal):
$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: [ pkgs.text pkgs.safe ])' haskellPackages.ghcid
This creates a temporary shell with ghc
, ghci
, ghcid
, and hoogle
available, including the Haskell packages text
and safe
. If you want to change the list of available Haskell packages, you can modify the command line and create a new shell.
To enable live type checking, run:
$ ghcid --command='ghci align-equals.hs'
This command will automatically reload the align-equals.hs
file whenever it changes and display any errors or warnings found by the Haskell compiler.
In another terminal, I open the code I'm working on in the ghci
REPL to interactively test the functions I'm writing:
$ ghci align-equals.hs
GHCi, version 8.2.2: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( test.hs, interpreted )
Ok, one module loaded.
*Main>
And in a third terminal, I actually edit the file:
$ vi align-equals.hs
The Program
First, before writing any code, let's describe what we want to do in plain English[1].
We want to find the longest string before an equals sign (=), and then add spaces to the end of all other strings to match the length of the longest one.
Length of String Before Equals Sign
To implement this, we first need to define a function that calculates the number of characters before the equals sign (=) in a given line. This function should have the following type:
import Data.Text (Text)
prefixLength :: Text -> Int
This type declaration can be read as:
"prefixLength
is a function that takes a value of type Text
as input (i.e., a line of input) and returns an Int
as output. The return value represents the number of characters before the first =
symbol."
We could also add this explanation as a comment:
prefixLength
:: Text
-- ^ A line of input
-> Int
-- ^ Number of characters before the first @=@ symbol
I don't use the basic String
type provided by Haskell's Prelude[2] because it's inefficient. Instead, I use the high-performance Text
type and useful tools from the Data.Text
package.
The implementation of the prefixLength
function is almost identical to the description:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text)
import qualified Data.Text
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
where
(prefix, suffix) = Data.Text.breakOn "=" line
As the name suggests, prefixLength
returns the length of the string before the equals sign. The only thing to consider is how to find the Data.Text.breakOn
function that was created for this purpose.
When looking for Haskell package documentation, I typically search Google for hackage ${package name}
(e.g., hackage text
). This is how I found the breakOn
function.
Some people also use hoogle
to search by function name or type. For example, if you want to find a function that splits a Text
value into the part before the equals sign and the part after, you could run:
$ hoogle 'Text -> (Text, Text)'
Data.Text breakOn :: Text -> Text -> (Text, Text)
Data.Text breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOn :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text transpose :: [Text] -> [Text]
Data.Text.Lazy transpose :: [Text] -> [Text]
Data.Text intercalate :: Text -> [Text] -> Text
Data.Text.Lazy intercalate :: Text -> [Text] -> Text
Data.Text splitOn :: Text -> Text -> [Text]
Data.Text.Lazy splitOn :: Text -> Text -> [Text]
-- more results omitted, can be seen with --count=20 option
I can also test the function in my long-running REPL to verify it works as intended:
*Main> :reload
Ok, one module loaded.
*Main> :set -XOverloadedStrings
*Main> Data.Text.breakOn "=" "foo = 1"
("foo ","= 1")
*Main> prefixLength "foo = 1"
4
The reason we need to enable the OverloadedStrings
extension is because we're not using the default String
type from Prelude. This extension allows other packages to use string literals with alternative implementations (like Text
).
One nice thing about Haskell is that you're not very constrained by the order of code definitions. You can define things in any order, and the compiler won't complain. So you could write:
prefixLength line = Data.Text.length prefix
where
(prefix, suffix) = Data.Text.breakOn "=" line
This order-independent coding style also works well with lazy evaluation. Haskell is a "lazy" language, meaning that unused values aren't calculated, or the evaluation order might be reversed. For example, the prefixLength
function doesn't actually use suffix
, so that value isn't computed.
The more you program in Haskell, the more you start thinking of code not as a sequence of instructions, but as a graph of calculations that depend on each other.
Indenting a Single Line
Now we need to define a function that adds spaces to the end of the string before the equals sign to match a desired length.
adjustLine :: Int -> Text -> Text
With comments added, we could write:
adjustLine
:: Int
-- ^ Desired length of string before equals sign
-> Text
-- ^ A line to which spaces will be added
-> Text
-- ^ New line with spaces added
This function is a bit longer, but still intuitive:
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
where
(prefix, suffix) = Data.Text.breakOn "=" oldLine
actualPrefixLength = Data.Text.length prefix
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = Data.Text.replicate additionalSpaces " "
newLine = Data.Text.concat [ prefix, spaces, suffix ]
All the lines after where
could be rearranged without affecting how the program works. However, for readability, we list them in an order that can be understood from top to bottom:
- Split a line into the part before the equals sign and the part after.
- Calculate the actual length of the string before the equals sign.
- Calculate how many spaces to add by subtracting the actual length from the desired length.
- Create padding by repeating spaces the specified number of times.
- Create a new line by inserting the padding between the part before the equals sign and the part after.
This code structure reads like a function defined in an imperative language. For example, similar Python code would be:
def adjustLine(desiredPrefixLength, oldLine):
(prefix, suffix) = oldLine.split("=")
actualPrefixLength = len(prefix)
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = " " * additionalSpaces
# Python's split removes '=', so we need to add it back
newLine = "".join([ prefix, spaces, "=", suffix ])
return newLine
Generally, when a functional program uses simple types (strings, numbers, records, etc.), it can be translated to an imperative program like this. In such simple programs, functional code is essentially an imperative program restricted from reassigning values ("mutation"), which is a good practice for making programs easier to understand.
To verify that the function works correctly, we can save the entire program we've written so far and reload it in the REPL:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text)
import qualified Data.Text
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
where
(prefix, suffix) = Data.Text.breakOn "=" line
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
where
(prefix, suffix) = Data.Text.breakOn "=" oldLine
actualPrefixLength = Data.Text.length prefix
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = Data.Text.replicate additionalSpaces " "
newLine = Data.Text.concat [ prefix, spaces, suffix ]
Then, reload the program in the REPL:
*Main> :reload
Ok, one module loaded.
*Main> adjustLine 10 "foo = 1"
"foo = 1"
Indenting Multiple Lines
Now we can define a function to indent multiple lines.
import Safe
adjustText :: Text -> Text
adjustText oldText = newText
where
oldLines = Data.Text.lines oldText
prefixLengths = map prefixLength oldLines
newLines =
case Safe.maximumMay prefixLengths of
Nothing ->
[]
Just desiredPrefixLength ->
map (adjustLine desiredPrefixLength) oldLines
newText = Data.Text.unlines newLines
This function utilises two convenient tools: lines
and unlines
.
Data.Text.lines
splits a block of Text
into multiple lines and returns them as a list.
*Main> :type Data.Text.lines
Data.Text.lines :: Text -> [Text]
Conversely, Data.Text.unlines
combines a list of multiple lines back into a single block of Text
.
*Main> :type Data.Text.unlines
Data.Text.unlines :: [Text] -> Text
Using these two tools makes it simple to perform line-by-line Text
transformations in Haskell:
- Split a block of
Text
into multiple lines. - Process the list of lines to create a new list of lines.
- Combine the new list of lines back into a single block of
Text
.
The interesting part of the adjustText
function is how it processes the list of lines:
prefixLengths = map prefixLength oldLines
newLines =
case Safe.maximumMay prefixLengths of
Nothing ->
[]
Just desiredPrefixLength ->
map (adjustLine desiredPrefixLength) oldLines
This code can be read as:
- Apply (
map
) theprefixLength
function to each line to create a list of lengths of strings before equals signs. - Find the maximum length.
- If there is no maximum length, return an empty list of lines.
- If there is a maximum length, add spaces to each line to match that length.
You might wonder, "Why might there be no maximum length?"
For example, when the input is 0 lines, what is the maximum value of an empty list?
The maximumMay
function doesn't throw an exception or return an incorrect value that could be confused with actual data. Instead, maximumMay
returns an optional result.
data Maybe a = Just a | Nothing
maximumMay :: Ord a => [a] -> Maybe a
The a
in the maximumMay
type can be any type that can be compared (implements Ord
), and in this code it's the Int
type, so we can actually think of it as:
maximumMay :: [Int] -> Maybe Int
This means that given a list of Int
s as input, maximumMay
may or may not return an Int
. The result will be either Nothing
(no result) or an Int
value wrapped in Just
.
The result of maximumMay
is handled using pattern matching:
case Safe.maximumMay prefixLengths of
Nothing ->
... -- First case
Just desiredPrefixLength ->
... -- Second case
The first case is when the list is empty. Here, desiredPrefixLength
is not in scope, so trying to use that value would result in a type error. This provides a safety mechanism to prevent accessing a result that doesn't exist. In other languages, you might get a runtime error like java.lang.NullPointerException
or AttributeError: 'NoneType' object has no attribute 'x'
, but in Haskell, pattern matching allows these bugs to be caught at compile time.
The second case is when the list is not empty and has a reasonable maximum length. We use this length to adjust each line.
The advantage of pattern matching is that you must handle these cases. If you tried to use the result of maximumMay
directly as an Int
, you would get a type error. maximumMay
wraps its result in a Maybe
, forcing users to carefully consider the possibility that the list might be empty.
Putting It All Together
All the functions we've written so far are "pure" functions. That is, they transform inputs to outputs in a deterministic way, without modifying variables or producing side effects that we care about.
The key phrase here is "side effects that we care about." In reality, these functions do technically have side effects:
- Memory/register allocation
- Finite time taken for computation
In certain contexts, these side effects might matter. For example, in cryptography, security information could leak through side effects, and in embedded programming, time and memory must be carefully considered. But for simple programs, we can consider these functions essentially "pure."
Now, to use this program's functions from the command line, we need to write a main
function that the program can execute:
import qualified Data.Text.IO
main :: IO ()
main = Data.Text.IO.interact adjustText
The interact
function transforms a pure Text
transformation into a program that can run that transformation from standard input to standard output:
*Main> :type Data.Text.IO.interact
Data.Text.IO.interact :: (Text -> Text) -> IO ()
This is an example of a "higher-order function" - a function that takes another function as input.
The input to the interact
function is a function of type Text -> Text
. Fortunately, our adjustText
function has exactly that type:
adjustText :: Text -> Text
Data.Text.IO.interact adjustText :: IO ()
Then, by assigning a value of type IO ()
to main
, it becomes the action that the program will execute when run from the command line.
For example, save the following complete example as align-equals.hs
:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text (Text)
import qualified Data.Text
import qualified Data.Text.IO
import qualified Safe
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
where
(prefix, suffix) = Data.Text.breakOn "=" line
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
where
(prefix, suffix) = Data.Text.breakOn "=" oldLine
actualPrefixLength = Data.Text.length prefix
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = Data.Text.replicate additionalSpaces " "
newLine = Data.Text.concat [ prefix, spaces, suffix ]
adjustText :: Text -> Text
adjustText oldText = newText
where
oldLines = Data.Text.lines oldText
prefixLengths = map prefixLength oldLines
newLines =
case Safe.maximumMay prefixLengths of
Nothing ->
[]
Just desiredPrefixLength ->
map (adjustLine desiredPrefixLength) oldLines
newText = Data.Text.unlines newLines
main :: IO ()
main = Data.Text.IO.interact adjustText
Then, you can compile it like this:
$ ghc -O2 align-equals.hs
Verify that the executable works correctly:
$ ./align-equals
foo = 1
a = 2
asdf = 3
<Ctrl-D>
foo = 1
a = 2
asdf = 3
Now you can use ./align-equals
to align blocks of text. For example:
address = "192.168.0.44"
port = 22
hostname = "wind"
Running :!./align-equals
from the command line will align the block:
address = "192.168.0.44"
port = 22
hostname = "wind"
Now you don't need to manually align your code one by one.
Conclusion
This article has shown one way to learn the Haskell language by writing a small, practical program. Haskell has many interesting features and concepts, and this article has only covered a tiny fraction of them.