Detailed Walkthrough of a Haskell Program for Beginners

This article is a translation of the original text using ChatGPT.

Author: Gabriella Gonzalez
Link: https://www.haskellforall.com/2018/10/detailed-walkthrough-for-beginner.html
License: Creative Commons Attribution 4.0 International License

This article provides a step-by-step guide to developing a small Haskell program that aligns equals signs (=) across multiple lines of text. It's aimed at beginner programmers, so some steps and concepts are explained in more detail than usual.

The article explains how to write, compile, and run a Haskell program as a single file to make experimentation and learning easier. For larger Haskell projects, you would typically use cabal or stack to create, run, and share your project structure with others. This single-file approach is used because it's the simplest way to get started and try out the language immediately.

Background

I focus more on making code easy to read rather than easy to write, so I pay a lot of attention to visual tidiness. One way I do this is by aligning equals signs (=). For example, code might initially be written like this:

address = "192.168.0.44"
port = 22
hostname = "wind"

Later, I manually adjust the indentation to align all the equals signs at the same position, like this:

address  = "192.168.0.44"
port     = 22
hostname = "wind"

I use vim as my editor, and I could install the Tabular plugin for this purpose. However, I thought it would be educational to implement this from scratch as an example of how to write a program in a functional style.

One useful feature of vim is that you can use any command-line program to transform text within the editor. For example, you can select text in visual mode and then type:

:!some-command

Then vim will send the selected text as standard input to the command-line program some-command, and replace the selected text with whatever the program outputs to standard output.

So all we need to do is write a program that takes text to be aligned via standard input and outputs the aligned text to standard output. We'll call this program align-equals.

Development Environment

My "IDE" is the command line. I typically work with up to three terminal windows open:

In one window, I edit text using vim.
In one window, I use ghcid to check for type errors in real-time.
In one window, I test my Haskell code in the REPL.

I also use Nix, particularly nix-shell, to set up my development tools. I prefer Nix for setting up development tools because I want to avoid accumulating unnecessary programs system-wide. With nix, I can temporarily configure the necessary development tools or libraries using nix-shell.

All the examples that follow will be run inside a Nix shell like this (run this each time you open a new terminal):

$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: [ pkgs.text pkgs.safe ])' haskellPackages.ghcid

This creates a temporary shell with ghc, ghci, ghcid, and hoogle available, including the Haskell packages text and safe. If you want to change the list of available Haskell packages, you can modify the command line and create a new shell.

To enable live type checking, run:

$ ghcid --command='ghci align-equals.hs'

This command will automatically reload the align-equals.hs file whenever it changes and display any errors or warnings found by the Haskell compiler.

In another terminal, I open the code I'm working on in the ghci REPL to interactively test the functions I'm writing:

$ ghci align-equals.hs
GHCi, version 8.2.2: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( test.hs, interpreted )
Ok, one module loaded.
*Main>

And in a third terminal, I actually edit the file:

$ vi align-equals.hs

The Program

First, before writing any code, let's describe what we want to do in plain English^[1].

We want to find the longest string before an equals sign (=), and then add spaces to the end of all other strings to match the length of the longest one.

Length of String Before Equals Sign

To implement this, we first need to define a function that calculates the number of characters before the equals sign (=) in a given line. This function should have the following type:

import Data.Text (Text)

prefixLength :: Text -> Int

This type declaration can be read as: "prefixLength is a function that takes a value of type Text as input (i.e., a line of input) and returns an Int as output. The return value represents the number of characters before the first = symbol."

We could also add this explanation as a comment:

prefixLength
    :: Text
    -- ^ A line of input
    -> Int
    -- ^ Number of characters before the first @=@ symbol

I don't use the basic String type provided by Haskell's Prelude^[2] because it's inefficient. Instead, I use the high-performance Text type and useful tools from the Data.Text package.

The implementation of the prefixLength function is almost identical to the description:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text (Text)
import qualified Data.Text

prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

As the name suggests, prefixLength returns the length of the string before the equals sign. The only thing to consider is how to find the Data.Text.breakOn function that was created for this purpose.

When looking for Haskell package documentation, I typically search Google for hackage ${package name} (e.g., hackage text). This is how I found the breakOn function.

breakOn

Some people also use hoogle to search by function name or type. For example, if you want to find a function that splits a Text value into the part before the equals sign and the part after, you could run:

$ hoogle 'Text -> (Text, Text)'
Data.Text breakOn :: Text -> Text -> (Text, Text)
Data.Text breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOn :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text transpose :: [Text] -> [Text]
Data.Text.Lazy transpose :: [Text] -> [Text]
Data.Text intercalate :: Text -> [Text] -> Text
Data.Text.Lazy intercalate :: Text -> [Text] -> Text
Data.Text splitOn :: Text -> Text -> [Text]
Data.Text.Lazy splitOn :: Text -> Text -> [Text]
-- more results omitted, can be seen with --count=20 option

I can also test the function in my long-running REPL to verify it works as intended:

*Main> :reload
Ok, one module loaded.
*Main> :set -XOverloadedStrings
*Main> Data.Text.breakOn "=" "foo = 1"
("foo ","= 1")
*Main> prefixLength "foo = 1"
4

The reason we need to enable the OverloadedStrings extension is because we're not using the default String type from Prelude. This extension allows other packages to use string literals with alternative implementations (like Text).

One nice thing about Haskell is that you're not very constrained by the order of code definitions. You can define things in any order, and the compiler won't complain. So you could write:

prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

This order-independent coding style also works well with lazy evaluation. Haskell is a "lazy" language, meaning that unused values aren't calculated, or the evaluation order might be reversed. For example, the prefixLength function doesn't actually use suffix, so that value isn't computed.

The more you program in Haskell, the more you start thinking of code not as a sequence of instructions, but as a graph of calculations that depend on each other.

Indenting a Single Line

Now we need to define a function that adds spaces to the end of the string before the equals sign to match a desired length.

adjustLine :: Int -> Text -> Text

With comments added, we could write:

adjustLine
    :: Int
    -- ^ Desired length of string before equals sign
    -> Text
    -- ^ A line to which spaces will be added
    -> Text
    -- ^ New line with spaces added

This function is a bit longer, but still intuitive:

adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine

    actualPrefixLength = Data.Text.length prefix

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = Data.Text.replicate additionalSpaces " "

    newLine = Data.Text.concat [ prefix, spaces, suffix ]

All the lines after where could be rearranged without affecting how the program works. However, for readability, we list them in an order that can be understood from top to bottom:

Split a line into the part before the equals sign and the part after.
Calculate the actual length of the string before the equals sign.
Calculate how many spaces to add by subtracting the actual length from the desired length.
Create padding by repeating spaces the specified number of times.
Create a new line by inserting the padding between the part before the equals sign and the part after.

This code structure reads like a function defined in an imperative language. For example, similar Python code would be:

def adjustLine(desiredPrefixLength, oldLine):
    (prefix, suffix) = oldLine.split("=")

    actualPrefixLength = len(prefix)

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = " " * additionalSpaces

    # Python's split removes '=', so we need to add it back
    newLine = "".join([ prefix, spaces, "=", suffix ])

    return newLine

Generally, when a functional program uses simple types (strings, numbers, records, etc.), it can be translated to an imperative program like this. In such simple programs, functional code is essentially an imperative program restricted from reassigning values ("mutation"), which is a good practice for making programs easier to understand.

To verify that the function works correctly, we can save the entire program we've written so far and reload it in the REPL:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text (Text)
import qualified Data.Text

prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine

    actualPrefixLength = Data.Text.length prefix

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = Data.Text.replicate additionalSpaces " "

    newLine = Data.Text.concat [ prefix, spaces, suffix ]

Then, reload the program in the REPL:

*Main> :reload
Ok, one module loaded.
*Main> adjustLine 10 "foo = 1"
"foo       = 1"

Indenting Multiple Lines

Now we can define a function to indent multiple lines.

import Safe

adjustText :: Text -> Text
adjustText oldText = newText
  where
    oldLines = Data.Text.lines oldText

    prefixLengths = map prefixLength oldLines

    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLines

    newText = Data.Text.unlines newLines

This function utilises two convenient tools: lines and unlines.

Data.Text.lines splits a block of Text into multiple lines and returns them as a list.

*Main> :type Data.Text.lines
Data.Text.lines :: Text -> [Text]

Conversely, Data.Text.unlines combines a list of multiple lines back into a single block of Text.

*Main> :type Data.Text.unlines
Data.Text.unlines :: [Text] -> Text

Using these two tools makes it simple to perform line-by-line Text transformations in Haskell:

Split a block of Text into multiple lines.
Process the list of lines to create a new list of lines.
Combine the new list of lines back into a single block of Text.

The interesting part of the adjustText function is how it processes the list of lines:

prefixLengths = map prefixLength oldLines

newLines =
    case Safe.maximumMay prefixLengths of
        Nothing ->
            []
        Just desiredPrefixLength ->
            map (adjustLine desiredPrefixLength) oldLines

This code can be read as:

Apply (map) the prefixLength function to each line to create a list of lengths of strings before equals signs.
Find the maximum length.
If there is no maximum length, return an empty list of lines.
If there is a maximum length, add spaces to each line to match that length.

You might wonder, "Why might there be no maximum length?" For example, when the input is 0 lines, what is the maximum value of an empty list? The maximumMay function doesn't throw an exception or return an incorrect value that could be confused with actual data. Instead, maximumMay returns an optional result.

data Maybe a = Just a | Nothing

maximumMay :: Ord a => [a] -> Maybe a

The a in the maximumMay type can be any type that can be compared (implements Ord), and in this code it's the Int type, so we can actually think of it as:

maximumMay :: [Int] -> Maybe Int

This means that given a list of Ints as input, maximumMay may or may not return an Int. The result will be either Nothing (no result) or an Int value wrapped in Just.

The result of maximumMay is handled using pattern matching:

case Safe.maximumMay prefixLengths of
    Nothing ->
        ...  -- First case
    Just desiredPrefixLength ->
        ...  -- Second case

The first case is when the list is empty. Here, desiredPrefixLength is not in scope, so trying to use that value would result in a type error. This provides a safety mechanism to prevent accessing a result that doesn't exist. In other languages, you might get a runtime error like java.lang.NullPointerException or AttributeError: 'NoneType' object has no attribute 'x', but in Haskell, pattern matching allows these bugs to be caught at compile time.

The second case is when the list is not empty and has a reasonable maximum length. We use this length to adjust each line.

The advantage of pattern matching is that you must handle these cases. If you tried to use the result of maximumMay directly as an Int, you would get a type error. maximumMay wraps its result in a Maybe, forcing users to carefully consider the possibility that the list might be empty.

Putting It All Together

All the functions we've written so far are "pure" functions. That is, they transform inputs to outputs in a deterministic way, without modifying variables or producing side effects that we care about.

The key phrase here is "side effects that we care about." In reality, these functions do technically have side effects:

Memory/register allocation
Finite time taken for computation

In certain contexts, these side effects might matter. For example, in cryptography, security information could leak through side effects, and in embedded programming, time and memory must be carefully considered. But for simple programs, we can consider these functions essentially "pure."

Now, to use this program's functions from the command line, we need to write a main function that the program can execute:

import qualified Data.Text.IO

main :: IO ()
main = Data.Text.IO.interact adjustText

The interact function transforms a pure Text transformation into a program that can run that transformation from standard input to standard output:

*Main> :type Data.Text.IO.interact
Data.Text.IO.interact :: (Text -> Text) -> IO ()

This is an example of a "higher-order function" - a function that takes another function as input. The input to the interact function is a function of type Text -> Text. Fortunately, our adjustText function has exactly that type:

adjustText :: Text -> Text
Data.Text.IO.interact adjustText :: IO ()

Then, by assigning a value of type IO () to main, it becomes the action that the program will execute when run from the command line.

For example, save the following complete example as align-equals.hs:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Data.Text (Text)
import qualified Data.Text
import qualified Data.Text.IO
import qualified Safe

prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine

    actualPrefixLength = Data.Text.length prefix

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = Data.Text.replicate additionalSpaces " "

    newLine = Data.Text.concat [ prefix, spaces, suffix ]

adjustText :: Text -> Text
adjustText oldText = newText
  where
    oldLines = Data.Text.lines oldText

    prefixLengths = map prefixLength oldLines

    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLines

    newText = Data.Text.unlines newLines

main :: IO ()
main = Data.Text.IO.interact adjustText

Then, you can compile it like this:

$ ghc -O2 align-equals.hs

Verify that the executable works correctly:

$ ./align-equals
foo = 1
a = 2
asdf = 3
<Ctrl-D>
foo  = 1
a    = 2
asdf = 3

Now you can use ./align-equals to align blocks of text. For example:

address = "192.168.0.44"
port = 22
hostname = "wind"

Running :!./align-equals from the command line will align the block:

address  = "192.168.0.44"
port     = 22
hostname = "wind"

Now you don't need to manually align your code one by one.

Conclusion

This article has shown one way to learn the Haskell language by writing a small, practical program. Haskell has many interesting features and concepts, and this article has only covered a tiny fraction of them.

Translator's note: In the original, this refers to Korean. ↩︎
Translator's note: Haskell's standard library. ↩︎