I decided to make a text adventure from scratch for my second Haskell project. Text adventures, which are one kind of interactive fiction, are games which are played entirely on a command-line terminal and are require only standard I/O.
I chose to build a text adventure because it only has a minimal dependency on I/O operations and also because they involve some interesting symbolic computations. This means that if you’ve read up to Modeling Generalized Behaviors and Imprisoning Side Effects in my tutorial series on Haskell, you should be able to understand almost all of the code for this project.
In order to interact with the game, the player inputs sentences in a natural language, which means that you need to write a simple parser to make one.
A Natural Language Lexer
The first step in writing a simple natural language parser is to perform lexical analysis on the sentence which the user enters each time they want to interact with the game. A lexical analyzer (lexer), considers each word in a sentence and outputs a token representing the role of the word in the sentence. For example, “walk” would be tokenized as a verb and “cake” would be tokenized as a noun. This is made more difficult by the fact that some words in English are lexically ambigous, for example “pick” may be either a verb or a noun. Furthermore, some tokens need to contain more than one word, for example “pick” and “pick up” are two different verbs.
I want to support two types of English language sentences, simple sentences and complex sentences. In order to represent the important parts of speech that I want to parse, I created several tokens:
data Token = TokenVerb [String] | TokenNoun String | TokenPreposition [String] deriving (Show, Eq)
With these tokens, it should be possible to create a semantic parser (which we’ll cover in my next post) to recognize both simple and complex English language sentences.
As you can see, TokenVerb and TokenPreposition have a list of Strings. This is because some verbs and prepositions have synonyms that I want to support. For example, “close” and “shut” both represent closing something. I want to support synonyms because any synonyms which aren’t supported in the vocabulary will result in a parse error and a message like “I don’t understand what you’re trying to say” being printed to the console, which will be very frustrating for the player.
I also define a type which stores which tokens match to a string to handle ambiguities where multiple tokens match to a single word:
data TokenMatch = TokenMatch String [Token] deriving (Show, Eq)
You can join two TokenMatches using the operator `join`:
join :: Maybe TokenMatch -> Maybe TokenMatch -> Maybe TokenMatch join Nothing Nothing = Nothing join (Just a) Nothing = Just a join Nothing (Just b) = Just b join (Just (TokenMatch wordA tokensA)) (Just (TokenMatch wordB tokensB)) | wordA == wordB = Just (TokenMatch wordA (tokensA ++ tokensB)) | otherwise = Nothing
`join` takes two TokenMatches with the Maybe typeclass and returns a TokenMatch with the Maybe typeclass.
The Maybe typeclass allows you to represent when an evaluation may have failed. In the case of failure, it has a value constructor Nothing and in the case of success, the Maybe typeclass has a value constructor Just followed by the value constructor of its type parameter. For more details about typeclasses, read my post on Modeling Generalized Behaviors and Imprisoning Side Effects.
Maybe is defined as:
data Maybe a = Nothing | Just a deriving (Eq, Ord)
For example, Nothing is a Maybe Int and Just 5 is also a Maybe Int.
`join` handles the cases where either token match failed. If both inputs matched and the words are the same, then it concatenates the matched tokens.
The top level function used in the lexer is lexInput:
lexInput :: [Token] -> [String] -> [TokenMatch] lexInput potentialTokens  =  lexInput potentialTokens (word1 : word2 : words) = lexTokens potentialTokens (word2 : words) [(foldl (\acc token -> (tokenize (word1 ++ ' ' : word2) token) `join` acc) Nothing potentialTokens, words), --Prioritize look-ahead by putting the look-ahead option first (foldl (\acc token -> (tokenize word1 token) `join` acc) Nothing potentialTokens, word2 : words)] lexInput potentialTokens (word : words) = lexTokens potentialTokens words [(foldl (\acc token -> (tokenize word token) `join` acc) Nothing potentialTokens, words)]
foldl runs the lambda on every token in potentialTokens, accumulating the result into acc, which is initialized with Nothing. The lambda attempts to tokenize the word, and evaluates to a Maybe TokenMatch. The Maybe TokenMatch is joined onto the accumulator Maybe TokenMatch so that at the end we have a Maybe TokenMatch with every token that matched the word inside it.
This function takes the sentence typed by the player as an input and returns a list of tokens matched in the order they appear in the sentence. I’m not going to match every word to a token, because words like “the” or even “want” are not useful for controlling the text adventure I’m going to write. For example, “I want to go to the house” will only match “go” and “house” in this simple lexer.
The lexer recursively processes each word in order. There are two non-terminal cases, first, the lexer will attempt to try looking ahead one word, if there is more than one word to process. Otherwise it will not use look-ahead. None of the tokens will require more than one word of look-ahead.
In the look-ahead case, the function creates a list of two Maybe TokenMatches. The first Maybe TokenMatch is the result of calling tokenize on the next two words in the line, and the second is the result of calling tokenize on only the next word in the line. This is how look-ahead is implemented in the lexer, if the look-ahead matches, the single word case is ignored.
The reason I pass a tuple containing both a Maybe TokenMatch and a list of remaining strings is because if there is a look-ahead match, I want to pop off both of the matched words. If there is no look-ahead match, or no match at all, I only want to pop off one word.
We’ll look at tokenize soon, but first let’s look at lexTokens:
lexTokens :: [Token] -> [String] -> [(Maybe TokenMatch, [String])] -> [TokenMatch] lexTokens potentialTokens words  = lexInput potentialTokens words lexTokens potentialTokens words ((Nothing, _) : tokens) = lexTokens potentialTokens words tokens lexTokens potentialTokens words ((Just token, tokenWords) : tokens) = token : lexInput potentialTokens tokenWords
As you can see, lexTokens and lexInput are mutually recursive. lexTokens takes the string of words remaining to be parsed and a list of (Maybe TokenMatch, [String]) tuples. If a particular token doesn’t match the current word, then the algorithm proceeds on to the next token. If there is a token match, then the token is added to the list of lexed tokens and lexInput is then called recursively.
When there is a match in lexTokens, the remaining tokens in the token list are discarded; so if there is a look-ahead match, the evaluation proceeds without processing the single word match at all.
lexInput and lexTokens recursively produce the token list.
Now let’s look at tokenize:
tokenize :: String -> Token -> Maybe TokenMatch tokenize "" _ = Nothing --Empty string can't match tokens tokenize word token@(TokenVerb synonyms) | lowerCaseWord `elem` synonyms = Just (TokenMatch word [token]) | otherwise = Nothing where lowerCaseWord = (Data.Char.toLower (head word)) : (tail word) tokenize word token@(TokenNoun name) | word == name = Just (TokenMatch word [token]) | lowerCaseWord == name = Just (TokenMatch word [token]) | otherwise = Nothing where lowerCaseWord = (Data.Char.toLower (head word)) : (tail word) tokenize word token@(TokenPreposition synonyms) | lowerCaseWord `elem` synonyms = Just (TokenMatch word [token]) | otherwise = Nothing where lowerCaseWord = (Data.Char.toLower (head word)) : (tail word)
The token@ in front of the Tokens defines an alias for each token.
tokenize takes a word and a token and produces a Maybe TokenMatch.
tokenize has one case each for verbs, nouns and prepositions. In each of these cases, I use Data.Char.toLower on the first character of the word to ensure that the tokenizer can match words which are capitalized.
In the case of verb and preposition, I use the `elem` operator to check whether the word is in the synonym list for the token. If it is, I return a Just TokenMatch, otherwise I return Nothing.
In the case of nouns, tokenize first attempts to match the word without reducing it to lower case to match proper nouns. There are no synonyms for nouns, so matching only requires an == check.
All lexer functionality is in NaturalLanguageLexer.hs.
The full list of current tokens is in the DummyAdventure.hs file, but that will change based on the requirements of the story as I add it to the game.
The game executable code, which includes all I/O functionality and the main function is in TextAdventure.hs.
Up to this point, I’ve shown how to make a simple natural language lexer. When the user enters a sentence in the current version of this application, it just prints out the tokens which were matched in the sentence.
I don’t match the subject of the sentence because that’s always going to be the user. Also, ambiguous matches return multiple tokens.
Here’s an example of what happens when you use the current version of the text adventure:
$ ./text_adventure.exe Commands: Help - Print help text Nouns - Print all available nouns Verbs - Print all available verbs Prepositions - Print all available prepositions Quit - Exit the game I want to go eat with Steve == Verb go == Verb eat == Preposition with == Noun Steve quit Thanks for playing!
Next time, I’ll be adding a parser to understand what kind of sentence the player has entered and a semantic analyzer to tie objects, events and actions in the game world to sentences.
The code for the text adventure is available at https://github.com/WhatTheFunctional/HaskellAdventure.
Continue reading Making a Text Adventure in Haskell (Part 2).
The Maybe monad: https://wiki.haskell.org/Maybe