A few days ago, a friend of mine sent me a fascinating problem. The problem goes like this:
The homophony group (of English) is the group with 26 generators
a
,b
,c
, and so on untilz
and one relation for every pair of English words which sound the same. Prove that the group is trivial!
For example, consider the group elements knight and night. By the cancellation laws, this implies that k must be the identity element. Recall that a trivial group is one which consists solely of its identity element, so our task is to show that each letter of the English alphabet is the identity element.
Skipping all of the algebraic jargon, we want to show that if we set all homophones "equal" to one another, and do left cancellation, right cancellation, and substitution, we can show that all the English letters equal one.
This is a fun exercise to do by hand, but I'd like to do it in Haskell.
Note: This work was done in IHaskell, and what you're reading is the IHaskell notebook exported to HTML for viewing in the browser. You can download the original notebook here.
I've started by compiling a list of homophones in American English, starting with this list and removing all single letters (such as j
being a homophone with jay
) and all words with apostrophes and periods, as well as some less commonly used words. You can download my list, or make your own.
The contents of the file look like this:
ad add
add ad
arc ark
ark arc
...
Each line is a space-delimited list of words. The first word in the list sounds identical to all the remaining words in the list. This is why you see repeats - ad
sounds like add
but also add
sounds like ad
. This repetition isn't necessary, as we could do it programmatically, but is convenient.
Let's go ahead and load this list:
import Control.Applicative ((<$>))
import Data.List.Utils (split)
removeEmpty = filter (not . null)
homophones <- removeEmpty . map words . lines <$> readFile "homophones.list"
Let's take a look at a few more of these homophones.
import Control.Monad (forM_)
import Data.List (intercalate)
-- Show ten of the homophone sets
forM_ (take 10 homophones) $ \ homs ->
putStrLn $ intercalate "\t" homs
Note that some of the sets have more than two elements, yet they are all on the same line.
Let's convert this into a more usable format. We'll define a new type WordPair
which represents a single pair of homophones, and convert this list into a list of WordPair
s.
data WordPair = WordPair String String
deriving Show
-- Convert a list of homophones into a list of word pairs.
-- Note that the wordpairs should only use the first of the
-- list as the first word, since there will be repeat sets.
-- For instance, the set ["a", "b", "c"] would only generate
-- word pairs [WordPair "a" "b", WordPair "a" "c"].
pairs :: [String] -> [WordPair]
pairs (str:strs) = map (WordPair str) strs
-- All pairs of words we consider homophones.
wordPairs = concatMap pairs homophones
Now that we have this data in a usable form, let's use it to derive relations.
The initial relations we have are simply the set of word pairs. However, we can use two operations in order to derive more relations:
reduce
: The reduction operation will be the application of left and right cancellation laws. If a relation has the same thing on the left of both sides, we can take it off; same for the right side. This generates a new, simpler relation.substitute
: The substitution operation will be substituting identity relations in. For instance, if we've derived thatd
is the identity element, then we can removed
from all known relations to get new, simpler relations.
In addition to each relation storing what strings it considers equal, we'd also like to be able to track what operations led to the creation of that word pair. So before defining a relation, let's define a history data type:
data History = Reduce String String
| Substitute Char
deriving Show
Now, we'd like a relation to store all the transformations that were used to generate it, and also the two things it relates:
data Relation = Relation [History] String String deriving Show
-- We'd like equality to only look at the strings, not the history.
instance Eq Relation where
Relation _ s1 s2 == Relation _ t1 t2 =
s1 == t1 && s2 == t2
Since Relation
and WordPair
are slightly different, let's convert all our WordPair
s to Relation
s. This gives us our initial set of relations, which we will use to derive all other relations.
toRelation :: WordPair -> Relation
toRelation (WordPair first second) = Relation [] first second
initRelations = map toRelation wordPairs
Eventually, we're going to iteratively improve these relations until we have proven that all letters equal the identity. First, though, let's define our two operators, starting with reduce
.
When we reduce
a relation, we apply the right and left cancellation laws. If we have the equation
$$ab = ac$$
we can use the left cancellation law to reduce it to $b = c$; similarly, using the right cancellation law, we can reduce the equation
$$xa = ya$$
to just $x = y$.
Our reduce
operator repeats these steps until it can no longer do so, and then the resulting strings are the reduced relation.
reduce :: Relation -> Relation
reduce rel@(Relation hist first second)
| canReduce first second = go (first, second)
-- Note that we also have to be careful with the history.
-- If the `reduce` does nothing, then we do not want to add
-- anything to the history of the relation.
| otherwise = rel
where
-- A reduction can happen if both strings are non-zero
-- and share a common first or last letter.
canReduce first second =
not (null first) &&
not (null second) &&
(head first == head second ||
last first == last second)
-- Modified history including this reduction.
hist' = Reduce first second : hist
-- Base case: if we've reduced a word pair to an empty string
-- and something else, we're done, as that something else
-- is equivalent to the identity element.
go ("", word) = Relation hist' word ""
go (word, "") = Relation hist' word ""
go (first, second)
-- Chop off the first element if they're equal.
| head first == head second
= go (tail first, tail second)
-- Chop off the last element if they're equal.
| last first == last second
= go (init first, init second)
-- If netiher first nor last element are equal,
-- we've simplified the relation down as much
-- as we can simplify it.
| otherwise =
Relation hist' first second
This looks pretty good. Next, let's define the substitute
operator.
The substitute
operator removes a character from a relation. For instance, if we know that d
is the identity, we can simplify the relation $$ad = dyd$$ to just $a = y$.
Just like the reduce
operator, we avoid modifying the Relation
's history if the substitute
does nothing.
import Data.List.Utils (replace)
-- Generate a new relation by removing characters we know to be
-- the identity. Make sure to update the history of the relation
-- with this substitution!
substitute :: Char -> Relation -> Relation
substitute char rel@(Relation hist first second)
| canSubstitute first second
= Relation (Substitute char : hist) (replaced first) (replaced second)
| otherwise = rel
where
canSubstitute first second = char `elem` first || char `elem` second
replaced = replace [char] ""
With substitute
implemented, we've finished all the machinery we're going to use for simplifying our relations. We're going to iteratively reduce and substitute until we've found that all the English letters are the identity element of the homophony group. We're still missing one thing, though - how do we know which letters we've proven to be the identity?
Let's define a quick helper datatype for every identity we find. We're going to store the character that we've proven is the identity, as well as the history; that way, when we want to examine the results, we can see exactly how each letter was reduced to the identity.
data FoundIdent = FoundIdent {
char :: Char,
hist :: [History]
}
Let's also define a function that extracts all the identity elements from a set of relations.
-- mapMaybe = map fromJust . filter isJust . map
import Data.Maybe (mapMaybe)
identities :: [Relation] -> [FoundIdent]
identities = mapMaybe go
where
go :: Relation -> Maybe FoundIdent
go (Relation hist [char] "") = Just $ FoundIdent char hist
go (Relation hist "" [char]) = Just $ FoundIdent char hist
go _ = Nothing
Let's finally put all of this together. We're going to start with our initial set of relations, initRelations
, and then we're going to iteratively simplify them. Initially, we have no known identity elements.
In each iteration, we
- Substitute into each relation each known identity (replacing it with the empty string).
- Reduce the resulting relations.
- Collect all known identity elements.
import Data.List (nubBy)
import Data.Function (on)
-- The iteration starts with a list of known identity elements
-- and the current set of relations. It outputs the updated
-- relations and all known identity elements.
iteration :: ([FoundIdent], [Relation]) -> ([FoundIdent], [Relation])
iteration (idents, relations) = (newIdents, newRelations)
where
-- Collect all the substitutions into a single function.
substitutions = foldl (.) id $ map (substitute . char) idents
-- Do all substitutions, then reduce (for each relation).
newRelations = map (reduce . substitutions) relations
-- We have to remove duplicate identity elements, because
-- in each iteration we find multiple ways to prove that some
-- letters are the identity element. We just want one.
removeDuplicateIdents =
nubBy ((==) `on` char)
-- Find all identities in the new relations.
newIdents = removeDuplicateIdents $ idents ++ identities newRelations
Let's iterate this process until we have all the identities we want. We want 26 of them, so we can just check the length. (If this operation never finishes, we're out of luck!)
-- Generate the infinite list of iterations and their results.
initIdents = []
iterations = iterate iteration (initIdents, initRelations)
-- Define a completion condition.
-- We're done when there are 26 known identity elements.
done (idents, _) = length idents == 26
-- Discard all iteration results until completion.
-- Take the next one - the first one where the condition is met.
result = head $ dropWhile (not . done) iterations
Woohoo! We're done! Let's take a look at the results!
import Data.List (sort)
idents = fst result
identChars = map char idents
putStrLn $ sort identChars
length identChars
Looks like we do indeed have every single letter mapped to the identity.
Let's see if we can deduce, for each letter, how it was mapped to the identity. Instead of doing it in alphabetical order, we'll look at them in the order they were deduced, so it follows some logical flow.
import Text.Printf (printf)
forM_ idents $ \(FoundIdent char hist) -> do
printf "Proving %c = 1:\n" char
forM_ (reverse hist) $ \op ->
putStrLn $ case op of
Reduce first second ->
printf "Reduce %s and %s" first second
Substitute ch ->
printf "Substitute %c for ''" ch
putStr "\n"
If you scan through the list above, there's a few weird cases, but for the most part, it seems legitimate. (I mildly question felt
and veldt
, but it depends on how you pronounce things. If you look at the British English list of homophones, it's totally different anyways!)
So that's that! We've found the ways to reduce every letter to the identity, and shown how to do it.
I wonder if other languages also have trivial homophony groups. It might be fun to try Spanish, French, Russian, and others, and see if the homophony groups tell us anything interesting about the language!