Cyberville

Tuesday, November 3, 2009

Multicore Haskell Now!

I just looked through slides of Dons latest talk about concurrent and parallel programming in Haskell.

Shamelessly lifted from the Abstract:
Multicore computers are here: is your programming language ready for it? Haskell is: you can take an off-the-shelf copy of GHC and write high performance parallel programs right now.

If you want to program a parallel machine, a purely functional language such as Haskell is a good choice: purity ensures the language is by-default safe for parallel execution, (whilst traditional imperative languages are by-default unsafe). This foundation has enabled Haskell to become something of a melting pot for high level approaches to concurrent and parallel programming, all available with an industrial strength compiler and language toolchain, available now for mainstream multicore programming.

I wish someone would record these talks and put them on Youtube! The talk outlines the various ways to get concurrency and parallelism strategies available in Haskell.

More details about concurrency can be found on the Haskellwiki page.

There is also a whole chapter in the fantastic Real World Haskell devoted to Concurrent and Multicore Programming featuring explicit threads using forkIO, synchronized shared variables in form of MVars. Moreover, there's examples of implicit parallelism - called sparks - showing the application of par & pseq combinators.

Oh btw, there's also a complete chapter about Software Transactional Memory. This is really intriguing. The approaches above relied on explicit locking of shared variables. MVars behave similar to mutexes and semaphores in procedural programming. And they suffer from the exact same problems: deadlocks, race conditions, uncaught exception etc. STM provides a way perform database-like (i.e. ACID) transactions on shared variables including retries and rollbacks. You never have to worry about your shared data again. STM is not unique to Haskell, but for some reason it hasn't caught on in the mainstream yet.

These two chapters cover pretty much everything mentioned in the slides. So you should get a clear idea about concurrency in Haskell.

Thursday, October 29, 2009

Vigenère Cipher in Haskell for fun

> import Char

This is a little toy implementation of the Vigenère Cipher - written primarly for autodidactic purposes. The code is based on the Caesar Cipher code of Graham Hutton's book Programming in Haskell. This post is written in literal Haskell, so you should be able to copy paste the whole thing into an *.lhs file and run it.

First the encryption and decryption functions:

> {- Vigenere Encode & Decode -}
> -- converts letter (ASCII encoded) to numeric value
> let2int :: Char -> Int
> let2int c = ord c - ord 'a'
>
> -- converts a numeric value to ASCII
> int2let :: Int -> Char
> int2let n = chr (ord 'a' + n)
>
> -- shift letter by n places modulo 26
> shift :: Int -> Char -> Char
> shift n c | isLower c = int2let ((let2int c + n) `mod` 26)
>           | otherwise = c
>

The Caesar and Vigenère ciphers are closely realted. In a Caesar cipher, each letter of the alphabet is shifted along some number of places. For example, in a Caesar cipher of shift 3, A would become D, B would become E and so on. This is what the shift function is used for. The Vigenère cipher consists of several Caesar ciphers in sequence with different shift values. I used a list comprehension to cycle through the provided key shifting each letter as required.

> -- vigenere encodes the message
> encode :: String -> String -> [Char]
> encode key msg = [ shift (let2int k) c | (k,c) <- zip (cycle key) msg]

The decode function is just the inverse of the encode function. On a sidenote; this would be a nice property to test with quickcheck.

> -- vigenere decodes the message
> decode :: String -> String -> [Char]
> decode key msg = [ shift (26 - let2int k) c | (k,c) <- zip (cycle key) msg]

This takes care of the cryptographic functions in the characteristic concise haskelly way. But the intersting part is the cryptoanalytical part of cracking the cipher. To crack Vigenère, you first have to figure out the length of the key used to encrypt the plain text. There are actually two methods to do this. Either using a statistical method called the Index of Coincidence (IC) or a more mechanial approach called Kasiski's Method. I've only implemented the first approach here, mainly because I'm lazy. Kasiski's method is actually easier to understand and probably more accurate for longer keys than is IC...
Once you know the cipher key length, you can separate the underlying Caesar Cihpers and perform perform individual frequency analysis - one for each position in the key. The last step is then to match the observed frequencies with a table of expected frequencies. This allows you to calculate the key.

> {- Frequency analysis -}
> type FrequencyTable = [Float]
>
> -- frequency table for the english language (expected values)
> table :: FrequencyTable
> table = [8.2, 1.5, 2.8, 4.3, 12.7, 2.2, 2.0,
>           6.1, 7.0, 0.2, 0.8, 4.0, 2.4, 6.7,
>           7.5, 1.9, 0.1, 6.0, 6.3, 9.1, 2.8,
>           1.0, 2.4, 0.2, 2.0, 0.1]
>
> lowers :: String -> Int
> lowers xs = length [x | x <- xs, isLower x]
>
> count :: Char -> String -> Int
> count x xs = length [x' | x' <- xs, x == x']
>
> percent :: Int -> Int -> Float
> percent n m = (fromIntegral n / fromIntegral m) * 100

Freqs calculates the observed frequencies (in percent) of all the characters in the cipher text. Frequency analysis is a very common attack on many pen an paper ciphers.

> freqs :: String -> [Float]
> freqs xs = [percent (count x xs) n | x <- ['a'..'z']]
>             where n = lowers xs
>
>
> indexOfCoincidence :: String -> Float
> indexOfCoincidence xs = fromIntegral (sum [f * (f-1) | f <- counts]) / fromIntegral (n * (n-1))
>                         where
>                            counts = [(count x xs) | x <- ['a'..'z']]
>                            n = sum counts

The function passwordLength deduces the likely password length for english texts based on the Index of Coincidence. This is sometimes also called the Friedman test. There's a lot of statistical magic going on behind the scenes and the details are beyond the scope of this text. As I mentioned earlier IC suffers from some weaknesses, first and foremost it gets inaccurate the longer the cipher key is.

> passwordLength :: String -> Int
> passwordLength xs | ic > 0.06552 = 0
>                   | ic > 0.05185 = 1
>                   | ic > 0.04730 = 2
>                   | ic > 0.04502 = 3
>                   | ic > 0.04365 = 4
>                   | ic > 0.04274 = 5
>                   | ic > 0.04209 = 6
>                   | ic > 0.04160 = 7
>                   | ic > 0.04122 = 8
>                   | ic > 0.04092 = 9
>                   | otherwise     = 10
>                   where
>                      ic = indexOfCoincidence xs

Once you know length of the key, you can proceed to calculate the frequencies for each cipher key character. To find the best match between the expected frequencies in the FrequencyTable and the observed frequencies in the cipher text, a statistical method called Chi Square Test is used.

> -- Chi square for the observed frequencies
> chisqr :: [Float] -> [Float] -> Float
> chisqr os es = sum [((o - e) ^ 2) / e | (o,e) <- zip os es]
>
> rotate :: Int -> [a] -> [a]
> rotate n xs = drop n xs ++ take n xs
>
> positions :: Eq a => a -> [a] -> [Int]
> positions x xs = [i | (x',i) <- zip xs [0..], x == x']
>

Bestmatch cycles through the all possible shifts (zero to 25) and returns the most likely number of positions the alphabet was shifted, i.e. the one with the lowest Chi-Square value.

> bestmatch :: String -> Int
> bestmatch xs = head (positions (minimum chitab) chitab)
>                where
>                   chitab = [ chisqr (rotate n table') table | n <- [0..25]]
>                   table' = freqs xs
>
>
> separateAlphabets :: String -> Int -> [String]
> separateAlphabets xs n = [[ c | (c,p) <- zip xs (cycle [0..n-1]), p `mod` n == i] | i <- [0..n-1]]
>
> findKey :: String -> String
> findKey msg = [ int2let (bestmatch s) | s <- alphas ]
>               where
>                  alphas = separateAlphabets msg cnt
>                  cnt = passwordLength msg

The crack function combines all of the above into another Haskell one-liner:

> crack :: String -> String
> crack cipherText = decode (findKey cipherText) cipherText

Monday, October 19, 2009

A Neighborhood of Infinity: You Could Have Invented Monads!

Another gem from the 'Neighborhood of infinity' - one of my favorite blogs. Definitely the best, most straight forward explanation of the concept of Monads in Haskell I have come across so far. Illustrating that monads are not the incomprehensible, magical, mystical things they are so often depicted to be. On the contrary, Monads are a naturally arising pattern when you're programming in a purely functional setting. Using simple examples, Dan goes on to show that seemingly unrelated problems can all be solved using monads. A truly brilliant article that every Haskell beginner trying to wrap their head around Monads should read.

Tuesday, March 17, 2009

Don't fear the Monads

Dr. Brian Beckman, a Channel 9 celebrity, astrophysicist and senior software engineer thought it would be a very good idea to address the complexity of monads in an easy to understand way: a technical conversation at the whiteboard with yours truly for Channel 9.

Intro to FP and Monads in general and Haskell in particular. You'll have to remember high school algebra, tho :) He tries to explain this without going too much into the mathematical details:

Just think a monoid satisfies all the axioms of a group with the exception of having inverses.

Thursday, February 19, 2009

John Goerzen on Why You Should Learn Haskell

Where does laziness help real world programs? The traditional explanations I've seen in beginning tutorials and calculate Fibonacci, which is not that interesting.

Goerzen: One of the first things that I thought was interesting was how laziness relates to IO in Haskell. In Haskell there's a function called getContents, and getContents returns a string. That string represents the entire contents of standard input or of a file or whatever, but it doesn't just read the whole thing into memory all at once. It also doesn't mmap. In other languages, if you want a string that represents an entire file, you're either going to mmap the thing or you're going to have to waste a lot of memory, or you read it in blocks or lines or whatever else you might do. If I write a little filter or parser program in Haskell, I don't have to really worry about the details of reading the input line-by-line or block-by-block because I can just use getContents and then I can string together a whole bunch of functions. I like writing Unix filters in Haskell.

That doesn't sound so extraordinary. After all, imperative languages also offer some pretty nice abstractions. The coolest aspect is not how Haskell abstracts the IO, but that lazy IO allows you to completely restructure your programs. You can (and should) separate the code that deals with the IO from code that works on the contents of a file - any kind of resource actually. In an imperative setting you tend to have a control structure that loops through the contents of a file chunk by chunk and in the middle of that you have a code block that processes each chunk of the contents. These two blocks are essentially unrelated. It had never even occurred to me until I saw the Haskell way of performing input output. It's much cleaner and also much more composible. You can simply plug in another processing function if it should become necessary at some point in the future. That is as long as the types match. And if they don't, the compiler will warn you about it preventing nasty runtime exceptions.