Friday, December 25, 2009

Determining the status of Portupdate

I've been fiddling around with the new FreeBSD 8 recently. The surface of the OS hasn't changed that much over time. However, 8.0 introduced a couple of major changes under the hood. Anyway, part of the installation process involved a lot of compiling (and recompiling due to the neglect to consult UPDATING, silly me).
I manage my ports mainly through portupgrade. One of the things that are a bit annoying about it is that you don't know how many ports are affected by portupgrade -a.
To solve this, I've added another little gem to my toolbox. The tool checks the progress of either portinstall or portupdate.

The Idea

After a little goggling, I found that the tried and trusted UNIX ps displays all the information I need. I'm sure you used it a million times blissfully ignorant about its full capabilities, I know I have. Rather than the typical ps aux, I will use ps -ao 'lstart etime command'. This will spit out the date and time a process started, how long it's been running (mnemonic etime = elapsed time) and command which contains the current port and how many ports will be installed or updated respectively. So basically the Idea is call ps, parse the output, find the portinstall/portupdate process and format the output a bit more nicely, that's it.

The Tool

So I'm gonna write such a thing in Haskell. If you're absolutely against using Haskell, I also hacked up an equivalent Perl version available here.

> import System (getArgs)
> import System.Console.GetOpt
> import System.Process (createProcess, proc, std_out, StdStream(..))
> import System.Time (CalendarTime(..), Day(..), Month(..))
> import GHC.IO (hGetContents)
> import Text.Printf (printf)
>
> import Control.Applicative
> import Control.Monad (MonadPlus(..), ap)
> -- Hide a few names that are provided by Applicative.
> import Text.ParserCombinators.Parsec hiding (many, optional, (<|>))
> -- The Applicative instance for every Monad looks like this.
> instance Applicative (GenParser s a) where
>     pure  = return
>     (<*>) = ap
>
> -- The Alternative instance for every MonadPlus looks like this.
> instance Alternative (GenParser s a) where
>     empty = mzero
>     (<|>) = mplus

Calling System Processes

First of all, we'll have to call ps. I will use the System.Process module. Creating a subprocess to make a call to ps is simple.

For example, to execute a simple ls command:

   r <- createProcess (proc "ls" [])

To create a pipe from which to read the output of ls:

   (_, Just hout, _, _) <-
       createProcess (proc "ls" []){ std_out = CreatePipe }

Note: the program will blow up in your face if createProcess fails to create a handle to stdout for some reason. In that unfortunate case the pattern won't match ('fail' will be called and you certainly don't wanna go there).

Note further: std_out = CreatePipe redirects stdout to the handle called 'hout' if you don't do that, every output will be written to the console.

CallPsWith is a simple wrapper around the call that allows you to call ps with the specified formatting.

> callPsWith :: String -> IO String
> callPsWith formattingOptions = do
>    (_, Just hOut, _, _) <- createProcess (proc "ps" ["-ao " ++ formattingOptions]) { std_out = CreatePipe }
>    hGetContents hOut

Technically only one call to ps would be necessary (formatted as 'lstart etime command'). It would contain all the information I'm interested in. However, I will make two calls to ps because ps won't break the line if the formatted output is too long. Usually the command string is truncated because the terminal is not wide enough.
So to avoid this, I will make one call with the arguments '-ao pid command' to find the running portinstall process and its PID. The second call will be with the arguments '-ao pid lstart etime'. And I'll try to find the lstart and etime with the matching PID.

Parsing

Here are some instance declarations to make Parsec an instance of Applicative. I'm gonna use quite a bit of applicative Parsec. I used to have real problems wrapping my head around applicative functors. If you have the same trouble, I highly suggest reading the chapter about Applicative Functors on LearnYouAHaskell.com.

Eventually the goal is to parse the output of ps. This being Haskell (rather than Perl), I want to parse the output to handy datatypes that I can use in other parts of the program. So I defined suitable data types to hold the output of ps:

> data ElapsedTime = ElapsedTime {
>     hours :: Int,
>     minutes :: Int,
>     seconds :: Int
> } deriving (Show, Eq, Ord)
>
> data Command = Command {
>     portNum :: Int,
>     portCount :: Int,
>     portName :: String
> } deriving (Show, Eq, Ord)

You may wonder where the datatype to store the starting time went. I'm gonna use the CalendarTime from the System.Time module. This is a bit overkill for such a small program, but it'll demonstrate how to parse a little bit more complex type.

But first, here are two helper parsers that I will use heavily as part of other parsers. The first is skipSpaces and it does just that. The other one, p_Int, reads a sequence of digits and converts it to an Int. Note that Int will overflow if the sequence is too long. Eg. read "4000000000"::Int returns -294967296 on my machine. However, I know that I will only really use it to parse PIDs or dates and times. So there is no danger of an overflow.

> skipSpaces :: CharParser () String
> skipSpaces = many (oneOf " ")

Also note that the typesystem automatically coerces the call to read to the type declared in the function signature. You just love the type system!

> p_Int :: CharParser () Int
> p_Int   = read <$> many1 (oneOf ['0'..'9'])

>
> p_month :: CharParser () Month
> p_month =  January      <$ try (string "Jan")
>              <|> February   <$ string "Feb"
>              <|> March        <$ try (string "Mar")
>              <|> April          <$ try (string "Apr")
>              <|> May            <$ string "May"
>              <|> June           <$ try (string "Jun")
>              <|> July             <$ string "Jul"
>              <|> August       <$ string "Aug"
>              <|> September <$ string "Sep"
>              <|> October      <$ string "Oct"
>              <|> November  <$ string "Nov"
>              <|> December  <$ string "Dec"
>              <?> "Failed to parse month"
>
> p_day :: CharParser () Day
> p_day   =  Monday      <$ string "Mon"
>           <|> Tuesday      <$ try (string "Tue")
>           <|> Wednesday <$ string "Wed"
>           <|> Thursday    <$ string "Thu"
>           <|> Friday         <$ string "Fri"
>           <|> Saturday     <$ try (string "Sat")
>           <|> Sunday       <$ string "Sun"
>           <?> "Failed to parse day"

These are all building blocks required to parse the CalendarTime from the System.Time module. The thing to keep in mind is that each line with a '<$' has to be read from right to left. E.g. in the first line of p_month, January <$ try (string "Jan"). The data constructor January is returned if the string that is parsed contains "Jan". Also note the 'try' function. This is Parsec's look-ahead function. It modifies the behavior of the string matching. Parsec tries to match the string, but if it fails, it restores the original string that is being parsed. This is important if you have several patterns that start identically.
Let's say I omitted 'try', and I'd attempt parse the string "Jun". Parsec would see the letter "J" and it would attempt to match against the first matching string. That's 'string "Jan" in this case. So it would consume the letter "J" and continue. It then expects to see the next letter "a". But the string contains "un" (remember "J" has already been consumed). At that point it would fail with an exception saying something along the lines of "expected 'a' but found 'u'". So even though "Jun" is a valid option, Parsec couldn't properly parse it without 'try'. On the other hand, Parsec restores the string to "Jun" if 'string "Jan"' fails and the continues with the next viable option.
I suggest reading the chapter about Parsec on RealWorldHaskell, it explains these and more Parsec functions in detail.

By the way, you can run any of those parsers directly in GHCi if you like to experiment.
parse p_month "(error)" "Jan and some random string you want to parse"

We can use the parsers we've seen so far as building blocks to construct a bigger, more powerful parser that can parse the ps' date string. That date string looks like this "Mon Dec 20 17:40:06 2009". This time, you have to read each line from left to right. For example (p_day <* skipSpaces) parses the day and ignores any whitespaces. The parsed day will then be the first parameter of 'toCTime'. I defined toCTime as a wrapper around the CalendarTome constructor for two reasons. Firstly not all parameters are required. Second and more importantly, by reordering the in the parameters of the toElapsedTime wrapper, I can  make use the applicative functions (<$>) and (<*>) instead of the do-notation.

> p_time :: CharParser () CalendarTime
> p_time = toCTime <$> (p_day <* skipSpaces)
>                                <*> (p_month <* skipSpaces)
>                                <*> (p_Int <* skipSpaces)
>                                <*> (p_Int <* char ':')
>                                <*> (p_Int <* char ':')
>                                <*> (p_Int <* skipSpaces)
>                                <*> (p_Int <* skipSpaces)
>    where toCTime d m mday th tm ts y = CalendarTime y m mday th tm ts 0 d 0 "UTC" 0 False

The elapsed time looks like this: "01:13:13". etime is a bit tricky because it has a variable length. For example if the process has been running for 14 seconds, etime would be displayed as just "14". If the process has been running for five minutes, it would be displayed as "05:00".

> p_elapsed_time :: CharParser () ElapsedTime
> p_elapsed_time = toElapsedTime <$> sepBy p_Int (char ':')
>    where
>     toElapsedTime (h:m:s:_) = ElapsedTime h m s
>     toElapsedTime [m,s]     = ElapsedTime 0 m s
>     toElapsedTime [s]       = ElapsedTime 0 0 s
>     toElapsedTime []        = ElapsedTime 0 0 0

A command will look like this "ruby18: portinstall: [2/3] multimedia/win32-codecs (ruby18)". You can see that there are three ports in total that will be installed. The win32-codecs video codecs are being installed at the moment. The Ruby stuff is due to the fact that portinstall is written in ruby and is always shown for every port. We can ignore that.
The command parser looks like this:

> p_command :: String -> CharParser () Command
> p_command progName = Command <$> (string "ruby" *> p_Int *> string ": " *> string progName *> string ": [" *> p_Int)
>                              <*> (char '/' *> p_Int <* string "] ")
>                              <*> (many (noneOf " "))
>
> p_pid :: String -> CharParser () (Int, Command)
> p_pid progName = (,) <$> (skipSpaces *> p_Int) <*> (skipSpaces *> p_command progName)
>
> p_times :: Int -> CharParser () (CalendarTime, ElapsedTime)
> p_times pid = (,) <$> (skipSpaces *> (string (show pid)) *> skipSpaces *> p_time) <*> (skipSpaces *> p_elapsed_time)
>
>
> tryParse :: CharParser () a -> [String] -> Maybe a
> tryParse _ [] = Nothing
> tryParse p_line (l:ls) =
>    case parse p_line "Error While Parsing" l of
>       Left _      -> tryParse p_line ls
>       Right match -> Just match
>    
> lookupPidFor :: String -> [String] -> Maybe (Int, Command)
> lookupPidFor programName = tryParse (p_pid programName)
>
> lookupStatusOf :: [String] -> (Int, Command) -> Maybe (CalendarTime, ElapsedTime, Command)
> lookupStatusOf linesToParse (pid, command) = tryParse (p_times pid) linesToParse >>= \(ct, et) ->
>    Just (ct, et, command)
>

Formatting

I use printf from the Text.Printf package to format the final output. It behaves exactly like the C counterpart.

> formatStatus :: String -> CalendarTime -> ElapsedTime -> Command -> String
> formatStatus progName start elapsed command =
>    printf "%s status:\n\tstarted on %s %s %d at %02d:%02d:%02d (elapsed: %02d:%02d:%02d)\n\tworking on %s (%d of %d)"
>            progName wday month day sHour sMin sSec eHour eMin eSec port pNum pCount
>       where wday   = (show . ctWDay) start
>             month  = (show . ctMonth) start
>             day    = ctDay start
>             sHour  = ctHour start
>             sMin   = ctMin start
>             sSec   = ctSec start
>             eHour  = hours elapsed
>             eMin   = minutes elapsed
>             eSec   = seconds elapsed
>             port   = portName command
>             pNum   = portNum command
>             pCount = portCount command

Input Handling

Handling user input trough the GetOpt package is easy. I barely scratched the surface. Two features that would be handy even in this simple app, would be the declaration of required and/or mutually exclusive options. I want that the user picks either '-i' or '-u'. But I couldn't enforce it through GepOpt. It fails if you don't pick any option and if you pick more than one, it will silently go with the first one.

> data Opts = Opts { program :: String } deriving Show
>
> options :: [OptDescr Opts]
> options =  [Option ['u'] ["portupdate"] (NoArg (Opts {program = "portupgrade"})) "show status of portupdate",
>             Option ['i'] ["portinstall"] (NoArg (Opts {program = "portinstall"})) "show status of portinstall"]
>
> parseOpts :: [String] -> IO ([Opts])
> parseOpts args =
>     case getOpt RequireOrder options args of
>         ([], _, []) -> fail ("No option selected " ++ usageInfo header options)
>         (opts, _, []) -> return opts
>         (_, _, errs) -> fail (concat errs ++ usageInfo header options)
>     where
>         header = "Usage: portstatus [OPTION...]"

Putting it all together

> main :: IO ()
> main = do
>    optList <- parseOpts =<< getArgs
>    let programName = program $ head optList
>
>    psPids <- callPsWith "pid command"
>    psTimes <- callPsWith "pid lstart etime"
>   
>    case (lookupPidFor programName (lines psPids)) >>= (lookupStatusOf (lines psTimes)) of
>       Nothing -> putStrLn $ "Couldn't find a running instance of " ++ programName
>       Just (lstart, etime, command) -> putStrLn $ formatStatus programName lstart etime command