More Scheming with Haskell

Posted by sjs
on Thursday, June 14

It’s been a little while since I wrote about Haskell and the Scheme interpreter I’ve been using to learn and play with both Haskell and Scheme. I finished the tutorial and got myself a working Scheme interpreter and indeed it has been fun to use it for trying out little things now and then. (Normally I would use Emacs or Dr. Scheme for that sort of thing.) There certainly are interesting things to try floating around da intranet. And also things to read and learn from, such as misp (via Moonbase).

I’m going to describe two new features of my Scheme in this post. The second one is more interesting and was more fun to implement (cond).

Parsing Scheme integers

Last time I left off at parsing R5RS compliant numbers, which is exercise 3.3.4 if you’re following along the tutorial. Only integers in binary, octal, decimal, and hexadecimal are parsed right now. The syntaxes for those are #b101010, #o52, 42 (or #d42), and #x2a, respectively. To parse these we use the readOct, readDec, readHex, and readInt functions provided by the Numeric module, and import them thusly:

import Numeric (readOct, readDec, readHex, readInt)

In order to parse binary digits we need to write a few short functions to help us out. For some reason I couldn’t find binDigit, isBinDigit and readBin in their respective modules but luckily they’re trivial to implement. The first two are self-explanatory, as is the third if you look at the implementation of its relatives for larger bases. In a nutshell readBin says to: “read an integer in base 2, validating digits with isBinDigit.”

-- parse a binary digit, analagous to decDigit, octDigit, hexDigit
binDigit :: Parser Char
binDigit = oneOf "01" 

-- analogous to isDigit, isOctdigit, isHexDigit
isBinDigit :: Char -> Bool
isBinDigit c = (c == '0' || c == '1')

-- analogous to readDec, readOct, readHex
readBin :: (Integral a) => ReadS a
readBin = readInt 2 isBinDigit digitToInt

The next step is to augment parseNumber so that it can handle R5RS numbers in addition to regular decimal numbers. To refresh, the tutorial’s parseNumber function looks like this:

parseNumber :: Parser LispVal
parseNumber = liftM (Number . read) $ many1 digit

Three more lines in this function will give us a decent starting point:

parseNumber = do char '#'
                 base <- oneOf "bdox" 
                 parseDigits base
           <|> (many1 digit >>= return . Number . read)

Translation: First look for an R5RS style base, and if found call parseDigits with the given base to do the dirty work. If that fails then fall back to parsing a boring old string of decimal digits.

That brings us to actually parsing the numbers. parseDigits is simple, but there might be a more Haskell-y way of doing this.

-- Parse a string of digits in the given base.
parseDigits :: Char -> Parser LispVal
parseDigits base = do digits <- many1 d
                      return . Number . fst . head . f $ digits
                      where f = case base of
                                  'b' -> readBin
                                  'd' -> readDec
                                  'o' -> readOct
                                  'x' -> readHex
                            d = case base of
                                  'b' -> binDigit
                                  'd' -> digit
                                  'o' -> octDigit
                                  'x' -> hexDigit

The trickiest part of all this was figuring out how to use the various readFoo functions properly. They return a list of pairs so head grabs the first pair and fst grabs the first element of the pair. Once I had that straight it was smooth sailing. Having done this, parsing R5RS characters (#\a, #\Z, #\?, ...) is a breeze so I won’t bore you with that.

The cond function

It still takes me some time to knit together meaningful Haskell statements. Tonight I spent said time cobbling together an implementation of cond as a new special form. Have a look at the code. The explanation follows.

1
2
3
4
5
6
7
8
9
eval env (List (Atom "cond" : List (Atom "else" : exprs) : [])) =
    liftM last $ mapM (eval env) exprs
eval env (List (Atom "cond" : List (pred : conseq) : rest)) = 
    do result <- eval env $ pred
       case result of
         Bool False -> case rest of
                         [] -> return $ List []
                         _ -> eval env $ List (Atom "cond" : rest)
         _ -> liftM last $ mapM (eval env) conseq
  • Lines 1-2: Handle else clauses by evaluating the given expression(s), returning the last result. It must come first or it’s overlapped by the next pattern.
  • Line 3: Evaluate a cond by splitting the first condition into predicate and consequence, tuck the remaining conditions into rest for later.
  • Line 4: Evaluate pred
  • Line 5: and if the result is:
  • Line 6: #f then look at the rest of the conditions.
  • Line 7: If there are no more conditions return the empty list.
  • Line 8: Otherwise call ourselves recursively with the remaining conditions.
  • Line 9: Anything other than #f is considered true and causes conseq to be evaluated and returned. Like else, conseq can be a sequence of expressions.

So far my Scheme weighs in at 621 lines, 200 more than the tutorial’s final code listing. Hopefully I’ll keep adding things on my TODO list and it will grow a little bit more. Now that I have cond it will be more fun to expand my stdlib.scm as well.

inspirado 0

Posted by sjs
on Tuesday, May 22

spyderous is a Gentoo dev and I read his posts via the Gentoo planet (and again on the freedesktop.org planet).

He recently mentioned an idea to foster participation in Gentoo (or any other project) by aggregating personal project plans for people to browse. I thought it sounded cool so I started coding and came up with what I call inspirado.

It’s fairly basic so far but it’s only a week old. It is RESTful and you can get XML for most pages by appending .xml to the URL (or by using curl and setting the HTTP Accept header). Eventually Atom and/or RSS should be available as well.

Note that everything you see there is purely for testing purposes and any changes you make are most likely going to be blown away.

There are several features on my TODO list but I’d love to hear about any you might have. Write to sami.samhuri@gmail.com if you have suggestions or anything else to say.

(Inspirado is, of course, a Rails app.)

Enumerable#pluck and String#to_proc for Ruby 2

Posted by sjs
on Thursday, May 10

I wanted a method analogous to Prototype’s pluck and invoke in Rails for building lists for options_for_select. Yes, I know about options_from_collection_for_select.

I wanted something more general that I can use anywhere – not just in Rails – so I wrote one. In a second I’ll introduce Enumerable#pluck, but first we need some other methods to help implement it nicely.

First you need Symbol#to_proc, which shouldn’t need an introduction. If you’re using Rails you have this already.

1
2
3
4
5
6
7
8
9
10
11
class Symbol
  # Turns a symbol into a proc.
  #
  # Example:
  #   # The same as people.map { |p| p.birthdate }
  #   people.map(&:birthdate)
  #
  def to_proc
    Proc.new {|thing, *args| thing.send(self, *args)}
  end
end

Next we define String#to_proc, which is nearly identical to the Array#to_proc method I previously wrote about.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class String
  # Turns a string into a proc.
  #
  # Example:
  #   # The same as people.map { |p| p.birthdate.year }
  #   people.map(&'birthdate.year')
  #
  def to_proc
    Proc.new do |*args|
      split('.').inject(args.shift) do |thing, msg|
        thing = thing.send(msg.to_sym, *args)
      end
    end
  end
end

Finally there’s Enumerable#to_proc which returns a proc that passes its parameter through each of its members and collects their results. It’s easier to explain by example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
module Enumerable
  # Effectively treats itself as a list of transformations, and returns a proc
  # which maps values to a list of the results of applying each transformation
  # in that list to the value.
  #
  # Example:
  #   # The same as people.map { |p| [p.birthdate, p.email] }
  #   people.map(&[:birthdate, :email])
  #
  def to_proc
    @procs ||= map(&:to_proc)
    Proc.new do |thing, *args|
      @procs.map do |proc|
        proc.call(thing, *args)
      end
    end
  end
end

Here’s the cool part, Enumerable#pluck for Ruby in all its glory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
module Enumerable
  # Use this to pluck values from objects, especially useful for ActiveRecord models.
  # This is analogous to Prototype's Enumerable.pluck method but more powerful.
  #
  # You can pluck values simply, like so:
  #   >> people.pluck(:last_name)  #=> ['Samhuri', 'Jones', ...]
  #
  # But with Symbol#to_proc defined this is effectively the same as:
  #   >> people.map(&:last_name)   #=> ['Samhuri', 'Jones', ...]
  #
  # Where pluck's power becomes evident is when you want to do something like:
  #   >> people.pluck(:name, :address, :phone)
  #        #=> [['Johnny Canuck', '123 Maple Lane', '416-555-124'], ...]
  #
  # Instead of:
  #   >> people.map { |p| [p.name, p.address, p.phone] }
  #
  #   # map each person to: [person.country.code, person.id]
  #   >> people.pluck('country.code', :id)
  #        #=> [['US', 1], ['CA', 2], ...]
  #
  def pluck(*args)
    # Thanks to Symbol#to_proc, Enumerable#to_proc and String#to_proc this Just Works(tm)
    map(&args)
  end
end

I wrote another version without using the various #to_proc methods so as to work with a standard Ruby while only patching 1 module.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
module Enumerable
  # A version of pluck which doesn't require any to_proc methods.
  def pluck(*args)
    procs = args.map do |msgs|
      # always operate on lists of messages
      if String === msgs
        msgs = msgs.split('.').map {|a| a.to_sym} # allow 'country.code'
      elsif !(Enumerable === msgs)
        msgs = [msgs]
      end
      Proc.new do |orig|
        msgs.inject(orig) { |thing, msg| thing = thing.send(msg) }
      end
    end
    
    if procs.size == 1
      map(&procs.first)
    else
      map do |thing|
        procs.map { |proc| proc.call(thing) }
      end
    end
  end
end

It’s just icing on the cake considering Ruby’s convenient block syntax, but there it is. Do with it what you will. You can change or extend any of these to support drilling down into hashes quite easily too.

Update #1: Fixed a potential performance issue in Enumerable#to_proc by saving the results of to_proc in @procs.

Array#to_proc: A complement to Symbol#to_proc

Posted by sjs
on Thursday, May 03

By now many Ruby programmers are familiar with the magic that is Symbol#to_proc.

If you haven’t seen anything such as youngest = people.map(&:age).min yet here’s a quick explanation of the syntax.

  • Ruby calls to_proc on the block parameter if it is not already a Proc. (coercion)
  • Since a symbol is a message, Symbol#to_proc simply constructs a block that sends self (the symbol) to the first block parameter.
  • The code above is equivalent to youngest = people.map { |p| p.age }.min
  • The few characters saved are significant if you are sufficiently lazy, or if you frequently chain short blocks like that together.

This is a great way to clean up your code, and the brevity will have you groaning when you have to type out something such as: birth_years = people.map { |p| p.birthdate.year }

But there is a simple way to make that shorter using a trick similar to Symbol#to_proc. To me it seems logical that if map(&:sym) sends :sym to each element of the enumerable object, then an array of symbols should chain the calls together on each element of the enumerable.

If you’ve looked at the implementation of Symbol#to_proc in Rails then this should look familiar as it’s nearly the same code.

1
2
3
4
5
6
7
8
9
10
11
module ArrayExtensions
  # Turns an array of symbols into a simple proc, which is especially useful for enumerations. Examples:
  #
  #   # The same as people.map { |p| p.birthdate.year }
  #   people.map(&[:birthdate, :year])
  def to_proc
    Proc.new do |*args|
      self.inject(args.shift) { |obj, msg| obj = obj.__send__(msg, *args) }
    end
  end
end

This makes the following code valid, and equivalent to the previous example using birth_years.


birth_years = people.map(&[:birthdate, :year])

You can use strings, like so: birth_years = people.map(&%w[birthdate year]), but I don’t dig the &% part of it. To each their own.

Then all that needs to be done is load the extension. I’m doing all this in the context of Rails so I’m setting this up in my environment.rb.


Array.send(:include, ArrayExtensions)

A Scheme parser in Haskell: Part 1

Posted by sjs
on Thursday, May 03
From Write Yourself a Scheme in 48 hours:

Basically, a monad is a way of saying “there’s some extra information attached to this value, which most functions don’t need to worry about”. In this example, the “extra information” is the fact that this action performs IO, and the basic value is nothing, represented as “()”. Monadic values are often called “actions”, because the easiest way to think about the IO monad is a sequencing of actions that each might affect the outside world.

I really like this tutorial. I’m only on part 3.3 of 12, parsing, but I’m new to Haskell so I’m learning left, right & centre. The exercises are taking me hours of reading and experimenting, and it’s lots of fun! ghc’s errors are usually quite helpful and of course ghci is a big help as well.

I’m going to explain one of the exercises because converting between the various syntax for dealing with monads wasn’t plainly obvious to me. Perhaps I wasn’t paying enough attention to the docs I read. In any case if you’re interested in Haskell at all, I recommend the tutorial and if you’re stuck on exercise 3.3.1 like I was then come on back here. Whether you’re following the tutorial or not the point of this post should stand on its own with a basic knowledge of Haskell.

Last night I rewrote parseNumber using do and >>= (bind) notations (ex. 3.3.1). Here’s parseNumber using the liftM method given in the tutorial:

parseNumber :: Parser LispVal
parseNumber :: liftM (Number . read) $ many1 digit

Okay that’s pretty simple right? Let’s break it down, first looking at the right-hand side of the $ operator, then the left.

  • many1 digit reads as many decimal digits as it can.
  • Number . read is a function composition just like we’re used to using in math. It applies read to its argument, then applies Number to that result.
  • liftM is concisely and effectively defined elsewhere, and I’ll borrow their description:

liftM f m lets a non-monadic function f operate on the contents of monad m

liftM’s type is also quite telling: liftM :: (Monad m) => (a -> b) -> (m a -> m b)

In a nutshell liftM turns a function from a to b to a function from a monad containing a to a monad containing b.

That results in a function on the left-hand side of $, which operates on and outputs a monad. The content of the input monad is a String. The content of the output monad is a LispVal (defined earlier in the tutorial). Specifically it is a Number.

The $ acts similar to a pipe in $FAVOURITE_SHELL, and is right associative which means the expression on the right is passed to the expression (function) on the left. It’s exactly the same as (liftM (Number . read)) (many1 digit) except it looks cleaner. If you know LISP or Scheme (sadly I do not) then it’s analogous to the apply function.

So how does a Haskell newbie go about re-writing that using other notations which haven’t even been explained in the tutorial? Clearly one must search the web and read as much as they can until they understand enough to figure it out (which is one thing I like about the tutorial). If you’re lazy like me, here are 3 equivalent pieces of code for you to chew on. parseNumber’s type is Parser LispVal (Parser is a monad).

Familiar liftM method:
parseNumber -> liftM (Number . read) $ many1 digit
Using do notation:
parseNumber -> do digits <- many1 digit
                  return $ (Number . read) digits

If you’re thinking “Hey a return, I know that one!” then the devious masterminds behind Haskell are certainly laughing evilly right now. return simply wraps up it’s argument in a monad of some sort. In this case it’s the Parser monad. The return part may seem strange at first. Since many1 digit yields a monad why do we need to wrap anything? The answer is that using <- causes digits to contain a String, stripped out of the monad which resulted from many1 digit. Hence we no longer use liftM to make (Number . read) monads, and instead need to use return to properly wrap it back up in a monad.

In other words liftM eliminates the need to explicitly re-monadize the contents as is necessary using do.

Finally, using >>= (bind) notation:
parseNumber -> many1 digit >>= \digits ->
               return $ (Number . read) digits

At this point I don’t think this warrants much of an explanation. The syntactic sugar provided by do should be pretty obvious. Just in case it’s not, >>= passes the contents of its left argument (a monad) to the function on its right. Once again return is needed to wrap up the result and send it on its way.

When I first read about Haskell I was overwhelmed by not knowing anything, and not being able to apply my previous knowledge of programming to anything in Haskell. One piece of syntax at a time I am slowly able to understand more of the Haskell found in the wild.

I’m currently working on ex. 3.3.4, which is parsing R5RS compliant numbers (e.g. #o12345670, #xff, #d987). I’ll probably write something about that once I figure it out, but in the meantime if you have any hints I’m all ears.

Update #1: I should do more proof-reading if I’m going to try and explain things. I made some changes in wording.

ActiveRecord::Base.find_or_create and find_or_initialize 0

Posted by sjs
on Wednesday, April 11

If you work with Rails then you may be familiar with the dynamic attribute-based finder methods in ActiveRecord::Base. They are class methods used such as User.find_or_initialize_by_name(‘sjs’) or User.find_or_initialize_by_name_and_password(‘sjs’, ‘insanely secure password’). There is also a find_or_create_by_… version.

If you want to do this for many attributes it gets old very, very quickly. This should just accept a hash of attributes like the other ActiveRecord methods we’re used to. 3 methods later and the problem is solved, read on…

Automatic inclusion of useful modules in Ruby

Posted by sjs
on Friday, March 30

Last summer I did some reading about Haskell and read some intros on using it. I never got into it though as I was busy working and learning other things at the time. Today I was reminded about Haskell and decided to really put an effort into learning the basics and coding a few small programs.

Classes in Haskell are very interesting, and fairly different from classes I’m used to in Ruby or Java. One feature about them that struck me as particularly cool is that you can define methods in terms of each other, and if the minimum requirements are met then the others are automatically defined in terms of the provided methods. It’s genius I tell you, but my explanation is lacking so check the linked Wikibook at the beginning of this paragraph to grasp what I’m talking about. When you get back the rest of this should make sense…

Chronic time parsing in Ruby 0

Posted by sjs
on Friday, March 30

The date & time extensions provided by Rails are convenient but they don’t provide anything when it comes to relative dates using weekday names, for example. I’ve been frustrated writing weekday arithmetic so I went in search of something easier. What I found was Chronic!

Diggscuss 0.9

Posted by sjs
on Sunday, March 25

I rewrote the Reply to Reply user script for Digg and am releasing for the first time under the name Diggscuss. I hope Digg doesn’t send me a C&D letter. ;-)

It resides at the same place on userscripts.org.

Once again if you like this script then Digg it!

Digg v4: Reply to replies (Greasemonkey script) 0

Posted by sjs
on Friday, March 09

Update: This now works with regular comments and friend comments as well. It is now even more efficient (I reduced 3-4 main loops to one and reduced several globals to one).

Digg v4 has been out for months now and no one has updated the Digg reply-to-reply userscript for Greasemonkey yet … until now. :-)

Install the script, or right-click and choose “Save file as…” to save the file and view or edit the source.

Userscripts.org page

Class method? Instance method? It doesn't matter to PHP 1

Posted by sjs
on Friday, July 21

Update: This has been discussed for PHP6. A little late, but I guess better than never.

I made a mistake while I was coding, for shame! Anyway this particular mistake was that I invoked a class method on the wrong class. The funny part was that this method was an instance method in the class which I typed by mistake. In the error log I saw something like “Invalid use of $this in class function.”

I knew for a fact I hadn’t used $this in a class method, so it was kind of a confusing error. I went to the file in question and found out that it was calling an instance method as a class method. Now that is some crazy shit.

I would fully expect the PHP parser to give me an error like “No class method [foo] in class [blah]”, rather than try and execute it as a class method. The syntax is completely different; you use :: to call a class method and -> to call an instance method. And you use the name of a class when you call a class method.

This code:


class Foo {
  public static function static_fun()
  {
    return "This is a class method!\n";
  }

  public function not_static()
  {
    return "This is an instance method!\n";
  }
}

echo '<pre>';
echo "From Foo:\n";
echo Foo::static_fun();
echo Foo::not_static();
echo "\n";

echo "From \$foo = new Foo():\n";
$foo = new Foo();
echo $foo->static_fun();
echo $foo->not_static();
echo '</pre>';

Produces:

From Foo:
This is a class method!
This is an instance method!

From $foo = new Foo():
This is a class method!
This is an instance method!

What the fuck?! http://www.php.net/manual/en/language.oop5.static.php is lying to everyone.

Late static binding 0

Posted by sjs
on Wednesday, July 19

Update: This has been discussed and will be uh, sort of fixed, in PHP6. You’ll be able to use static::my_method() to get the real reference to self in class methods. Not optimal, but still a solution I guess.

As colder on ##php (freenode) told me today, class methods in PHP don’t have what they call late static binding. What’s that? It means that this code:


class Foo
{
  public static function my_method()
  {
    echo "I'm a " . get_class() . "!\n";
  }
}

class Bar extends Foo
{}

Bar::my_method();

outputs “I’m a Foo!”, instead of “I’m a Bar!”. That’s not fun.

Using __CLASS__ in place of get_class() makes zero difference. You end up with proxy methods in each subclass of Foo that pass in the real name of the calling class, which sucks.


class Bar extends Foo
{
  public static function my_method()
  {
    return parent::my_method( get_class() );
  }
}

I was told that they had a discussion about this on the internal PHP list, so at least they’re thinking about this stuff. Too bad PHP5 doesn’t have it. I guess I should just be glad I won’t be maintaining this code.

The resident PHP coder said “just make your code simpler”, which is what I was trying to do by removing duplication. Too bad that plan sort of backfired. I guess odd things like this are where PHP starts to show that OO was tacked on as an after-thought.

Ruby and Rails have spoiled me rotten 0

Posted by sjs
on Sunday, July 16

It’s true. I’m sitting here coding in PHP using the Zend Framework and all I can think about is how much nicer Rails is, or how much easier it is to do [x] in Ruby. It’s not that the Zend Framework is bad or anything, it’s quite nice, but you just can’t match Ruby’s expressiveness in a language like PHP. Add the amazing convenience Rails builds on top of Ruby and that’s a really hard combo to compete with.

Working with the Zend Framework 0

Posted by sjs
on Thursday, July 06

At Seekport I’m currently working on an app to handle the config of their business-to-business search engine. It’s web-based and I’m using PHP, since that’s what they’re re-doing the front-end in. Right now it’s a big mess of Perl, the main developer (for the front-end) is gone, and they’re having trouble managing it. I have read through it, and it’s pretty dismal. They have config mixed with logic and duplicated code all over the place. There’s an 1100 line sub in one of the perl modules. Agh!

WikipediaFS on Linux, in Python 0

Posted by sjs
on Sunday, May 07

Till now I’ve been using my own version of pywikipedia for scripting MediaWiki, and it works well. But I read about WikipediaFS and had to check it out. It’s a user space filesystem for Linux that’s built using the Python bindings for FUSE. What it does is mounts a filesystem that represents your wiki, with articles as text files. You can use them just like any other files with mv, cp, ls, vim, and so on.

There hasen’t been any action on that project for 13 months though, and it doesn’t work on my wiki (MediaWiki 1.4.15) so I’m going to try and make it work after I upgrade to MediaWiki 1.6.3 tonight. This will be pretty cool when it works. I haven’t looked at the code yet but it’s only 650 lines.