CLiki: cl-groupby

cl-groupby

Short implementation of Scalas groupBy functionality for Common Lisp

Homepage: https://github.com/wlbr/cl-groupby

Dependencies: none (except asdf)
License: MIT
Author: Michael Wolber

Motivation

Let's think of the task "find the first unique character" in a given string.

(defparameter str "lisp rulez")

The naive approach in Lisp is as simple as elegant:

(char (remove-if #'(lambda (x) (> (count x str) 1)) str) 0)

For simplicity let's ignore the point that it will throw an error when there is no uniqe char in str. Unfortunately the performance of the function is quite bad, quadratic runtime.

A quicker solution would look like the following lines. While this is fast, it is not as intuitive as the first solution.

(defun first-uniq-char (str)
  (let* ((size (length str))
         (cnt (make-hash-table :size size)))
    (do
        ((i 0 (incf i)) )
        ((<= size i))
      (incf (gethash (char str i) cnt 0)))
    (do 
        ((i 0 (incf i))
         (c nil))
        ((<= size i) 
         c)

Compare both funnctions with a long string:

(defparameter strlong 
  (coerce (loop for i to 10000 
            collect (code-char (+ (char-code #\a) (random 24)))) 
    'string))

(time (char (remove-if #'(lambda (x) (> (count x str) 1)) strlong) 0))
(time (first-uniq-char strlong))

A factor of almost 4000 on my machine.

Now we make use of our new groubby function:

(time (remove-if #'(lambda (x) (> (second x) 1)) 
         (mapcar #'(lambda (x) (list (first x) (length (second x)))) 
                 (gb:groupby #'(lambda (x) x) strlong))))

This seems to be only a little bit slower than the second, quick function above, while being a an intuitive and quick solution again. First group the data by its identity, then count each of the groups and remove the groups with more than one elements.

As you can see, with the groupby higher order function it is possible to write very comprehensive code, that takes the standard lisp solutions to the next level. Its drawback seems to be, that it is quite not simple to identify the good use of this feature to proper solve problems.

language extension convenience library