A Song for HList

Felix Bogdanski

since 30.05.2016


In this article I want to motivate the existence of heterogenous lists, often simply called HLists, in Scala. Never heard of them? In NeuroFlow, you use them, because the layout graph is implemented as a HList. Think of a linked list that knows the types of its objects, instead of generalizing all objects to a type all objects have in common. While being much like the TupleN class from a semantic perspective, it overcomes severe limitations of it. Together, we will derive a minimalistic HList implementation. We will show why HLists not only are a good alternative for tuples, but also pretty good in reducing boilerplate and abstracting over arity. You can check the code of this HList implementation on GitHub. If you are on the hunt for a full-fledged HList library ready for production use or some really good inspiration of what is possible at the type-level, I strongly recommend Shapeless.

The Challenge

Let's imagine a very simple telegram machine. The machine is a write-only machine, so all we can do is to send objects. Anyone can listen to this telegram machine if needed, but all we care for is sending objects. The machine offers a simple API we can call to send objects in sequential order, e. g. letters or numbers, so the telegram machine can pipe these objects over the real wire to some other telegram machine.

The protocol is also very simple: We send object A, if the transmission is okay, we simply get the same object back. If the transmission fails, we don't get it back. This pattern makes it possible for us to resend objects on failure. (Don't you worry about the response time in this example. :-))

The thing is, we can't just send plain objects to the telegram machine. To make sure that our transmission will not fail, the object needs to be wrapped in a sending envelope. This envelope provides a send function to enrich the object with required meta information for a frictionless transmission. Let's code it in Scala:

case class Σ[T](item: T) {
  def send: Option[T] = Random.nextInt(2) match {
    case x if x == 0 => None
    case _ => Some(item)
  }
}

Here, our case class Σ (sigma is for sending envelope) takes an item of type T and offers to send it to the telegram machine. The error probability is 0.5, expressed through this random number trick. When the transmission succeeds, we get Some(item) back, if it fails, we get None (both results being a refinement of Option[T]). No surprises here, the behavior is implemented just as discussed.

Furthermore, let's define some objects and types, so we can send something meaningful over to the telegram machine:

object Alphabet {

  case object A
  case object B
  case object C
  case object D
  // ...
  case object X
  case object Y
  case object Z
  case object __

}

It is important to notice that all objects are absolutely unique. There are no relations between the types. The only type all these objects have in common is Any. Using our Alphabet and Σ we can now build arbitrary combinations and send them to the machine:

val message = (H, E, L, L, O, __, W, O, R, L, D)
val prepared = message.productIterator.map(char => Σ(char))
val extracted = prepared.map(_.send)

The (H, E, L, L, O, __, W, O, R, L, D)-tuple is syntactic sugar applied by the compiler for a built-in Tuple11-class. After wrapping these into our sending envelope Σ, we can finally send the objects in sequential order. Remember, the success of the sending process is expressed as Option[T], depending on the original object type T, and to us, it is somewhat important to not lose the type T right after sending, because we need it for further processing.

val prepared: Iterator[Σ[Any]] = ...
val extracted: Iterator[Option[Any]] = ...

But if we examine the inferred types after working with the objects, it says Any everywhere! Darn, it looks like we lost precious type information all along the way! This is not what we want. All objects that we want to send over the wire are totally unique, they don't extend some common father class. If we operate on Any right after sending, we can in fact just move object pointers, but we can't call their properties, because types carry properties, and we just lost these. The reason for this is Scalas Product implementation for TupleN. To element-wise map over tuples, Scala offers the productIterator, but this iterator is simply Any-typed, so the type information about T is generalized to Any, thus lost. Even worse, the built-in TupleN implementation just goes up to Tuple22 - so this message wouldn't even compile:

(T, H, I, S, __, I, S, __, A, __, V, E, R, Y, __, L, O, N, G, __, T, E, X, T)

Conclusion: by using Scalas elegant, built-in functionality for iterating over tuples, we would lose precious type information. Also, using the tuple, we would be limited to messages of length 22. That doesn't feel right. What if we try it with a regular List instead of a tuple to overcome the length-issue at least:

T :: H :: I :: S :: __ :: I :: S :: __ :: A :: __ :: V :: E :: R :: Y :: __ :: L :: O :: N :: G :: __ :: T :: E :: X :: T :: Nil

Sure, using the list, the length is not the limit anymore, and we could map over this list building and sending envelopes, just as we did with the tuple before, but the loss of type information would be quite the same. The difference is: this time we lose the types right from the start, simply because of the nature of the List type constructor. Because all object types are absolutely different, the Scala compiler will infer the lowest bound it can find, e. g. something like List[Any]. Since we want to keep the type information, using a list will also not work for us.

Let's go back to the tuple. Maybe we shouldn't take ourselves all too serious and live with maximum tuple length 22 for the moment. (Well, messages must be short then - haha). What can we do to not lose type information when operating on items of a tuple? The tuple itself is generic, which is good and mandatory for keeping T. Let's look at the productIterator again:

val message = (H, E, L, L, O, __, W, O, R, L, D)
val prepared: Iterator[Σ[Any]] = message.productIterator.map(char => Σ(char)) // <--!

What do we want to do here? Simply speaking, we want to iterate over an arbitrary combination of objects, wrap these into a sending envelope Σ and send the objects to the machine. What if instead of using the iterator, we provide a generic function which directly lifts all objects of this particular Tuple11[_] to give a Tuple11[Σ[_]]:

def lift11[_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11]
  (m: (_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11)) = {
    def l[A](a: A) = Σ(a)
    (l(m._1), l(m._2), l(m._3), l(m._4), l(m._5), l(m._6), l(m._7),
      l(m._8), l(m._9), l(m._10), l(m._11))
  }

No more evil iterators! The function lift11 would take a Tuple11 and return a Tuple11 with all objects lifted into the sending envelope Σ. Now we could perfectly work with this enriched tuple, i. e. calling the send function to get the resulting Option[T]. So what's the catch?

What if the message length is not 11, but 5? Or 3, 2, 1337? The fact that we would have to implement all remaining liftN functions is obvious. Unfortunately, there comes a whole lot of boilerplate code with this approach. This just doesn't fit to our understanding of elegance. Plus, the limit of 22 objects for a message because of the fixed TupleN instances makes it hard to compose longer messages.

HList to the rescue

To combine the strength of the unbound List class with the type-safety of the TupleN classes, we need to create our own data structure which will suit our needs:

  • Keep track of the individual object types
  • Operate on items without losing their type information, e. g. lifting objects into a higher kinded type Σ
  • Allow a chain of distinct objects of arbitrary shape and length
  • Little boilerplate code, one generic implementation for all cases

It looks like we need a solid HList! After goofing around with the type system for a while and a fresh cup of sencha, we came to a minimal implementation which is just fine for this article:

trait HList {
  def ::[H](head: H): H :: this.type = hlist.::(head, tail = this)
}

case class ::[H, +T <: HList](head: H, tail: T) extends HList

case object HNil extends HList

Let's have a quick look at the details. First, we simply have a marker trait HList. Whenever we need to refer to any other HList on the type-level, we can declare the bounds to be of kind _ <: HList, thus leading the compiler into the right direction when inferring. The case class :: (yes, a valid name for a type) has two value and type parameters, namely the head item of type H as well as the tail list of type +T <: HList. Is it just like a regular list then? No, because both the case class :: and its second type parameter are bound to HList, we have a recursive type instead. The regular list is a recursive data structure on the value-level, but not on the type-level! In contrast, the heterogenous list is a recursive data structure on both the value- and type-level. The case object HNil, also an HList, is for stopping the recursion on both the type- and the value-level. Further, HNil is the starting value for inductive proofs through implicit type class resolution. Every HList comes with a function :: (yes, a valid name). This right-associative operator is for chaining objects in the usual Scala way, thus making the following syntax possible:

val list = H :: E :: L :: L :: O :: Nil
val hlist = H :: E :: L :: L :: O :: HNil

If we compare both, they share the same look and feel on the right hand side of the assignment, with the exception of the last item being HNil instead of Nil for the HList. Let's annotate their types explicitly:

val list: List[Any] = H :: E :: L :: L :: O :: Nil
val hlist: H :: E :: L :: L :: O :: HNil = H :: E :: L :: L :: O :: HNil

The list comes with type List[Any] - this is not even surprising. But, whoooops, when we look at the type of hlist, it says that the type of the list is the list itself. Now, how's that? Are we facing a situation with run-time values being available at compile-time? Almost! Here, the type coincides with the value, but if, for instance, we were using native integers, 1 :: HNil would have the type Int :: HNil, which is not the same.

Keep in mind, there is Scalas special syntax for higher kinded types F[A, B], which allows to write A F B instead, so ::[O, HNil] is equivalent to O :: HNil, and since ::[H, +T <: HList] is a recursive type, this syntax is valid for HLists of any length.

Now we have a rich and distinct type landscape to work with and this is exactly what we need to operate on heterogenous objects while respecting their individual types. The question is: how do we implement these generic operations? How can we lift all objects of arbitrary type T of an HList to give a new HList of objects with shape Σ[T], solely on the type-level?

A Type-Level Functor

Category theory says, an endofunctor is something that maps a function f between a category F:

trait Functor[F[_]] {
  def map[A, B](fa: F[A])(f: A => B): F[B] 
}

Functor[List].map(List(A))(a => Σ(a))

If we try to translate this classical implementation of a Functor to a NaiveHFunctor taking HLists, we sooner or later notice that we would need to provide a dedicated function f for each possible shape of HList:

trait NaiveHFunctor[L <: HList] {
  def map[R <: HList](fa: L)(f: L => R): R = f(fa)
}

object NaiveHFunctor extends NaiveHFunctor[A :: B :: HNil]

NaiveHFunctor.map(A :: B :: HNil)(l => Σ(A) :: Σ(B) :: HNil)
NaiveHFunctor.map(C :: D :: HNil)(l => Σ(C) :: Σ(D) :: HNil) // doesn't compile

Basically, NaiveHFunctor maps from HList L to another HList R, but since we don't know anything about the concrete shapes of both L and R, all we can do is to provide a function between full lists f: L => R. Instead, we would like to use a more granular function g: T => Σ[T] applied element-wise to treat types individually. Our naive translation does not harness the recursive nature of HList, neither on the type nor the value-level. Thus, this implementation would be of little benefit when it comes to avoiding boilerplate.

What we need is comparable to the principle of a Functor, but in order to ultimately abstract over all possible shapes of HList, we need to do the math on the type-level! The idea is to use Scalas implicit lookup mechanism to traverse through the recursive HList. When the compiler finds an HFunctor for HNil - we will make sure that it always will - we can guide the compiler to find an HFunctor for A :: HNil for any type A, et cetera. Think of proof by induction, or peano arithmetics. In fact, Scalas typesystem is turing complete, and our HFunctor will be the connecting piece between all inductive steps:

trait HFunctor[L <: HList, F[_]] {
  type Res <: HList
}

object HFunctor {
  type Aux[L <: HList, R <: HList, F[_]] = HFunctor[L, F] { type Res = R }
}

This HFunctor maps from hlist L to resulting hlist Res, ensuring all individual object types Tn, which are the essence of L, will be liftet into F, solely at the type-level. We write HList(Tn) => HList(F[T]n) to refer to this operation. What is Aux for? When the compiler inductively searches (or constructs) implicit HFunctor instances for a desired result A :: HNil, he needs to find an HFunctor instance for HNil in advance. Therefore, he needs to search for HFunctors with Res=HNil being parameterized from the outside, but Res is a type member, not a type parameter, so the Aux type gives us the possiblity to parameterize this type member. One could ask: why do we use this type member instead of a regular type parameter at all?

def lift[L <: HList, Res <: HList](l: L)(implicit f: HFunctor[L, F]): Res // tedious and not stringent, f.Res is not connected to Res
def lift[L <: HList](l: L)(implicit f: HFunctor[L, F]): f.Res // this saves one type parameter to be inferred

To answer, let's come up with a first signature of lift. If we want to express this function, we need to give it a result type. This means, we need to refer to a result of a type-level computation as a result type. We could model this with a dedicated result type parameter in the signature, sure, but this means, that we would need to explicitly annotate the result type on the left hand side of assignments all the time! Also, the result type of f would not be connected to this result type parameter, thus the compiler would not be able to infer it. To have a formally safe and sound result type, we need to take the type member Res of the topmost instance f found by the compiler. This magic is safe, and in a minute we will know why.

But now is a good moment (wait, now is always a good moment! :-)) to introduce another type-level functor, which we will need to get from sending envelope Σ to the resulting Option after calling send:

trait HFunctor1[L <: HList, F[_]] {
  type Res <: HList
}

trait HFunctor2[L <: HList, F[_], G[_]] {
  type Res <: HList
}

object HFunctor {
  type Aux1[L <: HList, R <: HList, F[_]] = HFunctor1[L, F] { type Res = R }
  type Aux2[L <: HList, R <: HList, F[_], G[_]] = HFunctor2[L, F, G] { type Res = R }
}

Our original HFunctor is called HFunctor1. The new HFunctor2 is for transforming an HList of objects wrapped by higher kinded type F to an HList of objects wrapped by higher kinded type G. In category theory, this is called a natural transformation F ~> G, e. g. Σ ~> Option. We write HList(Fn) => HList(Gn) to refer to this operation in the context of an HList.

Fully equipped, let's finish our implementation of Lift:

trait Lift[F[_]] {

  import HList._
  import HFunctor._

  implicit def r[T]: T => F[T]

  /**
    * Lifts all items of given [[HList]] `l` into `F`.
    */
  def lift[L <: HList](l: L)(implicit f: HFunctor1[L, F]): f.Res = {
    def trav(l: HList): HList = l match {
      case head :: tail => :: (r(head), trav(tail))
      case HNil => HNil
    }
    trav(l).asInstanceOf[f.Res]
  }

  implicit def liftNil: Aux1[HNil, HNil, F] = new HFunctor1[HNil, F] {
    type Res = HNil
  }

  implicit def liftList[H, L <: HList, R <: HList]
    (implicit f: Aux1[L, R, F], g: H => F[H]): Aux1[H :: L, F[H] :: R, F] = new HFunctor1[H :: L, F] {
      type Res = F[H] :: R
    }

}

This is a rather big chunk of code, but it is important to study it as a whole. First, we define a trait Lift which is parameterized by a higher kinded type F. The signature of lift is unchanged; the implementation simply traverses through the given hlist l applying function r on the head to build the result. This result is safely cast to f.Res at run-time.

The actual magic of type-level computation is done through both the liftNil and the liftList implicit functions. The first one gives the compiler an HFunctor f0 for HNil, which is kind of a bastion of calm, because the compiler will always find it. The second one builds an HFunctor fn by merging the previously found HFunctor fn - 1 with itself. This is expressed through the compound result type Aux1[H :: L, F[H] :: R, F]. Even more, this result type is the motor for the compiler to find a valid inference. Using the Aux1 type, we make sure that we always have a version of both lists, the original list L and a possible inference of type R. Now the compiler will recursively glue together these Aux1 functors, beginning with HNil, until it can resolve an instance of Aux1 having the same L as the input of lift. We know that this instance will carry a result Res, and since this Res was built inductively using g: H =>F[H] and the same seed HNil L was built with, we can, with a clear conscience, let the compiler pick this as a valid inference or type-level transformation for the input type L of lift. This is the lifting function HList(Tn) => HList(F[T]n).

The implementation of the transformation function HList(Fn) => HList(Gn) is straightforward, using an additional higher kinded type G:

trait Trans[F[_], G[_]] {

  import HList._
  import HFunctor._

  implicit def r[T]: F[T] => G[T]

  /**
    * Applies the Natural Transformation F ~> G for all items of given [[HList]] `l`.
    */
  def trans[L <: HList](l: L)(implicit f: HFunctor2[L, F, G]): f.Res = {
    def trav(l: HList): HList = l match {
      case head :: tail => :: (r(head.asInstanceOf[F[Any]]), trav(tail))
      case HNil => HNil
    }
    trav(l).asInstanceOf[f.Res]
  }

  implicit def transHNil: Aux2[HNil, HNil, F, G] = new HFunctor2[HNil, F, G] {
    type Res = HNil
  }

  implicit def transHList[H, L <: HList, R <: HList]
    (implicit f: Aux2[L, R, F, G], g: F[H] => G[H]): Aux2[F[H] :: L, G[H] :: R, F, G] = new HFunctor2[F[H] :: L, F, G] {
      type Res = G[H] :: R
    }

}

Note that Aux2[F[H] :: L, G[H] :: R, F, G] actually denotes the natural transformation F[H] ~> G[H].

To finish, let's provide two concrete instance objects of our fresh Lift and Trans traits! Let's stick to our telegram example and go for Σ and Option as fixed higher kinded types:

object Liftable extends Lift[Σ] {
  implicit def r[A]: A => Σ[A] = a => Σ(a)
}

object Transformable extends Trans[Σ, Option] {
  implicit def r[A]: (Σ[A]) => Option[A] = a => a.send
}

Now, after all the hard work is done, we can lift and transform heterogenous objects of arbitrary length and shape in a type-safe manner:

val hi = H :: I :: HNil
val high = lift(hi)
val higher: Σ[Σ[H]] :: Σ[Σ[I]] :: HNil = lift(lift(hi))

val a = A :: 1 :: "2" :: 3.0 :: HNil
val b = lift(a)
val c: Σ[A] :: Σ[Int] :: Σ[String] :: Σ[Double] :: HNil = b
val d = trans(c)
val e: Option[A] :: Option[Int] :: Option[String] :: Option[Double] :: HNil = d

Building the Mapper

What have we built so far? We can lift objects of HList into higher kinded type F, HList(Tn) => HList(F[T]n). Further, we can apply a natural transformation for objects wrapped by higher kinded type F to higher kinded type G, HList(Fn) => HList(Gn). The attentive reader will have noticed, that both implementations Lift and Trans use a simple trick to inject the actual lifting and transformation functions respectively:

trait Lift[F[_]] {
  /* ... */
  implicit def r[T]: T => F[T]
  /* ... */

  implicit def liftList[H, L <: HList, R <: HList]
    (implicit f: Aux1[L, R, F], g: H => F[H]): Aux1[H :: L, F[H] :: R, F] = new HFunctor1[H :: L, F] {
      type Res = F[H] :: R
    }
}

If we examine liftList, we can see that the function does need an implicit function g: H => F[H] in order to compile. With the implicit function r[T]: T => F[T] being in scope, we know that this will always happen. This technique is not just very simple, it is even safe: because the higher kinded type F is fixed by the trait Lift, we know that, no matter the type T in r, we always have a general function Any => F[Any], which we can use at run-time. Since we never actually change the objects of the HList, the type T of r is only of interest at the type-level. This is okay as long as we want to work with objects in the context of higher kinded types, but what if we want to apply arbitrary functions on these objects? Clearly, this kind of type-level transformation needs special care, because it is a tough idea to simply apply a generalist function Any => Any on heterogenous objects at run-time. How can we make sure that the result of the type-level computation influences the run-time behavior of our code?

object Players {
  case class King(name: String)
  case class Queen(name: String)
  case class Pawn(name: String)
  case class Tower(name: String)
}

val players = King("Karl") :: Queen("Quintessa") :: Pawn("Power Paul") :: Tower("Theodore") :: HNil

We have a King, a Queen, a Pawn and a Tower. And again, because we are exploring heterogenous lists, these types share absolutely nothing. How can we map over players to have a new HList with the names of the players?

trait HMapper0[L <: HList] extends HFunctor0[L] {
  def fs: HList
}

object HMapper {
  type Aux0[L <: HList, R <: HList] = HMapper0[L] { type Res = R }
}

trait Func[X, Y] {
  val f: X => Y
}

Let's introduce HMapper0, which extends our previously defined HFunctor0 by a function fs exposing access to an HList. This will be the place to store all mapping functions the compiler resolves implicitly. Another important part of the puzzle is Func[X, Y], which is just a wrapper for functions X => Y. We need this to not confuse the compiler, because if we provide plain implicit functions, they will collide with certain implicit functions already defined in scala.Predef (ambiguous implicits).

object ShowablePlayers {
  import Players._
  implicit object showKing extends Func[King, String] { val f = (k: King) => "Almighty King " + k.name }
  implicit object showQueen extends Func[Queen, String] { val f = (q: Queen) => "Beautiful Queen " + q.name }
  implicit object showPawn extends Func[Pawn, String] { val f = (p: Pawn) => "Strong Pawn " + p.name }
  implicit object showTower extends Func[Tower, String] { val f = (t: Tower) => "Monstrous Tower " + t.name }
}

Now we can be sure that each player has a dedicated Func which maps to a String using the player's name. Let's come up with the implementation of Map, which is very similar to Lift and Trans:

object Map {

  import HList._
  import HMapper._

  /**
    * Applies [[Func]] instances in scope for respective items of given [[HList]] `l`.
    */
  def map[L <: HList](l: L)(implicit f: HMapper0[L]): f.Res = {
    def bitrav(l: HList, r: HList): HList = l match {
      case hd0 :: tl0 => r match {
        case (hd1: Func[Any, Any]) :: tl1 => :: (hd1.f(hd0), bitrav(tl0, tl1))
      }
      case HNil => HNil
    }
    bitrav(l, f.fs).asInstanceOf[f.Res]
  }

  implicit def mapHNil[A, B]
    (implicit g: Func[A, B]): Aux0[A :: HNil, B :: HNil] = new HMapper0[A :: HNil] {
      type Res = B :: HNil
      def fs = g :: HNil
  }

  implicit def mapHList[A, B, L <: HList, R <: HList]
    (implicit f: Aux0[L, R], g: Func[A, B]): Aux0[A :: L, B :: R] = new HMapper0[A :: L] {
      type Res = B :: R
      def fs = g :: f.fs
  }

}

The only difference is that the implicitly resolved Funcs will be stored in fs, so we can safely bi-traverse both the hlist of objects as well as their corresponding map functions. Note that at run-time, we operate on Func[Any, Any], and this is absolutely fine, since the respective Func instance applied in bitrav is guaranteed to be a type-safe match for the current object.

Let's use our new functionality:

val players = King("Karl") :: Queen("Quintessa") :: Tower("Theodore") :: Pawn("Power Paul") :: HNil
val explicit = map(players)
val prettyPlayers: String :: String :: String :: String :: HNil = explicit
println(prettyPlayers) // Almighty King Karl :: Beautiful Queen Quintessa :: Monstrous Tower Theodore :: Strong Pawn Power Paul :: HNil

This looks pretty neat! Finally, we can freely apply arbitrary functions on arbitrary HLists. Let's express Lift in terms of Map:

implicit def lift[T]: Func[T, Σ[T]] = new Func[T, Σ[T]] { val f: T => Σ[T] = a => Σ(a) }
val abc = A :: B :: C :: HNil
val lifted = map(abc)
println(lifted) // Σ(A) :: Σ(B) :: Σ(C) :: HNil

In addition, let's also express Trans in terms of Map:

implicit def trans[T]: Func[Σ[T], Option[T]] = new Func[Σ[T], Option[T]] { val f: Σ[T] => Option[T] = a => a.send }
val nums = Σ(1) :: Σ("2") :: Σ(3.0) :: HNil
val sent = map(nums)
println(sent) // None :: Some(2) :: None :: HNil

Note that None is due to the random error probability we defined at the very beginning.

Final Thoughts

The HList gives us the freedom of the List class as well as the type-safety of the TupleN classes. One could ask: is it worth the stress? I would say: it depends. In my opinion, heterogenous lists, or similar structures, are the real deal when it comes to building Domain Specific Languages or libraries that impose certain conditions regarding their usage.

To me, ultimately, the actual beauty of Scalas type system is the ability to write run-time code while inductively proofing its correctness at compile-time. If we can formally prove the correctness of code through type-level programming - who would need to write a unit test for functionality that will only compile if all components are wired properly? (Wait: isn't this nothing else but an implicit unit test? ;-))

Dotty, maybe the next Scala, may come with native HList support (no fixed TupleN-classes, but tuples implemented like HLists in "a more efficient way" (?)) - which is something I would love to have in a future version of Scala!

Thanks for reading.