14 September 2015

Background

I'm currently writing bindings to the sodium library for Haskell, which in turn is mainly a portable version of Dan Bernsteins NaCl library. I'm also the author of the sodiumoxide bindings library for Rust.

The Problem

NaCl defines a couple of different cryptographic primitives and those primitives can have different implementations. For example crypto_stream is an API for stream ciphers. NaCl exposes the implementations aes128ctr, salsa20, salsa208, salsa2012 and xsalsa20. That is, the API is the same, but the actual implementation can be chosen by the user by using a suffix, e.g. crypto_stream_xsalsa20_stream_xor is the encryption function when using the xsalsa20 primitive.

If we want to implement bindings for these API's in a naive way a lot of code will have to be duplicated. We will end up with something like:

module Crypto.Sodium.Stream.Xsalsa20 where

-- skipped a lot of imports

foreign import ccall unsafe "crypto_stream_xsalsa20_keybytes"
    c_crypto_stream_xsalsa20_keybytes :: Int

-- skipped a lot of foreign imports

foreign import ccall unsafe "crypto_stream_xsalsa20_xor"
    c_crypto_stream_xsalsa20_xor :: Ptr Word8 -> Ptr CChar -> CULLong -> Ptr CChar -> Ptr CChar -> IO CInt

newtype Key = Key { unKey :: ByteString }
newtype None = Nonce { unNonce :: ByteString }

keyBytes :: Int
keyBytes = fromIntegral c_crypto_stream_xsalsa20_keybytes

randomKey :: IO Key
randomKey = Key <$> randomBytes keyBytes

streamXor :: Key -> Nonce -> ByteString -> ByteString
streamXor (Key k) (Nonce n) m =
    B.unsafeCreate mLen $ \pc ->
    B.unsafeUseAsCString m $ \pm ->
    B.unsafeUseAsCString n $ \pn ->
    B.unsafeUseAsCString k $ \pk ->
    void $ c_stream_xsalsa20_xor pc pm (fromIntegral mLen) pn pk

-- a lot more functions defined here

And then for AES-128-CTR we will need to define:

module Crypto.Sodium.Stream.Aes128Ctr where

-- skipped a lot of imports

foreign import ccall unsafe "crypto_stream_aes128ctr_xor"
    c_crypto_stream_aes128ctr_xor :: Ptr Word8 -> Ptr CChar -> CULLong -> Ptr CChar -> Ptr CChar -> IO CInt

newtype Key = Key { unKey :: ByteString }
newtype None = Nonce { unNonce :: ByteString }

streamXor :: Key -> Nonce -> ByteString -> ByteString
streamXor (Key k) (Nonce n) m =
    B.unsafeCreate mLen $ \pc ->
    B.unsafeUseAsCString m $ \pm ->
    B.unsafeUseAsCString n $ \pn ->
    B.unsafeUseAsCString k $ \pk ->
    void $ c_stream_aes128ctr_xor pc pm (fromIntegral mLen) pn pk
    where mLen = B.length m

-- a lot more functions defined here

We don't really want to bother to write all of these instances manually. It's a lot of work, and the more code we're forced to write, the higher the risk that we introduce bugs somewhere.

In my rust library sodiumoxide I solved this by using macros. I defined a macro stream_module and then instantiated that macro for each cryptographic primitive.

Solution 1 (Type Classes)

The first thing that I thought of was to create some kind of type class for a stream cipher. Something like:

newtype Key s = Key { unKey :: ByteString }
newtype Nonce s = Nonce { unNonce :: ByteString }

class StreamCipher s where
    c_crypto_stream_keybytes :: Tagged s Int
    c_crypto_stream_noncebytes :: Tagged s Int
    c_crypto_stream_xor :: Tagged s (Ptr Word8 -> Ptr CChar -> CULLong -> Ptr CChar -> Ptr CChar -> IO CInt)

streamXor :: StreamCipher s => Key s -> Nonce s -> ByteString -> ByteString
streamXor (Key k) (Nonce n) = 

streamXor :: forall s. StreamCipher s => Key s -> Nonce s -> ByteString -> ByteString
streamXor (Key k) (Nonce n) =
    B.unsafeCreate mLen $ \pc ->
    B.unsafeUseAsCString m $ \pm ->
    B.unsafeUseAsCString n $ \pn ->
    B.unsafeUseAsCString k $ \pk ->
    void $ c_stream_xor' pc pm (fromIntegral mLen) pn pk
    where
        mLen = B.length m
        c_stream_xor' = untag (c_stream_xor :: Tagged s (Ptr Word8 -> Ptr CChar -> CULLong -> Ptr CChar -> Ptr CChar -> IO CInt))

And we can use it like

foreign import ccall unsafe "crypto_stream_xsalsa20_keybytes"
    c_crypto_stream_xsalsa20_keybytes :: CInt

foreign import ccall unsafe "crypto_stream_xsalsa20_noncebytes"
    c_crypto_stream_xsalsa20_noncebytes :: CInt

foreign import ccall unsafe "crypto_stream_xsalsa20_xor"
    c_crypto_stream_xsalsa20_xor :: Ptr Word8 -> Ptr CChar -> CULLong -> Ptr CChar -> Ptr CChar -> IO CInt

data Xsalsa20
instance StreamCipher Xsalsa20 where
    c_crypto_stream_keybytes = Tagged c_crypto_stream_xsalsa20_keybytes
    c_crypto_stream_noncebytes :: Tagged c_crypto_stream_xsalsa20_noncebytes
    c_crypto_stream_xor = Tagged c_crypto_stream_xsalsa20_xor

This will work, but will introduce a lawless type-class. That might be the best solution, but I'm a bit wary of lawless type-classes. Also we're not really looking to write functions polymorphic in stream ciphers, we just want to reduce code duplication.

We will also need to create a dispatching type (e.g. Xsalsa20) for all of our different cryptographic primitives manually. Some cryptographic libraries dispatch on the Key type, but that would force us to define the Key types manually, and would also cause issues since we have a Nonce type as well. That would require us to do someting like:

class StreamCipher k n | k -> n, n -> k where
    ...

(or use TypeFamilies to achieve the same thing)

PROS

  • Might be the most idiomatic solution

CONS

  • Lawless type class
  • Need to define a dispatching type (or dispatch on the Key type which will lead to more boilerplate)
  • A user wanting to (for example) generate keys with a randomKey :: StreamCipher s => IO (Key s) function might need to annotate the type manually instead of just importing the correct module.

Solution 2 (Macro Instantiation)

The closest thing to using rust macros is to use some kind of macros in Haskell as well.

CPP

The easiest way to do that is probably to use a CPP macro to generate our code for us with textual substitutions.

#define STREAM_MODULE(name, primitive)

However I ran into some issues with parameter substitutions in the foreign import strings, probably nothing that can't be solved, but I didn't really want to use CPP (maybe for all the wrong reasons).

Template Haskell

Instead of CPP we can use Template Haskell to autogenerate the module bodies for us. I won't go into detail on the code, but we can quite easily create a template haskell function that imports all of the foreign functions and creates all function definitions for us. This comes with the drawback of making cross-compilation a bit hard. It will also increase binary size, since we will duplicate all definitions for all of our cryptographic primitives.

So if we define a template haskell function mkStream :: String -> Q [Dec] that takes a primitive name and does all the foreign imports, datatype definitions and function definitions for us we don't have to do the work manually.

I actually implemented this solution first, but tried to think of other solutions that didn't need Template Haskell.

PROS

  • No need to import foreign functions manually
  • No need for a dispatching datatype

CONS

  • Bad cross-compilation story
  • Code size growth
  • A lot of people seem to dislike Template Haskell

Solution 3 (plain old datatypes)

If we instead write something like this:

{-# LANGUAGE RecordWildCards #-}
module Stream.Internal where

type XorFn = Ptr Word8 -> Ptr CChar -> CULLong -> Ptr CChar -> Ptr CChar -> IO CInt

newtype Key s = Key { _unKey :: ByteString }
newtype Nonce s = Nonce { _unNonce :: ByteString }

data StreamCipher s = StreamCipher
    { keyBytes :: Int
    , nonceBytes :: Int
    , ...
    , streamXor :: Key s -> Nonce s -> ByteString -> ByteString
    }

mkStream :: CInt -> CInt -> XorFn -> StreamCipher s
mkStream c_keyBytes c_nonceBytes c_streamXor =
    StreamCipher {..}
    where
        keyBytes = fromIntegral c_keyBytes
        nonceBytes = fromIntegral c_nonceBytes
        streamXor (Key k) (Nonce n) m =
            B.unsafeCreate mLen $ \pc ->
            B.unsafeUseAsCString m $ \pm ->
            B.unsafeUseAsCString n $ \pn ->
            B.unsafeUseAsCString k $ \pk ->
            void $ c_streamXor pc pm (fromIntegral mLen) pn pk
        where mLen = B.length m

Then we can use it like

module Stream.Xsalsa20 where

foreign import ccall unsafe "crypto_stream_xsalsa20_keybytes"
    c_crypto_stream_xsalsa20_keybytes

foreign import ccall unsafe "crypto_stream_xsalsa20_noncebytes"
    c_crypto_stream_xsalsa20_noncebytes

foreign import ccall unsafe "crypto_stream_xsalsa20_xor"
    c_crypto_stream_xsalsa20_xor

data Xsalsa20

xsalsa20 :: StreamCipher Xsalsa20
xsalsa20 = mkStream c_crypto_stream_xsalsa20_keybytes c_crypto_stream_xsalsa20_noncebytes
                    c_crypto_stream_xsalsa20_xor 

StreamCipher {..} = xsalsa20

With this solution we still need to write all of our foreign imports manually, and also add a bit of boilerplate like the phantom type that we use to instantiate or functions. However using RecordWildCards we get all top-level definitions for free without using type classes. I added {-# OPTIONS_GHC -fno-warn-missing-signatures #-} to the source files as well to be able to compile with -Wall without the compiler complaining about missing signature.

PROS

  • No code growth
  • No need for template haskell
  • No lawless type class

CONS

  • Still some boilerplate remaining

Summary

This was an overview of different ways to emulate higher order modules in Haskell. I have chosen the last approach for my library. I wrote this up to see if anyone else finds it interesting and also to generate discussion on different ways to solve this problem.

Open Questions

  • Is there a better way to solve this problem?
  • Should I have gone with the type class approach instead?

Please join the reddit discussion for comments.