Monday, December 23, 2013

Gen_server in Ocaml

Note, this post is written against the 2.0.1 version of gen_server

Erlang comes with a rich set of small concurrency primitives to make handling and manipulating state easier. The most generic of the frameworks is the gen_server which is also the most commonly used. A gen_server provides a way to control state over multiple requests. It serializes operations and handles both synchronous and asynchronous communication with clients. The strength of a gen_server is the ability to create multiple, lightweight, servers inside an application where each operation inside of it runs in serial but individually the gen_servers run concurrently.

While it is not possible to provide all of the Erlang semantics in Ocaml, we can create something roughly analogous. We can also get some properties that Erlang can not give us. In particular, the implementation of gen_server provided here:

Tuesday, July 9, 2013

Experimenting in API Design: Riakc

Disclaimer: Riakc's API is in flux so not all of the code here is guaranteed to work by the time you read this post. However the general principles should hold.

While not perfect, Riakc attempts to provide an API that is very hard to use incorrectly, and hopefully easy to use correctly. The idea being that using Riakc incorrectly will result in a compile-time error. Riakc derives its strength from being written in Ocaml, a language with a very expressive type system. Here are some examples of where I think Riakc is successful.

Siblings

In Riak, when you perform a GET you can get back multiple values associated with the a single key. This is known as siblings. However, a PUT can only associate one value with a key. However, it is convenient to use the same object type for both GET and PUT. In the case of Riakc, that is a Riakc.Robj.t. But, what to do if you create a Robj.t with siblings and try to PUT? In the Ptyhon client you will get a runtime error. Riakc solves this by using phantom types. A Robj.t isn't actually just that, it's a 'a Robj.t. The API requires that 'a to be something specific at different parts of the code. Here is the simplified type for GET:

val get :
  t ->
  b:string ->
  string ->
  ([ `Maybe_siblings ] Robj.t, error) Deferred.Result.t

And here is the simplified type for PUT:

val put :
  t ->
  b:string ->
  ?k:string ->
  [ `No_siblings ] Robj.t ->
  (([ `Maybe_siblings ] Robj.t * key), error) Deferred.Result.t

The important part of the API is that GET returns a [ `Maybe_siblings ] Riak.t and PUT takes a [ `No_siblings ] Riak.t. How does one convert something that might have siblings to something that definitely doesn't? With Riakc.Robj.set_content

val set_content  : Content.t -> 'a t -> [ `No_siblings ] t

set_content takes any kind of Robj.t, and a single Content.t and produces a [ `No_siblings ] Riak.t, because if you set contents to one value obviously you cannot have siblings. Now the type system can ensure that any call to PUT must have a set_content prior to it.

Setting 2i

If you use the LevelDB backend you can use secondary indices, known as 2i, which allow you to find a set of keys based on some mapping. When you create an object you specify the mappings to which it belongs. Two types are supported in Riak: bin and int. And two query types are supported: equal and range. For example, if you encoded the time as an int you could use a range query to find all those keys that occurred within a range of times.

Riak encodes the type of the index in the name. As an example, if you want to allow people to search by a field called "foo" which is a binary secondary index, you would name that index "foo_bin". In the Python Riak client, one sets an index with something like the following code:

obj.add_index('field1_bin', 'val1')
obj.add_index('field2_int', 100000)

In Riakc, the naming convention is hidden from the user. Instead, the the name the field will become is encoded in the value. The Python code looks like the following in Riakc:

let module R = Riakc.Robj in
let index1 =
  R.index_create
    ~k:"field1"
    ~v:(R.Index.String "val1")
in
let index2 =
  R.index_create
    ~k:"field2"
    ~v:(R.Index.Integer 10000)
in
R.set_content
  (R.Content.set_indices [index1; index2] content)
  robj

When the Robj.t is written to the DB, "field1" and "field2" will be transformed into their appropriate names.

Reading from Riak results in the same translation happening. If Riakc cannot determine the type of the value from the field name, for example if Riak gets a new index type, the field name maintains its precise name it got from Riak and the value is a Riakc.Robj.Index.Unknown string.

In this way, we are guaranteed at compile-time that the name of the field will always match its type.

2i Searching

With objects containing 2i entries, it is possible to search by values in those fields. Riak allows for searching fields by their exact value or ranges of values. While it's unclear from the Riak docs, Riakc enforces the two values in a range query are of the same type. Also, like in setting 2i values, the field name is generated from the type of the value. It is more verbose than the Python client but it enforces constraints.

Here is a Python 2i search followed by the equivalent search in Riakc.

results = client.index('mybucket', 'field1_bin', 'val1', 'val5').run()
Riakc.Conn.index_search
  conn
  ~b:"mybucket"
  ~index:"field1"
  (range_string
     ~min:"val1"
     ~max:"val2"
     ~return_terms:false)

Conclusion

It's a bit unfair comparing an Ocaml API to a Python one, but hopefully this has demonstrated that with a reasonable type system one can express safe and powerful APIs without being inconvenient.

Thursday, July 4, 2013

Riakc In Five Minutes

This is a simple example using Riakc to PUT a key into a Riak database. It assumes that you already have a Riak database up and running.

First you need to install riakc. Simply do: opam install riakc. As of this writing, the latest version of riakc is 2.0.0 and the code given depends on that version.

Now, the code. The following is a complete CLI tool that will PUT a key and print back the result from Riak. It handles all errors that the library can generate as well as outputting siblings correctly.

(*
 * This example is valid for version 2.0.0, and possibly later
 *)
open Core.Std
open Async.Std

(*
 * Take a string of bytes and convert them to hex string
 * representation
 *)
let hex_of_string =
  String.concat_map ~f:(fun c -> sprintf "%X" (Char.to_int c))

(*
 * An Robj can have multiple values in it, each one with its
 * own content type, encoding, and value.  This just prints
 * the value, which is a string blob
 *)
let print_contents contents =
  List.iter
    ~f:(fun content ->
      let module C = Riakc.Robj.Content in
      printf "VALUE: %s\n" (C.value content))
    contents

let fail s =
  printf "%s\n" s;
  shutdown 1

let exec () =
  let host = Sys.argv.(1) in
  let port = Int.of_string Sys.argv.(2) in
  (*
   * [with_conn] is a little helper function that will
   * establish a connection, run a function on the connection
   * and tear it down when done
   *)
  Riakc.Conn.with_conn
    ~host
    ~port
    (fun c ->
      let module R = Riakc.Robj in
      let content  = R.Content.create "some random data" in
      let robj     = R.create [] |> R.set_content content in
      (*
       * Put takes a bucket, a key, and an optional list of
       * options.  In this case we are setting the
       * [Return_body] option which returns what the key
       * looks like after the put.  It is possible that
       * siblings were created.
       *)
      Riakc.Conn.put
        c
        ~b:"test_bucket"
        ~k:"test_key"
        ~opts:[Riakc.Opts.Put.Return_body]
        robj)

let eval () =
  exec () >>| function
    | Ok (robj, key) -> begin
      (*
       * [put] returns a [Riakc.Robj.t] and a [string
       * option], which is the key if Riak had to generate
       * it
       *)
      let module R = Riakc.Robj in
      (*
       * Extract the vclock, if it exists, and convert it to
       * to something printable
       *)
      let vclock =
 Option.value
   ~default:"<none>"
   (Option.map ~f:hex_of_string (R.vclock robj))
      in
      let key = Option.value ~default:"<none>" key in
      printf "KEY: %s\n" key;
      printf "VCLOCK: %s\n" vclock;
      print_contents (R.contents robj);
      shutdown 0
    end
    (*
     * These are the various errors that can be returned.
     * Many of then come directly from the ProtoBuf layer
     * since there aren't really any more semantics to apply
     * to the data if it matches the PB frame.
     *)
    | Error `Bad_conn           -> fail "Bad_conn"
    | Error `Bad_payload        -> fail "Bad_payload"
    | Error `Incomplete_payload -> fail "Incomplete_payload"
    | Error `Notfound           -> fail "Notfound"
    | Error `Incomplete         -> fail "Incomplete"
    | Error `Overflow           -> fail "Overflow"
    | Error `Unknown_type       -> fail "Unknown_type"
    | Error `Wrong_type         -> fail "Wrong_type"

let () =
  ignore (eval ());
  never_returns (Scheduler.go ())

Now compile it:

ocamlfind ocamlopt -thread -I +camlp4 -package riakc -c demo.ml
ocamlfind ocamlopt -package riakc -thread -linkpkg \
-o demo.native demo.cmx

Finally, you can run it: ./demo.native hostname port

...And More Detail

The API for Riakc is broken up into two modules: Riakc.Robj and Riakc.Conn with Riakc.Opts being a third helper module. Below is in reference to version 2.0.0 of Riakc.

Riakc.Robj

Riakc.Robj defines a representation of an object stored in Riak. Robj is completely pure code. The API can be found here.

Riakc.Conn

This is the I/O layer. All interaction with the actual database happens through this module. Riakc.Conn is somewhat clever in that it has a compile-time requirement that you have called Riakc.Robj.set_content on any value you want to PUT. This guarantees you have resolved all siblings, somehow. Its API can be found here.

Riakc.Opts

Finally, various options are defined in Riakc.Opts. These are options that GET and PUT take. Not all of them are actually supported but support is planned. The API can be viewed here.

Hopefully Riakc has a fairly straight forward API. While the example code might be longer than other clients, it is complete and correct (I hope).

Saturday, May 25, 2013

Setting Up NixOps On Mac OS X With VirtualBox

Disclaimer

I am a new user of nixops, so I cannot guarantee these directions work for everyone. I have successfully set it up on two machines.

Preamble

The following directions describe how to setup nixops on a Mac OS X machine in VirtualBox. By the end of this you should be able to spawn as many NixOS instances in VirtualBox as your machine can handle. NixOps is similar to vagrant, except it deploys NixOS instances. It can deploy them locally, using VirtualBox, or remotely using EC2. It allows you to deploy clusters of machines, automatically allowing them to communicate with each other. At a high-level, nixops deploys an instance by doing the following:

  1. It builds the environment you ask for on another NixOS instance. This could be your local machine or a build server.
  2. It creates a VM on the service or system you defined (VirtualBox, EC2, etc).
  3. It uploads the environment you've defined to the machine.

The main problem is that nixops must build the environment on the same OS and arch it is deploying. NixOS is a linux distro, that means you cannot built the environment on your Mac. The minor problem is that, by default, the OS X filesystem that everyone gets is case insensitive and that doesn't play well with nix, the package manager.

This post will accomplish the following:

  1. Install and setup VirtualBox.
  2. If your OS X file system is case insensitive (assume it is if you haven't done anything to change it), we will create a loopback mount to install nix on.
  3. Install nix on OS X.
  4. An initial NixOS VirtualBox instance will be created to bootstrap the process and act as a distributed build server.
  5. Create a user on the build system.
  6. Setup up signing keys, so we can copy environments between build server, host, and deployed VM.
  7. Setup local nix to use this VM as a build server.
  8. Deploy a VM.

1. Install VirtualBox

Download VirtualBox and install it. Just follow the directions. The only interesting thing you have to do is make sure you have the vboxnet0 adapter setup in networking. To do this:

  1. Start VirtualBox.
  2. Go to preferences (Cmd-,).
  3. Click on Network.
  4. If vboxnet0 is not present, add it by clicking the green +.
  5. Edit vboxnet0 and make sure DHCP Server is turned on. The settings I use are below.
  • Server Address: 192.168.56.100
  • Server Mask: 255.255.255.0
  • Lower Address Bound: 192.168.56.101
  • Upper Address Bound: 192.168.56.254

2. Creating a case-sensitive file system

Unless you have explicitly changed it, your OS X machine likely has a case insensitive file system. This means nix build some packages. The method I have chosen to get around this is to create a loopback filesystem and mount that.

  1. Create a image. I have been using one 5GB successfully, but if you plan on being a heavy user of nix, you should make it larger.
    hdiutil create ~/nix-loopback -megabytes 5000 -ov -type UDIF
  2. Load it but do not mount:
    hdiutil attach -imagekey diskimage-class=CRawDiskImage -nomount ~/nix-loopback.dmg
  3. Determine which disk and partition your newly created image corresponds to. Specifically you want to find the image that corresponds to the Apple_HFS entry you just created. It will probably be something like disk2s2, but could be anything.
    diskutil list
  4. Create a case-sensitive file system on this partition:
    newfs_hfs -s /dev/disk2s2
  5. Make the mountpoint:
    sudo mkdir /nix
  6. Mount it:
    sudo mount -t hfs /dev/disk2s2 /nix

At this point if you run mount you should see something mounted on /nix.

NOTE: I don't know how to make this point on reboot, which you will need to do if you want to use nix after restarting your system.

3. Install Nix

  1. Download the binary nix darwin package from nixos.org.
  2. Go to root:
    cd /
  3. Untar nix:
    sudo tar -jxvf /path/to/nix-1.5.2-x86_64-darwin.tar.bz2
  4. Chown it to your user:
    sudo chown -R your-user /nix
  5. Finish the install:
    nix-finish-install
  6. nix-finish-install will print out some instructions, you should copy the 'source' to your ~/.profile and run it in your current shell (and any other shell you plan on not restarting but using nix in).
  7. Delete the installer:
    sudo rm /usr/bin/nix-finish-install

4. Setup Nix

  1. Add the nixos channel:
    nix-channel --add http://nixos.org/releases/nixos/channels/nixos-unstable
  2. Update:
    nix-channel --update
  3. Install something:
    nix-env -i tmux

5. Install NixOps

  1. Set NIX_PATH:
    export NIX_PATH=/nix/var/nix/profiles/per-user/`whoami`/channels/nixos
  2. Get nixops:
    git clone git://github.com/NixOS/nixops.git
  3. cd nixops
  4. Install:
    nix-env -f . -i nixops
  5. Verify it is installed:
    nixops --version

5. Setup Distributed Builds

When deploying an instance, nixops needs to build the environment somewhere then it will transfer it to the instance. In order to do this, it needs an already existing NixOS instance to build on. If you were running NixOS already, this would be the machine you are deploying from. To accomplish this, you need a a NixOS running in a VM. Eventually nixops will probably accomplish this for you, but for now it needs to be done manually. Luckily, installing NixOS on VirtualBox is pretty straight forward.

  1. Install a NixOS on VirtualBox from the directions here. This doesn't need any special settings, just SSH.
  2. Setup a port forward so you can SSH into the machine. I'll assume this port forward is 3223.
  3. Make a user called 'nix' on the VM. This is the user that we will SSH through for building. The name of the user doesn't matter, but these directions will assume its name is 'nix'.
  4. On OS X, create two pairs of passwordless SSH keys. One pair will be the login for the nix user. The other will be signing keys.
  5. Install the login public key.
  6. On OS X, create /etc/nix/ (mkdir /etc/nix)
  7. Copy the private signing key to /etc/nix/signing-key.sec. Make sure this is owned by the user you'll be running nixops as and is readable only by that user.
  8. Create a public signing key from your private signing key using openssl. This needs to be in whatever format openssl produces which is not the same as what ssh-keygen created. This output should be in /etc/nix/signing-key.pub. The owner and permissions don't matter as long as the user you'll run nixops as can read it.
    openssl rsa -in /etc/nix/signing-key.sec -pubout > /etc/nix/signing-key.pub
  9. Copy the signing keys to the build server, putting them in the same location. Make sure the nix user owns the private key and is the only one that can read it.
  10. Tell nix to do distributed builds:
    export NIX_BUILD_HOOK=$HOME/.nix-profile/libexec/nix/build-remote.pl
  11. Tell the distributed builder where to store load content:
    export NIX_CURRENT_LOAD=/tmp/current-load
    mkdir /tmp/current-load
  12. Go into a directory you can create files in:
    cat <<EOF > remote-systems.conf
    nix@nix-build-server x86_64-linux /Users/`whoami`/.ssh/id_rsa 1 1
    EOF
  13. Tell the remote builder where to find machine information:
    export NIX_REMOTE_SYSTEMS=$PWD/remote-systems.conf
  14. Add an entry to ~/.ssh/config the fake host 'nix-build-server' turns into your actual VM:
    Host nix-build-server
        HostName localhost
        Port 3223

6. Start An Instance

  1. Create your machine's nix expression:
    cat <<EOF > test-vbox.nix
    {
      test = 
        { config, pkgs, ... }:
        { deployment.targetEnv = "virtualbox";
          deployment.virtualbox.memorySize = 512; # megabytes
        };
    }
    EOF
  2. Create a machine instance named test:
    nixops create ./test-vbox.nix --name test
  3. Deploy it:
    nixops deploy -d test

This could take awhile, and at some points it might not seem like it's doing anything because it's waiting for a build or a transfer. It will push around a fair amount of data. After all is said and done you should be able to do nixops ssh -d test test to connect to it.

Troubleshooting

  • I do a deploy and it sits forever waiting for SSH - You probably forgot to setup your vboxnet0 adapter properly. See Section 1.
  • It dies while building saying a store isn't signed - Only root an import unsigned stores, this means your signing keys aren't stup properly. Double check your permissions.

Other problems? Post them in the comments and I'll add them to the list.

Known Bugs

  • nixops stop -d test never returns - I've only experienced this on one of my installations. It is okay, though. Wait a bit and exit out of the command, then you can do any command as if stop succeeded
  • My Mac grey-screens of death! - This has happened to be once. I update my version of VirtualBox and installed any updates from Apple and I have not experienced it again.

Further Reading

Sunday, March 17, 2013

[ANN] Riakc 0.0.0

Note, since writing this post, Riakc 1.0.0 has already been released and merged into opam. It fixes the below issue of Links (there is a typo in the release notes, 'not' should be 'now'. The source code can be found here. The 1.0.0 version number does not imply any stability or completeness of the library, just that it is not backwards compatible with 0.0.0.

Riakc is a Riak Protobuf client for Ocaml. Riakc uses Jane St Core/Async for concurrency. Riakc is in early development and so far supports a subset of the Riak API. The supported methods are:

  • ping
  • client_id
  • server_info
  • list_buckets
  • list_keys
  • bucket_props
  • get
  • put
  • delete

A note on GET

Links are currently dropped all together in the implementation, so if you read a value with links and write it back, you will have lost them. This will be fixed in the very near future.

As with anything, please feel free to submit issues and pull requests.

The source code can be found here. Riakc is in opam and you can install it by doing opam install riakc.

Usage

There are two API modules in Riakc. Examples of all existing API functions can be found here.

Riakc.Conn

Riakc.Conn provides the API for performing actions on the database. The module interface can be read here.

Riakc.Robj

Riakc.Robj provides the API for objects stored in Riak. The module interface can be read here. Riakc.Conn.get returns a Riakc.Robj.t and Riakc.Conn.put takes one. Robj.t supports representing siblings, however Riakc.Conn.put cannot PUT objects with siblings, this is enforced using phantom types. A value of Riakc.Robj.t that might have siblings is converted to one that doesn't using Riakc.Robj.set_content.

[ANN] Protobuf 0.0.2

Protobuf is an Ocaml library for communicating with Google's protobuf format. It provides a method for writing parsers and builders. There is no protoc support, yet and writing it is not a top goal right now. Protobuf is meant to be fairly lightweight and straight forward to use. The only other Protobuf support for Ocaml I am aware of is through piqi, however that was too heavy for my needs.

Protobuf is meant to be very low level, mostly dealing with representation of values and not semantics. For example, the fixed32 and sfixed32 values are both parsed as Int32.t's. Dealing with being signed or not is left up to the user.

The source code can be viewed here. Protobuf is in opam, to install it opam install protobuf.

The hope is that parsers and builders look reasonably close to the .proto files such that translation is straight forward, at least until protoc support is added. This is an early release and, without a doubt, has bugs in it please submit pull requests and issues.

https://github.com/orbitz/ocaml-protobuf/tree/0.0.2/

Examples

The best collection of examples right now is the tests. An example from the file:

let simple =
  P.int32 1 >>= P.return

let complex =
  P.int32 1           >>= fun num ->
  P.string 2          >>= fun s ->
  P.embd_msg 3 simple >>= fun emsg ->
  P.return (num, s, emsg)

let run_complex str =
  let open Result.Monad_infix in
  P.State.create (Bitstring.bitstring_of_string str)
  >>= fun s ->
  P.run complex s

The builder for this message looks like:

let build_simple i =
  let open Result.Monad_infix in
  let b = B.create () in
  B.int32 b 1 i >>= fun () ->
  Ok (B.to_string b)

let build_complex (i1, s, i2) =
  let open Result.Monad_infix in
  let b = B.create () in
  B.int32 b 1 i1                 >>= fun () ->
  B.string b 2 s                 >>= fun () ->
  B.embd_msg b 3 i2 build_simple >>= fun () ->
  Ok (B.to_string b)

Thursday, February 7, 2013

[ANN] ocaml-vclock - 0.0.0

I ported some Erlang vector clock code to Ocaml for fun and learning. It's not well tested and it hasn't any performance optimizations. I'm not ready yet but I have some projects in mind to use it so it will likely get fleshed out more.

Vector clocks are a system for determining the partial ordering of events in a distributed environment. You can determine if one value is the ancestor of another, equal, or was concurrently updated. It is one mechanism that distributed databases, such as Riak, use to automatically resolve some conflicts in data while maintaining availability.

The vector clock implementation allows for user defined site id type. It also allows metadata to be encoded in the site id, which is useful if you want your vector clock to be prunable by encoding timestamps in it.

The repo can be found here. If you'd like to learn more about vector clocks read the wikipedia page here. The Riak website also has some content on vector clocks here.