GopherSRE

Using SSH and Domain Sockets For Serving

John Doak — Wed, 03 Nov 2021 16:18:00 GMT

Have you ever found the need to secure access to administrative traffic? This might be a system agent or a diagnostic endpoints for a service.

Recently I had the need to secure traffic for code examples I'm using in an book I am writing. I wanted to secure the traffic to a bunch of host machines without putting my readers (or myself) through the hassle of TLS mutual AAA or Oauth.

SSH To The Rescue

SSH seemed like the perfect solution to the problem and came with some side benefits of segmenting my serving traffic from admin traffic.
SSH is generally not exposed through a load balancer and should only be available through a VPN, from a bastion host or to services within a serving cluster. By leveraging SSH for admin traffic, it was simply consolidating administrative traffic on a port that is already setup for that purpose.

If SSH is exposed, then the box is no more at risk than it was with just the user account exposed.

To prevent any leaking locally, we can serve that traffic in a domain socket. This can be setup to allow only a single user on the device to access that domain socket. External traffic is accessing that domain socket over SSH and authenticates with secure keys using that local user. No internal exposure.
For people who would be trying out the exercises, they could simply setup a key on their bastion device that allows login to all their nodes and are ready to rock.

Time to Rock and Roll

Sound interesting and want to give it a try? I've packaged up a library that handles the SSH part of this:

Serve On SSH

Here are two examples for doing this with HTTP and gRPC:
HTTP Example
gRPC Example

WebGears: I'll get your HTML/Javascript and your little dog Toto too!

John Doak — Tue, 30 Mar 2021 00:31:18 GMT

TLDR

Web development stinks
I don't want to program in 5 languages to serve a webpage
I wrote the webgear package so I can just program in Go with Web Components
This package powers Golang Basics

Preamble

Hey, if you don't want to listen to my meandering (and I can meander with the best of them), just skip towards the bottom.

Does anyone like HTML for Apps?

Well, probably, but then they've probably never had the experience of using anything better. Or are masocists. Maybe both?

Many programmers today have grown up in the web era and people get used to whatever it is around you, no matter how bad it is.

HTML was great for what it was designed for: make research papers link to each other.

It's not so good at making Apps. The foundation is bad. Can you do it, sure...

Google/Microsoft/Facebook have spent billions making apps that are semi-functional. Semi-functional because they are never as good as desktop apps. Not in response time, amount of code, performance, memory consumption (have you seen how much RAM your browser uses!!!). But they run everywhere and are connected to the most powerful compute resources in the world. However, most web apps written by a dozen people could be beat in performance by a single developer on a Mac IIci from the 80s.

Instagram has thousands of employees to make an online version of a super simplistic desktop app somewhat equivalent to photoshop in 1990. Even they use native interfaces on platforms such as Android/IOS, it's so much easier and responsive (though maybe today they are all on React Native??).

We now have websites that build websites like squarespace/wix because its so hard to get right (and responsive). I've tried using these for my most simple sites and am always disappointed with the results in the long term. And they can only do this for the simplest of pages, not really worthy of the "web app" moniker.

So everyone tries to find some framework that gets in the way the least and has decent performance (React/Angular/...).

However much I would love to avoid web programming, every so often I have to deal with HTML/CSS/Javascript.

Programming in Go/Templates/HTML/CSS/Javascript, GAH!!!

A couple of years ago I was creating a web site that delivers videos from Vimeo with notes on teaching Go Basics (plug: www.golangbasics.com).

I wrote that site in Go, cause you know.....

This required I write HTML. But you can't just write HTML, oh no......

I want to use Web Components so that my CSS doesn't clash (oh dear Pike...), which means you have to use Javascript.

To be able to do dynamic content when the page loads and avoid AJAX/Websocket/GRPC calls using Javascript, I want to load dynamic content from my Go code. Now I need templates.....

Oh and I would like it to look decent, so now I've got to do CSS.

I think that means that I now need 5 languages to render a page.

Ok, done with that nonsense....

I decided for the future, I never wanted to do that again and this would be the first site I re-wrote.

New Goals

I don't want to write in more than 1 language and that language I want to be Go. Go templates don't count as being part of Go for this conversation.

My goals:

Write in one language
Have web components so that CSS stuff won't interfere with other code
Reusable parts
Avoid name collisions that components are prone to
Can reproduce the site I've already written

Webgear

I believe I have succeeded in everything but the one language, I'm still going to style with CSS.

The webgear package has several major parts:

html/ Defines HTML doc, html elements and the dynamic element for generating HTML from Go.
html/builder/ Defines a package for building HTML code more dynamically.
components/ Defines encapsulated Web Components written in Go.
handlers/ Defines simple wrappers around http.Server/Mux to serve your content.
wasm/ Defines a Web Assembly package that is still experimental

What does this thing buy me

Today:

Only use two languages: Go and CSS (maybe sprinkle some basic JS)
Encapsulate each part of a page as a reusable component
Launch the site quickly with the handlers package
Have separate tests and a visual viewer test just for components, separate from the page
Some static type checking, so you can only add form elements to forms
Everything isn't stringly typed
You can't accidentally misspell an attribute
You can't forget closing tags

Future:

Some more validation checks
Getting the WASM code to higher quality
Adding more tags, as I haven't added all tags or event types

Why are you letting CSS get away

When you change the html you have to reload your server. Now, that isn't too tedious if you are keeping your style away from your layout. So I consider that very low cost.

What isn't low cost is reloads while you are styling your CSS. You have to do that a lot. So by keeping the styles in external style sheets and providing a simple way to tell it to reload CSS when doing dev from external style sheets, you get a good mix.

HTML already has a dizzying amount of tags I have to recreate, not even looking at CSS and how I would deal with dynamic changes. I got a day job and carpal tunnel already.

Enough talk, show me the goods!

You can read all about Webgear with sample code here

You can view the Golang Basics site code here

You can checkout the future wasm package here

Flatbuffers in Go Fall Flat

John Doak — Mon, 15 Feb 2021 20:08:12 GMT

Flatbuffers are a Google message format in the same vein as Protocol Buffers or JSON. They were designed for game programmers in C++ who want to avoid heap allocations at all costs.

It isn't a new tech, but I started seeing articles recently saying people should use them. I was truly skeptical about this as all the example code for Go looked painful.

I recently had a very niche use case where I have a proxy service that needs to inspect incoming messages before passing them along. I did not want to have to do a Marshal/Unmarshal to determine if this was a data passthru or a control plane message.

I could have solved this in other ways, but this seemed like an ideal use case for Flatbuffers, so I thought I'd give Flatbuffers a Go (pun intended).

In short, I never want to use them again

Flatbuffers really satisfy a niche. I'm skeptical of the use of them in game programming is truly valuable in anything but the most demanding titles.

Google has been able to create low latency services using protocol buffers for a decade (JSON/REST... is not a real contender for speed/space against schema'd binary encoded messages).

Protocol buffers are great in Go, EXCEPT the protoc and the Go generators are just stupidly hard to use outside Google. Getting Grpc code to generate is ridiculous and go.mod made everything worse when I didn't think that was possible.

Flatbuffers is probably a Stadia tech. In that case, maybe they are worth while (large display frames being sent that you don't want to copy around).

Except for the most intense games, I've got to imagine the time savings on the client probably are not a real factor. In the server, maybe. But is there nothing else in the game loop that could save you this amount of memory/time to avoid this complexity (note: I'm not a game programmer)?

If I was doing a game server that needed something faster than Go's protocol buffer, I'd probably look at how much faster. 2x? I'd probably just re-write the proto generators to be more efficient. The current official ones are based on a legacy implementation that could lose a little convenience to get some real speed gains. That would be simpler than using Flatbuffers in the long run.

So why are Flatbuffers hard?

Flatbuffers are similar to writing disk file formats. You are always writing to the tail of an array. To find your data, you must serialize a table at the end to tell you the offset in the buffer where your data lives.

By doing this, you only have to make a single allocation when you read the buffer into memory. After that, you simply access the location of your data and convert it to the specific type. That conversion happens on the stack.

With other formats, you read in the bytes, allocate a representation in memory (like a struct), then copy in each field.

With flatbuffers, to allow for a struct to include another struct, you have to create the contained struct first. This means you have to write everything backwards.

Normally, if you have struct B inside struct A, you write struct A and add struct B. With flatbuffers, you write struct B, then struct A and add struct B to A.

Not so bad, right?

Well, you need to do that for slices of data as well, writing them in reverse order. Basically anything that uses a vector (slices) is painful.

The specific ordering that must be done with vectors gets more complex the more levels of hierarchy you have. Those errors occur at runtime and can be difficult to trace once you build in any level of abstraction.

Then there are oddities like how to do vector's of enums. If vectors of other types are not straight forward, these are worse.

And one more note, if you want to have a field that represents []byte, in Flatbuffers you need to use [ubyte] instead of [byte], otherwise you can't access the entire slice at once, you have to do it byte by byte.

That doesn't sound too bad

After you write something four levels deep, you'll change your mind.

This means you need to write a constructor for every message type and a test for every constructor (because its easy to cause a runtime error if you do things out of order).

To give you a sense of complexity, here is a very simple two level message in both protocol buffer and flatbuffer.

Here is a protocol buffer that holds some data:

msg := &pb.Message{
    Type: pb.MessageType_MTDataType,
    Data: &pb.Data{Dest: pb.MyDest, Fd: 10, Data: b}
}

Here is the equivalent constructor you must write in flatbuffers:

// MarshalData creates a flatbuffer []byte representing a Message that contains data.
func MarshalData(dest fb.DestType, fd uint32, data []byte) []byte {
	size := len(data) + 4
	builder := flatbuffers.NewBuilder(size)

    // This is the only simple Vector operation, everything else is
    // reverse for loops.
	content := builder.CreateByteVector(data)
    
    // Builds a Data message.
	fb.DataStart(builder)
	fb.DataAddDest(builder, dest)
	fb.DataAddFd(builder, fd)
	fb.DataAddContent(builder, content)
	dataMsg := fb.DataEnd(builder)

    // Builds our outer message.
	fb.MessageStart(builder)
	fb.MessageAddType(builder, fb.MessageTypeMTDataType)
	fb.MessageAddData(builder, dataMsg)
	msg := fb.MessageEnd(builder)
	builder.Finish(msg)

	return builder.FinishedBytes()
}

Extracting:

Protocol Buffer:

    msg := &pb.Message{}
    if err := proto.Unmarshal(b, msg); err != nil {
        // Do something
    }

Flatbuffer:

// ReadMessage reads a fb.Message represented by b. 
// Flatbuffers in Go panic if they are
// not correct, so we do a recover and return the error.
func ReadMessage(b []byte) (msg *fb.Message, err error) {
	defer func() {
		if r := recover(); r != nil {
			msg = nil
			err = fmt.Errorf("deformed message: %v", err)
		}
	}()

	return fb.GetRootAsMessage(b, 0), nil
}

// ExtractData extracts the Data message from a Message type. If .Data is not
// set, *fb.Data will be nil.
func ExtractData(b []byte) *fb.Data {
	defer func() {
		if r := recover(); r != nil {
			// Do nothing
		}
	}()

	m, err := ReadMessage(b)
	if err != nil {
		return nil
	}

	if m.Type() != fb.MessageTypeMTDataType {
		return nil
	}

	return m.Data(nil)
}

func main() {
    ...
    data := ExtractData(b)
    if data == nil {
        // Do something
    }
}

I'm not sure you can trust any of the fields you are using here either. I think at this point we are only sure that the flatbuffers data is within the bounds of the slice, but not that any data inside is what is expected.

Accessing fields is going to try and interpret the range of bytes in a []byte to the specified type.

Each function that accesses a field may still need the following to prevent a panic from escaping:

defer func() {
    if r := recover(); r != nil {
        // Do something
    }
}()

But I want the speed, so it still sounds worth it

Maybe your project really needs it (I bet most uses are over optimizing the wrong portion of their code).

But let's move on to the next problem:

All bad data causes a panic!

Flatbuffers Go assumes that all messages come through without a problem. In C++ there is some type of verifier, but they did not port that to Go.

For speed reasons, they did not want to deal with returning an error and instead let panics happen.

I find it hard to believe that this is truly valid or worth the risk/reward/complexity.

At least for my use case, I have a ZERO trust in packets sent to me. So now you must deal with a kind of buffer attack (bad data crashing my program) every time you read a message if you can't trust what is coming in (I don't trust myself sending packets, much less a client not under my control).

Getters are panicy too

So flatbuffers allow you to access struct data sources such as:

field := data.StructA(nil).StructB(nil).Field1()

Note: The "nil" you see allows you to reuse a pointer for that type instead of allocating a new one. Flatbuffers are all about zero allocations.

However, if StructA or StructB is nil, this is going to panic. It could really use a GetStructA() type methods to prevent this kind of thing:

structA := data.StructA(nil)
if structA == nil {
    // Do something
}
structB := structA.StructB(nil)_
if structB == nil {
    // Do something
}
field := structB.Field1()

Way better to have:

field := data.GetStructA(nil).GetStructB(nil).Field1()

If the Get* is too slow in your use case, you can always do it the other way. It probably isn't, especially if the code generator always passes back a global static version of that type which has no mutable fields.

Are there other good alternatives?

You really need that speed, so what are your choices?

There were really only two that I considered:

Flatbuffers
Capt'n Proto

I really wanted to use Capt'n Proto, but I just couldn't make myself. Capt'n Proto is written by the Proto2 author. A super smart guy who I believe only programs for fun nowadays (jealous of that!).

His only implementation I am aware of is in C++ and a simple one for JS. Other languages are implemented by other maintainers. I think there are 6 levels of complexity that an implementation can use. The more it supports, the more advanced features you get.

The Go package supports level 1. Many of the others only support Level 1, giving you an idea of how complex this is. From reading, I don't think any of the other languages (except Python, which is a wrapper around the C++ lib) actually supports more than level 1. Several say that they are alpha level code.

Go's version is considered beta and is looking for a new maintainer. So Capt'n Proto is for people who do C++ only unless you want to use beta code and a possible maintainer.

Flatbuffers has people actively working on it vs. a pet project for someone who is only investing in the C++ (as far as I can tell).

Okay, so when should I use Flatbuffers?

With all this, you might think I believe that Flatbuffers are a bad messaging format. I don't. I do think it is niche.

The Go implementation of the spec is probably not benefiting from panic'ing and having getters that can panic.

Flatbuffers seem to be useful when:

You need more speed and you've turned all the other knobs
You have very few types of messages that are not complex
You are core infrastructure

For Go I keep thinking that in all likelihood if I'm down to this dial, I should at least consider using Rust to have more dials and more efficient protocol buffers implementations before committing to Flatbuffers in Go. While Go is wonderful and I do love the language, maybe in that case I need to have all the tools that a non-GC'd language can give me.

Proto vs encoding/json - No Contest

John Doak — Fri, 29 Jan 2021 16:54:40 GMT

The standard library's JSON encoder is slow. And JSON is not efficient. BSON is where JSON should have been, at least eliminating base64 encoding to transfer bytes.

The non-standard library JSON encoders either require code generation (which you might as well use proto as it encodes better) or can't support the full feature set.

I was being lazy recently and I decided to use JSON for encoding an internal frame over a Unix socket. When I compared to gRPC I was being destroyed in packets per second. Switching to a proto fixed this.

I thought it might be base64 encoding of the []byte content. That added a small bit of overhead. Frankly the built in encoder is just slow.

I think the benchmarks speak for themselves. I didn't even bother benmarking the decodes, the encoding was enough to convince me.

*WithStr means the data is just a string
-*WithBytes means the data is a []byte, which in JSON gets base64 encoded

BenchmarkJSONEncodingWithStr/_10kiBStr-16         	  119774	      9748 ns/op	   10914 B/op	       2 allocs/op
BenchmarkJSONEncodingWithStr/_100kiBStr-16        	   13214	     91671 ns/op	  107140 B/op	       2 allocs/op
BenchmarkJSONEncodingWithStr/_1miBStr-16          	    1292	    875316 ns/op	 1080500 B/op	       2 allocs/op
BenchmarkJSONEncodingWithBytes/_10kiBBytes-16     	  100412	     11924 ns/op	   15562 B/op	       3 allocs/op
BenchmarkJSONEncodingWithBytes/_100kiBBytes-16    	   10000	    110331 ns/op	  143101 B/op	       3 allocs/op
BenchmarkJSONEncodingWithBytes/_1miBBytes-16      	    1125	   1053090 ns/op	 1674998 B/op	       4 allocs/op
BenchmarkProtoEncodingWithBytes/_10kiBBytes-16    	  718411	      1440 ns/op	   10880 B/op	       1 allocs/op
BenchmarkProtoEncodingWithBytes/_100kiBBytes-16   	   95144	     12543 ns/op	  106496 B/op	       1 allocs/op
BenchmarkProtoEncodingWithBytes/_1miBBytes-16     	    8842	    126329 ns/op	 1032192 B/op	       1 allocs/op

Socket to me: A Set Of Unix Socket Packages for Go

John Doak — Thu, 28 Jan 2021 05:53:10 GMT

Introduction

Unix Domain Sockets are an Interprocess Communication(IPC) mechanism that is available on Linux/OSX/BSD/Windows systems.

Go has support for unix sockets via the net.Dial() and net.Listen() calls. However this lacks higher level wrappers that provide:

Authentication/Security
Chunking
Message streaming
RPCs

As such I am releasing a set of packages to provide these on Linux/OSX systems.

TLDR

Packages to do IPC via unix sockets made easy
- Raw sockets support
- Chunking support
- Streaming support (Protocol Buffers/JSON)
- RPC support (Protocol Buffers/JSON)
- Support OSX/Linux
- Provides authentication
Benchmarked proto RPC client/server against gRPC
- Faster in most cases
- Way less allocations
Find here: https://github.com/johnsiilver/golib/tree/master/ipc/uds

IPC Choices
Examples
a. Raw Stream
b. Bi-Directionaly Streaming with JSON
c. RPCs with Protocol Buffers
Benchmarks

IPC Choices

There are a few flavors of IPC that a user can choose from:

Unix Domain Sockets
Various Message Queuing (like SystemV message queues)
Shared memory (shm)
Using the IP stack on loopbacks
...

For the highest speeds, I'd recommend a message queue. There are not a lot of benchmarks for these in recent years I could find, but all of the ones around 10 years ago point to shared memory being the fastest (by a large margin), message queues next, unix sockets and finally IP on loopback.

shm is painful to use and I have shied away from it. While I was working at Google, I would see notes in major packages about possibly using shm in the future. I noticed that never happened in any of those packages and I'm guessing it wasn't necessary and it was painful to implement. Maybe I just haven't found the best wrapper for shm yet, but using it always looks like "YIKES".

Message Queues come in different flavors, but don't seem to have ubiquitous support. Linux and most BSDs provides sysv queues, but OSX and Windows doesn't. If speed is paramount and you are on supported systems, maybe message queues are the way to go.

Unix sockets work on all OSs (though not all features are shared) and are faster than the IP stack. In addition, with some local knowledge you can pull authentication information from connections in addition to file rights for security. With IP, you must deal with this on your own.

If unix sockets are fast enough for your internal processing and you are using Linux/OSX, this package is for you (I may add support for BSD and Windows).

Examples

Using the raw stream

Package: github.com/johnsiilver/golib/tree/master/ipc/uds
Example: github.com/johnsiilver/golib/tree/master/ipc/uds/example

Use the raw stream if you want to implement your own io where you just need io.ReadWriteCloser types. This is probably most helpful when forwarding content you aren't reasoning about. Most users will at least want to use the higher level chunking on top of this package.

An example server that returns the current time in UTF-8 UTC every 10 seconds.

package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"
	"time"

	"github.com/google/uuid"
	"github.com/johnsiilver/golib/ipc/uds"
)

func main() {
	socketAddr := filepath.Join(os.TempDir(), uuid.New().String())

	cred, _, err := uds.Current()
	if err != nil {
		panic(err)
	}

	// This will set the socket file to have a uid and gid of whatever the
	// current user is. 0770 will be set for the file permissions (though on some
	// systems the sticky bit gets set, resulting in 1770.
	serv, err := uds.NewServer(socketAddr, cred.UID.Int(), cred.GID.Int(), 0770)
	if err != nil {
		panic(err)
	}

	fmt.Println("Listening on socket: ", socketAddr)

	// This listens for a client connecting and returns the connection object.
	for conn := range serv.Conn() {
		conn := conn

		// We spinoff handling of this connection to its own goroutine and
		// go back to listening for another connection.
		go func() {
			// We are checking the client's user ID to make sure its the same
			// user ID or we reject it. Cred objects give you the user's
			// uid/gid/pid for filtering.
			if conn.Cred.UID.Int() != cred.UID.Int() {
				log.Printf("unauthorized user uid %d attempted a connection", conn.Cred.UID.Int())
				conn.Close()
				return
			}
			// Write to the stream every 10 seconds until the connection closes.
			for {
				if _, err := conn.Write([]byte(fmt.Sprintf("%s\n", time.Now().UTC()))); err != nil {
					conn.Close()
					return
				}
				time.Sleep(10 * time.Second)
			}
		}()
	}
}

A example client that connects to the server and reads what the server sends to stdout:

package main

import (
	"flag"
	"fmt"
	"io"
	"os"

	"github.com/johnsiilver/golib/ipc/uds"
)

var (
	addr = flag.String("addr", "", "The path to the unix socket to dial")
)

func main() {
	flag.Parse()

	if *addr == "" {
		fmt.Println("did not pass --addr")
		os.Exit(1)
	}

	cred, _, err := uds.Current()
	if err != nil {
		panic(err)
	}

	// Connects to the server at socketAddr that must have the file uid/gid of
	// our current user and one of the os.FileMode specified.
	client, err := uds.NewClient(*addr, cred.UID.Int(), cred.GID.Int(), []os.FileMode{0770, 1770})
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

	// client implements io.ReadWriteCloser and this will print to the screen
	// whatever the server sends until the connection is closed.
	io.Copy(os.Stdout, client)
}

The server will print out the socket is listening on and you can pass it to the client via --addr.

Bi-Directionaly Streaming with JSON

Package: github.com/johnsiilver/golib/tree/master/ipc/uds/highlevel/json/stream
Example: github.com/johnsiilver/golib/tree/master/ipc/uds/highlevel/json/stream/example

Note: We also support bi-directional proto streaming, works the same but with protos.

Note: I'm going to lose the boilerplate for this example forward. Full code is provided in links.

This example is a simple server where the client sends random words to the server and the server will return a set of 10 words in a summary JSON message after it has received them.

This is of course silly, but it shows Bi-Directional full duplex JSON streaming.

Here's the server:

udsServ, err := uds.NewServer(socketAddr, cred.UID.Int(), cred.GID.Int(), 0770)
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}

fmt.Println("Listening on socket: ", socketAddr)

for conn := range udsServ.Conn() {
    conn := conn

    go func() {
        // Cred checks
        ...

        // This is a summary message of the last 10 words we have received. We will
        // reuse this.
        sum := messages.Sum{
            Sum: make([]string, 0, 10),
        }

        // This wraps our conn in a stream client for reading/writing JSON.
        streamer, err := stream.New(conn)
        if err != nil {
            log.Println(err)
            conn.Close()
            return
        }

        // Receive 10 words from the client and then send back a list of the last 10
        // we got on this conn.
        for {
            m := messages.Word{}

            // Receive a JSON message from the stream.
            if err := streamer.Read(&m); err != nil {
                if err != io.EOF {
                    log.Println(err)
                }
                conn.Close()
                return
            }

            // Add the contained word to our summary.
            sum.Sum = append(sum.Sum, m.Word)

            // Sends back the sum if we have 10 words sent to us. We don't wait for
            // the write, we immediately start waiting for our next values.
            if len(sum.Sum) == 10 {
                sendSum := sum
                sum = messages.Sum{Sum: make([]string, 0, 10)}
                go func() {
                    if err := streamer.Write(sendSum); err != nil {
                        conn.Close()
                        return
                    }
                }()
            }
        }
    }()
}

Here's the client:

// Connects to the server at socketAddr that must have the file uid/gid of
// our current user and one of the os.FileMode specified.
client, err := uds.NewClient(*addr, cred.UID.Int(), cred.GID.Int(), []os.FileMode{0770, 1770})
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}

streamer, err := stream.New(client)
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}

wg := sync.WaitGroup{}
wg.Add(2)

babbler := babble.NewBabbler()

// Client writes.
go func() {
    defer wg.Done()

    for {
        m := messages.Word{Word: babbler.Babble()}
        if err := streamer.Write(m); err != nil {
            if err != io.EOF {
                log.Println(err)
            }
            return
        }
        time.Sleep(500 * time.Millisecond)
    }
}()

// Client reads.
go func() {
    defer wg.Done()

    sum := messages.Sum{}
    for {
        if err := streamer.Read(&sum); err != nil {
            if err != io.EOF {
                log.Println(err)
            }
            return
        }
        fmt.Printf("Sum message: %s\n", strings.Join(sum.Sum, " "))
    }
}()

wg.Wait()

RPCs with Protocol Buffers

Package: github.com/johnsiilver/golib/tree/master/ipc/uds/highlevel/proto/rpc
Example: github.com/johnsiilver/golib/tree/master/ipc/uds/highlevel/proto/rpc/example

Note: We also support RPCs using JSON, works the same but with JSON.

This is a simple RPC mechanism. You call a RPC method along with a proto reprsenting a request. The server responds by calling an internal method, giving the request message a look over and responding with a protocol buffer.

Each call is syncronous, but you can make multiple calls at the same time. gRPC like.

Our example will be a server that prints out one of several famous quotes it has when a client queries it.

Server code:

serv, err := rpc.NewServer(socketAddr, cred.UID.Int(), cred.GID.Int(), 0770)
if err != nil {
    panic(err)
}

fmt.Println("Listening on socket: ", socketAddr)

// We can reuse our requests to get better allocation performance.
reqPool := sync.Pool{
    New: func() interface{} {
        return &pb.QuoteReq{}
    },
}
// We can reuse our responses to get better allocation performance.
respPool := sync.Pool{
    New: func() interface{} {
        return &pb.QuoteResp{}
    },
}

// Register a method to handle calls for "quote". I did this inline, normally
// you do this in its own func block.
serv.RegisterMethod(
    "quote",
    func(ctx context.Context, req []byte) (resp []byte, err error) {
        reqpb := reqPool.Get().(*pb.QuoteReq)
        defer func() {
            reqpb.Reset()
            reqPool.Put(reqpb)
        }()

        // Get the request.
        if err := proto.Unmarshal(req, reqpb); err != nil {
            return nil, err
        }

        resppb := respPool.Get().(*pb.QuoteResp)
        defer func() {
            resppb.Reset()
            respPool.Put(resppb)
        }()

        resppb.Quote = quotes[rand.Intn(len(quotes))]
        return proto.Marshal(resppb)
    },
)

// This blocks until the server stops.
serv.Start()

Client code:

// Connects to the server at socketAddr that must have the file uid/gid of
// our current user and one of the os.FileMode specified.
client, err := rpc.New(*addr, cred.UID.Int(), cred.GID.Int(), []os.FileMode{0770, 1770})
if err != nil {
    fmt.Println(err)
    os.Exit(1)
}

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

resp := pb.QuoteResp{}
if err := client.Call(ctx, "quote", &pb.QuoteReq{}, &resp); err != nil {
    fmt.Println(err)
    os.Exit(1)
}

fmt.Println("Quote: ", resp.Quote)

Benchmarks

So I certainly haven't benched everything. But I did bench the proto RPC, as it uses the chunking rpc package, which uses chunked streams.

I compared this to gRPC as it provides a managed RPC mechansim over unix sockets. gRPC was my chosen mechanism for doing IPC on unix sockets in the past.

No matter what my internal settings, I would beat gRPC at 10kB and double their performance in the 102 kB size. To get better performance in large sizes, I had to add some kernel buffer space over the defaults, which lead to close to double performance.

But the real killer here is allocations. This decimates gRPC in heap allocation reduction. Keep this in mind for high performance applications. If you delve deep into making Go fast, everything points at one thing: keeping your allocations down. Once you leave the micro-benchmark world (like these benchmarks), your app starts crawling if the GC has to deal with lots of objects. The key to that is buffer reuse and gRPC's design for ease of use hurts that ability and gRPC supports a lot more plugable behavior.

To be fair to gRPC on the speed, it is possible that you could make some buffer adjustments that would tune this. I didn't go looking.

Benchmark platform was OSX running on an 8-core Macbook Pro, Circa 2019. You can guess that your Threadripper Linux box will do much better.

Test Results(uds):
==========================================================================
[Speed]

[16 Users][10000 Requests][1.0 kB Bytes] - min 100.729µs/sec, max 10.029661ms/sec, avg 348.084µs/sec, rps 45751.37

[16 Users][10000 Requests][10 kB Bytes] - min 282.04µs/sec, max 8.067866ms/sec, avg 685.19µs/sec, rps 23269.75

[16 Users][10000 Requests][102 kB Bytes] - min 1.512654ms/sec, max 12.380839ms/sec, avg 2.528536ms/sec, rps 6314.61

[16 Users][10000 Requests][1.0 MB Bytes] - min 9.33996ms/sec, max 65.578487ms/sec, avg 20.282241ms/sec, rps 788.33


[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 330,858

[10000 Requests][10 kB Bytes] - allocs 354,272

[10000 Requests][102 kB Bytes] - allocs 415,754

[10000 Requests][1.0 MB Bytes] - allocs 523,738

Test Results(grpc):
==========================================================================
[Speed]

[16 Users][10000 Requests][1.0 kB Bytes] - min 59.624µs/sec, max 3.571806ms/sec, avg 305.171µs/sec, rps 51137.15

[16 Users][10000 Requests][10 kB Bytes] - min 93.19µs/sec, max 2.397846ms/sec, avg 875.864µs/sec, rps 18216.72

[16 Users][10000 Requests][102 kB Bytes] - min 1.221068ms/sec, max 8.495421ms/sec, avg 4.434272ms/sec, rps 3606.63

[16 Users][10000 Requests][1.0 MB Bytes] - min 21.448849ms/sec, max 54.920306ms/sec, avg 34.307985ms/sec, rps 466.28


[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 1,505,165

[10000 Requests][10 kB Bytes] - allocs 1,681,061

[10000 Requests][102 kB Bytes] - allocs 1,947,529

[10000 Requests][1.0 MB Bytes] - allocs 3,250,309

Benchmark Guide:

[# Users] = number of simultaneous clients
[# Requests] = the total number of requests we sent
[# Bytes] = the size of our input and output requests
min = the minimum seen RTT
max = the maximum seen RTT
av = the average seen RTT
rps = requests per second

Autopool: Speeding Up gRPC With Finalizers

John Doak — Tue, 31 Dec 2019 19:59:08 GMT

Introduction

It might be the fever talking, but I have found a new use for the much maligned runtime.Finalizer.

Between urgent care runs through the holidays for my family members and trying to adapt songs about Christmas into songs about coughing (aka "Hard Candy Christmas" becomes "Hard Coughing Christmas"), this little problem popped into my head:

How do you use a *sync.Pool to recover gRPC protocol buffers?

Go program speed seem linked to some combination of the number of allocations and the size of allocations. The allocations cause the GC to work and most papers link this to be the biggest speed differences between non-GC languages and Go.

We can see that Java and C# programs can often get close to Go speed in common applications like web services, just not memory usage.

Many Go programmers want to get close to C/C++/Rust speed with Go, they spend a lot of time trying to control allocations, doing weird things with MAXPROCS, changing when GC's happen, or allocating large virtual memory chunks to trick the GC.

gRPC Problem

gRPC is my chosen platform for RPCs, though this problem would affect most Golang RPC services or when object control leaves code you control and doesn't return.

Repeated large allocations are bad for speed according to every source because of the time it takes to allocate objects and the time the garbage collector has to spend tracking them.

In the old days, experts would tell you to use circular buffers based on channels to reuse your buffers. This allowed reuse on demand of expensive objects. This lowered your large allocations but the circular buffer wouldn't automatically adjust its size and might hold more memory than you needed or not enough and constantly need to create new objects. It needed some automation.

The Go authors added a standard one called sync.Pool. This is a fast free list that you can store heap objects in for reuse.

But here's the rub:

gRPC and third party libs control when an object will go out of scope. In gRPC, this can make expensive slices un-poolable. When you create and return an output object, you cannot pool a contained []byte slice because the gRPC service takes control and you no longer have lifetime control.

Deeper Look

This isn't actually just a gRPC problem, but any time you have to pass an object to third-party package, you loose reuse by pooling.

Here is a simple proto definition for a gRPC service with a method called Record().

service Recorder {
   rpc Record(Input) returns (Output) {}
}

The code below implements the interface.

func (g *grpcService) Record(ctx context.Context, in *pb.Input) (*pb.Output, error) {
	out := &pb.Output{}
	...
	return out, nil
}

The first problem is that "in" is a complete loss. We can't reuse it because we cannot tell gRPC where to get its next input object.

The output object we create within the Record() function would be reusable, but it is returned to the gRPC service object that then has control of its lifetime.

gRPC is unfortunately caught in a bad position here. They cannot pool objects because they cannot control input/output object lifetime.

Stubby, the Google internal version of gRPC, in the early days, had an interface that was similar to:

func (g *grpcService) Record(in *pb.Input, out *pb.Output) error {
	...
}

In this model they could have pooled, but the user would have to make copies of input/output objects if they were to live passed the Record() call. This is probably why they moved to a more standard function model because most SWEs would forget that detail and have data race issues. This is conjecture on my part.

Autopool - a use for runtime.SetFinalizer()

If you've never used runtime.SetFinalizer(), good for you. People like to think of them as destructors, but in a GC language that makes no guarantees on object lifetime, this just leads to problems.

A finalizer simply is a function that is called when the object pointed to is going to be garbage collected.

David Crawshaw had a good article about finalizers being less than useful, so I will list his articles here and let you investigate why they are bad from an expert:

What if we could use a finalizer to reclaim an object we have lost track of into a sync.Pool for reuse?

Enter autopool. Let's put it into the service.


type grpcService struct {
	...
	pool *autopool.Pool
	rescID int
	...
}

func newGRPC() *grpcService {
	...
	serv := &grpcService{}
	
	// Create our pool object.
	p := autopool.New()
	
	// Add a pool that will serve this object type and get the ID of the 
	// internal sync.Pool to pull from.
	serv.rescID = autopool.Add(reflect.TypeOf(&pb.Resource{}))
	serv.pool = p

	return serv
}

...

// Record implements the gRPC service Record() call defined in the protocol buffer.
func (g *grpcService) Record(ctx context.Context, in *pb.Input) (*pb.Output, error) {
	// Create our standard Output struct.
	var out = &pb.Output{...}
	
	// The output.Resource object has a []byte, which we want to be able to reuse.
	// So we yank it from our pool and reset the []byte to 0 length. You may have
	// to reset other fields.
	out.Resc = g.pool.Get(g.rescID).(*pb.Resource)
	out.Resc.Payload = out.Resc.Payload[0:0]
	
	// Somewhere here we'd want to modify the payload.
	...

	return out, nil
}

What you see happening here is that when we create our output object and pull a sub object that contains a []byte from our Pool.

So why are we able to reclaim our Protocol Buffers here where we could not before? And where is that happening?

autopool wraps standard sync.Pool(s) for object types you define. When you pull one of these objects, it works exactly like a sync.Pool except we add a finalizer to the object and inserts it back into pool after garbage collection tries to free it.

But you cannot guarantee when the pool will be added to?

That is correct, especially if you are trying to hack your GC with a lot of the tricks I see around the web. The GC runs at certain memory pressures, so autopool finalizers wont' be run necessarily when the object goes out of scope, or ever.

But on any service that is getting a constant stream of requests, this should happen often enough to cause this to fill our pool.

The cost of adding the finalizer is fairly low.

Is this worth doing?

From what I can tell, if your service is getting enough requests to keep a sync.Pool from freeing the memory and you have payloads at around 100KiB or higher, you start to see non-trivial gains.

Why not do this with all messages?

I gave that a try to see if this gave any benefit. I could not detect benefits based on just number of allocations, the size mattered.

Why not finalize just the slices then, wouldn't that be safer?

You can only finalize an object created by new() or taking the address of a composite literal. Reference types don't count.

Since my initial problem was about gRPC and protocol buffers (and proto3 specifically), I could not wrap my buffer. Even if I did, that would not guarantee that all references to the underlying array would be clear when the wrapper went away.

Keith Randall on go-nuts had a cool way of finalizing a slice's array, you can read about it here (thanks Keith).

However, that method did not allow me to capture the slice itself and was banking on a loophole that he was kind enough to point out is not spec compliant.

Garbage Collection is tricky beast, are you sure this won't cause problems?

Short answer: No

Longer answer:

Using this is like the unsafe package, you better be sure on what you are doing, and even then you might get bitten in the future.

When you manually use a sync.Pool.Put(), you are ensuring that the entire object is free for reuse otherwise you get some nasty bugs. When an object is finalized, you have no idea if a reference to an underlying slice is held somewhere.

So this technique is not completely safe, you have to KNOW that any slices or maps will not have any references held in the third-party code (like gRPC). When using this package you need to either version or static the code in your mod file as to avoid nasty surprises by changes in the upstream code.

gRPC seems quite happy at the moment with taking my output object and keeping no references to any of my fields once it serializes the output.

But imagine a pirate voice here: "Thar be bugs out thar!"

Let's See Some Numbers

I thought you'd never ask:

We have two types of Benchmarks testing a gRPC service:

Using the autopool
Not using the autopool

My benchmark environment:

Mac Pro Laptop circa 2015
Go 1.13

Note a few things:

I don't have a benchmarking machine
I'm not on Linux, which I am sure the go compiler makes more optimized binaries for
I could have done something wrong in my benchmarks. This is likely
I could be drawing the wrong conclusions

Let's talk about what the server does:

Receives a message
Creates an output message. That output message has a []byte field.
The []byte field is filled to some buffer size at 64 byte chunks at a time.
Sends the output message back, which drops it

GRPC Service Benchmark

Without Pool Summary:

Clients	Buffer Size	Requests	ns/op	B/op	allocs/op	Real	User	Sys
100	1K	100K	1238696350	1703020680	16959429	0m1.965s	0m8.653s	0m1.095s
100	10K	100K	3634045380	11715026968	17832296	0m4.472s	0m17.188s	0m7.466s
100	50K	10K	1311663465	4924447952	1932345	0m2.072s	0m4.658s	0m2.731s
100	50K	100K	15146798078	49332431456	19474287	0m16.331s	0m40.992s	0m27.367s
100	100K	10K	2602280177	10144042400	2150441	0m3.793s	0m7.413s	0m5.290s
100	100K	100K	33020455793	101572491024	21440598	0m34.581s	1m14.539s	0m57.384s
100	3M	10K	185087993259	329795532600	9040208	3m7.449s	4m16.773s	7m6.174s

With Pool Summary:

Clients	Buffer Size	Requests	ns/op	B/op	allocs/op	Real	User	Sys
100	1K	100K	1228820620	1514841392	16515117	0m1.942s	0m8.842s	0m1.170s
100	10K	100K	3373151710	7457301736	16614532	0m4.187s	0m15.015s	0m6.795s
100	50K	10K	1208806099	3166273448	1764917	0m1.992s	0m3.987s	0m2.550s
100	50K	100K	12690099859	31439344120	17791186	0m13.965s	0m31.267s	0m23.260s
100	100K	10K	2118462469	5874870160	1890988	0m3.351s	0m5.465s	0m4.063s
100	100K	100K	29157830020	59277607656	19369437	0m30.743s	0m57.534s	0m52.227s
100	3M	10K	131743623587	178073871536	8470306	2m14.139s	2m53.578s	4m32.568s

Conclusions

1K Slices

Has Pool	Buffer Size	Requests	ns/op	B/op	allocs/op	Real	User	Sys
No	1K	100K	1238696350	1703020680	16959429	0m1.965s	0m8.653s	0m1.095s
Yes	1K	100K	1228820620	1514841392	16515117	0m1.942s	0m8.842s	0m1.170s

9.87573ms decrease in op time, 433KiB allocation savings, 444,312 reduction in allocs.

Virtually no real time saved and a slight increase in kernel time. I'd say that there isn't enough benefit here to warrant usage.

10K Slices

| Has Pool | Buffer Size | Requests | ns/op | B/op | allocs/op | Real | User | Sys |
| No | 10K | 100K | 3634045380 | 11715026968 | 17832296 | 0m4.472s | 0m17.188s | 0m7.466s |
| Yes | 10K | 100K | 3373151710 | 7457301736 | 16614532 | 0m4.187s | 0m15.015s | 0m6.795s |

261ms decrease in op time, 4.0 GiB in allocation savings, 1,217,764 reduction in allocs.

Still almost no real world savings in time, slight reduction in user space and kernel space time. Wouldn't get excited about using it here.

50K Slices

| Has Pool | Buffer Size | Requests | ns/op | B/op | allocs/op | Real | User | Sys |
| No | 50K | 10K | 1311663465 | 4924447952 | 1932345 | 0m2.072s | 0m4.658s | 0m2.731s |
| Yes | 50K | 10K | 1208806099 | 3166273448 | 1764917 | 0m1.992s | 0m3.987s | 0m2.550s |

103ms decrease in op time, 1.6 GiB in allocation savings, 167,428 reduction in allocs.

Again, nothing to write home about here. But if you look at the runs for 100K, we start to see several seconds in time reduction both for real time and CPU times.

100K Slices

| Has Pool | Buffer Size | Requests | ns/op | B/op | allocs/op | Real | User | Sys |
| No | 100K | 10K | 2602280177 | 10144042400 | 2150441 | 0m3.793s | 0m7.413s | 0m5.290s |
| Yes | 100K | 10K | 2118462469 | 5874870160 | 1890988 | 0m3.351s | 0m5.465s | 0m4.063s |

484ms decrease in op time, 4 GiB in allocation savings, 259,453 reduction in allocs.

Here is where I think things get interesting. Real time doesn't really change that much, but we can see that we are spending less CPU time here. We gain a second on both User/Sys.

So if you are averaging over 100K in byte slices, this might be where this might help.

3MiB Slices

| Has Pool | Buffer Size | Requests | ns/op | B/op | allocs/op | Real | User | Sys |
| No | 3M | 10K | 185087993259 | 329795532600 | 9040208 | 3m7.449s | 4m16.773s | 7m6.174s |
| Yes | 3M | 10K | 131743623587 | 178073871536 | 8470306 | 2m14.139s | 2m53.578s | 4m32.568s |

53s decrease in op time, 141 GiB in allocation savings, 569,902 reduction in allocs.

At the far end here we can see some significant savings. We saved over a minute on our runtime and several minutes of reduction in CPU time.

So in the MiB region of slice size, being able to recover these slices can provide some significant savings.

Final Conclusions

I think I've found a good use for finalizers that could really help speed up software where control of objects is lost to third party packages.

You might be asking, why I think I've found a good use.

It is possible a mistake was made or an assumption went into this that is incorrect.
The conclusions may be wrong or attributed to another factor.
This is not peer reviewed.
This was not tested on the most popular platform, Linux. There could be optimizations there that make this mute.

I would also note that I'm not recommending this use. There may be hidden gotchas I haven't thought of and it is easy to have packages outside your control change how they treat your slices. In most use cases, you give up control when you pass an object.

The code is published. You are welcome to duplicate my findings or show where it is incorrect.

Until then, we won't know if this was just the fever making me delusional or I have found something interesting.

Until then, cheers and happy holidays.

Note on some gotchas

Protocol Buffers have a Reset() function. In proto3, this simply points the pointer at a new version of the struct. That destroys the slices. This means that you cannot use Reset() and this technique.
If you are thinking of linking to this code, realize it is in a development branch, it is subject to change.

Not Required: gRPC Client Certs in Go

John Doak — Fri, 21 Jun 2019 21:45:16 GMT

A little about TLS use in gRPC

When you look at most examples for gRPC client and server, there are two examples given:

Set grpc.WithInsecure for the client, turning off TLS
Create a self-signed certificate for the server and supply the public cert to the client

Both of these are bad practices.

WithInsecure just says: "Hey! Please man in the middle me". There are plenty of websites you might turn up where you really don't care about hacking potential or state-sponsored manipulation. But, generally gRPC services for your company are not going to be those.

Self-signed certificates are just a bad habit to get into. How are the certificates managed, are the properly secured (much harder than you think), how often do you rotate them? Having a Certificate Authority (CA) can provide you with robust mechanisms for generating, revoking and disseminating your private keys.

Your cloud providers such as Azure, AWS, and GCP offer services for interacting with many popular CAs and storage of these secrets.

In addition, for smaller shops, Let's Encrypt can offer TLS certificates for free.

Why supply a public cert anyways?

I was recently using gRPC for a few projects. I hadn't really used it with certificates before, as my previous experience had been with Google's internal version where this is automated away from the developer.

I was surprised that the examples required you to supply a certificate. Web browsers don't hold public certificates for TLS, why should my application?

Not being a browser's internals expert, I simply read up on the process. A browser reaches a site with TLS, retrieves the public cert from the service, validates its certificate chain and checks to see that the root CA that signed it is in the browser's trusted CA list.

So a small tool for auto-retrieval of certificates from a TLS server was born:

https://godoc.org/github.com/johnsiilver/getcert

Usage is simple:

tlsCert, xCerts, err := FromTLSServer("service.com:443", true)

This command will return the TLS certificate (including the public cert, intermediate cert and root certs if provided by the server) for use in gRPC or other HTTPS client. In addition, it returns it in x509 form in case you wish to do inspections on the information (such as do you trust the CA).

The boolean simply does a validation check against the certificate chain to make sure that the root signs the intermediate which should sign the public and that it is for the endpoint specified. You may turn off validation, but that is not recommended unless you are going to do the validations yourself.

You can use this to supply your applications now without pulling the public certificate from storage.

A note on PKCS#12

I have been dealing with PCKS#12 lately as that is what one of the CAs I use returns. As Go really likes PEM, I have been needing the use of the PKCS#12 library that is published in golang.org/x/crypto/pkcs12 .

I'm grateful for the library, as I wouldn't want to have to try and unravel the RFC if there even is one. However, the library still has that "in progress" API feel to it and lacks some features.

The Decode() method for one cannot handle public certificate chains (and notes it). You need to use another function that doesn't get you to where you need to be to use the returned values.

It also lacks the ability to decode PKCS#12 certificates that can be generated by Windows CAs. There are a few OIDs there that are used in OSCP and others I haven't found good documentation on. Those aren't required for the general use case.

I have a wrapper that should make this easier based on a vendored version of the pkcs12 package:

https://godoc.org/github.com/johnsiilver/getcert/pkcs12

Please be aware there are no tests here, so your mileage may vary. I'm sure I haven't compensated for all uses and encodings. If nothing else, maybe someone can use this to make a much better package.

Author Note:

I am not a TLS expert. At best I am a TLS hacker by reverse engineering code by my betters.

Because of that I use existing code with wrappers or slight modifications.

All validation comes out of the stdlib, which doesn't do revocation checking. You may wish to test for this and other things (such as you trust the CA) before using the certificate.

About the Author

John Doak is the manager of Process Engineering for the Azure Fleet Program and the Principal Automation SWE for Azure Fleet at Microsoft.

Previously he was a Google Staff Site Reliability Engineer, a Network Systems Engineer (a now defunct subtype of SRE for Network Systems of which he was Google’s first), and a Network Engineer (among other titles).

In a previous life he worked on movies and games for LucasArts/LucasFilm/ILM as a Network Engineer/Systems Admin.

Contact
Website (Golang, SRE): www.gophersre.com,
Website (Photography): www.obscuredworld.com
Linkedin: https://www.linkedin.com/in/johngdoak/

HalfPike: A framework to avoid problems with standard regexes in operational tooling

John Doak — Thu, 07 Mar 2019 19:20:32 GMT

Introduction

It is my contention that naked regular expression use (regexes) within SRE and DevOps software and tooling provides for:

Hard to read code
Difficult to reason when errors occur
Bad matches that are assumed to be good data
Causing outages or unintentional consequences in tooling

It is not the contention that regexes are bad in nature, but that their ease of use combined with abilities to match large and variable amounts of characters (wildcards) allow for problematic use cases. Not all use cases are equal.

This paper concentrates on a single use case of which I have experience and the introduction of a framework to alleviate problems with this use case, called the HalfPike method.

Before going further

It should be noted that like a lexer/parser implementation, this methodology takes longer to develop.

You are trading these:

10x longer to develop a solution
Much more verbosity (100x or more)
Must teach new people the methodology (about 20 minutes)

For these advantages:

Debug time is almost instantaneous
Code can be read and deciphered quickly
Validation of your data before use
Death to stringly typed fields (was it “Enabled”, “enabled”, “ENablEd”)

Many of these advantages can be overcome with regexes. But like many good programming paradigms, the structure is a guard rail that encourages best practices across large groups of people. This encouragement is powerful in creating good software. Regexes tend to be bad at this encouragement.

There is upfront cost, but I believe over time these costs are paid back with interest in time savings from debugging and operational errors.

Finally, the halfpike package and examples can be found here.

Documentation can be found here

Background

During my years working on network automation systems for Google we had a constant problem. We needed to talk with network devices and extract various states from a human readable string format to concrete types in our native languages . SNMP could not provide complete information and streaming telemetry was still a pipe dream.

Of our early platforms, only Juniper routers could give us the data in a native format, XML. The XML was quite painful to use and IOS, IOSX, EOS, whatever Brocade calls their OS could not export to machine readable format like JSON or Protocol Buffer. This would change in the future, but at that time we could not wait until the vendors could deliver.

For years, the common way we did this was using regular expressions.

This was not a well thought out decision on our part. Most of us doing this type of work had never written a lexer or had done so long ago and were not eager to relive the pain. Regexes were simply something that we could get off the shelf.

Our output was often multi-line, so the early versions either used some type of loop with detection of a starting point where we then regexed a line we understood or built extra-ordinarily hard to read regexes that could handle multiple lines of output.

Rarely did we use named matches, instead choosing to use positional arguments. Worse yet, we assumed the matches were correct (I mean, they matched our 1 line of test data didn’t they!?).

One of my co-workers eventually created a framework around regexes. His implementation allowed you to build complex multi-line regexes that would handle getting matches back as a table in which you could then use the positional arguments of the table matches to store into an object’s attributes.

This certainly looked like a massive improvement at first, but uses of the tooling proved hard to debug. Like most regexes that were written, new versions of an OS or under new internal conditions might alter the output. Because the software output was provided by third-party software outside our control, there was no dictionary of terms we could use to be 100% sure of what would be returned over every version.

The solution to the debugging problem was to create a debugging tool to help resolve issues that arose. However, I felt that needing a debugger for my regex matching issues was proof that this wasn’t the method we should be using (very similar to how I felt when our Python programs needed us to substitute our own malloc to make them work, #deathtopython).

In addition, the framework would often use overly greedy regexes to match something incorrectly and cause operational problems by providing us with bad data. When 30 engineers might make changes, things slip through if no guard rails exist.

This caused configuration problems and tooling issues in different automation software when it would act incorrectly.

I began researching various lexing software such as ANTLR. I was sure I was on the right track, but I just couldn’t seem to make it through more than 2 pages of the ANTLR book without falling asleep. The only other technical book that had this effect on me was the Sendmail book where I found if I was having insomnia, I would just try to read a chapter.

Eventually, because of my work using Golang #deathtopython#, I came across a talk by Rob Pike on lexers and I was intrigued by Pike’s turn on the standard state machine for lexing. It is not an easy talk to get through, but once you get the concept down (on my 3rd viewing), you appreciate the methodology.

However, the lexer/parser he described was not a perfect fit for us. It was for a well described language where the syntax was completely known ahead of time. It also is word oriented, which provides for more verbosity.

Writing lexer/parsers like this was also too far of a leap to bring along my SRE colleagues from “simple regexes”. Looking at lexer/parser code is much more verbose and until you reason about them for a bit, they look harder to understand, not easier.

But I had hard requirements for a new system:

Do not want to decipher long regexes
No special debuggers, when it breaks I want to know exactly why
Bad data can never sneak through, so everything must validate
Avoid stringly typed fields, instead use enumeration.

What emerged was a technique I called the HalfPike. The proving ground was to be a command/configuration abstraction service I was writing. Up to this point, different automations each wrote and read commands via a connection abstraction service to different device types. The new service created a common way to command/configure a device with common data structures, regardless of the device type. This allowed all automation services to use a common way to ask about BGP state, have a device ping a neighbor address or configure an interface. What OpenConfig wants to do with configuration).

The service was written to extract and transform data from devices that had machine readable formats (JSON, XML, Protocol Buffers) into our common data types when available. When it wasn’t, the human readable format would be converted via a HalfPike lexer/parser into the common data type.

In both cases, we would transform string data into enumerated types when possible.

Finally, we required that each HalfPike implement a Validate() method that checked each attribute for the correct values or combination of values required. By codifying this as a requirement , we avoided misconfiguration of devices with bad data, instead causing the read to fail.

Fixes went from days to minutes and updates to systems in a few hours.

What Is A HalfPike

The HalfPike method derives its name from borrowing half of it’s implementation from Rob Pike’s lexer talk. Mr. Pike certainly has not endorsed this.

The HalfPike differs from Pike’s lexer/parser in the following ways:

Lexed Item(s) fall into predefined categories and there is a common lexer for all use cases
Uses line boundaries over word boundaries in the Parser
May use regexes for complex decomposition of an Item or Line
Provides methods for skipping lines or finding a particular line in output

The HalfPike methodology differs from standard regex based approaches by:

Preference on enumerations over string values
Requiring validation of data fields
Encouraging numeric conversion from strings
Line based approach (regexes can be multi-line)

None of this is 100% foolproof. But we found that using this methodology we were never surprised by our field data. We would need to adjust our parser for command output when we would encounter new data output we had not seen before or the format would change when a new OS version was being tested.

This technique was used until we could deprecate older devices unable to do structured output and/or get vendor’s to implement structured output for their devices. However, many shops still have these problems (not everyone can update hardware with new OS capabilities so quickly) and the technique is usable for other system output that needs to be parsed from human readable to machine readable structured data.

The original version that I used at Google was simply a codified state machine, a few interfaces and enforcement by code review.

Below I will provide a framework to make this as simple as possible for a new user to get into.

HalfPike Lexer

To start, if you have not seen Rob Pike’s [Lexer talk](Lexical Scanning in Go - Rob Pike - YouTube), this provides for a good introduction to a lexing engine used for a templates.

Our HalfPike lexer emits items similar to the item type in Pike’s talk. Here’s a look:

// Item represents a token created by the Lexer.
type Item struct {
	// Type is the type of item that is stored in .Val.
	Type ItemType
	// Val is the value of the item that was in the text output.
	Val string
}

We support a few ItemType(s) that the Parser will have to deal with:

const (
	// ItemUnknown indicates that the Item is an unknown. This should only happen on
	// a Item that is the zero type.
	ItemUnknown ItemType = iota
	// ItemEOF indicates that the end of input is reached. No further tokens will be sent.
	ItemEOF
	// ItemText indicates that it is a block of text separated by some type of space (including tabs).
	// This may contain numbers, but if it is not a pure number it is contained in here.
	ItemText
	// ItemInteger indicates that an integer was found.
	ItemInteger
	// ItemFloat indicates that a float was found.
	ItemFloat
	// ItemEOL indicates the end of a line was reached.
	ItemEOL
)

The lexer will only emit these tokens. The most common will be ItemText. But if the text contained a pure integer or pure float, these will be emitted.

Once a ItemEOF is reached the lexer will be done emitting tokens. An ItemUnknown should never be emitted and if it is seen by the parser it always indicates an internal error in the framework.

Spaces are never emitted nor are blank lines.

In our HalfPike framework, the lexer is hidden. The user simply has to deal with items output by our lexer and use the Parser framework to parse the output.

HalfPike Parser

The Parser is where all the magic comes in for the user. Here we have our structure for storing data we will parse, helpers to skip through input, etc….

But to talk about the Parser correctly, we need to talk about a few other constructs.

Line Objects

// Line represents a line in the input.
type Line struct {
	// Items are the Item(s) that make up a line.
	Items   []Item
	// LineNum is the line number in the content this 
  // represents, starting at 1.
	LineNum int
	// Raw is the actual raw string that made up the line.
	Raw     string
}

The Line object details the content of a line. .Items gives us the list of lexed Items that make up the line. No spaces are provided, but each Line will end with a combination of either a ItemEOL or ItemEOL+ItemEOF.

ParseFn

type ParseFn func(ctx context.Context, p *Parser) ParseFn

The ParseFn is where you write the meat of the program. You receive our Parser object and use it to loop through Line objects until an ItemEOF is reached.

Once you finished with a line or set of lines, you return the next ParseFn that will handle content. This makes up a basic state machine.

If you return nil, then parsing stops.

Within your ParseFn, you need to move through content. This is where the Parser comes in.

Parser

The Parser has a few methods worth noting:

// Errorf records an error in parsing and returns a nil ParseFn.
func (p *Parser) Errorf(str string, args ...interface{}) ParseFn {...}

You will use return p.Errorf() whenever you want to return an error and stop parsing. It returns a nil ParseFn so you don’t have to do a separate “return nil”.

// Next moves to the next Line sent from the Lexer. That Line is returned. If we haven't
// received the next Line, the Parser will block until that Line has been received.
func (p *Parser) Next() Line {...}

Next() is our basic method of getting content. You call Next() to receive the next Line object in the content.

// Backup undoes a Next() call and returns the items in the previous line.
func (p *Parser) Backup() Line {...}

Backup() goes back one Line of content and returns that Line. It is often used after an initial ParseFn is used to detect the start of input but another ParseFn will do the parsing.

// EOF returns true if the last Item in []Item is a ItemEOF.
func (p *Parser) EOF(line Line) bool {...}

EOF() is used to detect if a Line is the end of the file. Next() will continue to return the last Line once the end of content is reached. EOF() allows you to detect and break out of any loop.

// Peek returns the item in the next position, but does not change the current position.
func (p *Parser) Peek() Line {...}

Peek() is used to see the next Line of content without moving to that Line.

// FindStart looks for an exact match of starting items in a line represented by Line
// continuing to call .Next() until a match is found or EOF is reached.
// Once this is found, Line is returned. This is done from the current position.
func (p *Parser) FindStart(find []string) (Line, error) {...}

FindStart() takes a list of strings that represent Item.Val at the beginning of a line and calls Next() until it finds a line that has a match.

A special constant called Skip can be used to match any content. An error is returned if the end of content is reached and we have not found a Line with a match.

// FindUntil searches a Line until it matches "find", matches "until" or reaches the EOF. If "find" is
// matched, we return the Line. If "until" is matched, we call .Backup() and return true. This
// is useful when you wish to discover a line that represent a sub-entry of a record (find) but wish to
// stop searching if you find the beginning of the next record (until).
func (p *Parser) FindUntil(find []string, until []string) (matchFound Line, untilFound bool, err error) {

FindUntil() is similar to FindStart(), except it will stop searching if the "find" or “until” argument is found. If "find" is found, the line is returned. If "until" is found, .Backup() is called and we return true.

This allows searching through entries that belong to a record for "find" but stopping if we find the beginning of the next record denoted by "until".

// IsAtStart checks to see that "find" is at the beginning of "line".
IsAtStart(line Line, find []string) bool {...}

IsAtStart() is the basis for FindStart() and FindUntil(). This can be used to make your own searching method if the others don’t fit.

FindREStart(find []*regexp.Regexp) (Line, error) {...}
IsREStart(line Line, find []*regexp.Regexp) bool {...}

There are the same as FindStart() and IsAtStart() except that instead of exact string matches it uses regular expressions.

Let’s Parse

Let’s have a really simple example for parsing:

Physical interface: ge-3/0/2, Enabled, Physical link is Up
  Link-level type: 52, MTU: 1522, Speed: 1000mbps, Loopback: Disabled,
Physical interface: ge-3/0/3, Enabled, Physical link is Up
  Link-level type: 52, MTU: 1522, Speed: 1000mbps, Loopback: Disabled,

This is output for a Juniper router using the “show interfaces brief”. Now there is actually more output here, but these are two lines at the start of an entry I care about in our example.

Data types to store in

// Interfaces is a collection of Interface information for a device.
type Interfaces []*Interface
func (i Interfaces) Validate() error {
	for _, v := range i {
		if err := v.Validate(); err != nil {
			return err
		}
	}
	return nil
}

type LinkLevel int8
const (
	LLUnknown LinkLevel = 0
	LL52 LinkLevel = 1
	LLPPP LinkLevel = 2
	LLEthernet LinkLevel = 3
)

type InterState int8
const (
	IStateUnknown InterState = 0
	IStateEnabled InterState = 1
	IStateDisabled InterState = 2
)

type InterStatus int8
const (
	IStatUnknown InterStatus = 0
	IStatUp InterStatus = 1
	IStatDown InterStatus = 2
)

// Interface is a brief decription of a network interface.
type Interface struct {
	// VendorDesc is the name a vendor gives the interface, like ge-10/2/1.
	VendorDesc string
	// Blade is the blade in the routing chassis.
	Blade int
	// Pic is the pic position on the blade.
	Pic int
	// Port is the port in the pic.
	Port int
	// State is the interface's current state.
	State InterState
	// Status is the interface's current status.
	Status InterStatus
	// LinkLevel is the type of encapsulation used on the link.
	LinkLevel LinkLevel
	// MTU is the maximum amount of bytes that can be sent on the frame.
	MTU int
	// Speed is the interface's speed in bits per second.
  	Speed int

  	initCalled bool
}

// init initializes Interface.
func (i *Interface) init() {
		i.Blade = -1
		i.Pic = -1
		i.Port = -1
		i.MTU = -1
		i.Speed = -1
		i.initCalled = true
}

// Validate implements halfpike.Validator.
func (i Interface) Validate() error {
	if i.VendorName == "" {
		return fmt.Errorf("interface did not have a valild VendorName")
	}
	switch -1:
	case Blade, Pic, Port:
		return fmt.Errorf("interface %s did not have a valid Blade/Pic/Port(%s/%s/%s)", i.VendorName, i.Blade, i.Pic, i.Port)
	case MTU:
		return fmt.Errorf("interface %s did not have a valid MTU", i.VendorName)
	case Speed:
		return fmt.Errorf("interface %s did not have a valid Speed", i.VendorName)
	}

	if i.State == IStateUnknown {
		return fmt.Errorf("interface %s did not have a valid state", i.VendorName)
	}

	if i.Status == IStatUnknown {
		return fmt.Errorf("interface %s did not have a valid status", i.VendorName)
	}

	if i.LinkeLevel == LLUnknown {
		return fmt.Errorf("interface %s did not have a valid link level", i.VendorName)
	}

	return nil
}

Note: This is not a particularly good format. First, I’d use protocol buffers or some other cross language type for storage. Second, not everything has a blade and a pic. You need to have better methodology to handle these cases. But this isn’t a lesson on network representation formats. Also you should be able to get structured format from the Juniper, this is just an example.

Define our parsing states with ParseFn(s)

To make this transformation, we will need to create states in a state machine to handle searching output and turning it into this format.

I’m choosing here to bundle our ParseFn(s) into a type called interfaceBriefParsers. It stores the interfaces we find and a copy of our *Parser object.

type interBriefParsers struct {
	parser *Parser
	inters Interfaces
}

func (i *interBriefParsers) errorf(s string, a ...interface{}) ParseFn{
	if len(i.inters) > 0 {
		v := i.current().VendorDesc
		if v != "" {
			return i.parser.Errorf("interface(%s): %s", v, fmt.Sprintf(s, a...))
		}
	}
	return i.parser.Errorf(s, a...)
}

Here we have a convenience wrapper for writing errors. If we were able to parse the VendorDesc of an interface (like ge-0/1/1), we use that in our error output. If not, we just detail the error.

var phyStart = []string{"Physical", "interface:", Skip, Skip, "Physical", "link", "is", Skip}

// Physical interface: ge-3/0/2, Enabled, Physical link is Up
func (i *interBriefParsers) findInterface(ctx context.Context, p *Parser) ParseFn {
	if i.parser == nil {
		i.parser = p
	}

	// The Skip here says that we need to have an item here, but we don't care what it is.
	// This way we can deal with dynamic values and ensure we
	// have the minimum values we need.
	// p.FindItemsRegexStart() can be used if you require more
	// complex matching of static values.
	_, err := p.FindStart(phyStart) 
	if err != nil {
		if len(i.inters) == 0 {
			return i.errorf("could not find a physical interface in the output")
		}
		return nil
	}
	// Create our new entry.
	inter := &Interface{}
	inter.init()
	i.inters = append(i.inters, inter)

	p.Backup() // I like to start all ParseFn with either Find...() or p.Next() for consistency.
	return i.phyInter
}

Here is our starting ParseFn. Simply we use the FindStart() to locate the first line that matches phyStart. This is the beginning of a record for us to store.

Once found, we create a new Interface{} object and append it to our list of interfaces we find.

Finally, we do a .Backup() and pass the line to a ParseFn called phyInter to break down the line. We could have just done this here, but I find this cleaner, at least for the first line of a record.

var toInterState = map[string]InterState{
	"Enabled,": IStateEnabled,
	"Disabled,": IStateDisabled,
}

var toStatus = map[string]InterStatus{
	"Up": IStatUp,
	"Down": IStatDown,
}

// Physical interface: ge-3/0/2, Enabled, Physical link is Up
func (i *interBriefParsers) phyInter(ctx context.Context, p *Parser) ParseFn {
	// These are indexes within the line where our values are.
	const (
		name = 2
		stateIndex = 3
		statusIndex = 7
	)
	line := p.Next() // fetches the next line of ouput.

	i.current().VendorDesc = line.Items[name].Val // this will be ge-3/0/2 in the example above
	if err := i.interNameSplit(line.Items[name].Val); err != nil {
		return i.errorf("error parsing the name into blade/pic/port: %s", err)
	}
	
	state, ok := toInterState[line.Items[stateIndex].Val]
	if !ok {
		return i.errorf("error parsing the interface state, got %s is not a known state", line.Items[stateIndex].Val)
	}
	i.current().State = state

	status, ok := toStatus[line.Items[statusIndex].Val]
	if !ok {
		return i.errorf("error parsing the interface status, got %s which is not a known status", line.Items[statusIndex].Val)
	}
	i.current().Status = status
  return i.findLinkLevel

Note: There is a convenience method called .current() that gives us the current Interface{} that we are working on.

phyInter starts by grabbing the line via the .Next() call. We want to record the Vendor’s description of the interface before we break it down. There are a few constants that record the index in the Line.Items where that entry should be located. We record the VendorDesc attribute by simply referencing the index where it should be stored.

Then we need to break down the vendor’s description and turn it into our Blade/Pic/Port entries. To do this we pass that string representation to interNameSplit(). We will detail a little further.

Next, we want to get our interface state and status. This is very similar to VendorDesc, except that we want to convert to a known enumerator type. We define a few maps, toInterState and toStatus to handle this.

Finally, if we have had no issues, we return our next state, which is findLinkLevel.

Now, let’s go back to interNameSplit().

// ge-3/0/2
var interNameRE = regexp.MustCompile(`(?Pge)-(?P\d+)/(?P\d+)/(?P\d+),`)

func (i *interBriefParsers) interNameSplit(s string) error {
	matches, err := Match(interNameRE, s)
	if err != nil {
		return fmt.Errorf("error disecting the interface name(%s): %s", s, err)
	}

	for k, v := range matches {
		if k == "inttype" {continue}
		in, err := strconv.Atoi(v)
		if err != nil {
			return fmt.Errorf("could not convert value for %s(%s) to an integer", k, v)
		}
		switch k {
		case "blade":
			i.current().Blade = in
		case "pic":
			i.current().Pic = in
		case "port":
			i.current().Port = in
		}
	}
	return nil
}

Here we will use a regex in a very limited capacity. We use named matches to break apart the interface name into the interface type, the blade, the pic and finally the port. Below, we convert those entries to their numerical representation and store it. We skip the interface type, because this isn’t important (in real life this might actually be important on a platform once your reach speeds like 40g, 100g, 400g that can be broken into multiple logical ports via breakouts). We will simply use the Speed of the port to understand what the speed is.

// Link-level type: 52, MTU: 1522, Speed: 1000mbps, Loopback: Disabled,
func (i *interBriefParsers) findLinkLevel(ctx context.Context, p *Parser) ParseFn {
	const (
		llTypeIndex = 2
		mtuIndex = 4
		speedIndex = 6
	)

	line, until, err := p.FindUntil([]string{"Link-level", "type:", Skip, "MTU:", Skip, "Speed:", Skip}, phyStart)
	if err != nil {
		return i.errorf("did not find Link-level before end of file reached")
	}
	if until {
		return i.errorf("did not find Link-level before finding the next interface")
	}

	ll, ok := toLinkLevel[line.Items[llTypeIndex].Val]
	if !ok {
		return i.errorf("unknown link level type: %s", line.Items[llTypeIndex].Val)
	}
	i.current().LinkLevel = ll

	mtu, err := strconv.Atoi(strings.Split(line.Items[mtuIndex].Val, ",")[0])
	if err != nil {
		return i.errorf("mtu did not seem to be a valid integer: %s", line.Items[mtuIndex].Val)
	}
	i.current().MTU = mtu

	if err := i.speedSplit(line.Items[speedIndex].Val); err != nil {
		return i.errorf("problem interpreting the interface speed: %s", err)
	}

	return i.record
}

Here we search through all of the records entries until we either find the Link-Level line or we find the next record.

If we find a record without finding a Link-Level line, it is an error.

Similar conversion is done for numeric values. There is a speedSplit() method for converting the measured multiplier (kbps, mbps, gbps) to allow recording of the value as bps, the common denominator for all interfaces.

If everything is successful, we move to a state that records our record.

// record our data back to the parser.
func (i *interBriefParsers) record(ctx context.Context, p *Parser) ParseFn {
	i.parser.Validator = i.inters
	return i.findInterface
}

Here we assign our internal slice back to our parser’s Validator attribute.

And finally, we go searching for more interfaces going back to the start of our state machine (findInterface()).

Now, let’s do some parsing!

func main() {
	// Creates our parers object that our various ParseFn functions will use to move
	// through the input.
	p, err := NewParser(showIntBrief, Interfaces{})
	if err != nil {
		panic(err)
	}

	// An object that contains various ParseFn methods.
	states := &interBriefParsers{}

	// Parses our content in showBGPNeighbor and begins parsing with states.FindPeer
	// which is a ParseFn.
	if err := Parse(context.Background(), p, states.findInterface); err != nil {
		panic(err)
	}

	fmt.Println(pretty.Sprint(p.Validator.(Interfaces) ))
}

Here we create our Parser via NewParser(). We pass it the content we wish to parse (showIntBrief) and what we will store the data into (Interfaces{}) which must satisfy the Validator interface.

Next we create an instance of our state machine and assign that to states.

Finally, we start our parsing by calling Parse() and pass it the start of our state machine, states.findInterface.

Parse() will run through all the states and call Validate() on the object that is stored in the Parser.Validator attribute. If that fails it will cause Parse() to return an error.

As a necessity I needed to distriguish between a zero value for numbers and an attribute being set. I did this by including an .init() function on Interface{} objects. This sets numeric fields to values like -1 which are not valid values for the attribute. Validate can check those values. To ensure that .init() is called, it sets a private variable called initCalled = true.

When Validate() is run, it is automatically failed if !initCalled.

Conclusion

Sometimes human readable data needs to be converted to concrete representations. While working with vendors and upstream providers to fix this issue, sometimes we need to get work done.

To this end, regexes tend to be problematic in large development environments for data quality that can affect operational reliability. No data is worse than bad data in this regard. Bad assumptions lead to bad operations that can be catastrophic.

The HalfPike is one way to mitigate those issues in an easy to reason and diagnose method.

Happy coding and may your pager stay silent!

About the Author

John Doak is the manager of Process Engineering for the Azure Fleet Program and the Principal Automation SWE for Azure Fleet at Microsoft.

In a previous life he worked on movies and games for LucasArts/LucasFilm/ILM as a Network Engineer/Systems Admin.

Contact
Website (Golang, SRE): www.gophersre.com,
Website (Photography): www.obscuredworld.com
Linkedin: https://www.linkedin.com/in/johngdoak/

Go Language Basics

John Doak — Fri, 28 Dec 2018 22:24:26 GMT

Today I am announcing a Go Language Basics class. The class is provided as a free video and accompanying class materials in PDF format.

The class is meant to bring new programmers up to speed with Go. It is heavy on exercises and because its video, it provides animated illustrations of slice internals, pointers, ...

This class provides similar materials to what I taught at Google and Microsoft around the globe.

The video series can be found here:
http://www.golangbasics.com/

The class syllabus:

Why use Go
Why not to use Go
Packages
Types
Variables
Loops
Conditionals
Functions
Public/Private
Scopes
Structs
Pointers
Maps/Slices
Variadic Functions
Error Handling
Anonymous Functions
Defer/Panic/Recover
Interfaces
Go Routines
Channels
Syncronization
Constants
Blank Interfaces
Embedding/Composition
Writing Tests

To Hell With Testing and Documentation!

John Doak — Thu, 05 Jul 2018 08:07:32 GMT

What the worst projects say!

Have you heard any of these:

What's a test?
I don't have the time to write a test!
We have enough internal knowledge, we don't need to document.
I'd rather be working on code, not documenting!

When I talk to people about their projects or trying to decide on which my third-party project dependencies to rely on, I often look at testing and documentation methodologies.

The projects that I come across without adequate testing and documentation are the ones I find problematic over time to deal with. There are certainly other things that make a project bad to rely on, but these two are usually sure signs.

And the engineers always have excuses for why they don't have these things, but I find that it usually comes down to a single thing:

Writing tests and documentation sucks.

Occasionally, I have met the individuals who love writing tests. I am not one of them. Most programmers I have met would much rather be building something instead of validating that something works or writing documentation. The best programmers I have met play with ideas, but start writing their mental models into documentation before code starts becoming a project.

Like wearing a seatbelt is critical for safety, writing tests and documentation is critical for long-term success and velocity. That I don't like wearing a seatbelt, writing documentation or writing tests has no effect on the validity of that statement.

Wanna watch velocity die? Don't write tests

So I will occasionally hear about how writing tests cause velocity to slow. I always get a chuckle out of this.

There are two things that you can do to guarantee that your code velocity will die:

Use a dynamically typed language like Python or Node.js
Don't write tests

I'm not going to spend time on dynamic typing here, but basically, you are just waiting for runtime bugs to occur that could have been caught in compile. Yeah it's fast at first. That doesn't last.

Without tests though, how do you know your functions and methods even work? When someone makes a change to the code, how do you know that the 500 lines work the way you think it should?

When the new person comes in and makes a change, how do you know the change they made is good? If you are in a dynamic language, hell you can't even be sure they are returning the right type! If you are in a typed language, they might be returning the right type, but doesn't mean they are returning the right value.

Can you look at complicated code from 3 years ago you wrote and be sure in a review, without tests, that the code change is good? If you said "yes", well you are certainly better than me. Matter of fact, you are better than all the programmers I've met, and I've met some of the best.

Now there may be things you can't test or the time to write a test vs. payoff is too low. Game engines probably have display code that is really hard to test that isn't worth it. But I have also seen game programmers write virtually no tests for "velocity". Having met some others from the game industry, this seems like a norm (certainly haven't done a study, its just hearsay with a few people's observations).

The long-term results are always the same, each change keeps accumulating small bugs that would have been caught with adequate testing. The larger the code base, the more of these that will creep in. And velocity will go off a cliff.

Don't document? What's that smell? Oh right, burning money

There are two types of documentation, both are important:

Code documentation
Software documentation

These two very separate beasts but are critical to understanding software.

Code documentation refers to comments within the code that can be extracted to understand libraries that make up software. Godoc/Pydoc/... offer structure to write this type of documentation and extract it to allow understanding of library structure.

Software documentation is a level up from code documentation. It explains the overall architecture of the software, how to use the software, etc...

If you are a small company, these can be critical. Hiring new people or replacing people on a project becomes a long onboarding exercise without software/code documentation.

If you are a large company, this makes handing off software between teams difficult. And it incurs the same costs as a small company, though these are typically hidden by company scale. But nevertheless, it is costing you money.

Some engineers like the idea of being the only person who can fix a problem or use lack of documentation as a way to have control. Managers sometimes also encourage this behavior. It is like job security through obscurity. These are typically bad behaviors for the engineers and the company that should be discouraged.

You have documentation, but it is gathering dust

This is one of the most common problems I see. Large companies, small companies, ... documentation gets out of date.

Usually, this is at the software documentation level. If the initial documentation was good, then this usually isn't because the team didn't care. Usually, it is because it is an afterthought.

This is a hard problem to solve. My general belief is that to fix this, you have to put documentation next to the code. Both have to be checked into version control.

This does a few things which I think are important:

Documentation is always seen. To ignore updates, usually, the engineers have to make a conscious choice.
Allows you to create documentation metrics alongside code change metrics.

The first brings awareness of the documentation to the engineer. Otherwise, this usually is a secondary thought, if thought of at all. The code reviewer (and if you don't have these, you are in trouble already) also have the opportunity to stop the submit without a documentation update.

The second is critical. Without tracking of metrics, teams cannot be aware when large deviations between code updates and software updates have occurred. This can be leveraged for company-wide tracking of documentation.

Incentivizing

I once witnessed a new management system for an operations group come online. This system required adding lots of assets manually. But adding these entries didn't help the engineers in any way.

I tried to explain to the manager in charge of integrating the system into our processes that this wasn't going to work. The system required the engineer to do twice the work with no benefit to them. The manager replied, "they will have to because management is dictating it must be done". I looked back at the guy like he was nuts and he looked at me as if this was a sure thing. One of my failures at that time was not having the skills to communicate with him in a way that would have lead to the right outcome. By this time there was a lot of pressure to get the system out the door. To have gotten this right the software team would have required another 1 year of work.

Looking back, the politics of delaying any launch would not have been in his favor. It was easier to believe that mandates would work. If the system failed to be used, the blame would not fall on him. If he caused delays on the launch, however, he would suffer repercussions. The dev team would have been upset that the correct requirements had not been pushed to them. The Ops team members who were supposed to be shepherding development would have been on the hook for some blame. So...

The system came online and within a week the diff between the live state and the recorded state was thousands of pages. This mistake on human behavior on the part of management and the software designers cost the company tens of millions of dollars over the years. Not only in the hacks that were required to keep the data up to date but the huge cost of removing the hacks (which took years of work).

Human behavior often dictates the success or failure of something within an organization, not technology. Not that mandates cannot help a situation, but it certainly won't guarantee it.

The key to any organization change is encouraging the right behavior. Management saying "you must document" or "you must test" will not get you the desired outcome. The people writing your software must see the value and be incentivized to do it.

I've always liked the saying you can have something "good, fast or cheap. Choose two!" If you want to get good documentation, you will have to incentivize it and loose the "fast" or "cheap" option.

How you incentivize will be organizationally dependent. But using metrics and adding in compliance checking can help. However only if you are giving the right incentives and promoting the right software culture. If you keep asking for "fast" and "cheap", you will keep reaping the same long-term problems.

So what can you do?

Well, you have a few problems to solve:

Increasing the visibility of documentation to encourage regular updates
Incentivizing documentation updates and tracking documentation drift
Incentivizing testing and providing metrics around code coverage
Changing the culture

The first one is easy and has multiple ways to do it. If I was to give a recommendation, I would use markdown and store it close to the code. Multiple languages can render it in whatever style you choose, it is easy to index, and for every project you know where the documentation is.

This allows you to track how often documentation changes alongside code changes. This can give you warning signs on documentation drift.

The second and third is harder and requires some personalization for the particular company. Having design documentation as part of promotion consideration could be one way (though sometimes this can incentivize the wrong thing, I've seen this go sideways). Preventing product launches without up to date documentation and code testing coverage of say 70% can be another. Additional bonuses based on documentation/testing metrics could add another early incentive. Hiring in people with strong positions on these things can also help turn a culture around. Remember that the idea is to create the culture and eventually it will self-reinforce.

The key thing to understand is:

The long-term viability of a project can often be tracked to documentation and testing.

Whatever you can do to make that better will help you in the long run.

Using No-Op Objects For Cleaner Code

John Doak — Sun, 15 Apr 2018 22:51:33 GMT

Sooner or later every Go programmer will want to extract an object from another object such as Context or utilize an object that can be set to nil throughout the call stack. However, this can lead to ugly code that looks like:

func someFunc(obj *MyObj) { 
    if obj != nil {
        DoThis()
    }
  
    otherFunc(obj)
}

func otherFunc(obj *MyObj) {
    if obj != nil {
        DoThis()
    }
    ...
}

I this case, if our obj != nil, we want to perform an action.

A use case I recently ran into was the extraction of a *Tracer object from a Context in an RPC call. A Tracer object not being part of the Context is a valid state we want to support, as not all executions should receive a Tracer.

The basic structure of the tracer library looked similar to this:

// Extract a Tracer from a Context object that is at key.
// Note: I'm using a string as the key type for brevity, best
// practices say use a custom type.
func FromContext(ctx context.Context, key string) *Tracer {
    v := ctx.Value(key)
    if v == nil {
        return nil
    }
    return v.(*Tracer)
}

// NewContext adds a Tracer to the Context at key.
func NewContext(ctx context.Context, key string, t *Tracer) context.Context {
    ...
}

// Tracer provides execution tracing for RPCs.
type Tracer struct{
    ...
}

// NewTracer creates a new Tracer object that logs to fPath.
func NewTracer(fPath string) (*Tracer, error){
    ...
}

// FuncTimer creates a timer and immediately logs the function entrance
// time.  This should be the first line in a function call.
func (*Tracer) FuncTimer() *Timer {
    ...
}

// Close closes the Tracer and any open files.
func (*Tracer) Close() {
    ...
}

// Timer provides methods for logging timing information for function calls.
type Timer struct{
    ...
}

// Close stops the Timer and logs the function exit and duraction.
// This is normally used in a defer as the second line in a function call.
func (t *Timer) Close() {
    ...
}

The most obvious way of using this library would be:

func myFunc(ctx context.Context) {
    t := FromContext(ctx, "someKey").FuncTimer()
    if t != nil {
        defer t.Close()
    }
    ...
}

While this isn't too burdensome, I don't like the idea of having to test against nil in 40 different function calls or having the possibility that a call to the Timer could ever cause a panic.

In addition, if I was to extend Timer to have a method like:

func (t *Timer) Logf(s string, a ...interface{}) {
    ...
}

I would then have a lot more burden:

func myFunc(ctx context.Context) {
    t := FromContext(ctx, "someKey").FuncTimer()
    if t != nil {
        defer t.Close()
    }
    ...
    if t != nil {
        t.Logf("this is interesting")
    }
    ...
    if t != nil {
        t.Logf("this too")
    }
}

We can take the burden away from the user simply by providing no-op Timer
methods.

Instead of the previous Logf() and Close() options as written above, we can add the following to them to allow no-op calls:

func (t *Timer) Logf(s string, a ...interface{}) {
  if t == nil {  // Test to see if we are a *Timer that is nil.
      return
  }
  ...
}

func (t *Timer) Close() {
  if t == nil {
      return
  }
  ...
}

This allows us to shrink our code to:

func myFunc(ctx context.Context) {
    t := FromContext(ctx, "someKey").FuncTimer()
    defer t.Close()
    ...
    t.Logf("this is interesting")
    ...
    t.Logf("this too")
}

Regardless of the presence of "someKey" on the Context object, a *Timer will always be returned. In the case of a nil *Timer, no code will panic and the code flow will be easier to follow.

When using interfaces instead of concrete types, no-op objects can be implemented as well. Simply implement the interface with a type that has no-op methods and provide it when a value isn't available.

No-Ops can be powerful tools in your developer toolbelt. Don't forget to use them.

Unbounded Queue: A tale of premature optimization

John Doak — Sat, 30 Dec 2017 03:23:42 GMT

A simple start

When you work in this industry for a while, if your introspective, you can discover what type of programmer you are. You might be a programmer obsessed with performance or trying to find clever tricks.

I am neither of those. While I do like to find new techniques or simpler ways to do something, I generally prefer to steal from others who are much smarter than I. I dislike tracking down performance problems in systems unless it is necessary. I tend to be interested in delivering a product that functions with reasonable performance characteristics. I do not analyze assembly output in order to add custom assembly code to take advantage of a specific Intel instruction that does xyz faster (but thank God for a few people I've met who do). I rarely think that is necessary.

So, for reasons unknown to me, I have ended up going down a rabbit hole of exploring some Go concepts in order to make something faster and use less memory. Let me explain.

Recently I have had a need for an unbounded queue. Buffered channels are my go to for a standard queue. And they are nice when you are okay with either:

blocking when the buffer is full
dropping an update when the buffer is full

Neither of those situations was going to work in this case. In reality, the queue's use would have a bound, but I would not know what that bound was ahead of time. I could not drop an update, have an update out of order, or block for an update. However, I could deal with higher memory use while we built up to whatever the bound would be.

I wanted was something that:

FIFO if single reader
pushed new items fast
read items fast
grew when the buffer became full
shrank when buffer space was not required
was unbounded
provide methods that could pop if possible or return !ok
provide methods that blocked until it could pop
if an above method was waiting, it would not eat up the CPU
no goroutines
solved a generic use case, so []interface{}

Note on being Generic: Yeah, I hate []interface{}. Empty interface{} conversion is a time suck and I hate not being typed at compile. However, I'm anti-generics with the proposals I've seen, they are just ugly and I don't want to spend 6 weeks explaining them to newbies. I'm fairly convinced that with the brain power behind trying to come up with generics in Go that if there was a good answer it would have happened. They will either screw up compile time or make the code more difficult to understand. With that said, this code is probably ripe for using Genny or some other type of pre-compiled code generator to save you.

To test implementations, I eventually decided on an interface I wanted for testing:

type Buffer interface{
    // Push an item on to the buffer. If we have reached maximum
    // buffer size, return false.
    Push(item interface{}) (ok bool)
    // Force is like Push() except it will block until we can push onto the buffer.
    Force(item interface{}
    // Pop retrieves a value from the buffer. It returns false if the buffer
    // is empty.
    Pop() (val interface{}, ok bool)
    // Pull is like Pop() except it will block until it can retrieve a value.
    Pull() interface{}
}

This isn't the most Goish interface. Pull() for example could return a channel that keeps returning values until another method like Close() was called. But I wanted to avoid spinning off goroutines. My final version might include these.

But before I go any further, other than Attempt #1, everything I am doing here is wrong. This is a near-perfect example of over-optimizing. When writing an application you should concentrate on:

Application structure
Nice interfaces to data structures that can have the internals switched out
Good data structures

Performance optimization should come after testing, tracing, etc... of bottlenecks that are unacceptable in your application. Good data structures lead to more optimized algorithms if you need them. And if you abstract your interfaces cleanly, you can always optimize behind the scenes.

For reference, you can find the various code that I show or don't show here:

Different versions of the FIFO queue:
http://github.com/johnsiilver/golib/queue/fifo/unbounded/experimental/

The final version:
http://github.com/johnsiilver/golib/queue/fifo/unbounded/

Attempt 1: Not Being Clever

My first crack at this was to avoid being clever at all and use Go in the most simple way possible.

This involved:

using a slice to hold the queue
using a slice's sliding window over the array to dequeue entries
use a mutex when enqueuing or dequeuing to the slice
a channel to prevent providing sleeping routines when doing blocking calls

Here's what the first attempt looked like:

package unbounded

import "sync"

// Buffer provides a first in first out non-blocking (but not lock-free) Buffer.
type Buffer struct {
    mu     sync.Mutex
    nextCh chan interface{}
    data   []interface{}
}

// New is the contructor for Buffer.
func New() *Buffer {
    return &Buffer{nextCh: make(chan interface{}, 1)}
}

// Push adds an item to the Buffer.
func (q *Buffer) Push(item interface{}) {
    q.mu.Lock()
    q.data = append(q.data, item)
    q.shift()
    q.mu.Unlock()
}

// Force will push the item onto the buffer and blocks until the operation completes.
func (q *Buffer) Force(item interface{}) {
    q.Push(item)
}

// shift will move the next available entry from our queue into the channel to
// be returned to the user. Must be locked by a calling function.
func (q *Buffer) shift() {
    if len(q.data) > 0 {
        select {
        case q.nextCh <- q.data[0]:
            q.data = q.data[1:]
        default:
        }
    }
}

// Pop returns the next value in the buffer.  It returns ok == false if there
// is no value in the buffer.
func (q *Buffer) Pop() (val interface{}, ok bool) {
    select {
    case item := <-q.nextCh:
        q.mu.Lock()
        q.shift()
        q.mu.Unlock()
        return item, true
    default:
        return nil, false
    }
}

// Pull pulls an item off the circular buffer. It blocks until a value is found.
func (q *Buffer) Pull() interface{} {
    item := <-q.nextCh
    q.mu.Lock()
    q.shift()
    q.mu.Unlock()
    return item
}

This one worked well enough. But it got me thinking about other ways to do this. I have no idea why. I should have just stopped. I had a good data structure for what I needed. I could optimize later if I needed to. But for some reason, I just decided I wanted to waste a day, very unlike me. Sorry, time to dive first into the rabbit hole.

First, for whatever crazy reason, I dislike using Mutex. In recent times I've begun using atomic.Value and other atomic operations to protect constantly changing values. I've benchmarked these and found this to provide significant increases if you hammer the variable with changes. But that got me thinking about just doing atomic.CompareAndSwap_ for my Mutex needs.

Second, there is a lot of array copies happening in this version. Array copies are very fast¹, but I don't like doing quite so many (this irked me, but it has no bearing on the efficiency of the method). Wouldn't some form of circular buffer be faster? Of course, it would need to be a circular buffer that didn't have a set size.

¹Note: I saw a talk in the last year from someone showing that array copies were faster than heap allocations and pointer swaps by some crazy multiple. I remember his evidence being fairly good and for this reason, I concentrated on either using channels for the Buffer or slices. However, there are two old sayings from Google that apply here:

In God we trust, everyone else must bring the data
Trust but verify

I'm sure these originate somewhere else for anyone who is really nitpicky. But Google is the first place I heard this said.

Go wants you to be clever, but not too clever

So Go isn't the language for people who want to be clever with optimizations. C is the language for clever optimizations. No runtime to contend with, crazy pointer arithmetic and direct control.

Go really wants you to be clever with your data structures, APIs, etc... Make it easy for your users and maintainers. There is an entire philosophy that bears where after you swim in its waters for a while you do find it refreshing. At first, it just feels cold.

With that said, when you start looking at internals to low-level tasks, you will often find that the standard library has access to code your programs don't. So, for example, if you want to spinlock as well as the Mutex implementation, you can't. Mutex has access to runtime_doSpin(), which your program doesn't.

There are other memory barrier tricks that your code can't take advantage of unless you want to use Cgo, which for the purpose of this article is cheating. And I'm sure I can do a hack to gain access to these things, but that's not portable. One of Go's strength is that until you start using unsafe or syscall, your code really is portable (while C just dreams it is actually portable).

It is important to understand this, because if you try to write a Mutex that is more optimized for your use case, you are not going to be as efficient. You can attempt to do some of this with runtime.Gosched(), but you are rolling that rock uphill. Apparently, I like rolling rocks.

Trying to be clever

My first attempt at being clever was to use channels as a circular buffer. The circular buffer would be protected by a simple atomic.CompareAndSwapInt32(). If I couldn't insert a value into the channel, I would simply create a new channel with double the capacity, copy values from the old channel to the new channel, insert the new value, and swap the old channel for the new one. Pretty much what append() does for slices.

My reasoning was:

A lot of work has been done in channels to prevent the use of locks when the channel is not blocked
select has some very nice properties that can be used
atomic.CompareAndSwapInt32 is much faster than sync.Mutex

I knew that there was some risk:

Mutex makes certain guarantees that a caller will not be starved. CompareAndSwapInt32() by itself does not.
Needed to write my own sleeping code that tried to be a spinlock and eventually slept to some maximum value. Otherwise, when idle these locking loops would eat the CPUs alive.

The sleep code took a while to get to a point that seemed to work in a manner that didn't kill performance nor choke my CPU for too long.

package spin

import (
    "runtime"
    "time"
)

// Sleeper allows sleeping for an increasing time period up to 1 second
// after giving up a goroutine 2^16 times.
// This is not thread-safe and should be thrown away once the loop the calls it
// is able to perform its function.
type Sleeper struct {
    loop uint16

    at time.Duration
}

// Sleep at minimum allows another goroutine to be scheduled and after 2^16
// calls will begin to sleep from 1 nanosecond to 1 second, with each
// call raising the sleep time by a multiple of 10.
func (s *Sleeper) Sleep() {
    const maxSleep = 1 * time.Second

    if s.loop < 65535 {
        runtime.Gosched()
        s.loop++
        return
    }

    if s.at == 0 {
        s.at = 1 * time.Nanosecond
    }

    time.Sleep(s.at)

    if s.at < maxSleep {
        s.at = s.at * 10
        if s.at > maxSleep {
            s.at = maxSleep
        }
    }
}

The first custom version came out like:

import (
    "runtime"
    "sync/atomic"
    "time"

    "github.com/johnsiilver/golib/queue/fifo/internal/spin"
)

// Unbounded indicates that the Queue should have no memory bounds.
const Unbounded = -1

const (
    unlocked = int32(0)
    locked   = int32(1)
)

// Buffer provides a FIFO circular buffer that can grow and shrink as required.
// Buffer must not be copied after creation (which means use a pointer if
// passing between functions).
type Buffer struct {
    // Max is the maximum size the Buffer can grow to.  Use Unbounded if
    // you wish to grow the buffer to any size. By default, this will grow to 1k items.
    Max int

    lockInt    int32
    lastShrink time.Time
    data       chan interface{}
}

// Push pushes an item into the circular buffer. "ok" indicates if this happens.
func (c *Buffer) Push(item interface{}) (ok bool) {
    c.lock()
    defer c.unlock()

    select {
    case c.data <- item:
        return true
    default:
    }

    // The buffer was too small, grow the buffer and then insert.
    c.grow()
    select {
    case c.data <- item:
        return true
    default:
        return false
    }
}

// Force will push the item onto the buffer and blocks until the operation completes.
func (c *Buffer) Force(item interface{}) {
    sleeper := spin.Sleeper{}

    for {
        if c.Push(item) {
            return
        }
        sleeper.Sleep()
    }
}

// Pop returns the next value off the circular buffer. If the buffer is empty
// ok will be false.
func (c *Buffer) Pop() (value interface{}, ok bool) {
    c.lock()
    defer c.unlock()

    select {
    case v := <-c.data:
        c.shrink()
        return v, true
    default:
        return nil, false
    }
}

// Pull pulls an item off the circular buffer. It blocks until a value is found.
func (c *Buffer) Pull() interface{} {
    sleeper := spin.Sleeper{}

    for {
        v, ok := c.Pop()
        if !ok {
            sleeper.Sleep()
            continue
        }
        return v
    }
}

// grow will double the size of the internal buffer until we hit the max size
// if the buffer is currently full.
// Note: grow must be protected with .lock/.unlock.
func (c *Buffer) grow() {
    if c.Max == 0 {
        c.Max = 1000
    }

    if cap(c.data) == c.Max {
        return
    }

    if len(c.data) == cap(c.data) {
        if c.Max == Unbounded {
            ch := make(chan interface{}, cap(c.data)*2)
            c.copy(ch, c.data)
            c.data = ch
            return
        }
        if cap(c.data) < c.Max {
            size := cap(c.data) * 2
            if size == 0 {
                size = 8
            }
            if size > c.Max {
                size = c.Max
            }
            ch := make(chan interface{}, size)
            c.copy(ch, c.data)
            c.data = ch
            return
        }
    }
}

// shrink shrinks the size of the internal buffer if the length of the buffer
// is < 50% of the capacity.  The reduction will be by 25% but will produce
// a buffer size of no less than 8 slots.
// Note: shrink must be protected with .lock/.unlock.
func (c *Buffer) shrink() {
    if cap(c.data) == 8 {
        return
    }

    if time.Now().Sub(c.lastShrink) > 10*time.Minute {
        return
    }

    // If the current unused capacity is > 50% of the buffer, reduce it by 25%.
    if (cap(c.data) - len(c.data)) > (cap(c.data) / 2) {
        size := int(float64(cap(c.data)) * .75)
        if size < 8 {
            size = 8
        }

        ch := make(chan interface{}, size)
        c.copy(ch, c.data)
        c.data = ch
        c.lastShrink = time.Now()
    }
}

func (c *Buffer) copy(dst chan<- interface{}, src chan interface{}) {
    if (cap(dst) - len(dst)) < (cap(src) - len(src)) {
        panic("internal error: Buffer.copy() cannot be called when dst is smaller than src")
    }

    close(src)
    for v := range src {
        dst <- v
    }
}

func (c *Buffer) lock() {
    for {
        if atomic.CompareAndSwapInt32(&c.lockInt, unlocked, locked) {
            return
        }
        runtime.Gosched()
    }
}

func (c *Buffer) unlock() {
    for {
        if atomic.CompareAndSwapInt32(&c.lockInt, locked, unlocked) {
            return
        }
        runtime.Gosched()
    }
}

This one had some nice properties. It had a useful zero value, which I tried to use with other implementations instead of a constructor (which was not always possible).

It also only shrunk at intervals, reducing the need for array copies.

The Truth is in the Pudding

Now I have two versions, but how do each stack up against each other? Oh, the fun of benchmarking!

bench.go

package bench

import (
    "sync"
    "testing"
)

type unbounded interface {
    Force(item interface{})
    Pull() interface{}
}

func singleRun(bench *testing.B, n func() unbounded, items, senders, receivers int) {
    for i := 0; i < bench.N; i++ {
        bench.StopTimer()

        b := n()
        sendCh := make(chan int, items)
        wg := sync.WaitGroup{}
        wg.Add(items)

        // Setup senders.
        for i := 0; i < senders; i++ {
            go func() {
                for v := range sendCh {
                    b.Force(v)
                }
            }()
        }

        // Setup receivers.
        for i := 0; i < receivers; i++ {
            go func() {
                for {
                    b.Pull()
                    wg.Done()
                }
            }()
        }

        bench.StartTimer()

        // Send to Buffer (which the receivers will read from)
        go func() {
            for i := 0; i < items; i++ {
                sendCh <- i
            }
            close(sendCh)
        }()

        wg.Wait()
    }
}

bench_test.go

package bench

import (
    "fmt"
    "testing"

    "github.com/johnsiilver/golib/queue/fifo/experimental/unbounded/custom_locks"
    "github.com/johnsiilver/golib/queue/fifo/experimental/unbounded/custom_sleep_atomic"
    "github.com/johnsiilver/golib/queue/fifo/experimental/unbounded/sliding_slice_atomic"
    "github.com/johnsiilver/golib/queue/fifo/experimental/unbounded/sliding_slice_locks"
)

func BenchmarkUnboundedQueues(b *testing.B) {
    runs := []struct{ items, senders, receivers int }{
        {100000, 1, 1},
        {100000, 10, 1},
        {100000, 1, 10},
        {100000, 10, 10},
        {100000, 100, 10},
        {100000, 10, 100},
    }

    benchmarks := []struct {
        name    string
        newFunc func() unbounded
    }{
        {
            "sliding_slice_atomic",
            func() unbounded { return sliding_slice_atomic.New() },
        },
        {
            "sliding_slice_locks",
            func() unbounded { return sliding_slice_locks.New() },
        },
        {
            "custom_sleep_atomic",
            func() unbounded { return &custom_sleep_atomic.Buffer{} },
        },
        {
            "custom_locks",
            func() unbounded { return &custom_locks.Buffer{} },
        },
    }

    for _, run := range runs {
        for _, benchmark := range benchmarks {
            b.Run(
                benchmark.name+fmt.Sprintf("-items: %d, senders: %d, receivers: %d", run.items, run.senders, run.receivers),
                func(b *testing.B) {
                    singleRun(b, benchmark.newFunc, run.items, run.senders, run.receivers)
                },
            )
        }
    }
}

This benchmark was setup to test a few scenarios:

Single sender and receiver
10 senders and 1 receiver
1 sender and 10 receivers
10 senders and 10 receivers
100 senders and 10 receivers
10 senders and 100 receivers

This gives a certain static scaling of the number of senders and receivers to see how each scales for different scenarios. I want a good generic solution that handles each of these reasonably.

In this initial test, I also included versions of both that use atomic.CompareAndSwapInt32() and sync.Mutex. Here are the initial results:

Note: Bold is the fastest version.

Version	#Items	#Senders	#Receivers	Iterations	ns/op
sliding_slice_atomic	100000	1	1	30	`50688582`
sliding_slice_locks	100000	1	1	30	`41097381`
custom_sleep_atomic	100000	1	1	20	`60742172`
custom_locks	100000	1	1	30	`61267323`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
sliding_slice_atomic	100000	10	1	30	`46452147`
sliding_slice_locks	100000	10	1	50	`37041679`
custom_sleep_atomic	100000	10	1	10	`106059804`
custom_locks	100000	10	1	20	`116124583`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
sliding_slice_atomic	100000	100	1	30	`59664733`
sliding_slice_locks	100000	100	1	30	`39991409`
custom_sleep_atomic	100000	100	1	1	`1048544140`
custom_locks	100000	100	1	2	`699103394`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
sliding_slice_atomic	100000	10	10	30	`50459215`
sliding_slice_locks	100000	10	10	30	`42415787`
custom_sleep_atomic	100000	10	10	20	`76158348`
custom_locks	100000	10	10	20	`74131977`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
sliding_slice_atomic	100000	10	100	20	`82940141`
sliding_slice_locks	100000	10	100	30	`45741994`
custom_sleep_atomic	100000	10	100	5	`221296458`
custom_locks	100000	10	100	20	`97246758`

Well....huh...

First take away:

atomic and custom sleeping is not beating standard Mutex
The array copies are most likely beating the custom channel copies
Using select and channel, while providing some advantages are probably hurting us with a lot of extra locking and other magic

There was some profiling and such, but I decided to go on and test a bunch of other versions.

If At First You Don't Succeed, Fail Some More??

So it would be lying to say I'm covering all the steps I took here. It was a winding path of outright failure and tuning. There were other tests I wrote to prove that the CPU would lower after being stuck either pushing or pulling and other things I'm not going to cover. The rabbit hole seems to keep going.

In the end, I decided to benchmark against a few more methods:

Versions of each that had no sleep. These would eat CPU, but I wanted to see if it made a difference with using atomic for locking
I wanted to run against a reference type that used a channel with a buffer of 100 and was not unbounded. Resizing to support unbounded is expensive, but how expensive?
What if I used a queue that was more CS standard, records with pointers pointing to the next entry in the queue (I called this the heap version). According to the talk I saw, this should be much slower, but how much slower?

Well, let's have a look:

Note: Bold is for the fastest, discounting the reference.
Note: The reference is in blue when it was the fastest.

Version	#Items	#Senders	#Receivers	Iterations	ns/op
reference	100000	1	1	100	`20660880`
sliding_slice_atomic	100000	1	1	30	`49620908`
sliding_slice_locks	100000	1	1	30	`38928647`
custom_sleep_atomic	100000	1	1	30	`60764994`
custom_nosleep	100000	1	1	20	`61262504`
custom_locks	100000	1	1	20	`63590469`
heap_lock	100000	1	1	50	`27832756`
heap_atomic	100000	1	1	100	`25798467`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
reference	100000	10	1	50	`31719731`
sliding_slice_atomic	100000	10	1	20	`99060012`
sliding_slice_locks	100000	10	1	30	`78695899`
custom_sleep_atomic	100000	10	1	10	`119628479`
custom_nosleep	100000	10	1	10	`105837735`
custom_locks	100000	10	1	10	`111282187`
heap_lock	100000	10	1	50	`28885042`
heap_atomic	100000	10	1	50	`28278157`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
reference	100000	100	1	50	`29424432`
sliding_slice_atomic	100000	100	1	30	`45287610`
sliding_slice_locks	100000	100	1	30	`49388002`
custom_sleep_atomic	100000	100	1	2	`1014270608`
custom_nosleep	100000	100	1	1	`1018922777`
custom_locks	100000	100	1	3	`478910481`
heap_lock	100000	100	1	100	`26944540`
heap_atomic	100000	100	1	50	`28301942`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
reference	100000	10	10	50	`29539230`
sliding_slice_atomic	100000	10	10	30	`47400717`
sliding_slice_locks	100000	10	10	30	`44891524`
custom_sleep_atomic	100000	10	10	20	`73116554`
custom_nosleep	100000	10	10	20	`74409189`
custom_locks	100000	10	10	20	`70609204`
heap_lock	100000	10	10	50	`23445934`
heap_atomic	100000	10	10	50	`26386795`

Version	#Items	#Senders	#Receivers	Iterations	ns/op
reference	100000	10	100	50	`24754023`
sliding_slice_atomic	100000	10	100	20	`58304944`
sliding_slice_locks	100000	10	100	50	`46103498`
custom_sleep_atomic	100000	10	100	5	`203424993`
custom_nosleep	100000	10	100	5	`211169360`
custom_locks	100000	10	100	20	`89729445`
heap_lock	100000	10	100	50	`25450734`
heap_atomic	100000	10	100	100	`34094147`

Well....huh...., again!

It is not surprising the reference is faster in several versions. But what surprised me was the standard pointer based queue. The heap_lock or heap_atomic took this test by storm.

heap_atomic and heap_lock being so close in most benchmarks leads me to believe that the custom.* versions are slower because there is a bunch of lock contention with the combination of channels/locks(or atomic)/select. I could go work on that, but I don't think that is worth my time.

Now, I'm not sure why the speaker's assessment I listened to was wrong. I tend to think that this kind of thing is my fault. It could be:

I misheard what he was saying
I'm optimizing in some way
The compiler has changed
The benchmark is broken
He was just dead wrong

No matter which one it is, my heap_lock optimization is where my effort is going to concentrate on.

But why not heap_atomic? While it did win in some benchmarks, simpler is better. There isn't enough difference to make me want to use atomic over Mutex and Mutex makes guarantees on goroutine starvation that atomic will not.

What If I Made the Heap Version Circular?

The heap version uses a simple queue. It is memory efficient and apparently fast. But I'm still haunted by allocating extra memory when I don't need to.

What if instead of throwing away entries when dequeuing I just move a pointer in a circular buffer? That entry could be reused without the need for allocating a new entry. And if I run out of room I can always insert more entries at the back.

This will add some overhead in checking if things like the front and tail pointers are the same to know if we need to do an allocation.

Here's a first attempt at it with 100k items:

Version	#Items	#Senders	#Receivers	Iterations	ns/op
heap_lock	100000	1	1	50	`26110856`
heap_circular	100000	1	1	50	`26823725`
heap_lock	100000	10	1	50	`29731315`
heap_circular	100000	10	1	30	`34592901`
heap_lock	100000	100	1	50	`30892013`
heap_circular	100000	100	1	30	`40746096`
heap_lock	100000	10	10	50	`27873525`
heap_circular	100000	10	10	50	`35315318`
heap_lock	100000	10	100	20	`57328038`
heap_circular	100000	10	100	30	`60764377`

Well, it looks like the queue is still winning. Not by a huge amount, but enough.

I wonder if we see better results with 1m items:

Version	#Items	#Senders	#Receivers	Iterations	ns/op
heap_lock	1000000	1	1	5	`275110923`
heap_circular	1000000	1	1	5	`251509839`
heap_lock	1000000	10	1	5	`316068383`
heap_circular	1000000	10	1	3	`367007666`
heap_lock	1000000	100	1	3	`489702046`
heap_circular	1000000	100	1	3	`397814725`
heap_lock	1000000	10	10	3	`418825312`
heap_circular	1000000	10	10	3	`404838549`
heap_lock	1000000	10	100	2	`579100509`
heap_circular	1000000	10	100	2	`630555319`

How about 10m items:

Version	#Items	#Senders	#Receivers	Iterations	ns/op
heap_lock	10000000	1	1	1	`2719649748`
heap_circular	10000000	1	1	1	`2896301313`
heap_lock	10000000	10	1	1	`3239424103`
heap_circular	10000000	10	1	1	`3812422117`
heap_lock	10000000	100	1	1	`3314603774`
heap_circular	10000000	100	1	1	`4393173577`
heap_lock	10000000	10	10	1	`4271595984`
heap_circular	10000000	10	10	1	`4358516405`
heap_lock	10000000	10	100	1	`6051491583`
heap_circular	10000000	10	100	1	`5518405633`

Now my growing circular buffer is probably overly complicated, which leads to some time suck. And it is close in time to heap_lock for the common cases (1:1/10:10/10:100 senders to receivers).

But looking at the trace tool, the allocations are just minor time sinks, gc rarely kicks in because of this on the heap_lock. That makes heap_lock the overall winner in my view. The code is easy to follow and fast.

I think heap_circular has merit, but with a slightly different interface and only if the data being stored is concrete not interface{}. If the entries recorded large data, such as []byte{}, we could use it as a combination FIFO queue and free-list. You could have a method to return an empty value in which it would pull a dequeued entry or if full would give a new one. That could then be used to enqueue without a large allocation. But that is a very specific use case, which I am not going to test. And of course, there is the very good argument of just having a free list and this queue separate.

I am absolutely sure that many of these can be made faster with some trivial work (but I've wasted enough time with this thought experiment). And there were all kinds of things I didn't try (using []unsafe.Pointer and CompareAndSwapPointer() or a slice of slices or ...).

But now I am absolutely sure that:

I should have used the one in Attempt #1
I am not a computer scientist

If you read this far, I'm sorry to have dragged you down the rabbit hole with me. Misery occasionally loves company.

Abstraction of Storage APIs

John Doak — Tue, 05 Sep 2017 12:55:04 GMT

Introduction

One of the common problems in software development is dealing with storage. When writing micro-services you often are required to store application data in some type of long term storage (filesystem, NFS, SQL, Cloud storage by X vendor, ...).

Instead of choosing a storage system and writing code to that storage system's API, it is often better to extract your use cases into a storage API and write concrete implementations that implement the API.

This provides multiple storage solution choices, simple migration strategies, multi-vendor support (in cloud or database vendors), ...

This is different than abstraction layers that your programming language may provide for a type of storage (Go for example has io.ReadWriter, io.Reader, io.Writer abstractions for filesystems and package sql for SQL storage systems). Our abstraction is for application specific data which may use these lower level abstractions when implemented.

NOTE: Like most articles I write, this article will use the Go language for all examples, but the methodology is valid for other languages as well.

A simplistic storage API

For this article we are going to implement a storage API for an application that reads and writes employee records. We will not detail the service itself, just the storage layer.

Below is our storage interface packaged as storage.Employee.
The Employee interface can be implemented by many storage systems.

package storage

// NotFoundError indicates that a record could not be located.
// This differentiates between not finding a record and the
// storage layer having an error.
type NotFoundError struct{
    error
}

func (n NotFoundError) isNotFound(){}

// NotFound indicates if the error is that the ID could
// not be found.
func NotFound(e error) bool {
	if _, ok := e.(NotFoundError); ok {
		return true
	}
	return false
}

// EmployeeRec represents an employee record.
type EmployeeRec struct {
	// ID is the employee ID.
	ID uint64
	// First and Last are the first and last name of the employee
	First, Last string
	// Title is the employee's title.
	Title string
	// Dept is the employee's department number.
	Dept uint8
}

// Validate validates the fields are valid.
func (e *EmployeeRec) Validate() error {
	if e.ID == 0 {
		return errors.New("ID field cannot be 0")
	}
	
	switch "" {
	case strings.TrimSpace(e.First):
		return errors.New("First field cannot be empty string")
	case strings.TrimSpace(e.Last):
		return errors.New("Last field cannot be empty string")
	case strings.TrimSpace(e.Title):
		return errors.New("Title field cannot be empty string")
	}
	
	if e.Dept == 0 {
		return errors.New("Dept field cannot be 0")
	}
	return nil
}

// EmployeeSearch returns a single result of a search of 
// employee records.
type EmployeeSearch struct {
	// Rec exists if a valid response was returned.
	Rec *EmployeeRec
	// Err exists if the storage system had an error mid search.
	Err error
}

// Employee allows access to the system storing employee records.	
type Employee interface {
	// Get retrieves an employee record by their employee ID.
	Get(id uint64) (*EmployeeRec, error)
	// Put stores a record.
	Put(r *EmployeeRec) error
	// Search searches for a record matching on all fields
	// that do not have the zero value for that field type.
	Search(r EmployeeRec) (chan *EmployeeSearch, error)
}

Development storage is not production storage

By abstracting storage into an API we now have the ability to use multiple storage implementations. For development purposes, the first implementation I write is an "in-memory" implementation.

The "in-memory" representation allows tests to use a storage implementation that provides automatic cleanup at the end of any tests. Other systems doing integration tests can spin up the service and not worry about system cleanup or storage setup.

Finally this method prevents a user running the system locally from the requirement to spin up the storage mechanisms, which might require access permissions or creation of databases and tables.

package inmemory

import ".../storage"

// Employee implements storage.Employee.
type Employee struct {
	store map[uint64]*storage.EmployeeRec
}

// New is the constructor for Employee.
func New() storage.Employee {
	return &Employee{store: map[uint64]*storage.EmployeeRec{}}
}

// Get implements storage.Employee.Get().
func (e *Employee) Get(id uint64) (*storage.EmployeeRec, error) {
	v, ok := e.store[id]
	if !ok {
		return nil, storage.NotFoundError{fmt.Errorf("could not find id %d", id)}
	}
	return v, nil
}

// Put implements storage.Employee.Put().
func (e *Employee) Put(r *storage.EmployeeRec) error {
	if err := r.Validate(); err != nil {
		return fmt.Errorf("cannot store record: %s", err)
	}
	e.store[r.ID] = r
	return nil
}

// Search implements storage.Employee.Search().
func (e *Employee) Search(s storage.EmployeeRec) (chan storage.EmployeeSearch, error) {
	ch := make(chan storage.EmploySearch, 10)
	go func() {
		defer close(ch)
		for _, v := range e.store {
			if s.ID != 0 {
				if s.ID != v.ID {
					continue
				}
			}
			if s.Last != "" {
				if s.Last != v.Last {
					continue
				}
			}
			if s.First != "" {
				if s.First != v.First {
					continue
				}
			}
			if s.Title != "" {
				if s.Title != v.Title{
					continue
				}
			}
			if s.Dept != 0 {
				if s.Dept != v.Dept{
					continue
				}
			}
			ch <- storage.EmployeeSearch{Rec: v}
		}
	}()
	return ch, nil
}

The above implements an "in-memory" storage implementation of storage.Employee. This implementation is not highly optimized, using O^n search for example, which is fine when n is small, such as the tests where this will be used.

Choosing storage based on flags

When starting our application it is easy to choose what type of storage. For example, say we have our "in-memory" implementation and a MySql implementation:

package main

import (
	"flag"
	"os"

	".../server"
	".../storage"
	".../storage/inmemory"
	".../storage/mysql"
)

var (
	inmemoryStore = flag.Bool("inmemory", false, "Use the in-memory storage implementation, useful for tests and experimentation.")
	mysqlStore = flag.Bool("mysql", false, "Use a MySQL storage layer.  Must set certain env variables.")
)

func main() {
	var store storage.Employee

	switch true {
	case *inmemoryStore:
		store = inmemory.New()
	case *mysqlStore:
		var err error
		u := os.Getenv("mysqlUsr")
		p := os.Getenv("mysqlPass")
		a := os.Getenv("mysqlAddr")
		store, err = mysql.New(a, u, p)
		if err != nil {
			panic(err)
		}
	default:
		panic("must set either --inmemory or --mysql")
	}
	
	s, err := server.New(store)
	if err != nil {
		panic(err)
	}

	// Blocks forever unless system error.
	if err := s.Run(); err != nil {
		panic(err)
	}
}

Our application can simply choose what storage system to use based on a passed flag. Adding additional storage layers is also as simple as adding new case statements.

Add new storage with ease

Various scenarios can occur that require changing your storage system. This could be:

A new storage system with less support costs becomes available
Your storage system is being phased out for a new storage system
The storage system no longer meet your needs
Rising storage costs from a vendor
Switching cloud vendors or utilizing multiple cloud vendors

By implementing storage behind an API, you only need to write the new implementation. Once the implementation is completed, you can:

Create simple migration tools between any two storage systems
Create a unified benchmark suite to test performance of each implementation

A simple migration tool might look like:

package main

import (
	".../storage"
	".../storage/mysql"
	".../storage/postres"
)

func main() {
	// Grab the address, user, and password to the mysql storage
	// from an environmental variable.
	fu := os.Getenv("mysqlUsr")
	fp := os.Getenv("mysqlPass")
	fAddr := os.Getenv("mysqlAddr")
	
	// Grab the address, user, and password to the postgres storage
	// from an environmental variable.
	tu := os.Getenv("postgresUser")
	tp := os.Getenv("postgresPass")
	tAddr := os.Getenv("postgresAddr")

	// Let's copy from a mysql version of the storage.
	from, err := mysql.New(fAddr, fu, fp)
	if err != nil {
		panic(err)
	}
	
	// Let's copy to a postgres version of the storage.
	to, err := postgres.New(tAddr, tu, tp)
	if err != nil {
		panic(err)
	}
	
	// Search for all records.
	ch, err := from.Search(storage.Record{})
	if err != nil {
		panic(err)
	}
	
	// Write all records.
	for sr := range ch {
		if sr.Err != nil {
			panic(err)
		}
		if err := to.Put(sr.Rec); err != nil {
			panic(err)
		}
	}
}

NOTE: Not highly optimized and does not include any retries in case of errors.

Only write the tests once

Testing storage systems is a complicated subject because of the requirements needed to test an implementation.

Do you have a real integration test or mock implementations of the storage system?
How does test turnup/turndown work?
...

While you still have to figure out how that process works for any storage systems, the tests only must be written once for any implementation.

You write tests for the Storage API calls once and simply add storage implementations to the test suite. This greatly simplifies your testing if your application supports multiple storage mechanisms. No tests for MySQL storage and CloudSQL, just a single unified test against the API.

...

var stores = map[string]storage.Employee{}

func init() {
	stores["in-memory"] = inmemory.New()
}

func TestGet(t *testing.T) {
	rec := &EmployeeRec{
		First: "John",
		Last: "Doe",
		ID: 1,
		Title: "unknown",
		Dept: 1,
	}

	tests := []struct{
		desc string
		id uint64
		want *EmployeeRec
		err bool
		notFound bool
	}{
		{
			desc: "BadID",
			id: 0,
			err: true, 
		},
		{
			desc: "Not found",
			id: 3,
			err: true,
			notFound: true,
		},
		{
			desc: "Success",
			id: 1,
			want: rec,
		},
	}

	for k, store := range stores {
		if err := store.Put(rec); err != nil {
			t.Errorf("TestGet(%s): %s", k, err)
			continue
		}
		for _, tc := range tests {
			r, err := store.Get(tc.id)
			switch {
			case tc.err && err == nil:
				t.Errorf("TestGet(%s)(%s): got err == nil, want err != nil", tc.desc, k)
				continue
			case !tc.err && err != nil:
				t.Errorf("TestGet(%s)(%s): got err == %s, want err == nil", tc.desc, k, err)
				continue 
			case tc.err && tc.notFound:
				if _, ok := err.(storage.NotFoundError); !ok {
					t.Errorf("TestGet(%s)(%s): got error, but it was not of type NotFoundError", tc.desc, k)
                }
                continue
			case tc.err:
				continue
			}
		}
		if diff := pretty.Compare(tc.want, r); diff != "" {
			t.Errorf("TestGet(%s)(%s): -want/+got:\n%s", tc.desc, k, diff)
		}
	}
}

The above example will test the Get() method of a storage implementation. When adding a new storage implementation that requires a test, you simply need to add a new storage.Employee to the store variable in the init() function.

Summary

Wrapping your storage layer in abstraction allows:

Adding new storage solutions quickly
Development and integration tests can use storage more suited to those needs
Write tests and benchmarks once to support multiple storage solutions
Reusable code that provides migration support

Protocol buffers: Avoid these uses

John Doak — Wed, 16 Aug 2017 21:07:27 GMT

Note: This is not an article saying protocol buffers are bad. Protocol buffers are a great serialization technology. Compact, easy to understand and readable/writable in several different languages. They easily beat JSON or other schema-less technologies when you know the structure you want to send/receive. But they aren't appropriate for every use case.

Note: This article can be applied to multiple languages, but the code examples and talking points will be Go centric.

Protos make bad complex objects

The basic problem

The problem with protocol buffers is that they transform into language native data objects but not optimal structures for use in a language.

If your software simply changes a protocol buffer or records a protocol buffer, then using it directly in your code should not pose a problem.

But for more complex use cases, they are generally not a great fit.

A simple example

Protocol buffers support very basic types. But this certainly doesn't cover all the types provided by a language. A simple example is representing time.

Go has a great time library. Protos don't have such a representation, though they do provide a helper for recording timestamps. The most common way to support timestamps is via an int64 based on epoch.

Take a simple proto:

syntax = "proto3";

message Time {
    int64 submit = 1;
    int64 access = 2;
}

This turns into Go code like:


type Time struct{
    Submit int64
    Access int64
    
    // Some hidden and \_XXX fields.
    ...
}
// Methods here for accessing these fields
...

You have lost the simplicity that the time library gives you. If you plumb the proto though your code, you always pull the value into a native type to use it, then push it back into the proto. This gives code such as:

Now := time.Now()

// Don't accept a submission over 30 minutes.
if now.Sub(time.Unix(proto.Submit, 0)) > 30 * time.Minute {
    return fmt.Errorf("this request has expired")
}
// Record last access time.
proto.Access = time.Now().Unix()

Certainly not impossible to use, but less readable and this is a simplistic example with a type that has a built-in conversion method.

What is important to grasp here is that every time I want to call a method on this type, I have to do a time conversion to gain access to these methods.

Losing logical objects

While protos represent objects, they represent public data objects only. You lose access to private types, types with complex semantics (service RPC connections, file IO, ...), unsupported native types (like Go channels) and methods.

What you have is a native object, but one that is designed to allow data transfer between languages. It is not optimized for use in your language.

This causes your language to lose access to common design patterns.

One of my favorite patterns is a validation pattern I use on nested data objects. The basic idea is to call a Validate() method on the top object which calls Validate() methods on sub-objects which do the same for lower tier objects.

This pattern is very effective and keeps validation code bonded to the type they represent.

type Record struct {
	Basic      Basic
	Employment Employment
}

func (r Record) Validate() error {
    // This uses reflection to run Validate() on all fields.
    // It is not detailed here for brevity.
	if err := validateStruct(r); err != nil {
		return err
	}
	return nil
}

type Basic struct {
	First string
	Last  string
}

func (b Basic) Validate() error {
	if b.First == "" {
		return fmt.Errorf("Basic.First must not be an empty string")
	}
	if b.Last == "" {
		return fmt.Errorf("Basic.Last must not be an empty string")
	}
	return nil
}

type Employment struct {
	ID         uint
	Department string
}

func (e Employment) Validate() error {
	if e.ID == 0 {
		return fmt.Errorf("Employment.ID must be > 0")
	}
	if e.Department == "" {
		return fmt.Errorf("Employment.Department must be set")
	}
	return nil
}

Note: Full code here

This simple code allows validation against Record in a single call.

You can validate your proto, but not using a similar method. You would need to do this as a function or series of functions with the proto as a passed argument. You cannot use reflection to dig into the object hierarchy and call Validate(), because that method doesn't exist. Finally, you will need explicit mappings to all sub-messages.

What if you want to just add a GRPC connection object? You can't, because you can't add a new field. In some dynamic languages like Python, this is possible. But it is bad form to dynamically create new attributes on objects during runtime. It hides from developers what fields are available without deep code introspection.

Synchronization becomes another major hurtle. You can use a generic mutex to encompass the entire proto, but you cannot embed individual field mutexes.
You lose the ability to utilize faster synchronization types used in the atomic library when its appropriate (or your language's equivalent). Channels, queue.Queue objects, ... are not available for use.

What gets lost as you begin to solve these problems is that your architecture is being dictated by the limitations of the protocol buffer's implementation. You can do these things, but not in an optimal way.

You are asking protocol buffers to do more than what it was designed to do.

Solutions: wrappers, injection or native type conversion

Wrappers

Wrappers, depending on the language, can be a solution. Think of this as the lazy developers solution: You want to get the benefits of your language without doing much work.

Adding a wrapper allows direct use of the proto while gaining access to new fields and methods. Simply embed the proto message within a native type.

type Record struct {
    proto.Record
}

func (r Record) Validate() error {
    ...
}

type Basic struct {
	proto.Basic
}

func (b Basic) Validate() error {
	if b.First == "" {
		return fmt.Errorf("Basic.First must not be an empty string")
	}
	if b.Last == "" {
		return fmt.Errorf("Basic.Last must not be an empty string")
	}
	return nil
}
...

By doing composition we added methods around these fields.

But this method has a few negatives:

You lose access to native types like time.Time or time.Duration, because your timestamp is still an int64. You can fix this, but its not pretty.
Reflection methods that were used in Record.Validate() are much harder to write. You have to extract the sub-messages such as Basic and Employment into their own compositions when reflecting through the protos.
This adds more methods and fields than it appears. Protos contain public fields like _XXX, getters, ... that aren't as compact as your native code. This may or may not matter to your application.
You lose access to most object diffing packages. The proto libraries supply protocol buffer comparison functions, but not with native wrappers. You would need to write a custom compare function.

Injection

Label this as controversial.

In many proto languages it is possible to inject methods into your protocol buffer.

In Go you would add a file that has the same package name as the proto within the same directory where you generate your native proto file. Using this method, you can add new methods around your types.

This works fairly well if you don't need to add new fields to the structs.

However, I don't recommend this. It is doubtful this is compatible with build systems like Bazel that can automatically compile your proto and the methodology is brittle.

This also doesn't provide the ability to provide fields based on native types. If you ever want to add native types, you are out of luck. Future proofing is something to strive for.

Native Types

With this method protocol buffers are utilized for what they are meant for: data serialization.

This method would require conversion to/from native objects. This allows customization of an object with native type representation, complex objects and the addition of methods. It also allows the ability to hide from fields that are needed only between client/server communication.

This is the most work intensive method. It requires making very similar native representations of data fields and conversion methods to and from these types (int64 to time.Time or time.Duration). Enumerators need conversion, etc...

It would be great if someone provided a library that wrote the skeleton for this, but I haven't seen one (maybe a weekend project in my future).

This gives the ultimate in flexibility to use the language without restrictions put on protos by cross-language support.

It avoids all the weird naming conventions, extra fields and methods that are not required. In large projects, this can make a difference in how your program is structured.

And it provides separation between RPC calls/storage formats and the software's internal representations. This allows for variations between representations where it makes sense.

Protos make a bad configuration language

I'm only going to briefly touch on this and follow up with another article.

Protocol buffers do not make a good configuration language. This is another place where the versatility of protocol buffers gets us in trouble.

The string representation lacks:

good documentation, string representation is really a debug tool
support for serializing the string representation with new fields into an older version of the protocol buffer
multi-line strings in a human readable format

This representation is simply for debug use only. Yaml and Toml are both superior formats meant for human consumption.

But enough of this for now.

What to use protos for

Protos are great serialization objects. When you need to communicate with another service or store data onto disk there are few serializers than can match both its language support and efficiency.

But utilizing protos within your software's logic can lead to architectural decisions that are not optimal on your language platform.

Like any article, this one certainly has its opinion. This does not mean it is the right methodology for every language or every use case.

When thinking of plumbing protocol buffers throughout your software, just give a thought on how you would structure your software if you were using native language concepts and how future versions might change your needs.

This might save you many time and code complexity working around proto limitations.

Why I moved from Python to Go(Part II)

John Doak — Fri, 11 Aug 2017 00:17:00 GMT

Note: This is a continuation of this article

Type safety and the compiler to the rescue

One of the first things we noticed using Go was that once a program compiled, it tended to work.

This was not the experience that we had in Python. Small programs usually took a lot of runs before we got the bugs out. But Go usually just worked.

Now of course, larger programs are different, Go isn't magic. And Go does have a few runtime problems, namely nil pointer dereferences, non-initialized channels and assigning to a nil map. Python seems to have an infinite number of these because of when evaluation is done and its dynamic nature.

An early project where a service was re-written from Python to Go and run side by side showed us immediate advantages. Much lower memory use, faster response times, and significantly higher loads. And remember, still pre-Go 1.0 (again, we aren't using multi-processing in Python).

It also just worked out of the gate, something we didn't see from our Python code. I chalk this up simply to type safety.

Goroutines

The idea that threading could be simple had been completely lost on me. After years of deciding on which pain pill I wanted to take, inheriting from threading.Thread or running the thread via some archaic method? Do I need thread local data, is it going to be a daemon thread, ... I got this instead:

go func() ... done

Want a promise?

ch := make(chan bool, 1)

go func() {

  ...

  ch <- true

}()
...
...
promiseKept := <-ch

This was certainly easier than what Python provided in terms of threading/mulit-processing. It was first class within the language, not as a standard library and that made all the difference.

And even better was how cheap a goroutine was when compared to a thread or a fork call (even with all of linux's optimization, if your on linux).

Objects without inheritance, mostly...

What I learned about objects over the years is that people get way too excited about them after they learn how they work. I'd look through code that inherited from three classes, each which inherited from two other classes, that inherited from ...

When your trying to dissect someone's code, this kind of thing can drive you mad. And if I was running into this at Google, I've got to imagine it isn't much better in other places.

Go does composition, which has similar traits to inheritance. I've noticed that composition tends to be used sparingly and in ways that are easy to understand. I think that might be because it is not presented as something that the language design is based around, so it might not get as abused, but I don't really know.

This isn't a language "feature", but whatever it is I appreciated it. People often argue that when someone is using X feature wrong that it is not a problem with the language. I think that if you are seeing a feature used incorrectly a lot, it is a fault of the language.

Interface contracts, not object method overloads

Interfaces as contracts just made so much sense, or at least once I understood them. They were the most mysterious part of the language for me at first.

In comparison to how I would see it done in Python, this was better thought out. In Python, what I saw most often was creating a base class that had methods that subclasses would overload. But there were no guarantees here, just crashes when something wasn't implemented.

The compile time constraints around interface contracts prevented a lot of bugs and made code more readable.

No mocks, just interfaces for tests

I didn't do a lot of programming in Python outside the Google sphere, but one of the main methods of testing used internally was mock libraries.

I really hated every mock library we had. I found the tests hard to read and very brittle. And to get our code to work, we were always trying to get 90% coverage (by our definition of coverage, lots of coverage definitions) in order to stave off runtime bugs.

This often made code changes break lots of tests. Engineers were spending more time trying to fix the tests than working on the code.

Go didn't have a mock library when I started using it, though one did come out within a couple of years. You simply used private interfaces for object attributes, which allowed you to switch out for a fake at any time.

Fakes were generally easy to write and self-explanatory. Along with the table driven test method, it made reading tests much easier.

Because tests weren't as brittle, we would spend far less time debugging test breakage, except when we had actually broken something.

To be fair, you can certainly do fakes in Python. And you can do table driven tests with named tuples (I started doing this). But this wasn't the culture, which is sometimes as important as the language. And there was nothing that was going to make the amount of testing lessen, type safety just isn't there to pick up the slack.

Reflection

Python makes reflection very easy. Objects are really dictionaries and its easy to do runtime reflection any object.

Go has built-in runtime reflection, but it's not the easiest thing to learn. I remember thinking that I'd come across an alchemist cookbook for making gold, with the book cover in English and the contents in Greek.

Much of that difficultly, I believe, is because of the nature of a compiled language that doesn't want all code bogged down by the runtime.

However it was a compiled language with reflection and introspection. That was pretty awesome.

But if there was something Python did better than Go, reflection was it, hands down.

Search and replace

Have you ever wanted to replace a variable name across all files in a directory? Or worse, want to change an object name across all files in a repository?

In Python, there was just no end to the bugs. We'd spend forever tracking down all the issues.

Go provides out of the box tooling for doing this work. And it just worked.

Better yet, Go provided the AST library for building your own tools. The negative was, there was no good documentation. Today the documentation situation is slightly better with some blogs giving examples, but the godoc is severely lacking. But at least there is a way to do this.

Does not support runtime attributes

Python has this annoying or powerful feature, depending on what viewpoint you are coming from: adding runtime attributes to object instances.

The problem with adding runtime attributes is that spelling errors are now problematic. If I assign a value to "object.supercalifragilisicexpialidocious" instead of "object.supercalifragilisticexpialidocious", Python just creates a new attribute and assigns the value.

My nice web application isn't displaying what it should, but I'm not sure where the problem lies in my code. Worse, I'm dependent on my eyes seeing this misspelling.

In Go, you cannot add an attribute at runtime to a struct, you will always get a compiler error:
b.hello undefined (type blah has no field or method hello)

Do it our way

One of the things that Go is known for is being very opinionated. This sometimes really upsets people. They've done things with methodology X and they don't like being told that Go isn't going to work that way.

I think all humans like the illusion of choice. We want a myriad of options. But we are often confused when there are a lot choices.

When I'm programming, I don't want to be the guy at the restaurant who is looking at the menu for an hour trying to decide what to eat. I want to go to the restaurant that has only one thing on the menu.

Go seems the right fit for me here, where I think Perl was the opposite extreme. I could never read another engineer's Perl code, because there were so many options to choose from, I was always seeing new calls I had never seen before. The tedium of looking up every other call bogs you down.

Go was easy to read because there was a limited way of doing things. And each method felt like the Go authors took a lot of time thinking and experimenting on what was the best way to do something. It felt well curated.

At first the "do it my way" irked me a little, but after a while I really appreciated this "feature".

Must use imports and function variables

Go has this really annoying feature when I started to use it, must use imports. I would get so irritated when I'd need to put in debugging statements only to find that I had deleted the log import because the program wouldn't compile, because it wasn't used.

But that quickly faded as I figured out the benefits. Those huge binary sizes in my Python files were sometimes causes by imports that weren't needed. We of course got better with linters, but you can still fool Python's linters in ways Go compiler won't let you. Go keeps your code neat by forcing you to get rid of variables you're not using. You can't skip around this because your annoyed or want to get something done faster. Which is great for code health.

With the inclusion of goimports in the toolset, adding/removing of imports was no longer annoying. My editor could now just add or remove my log import whenever I saved a file.

Performance

When it came to memory performance, Go won hands down. I remember the number was around 4x the size for basic types in Python vs Go. I've heard Java also has a similar type of memory consumption once a type becomes an object, but I've never verified that claim.

With Goroutines vs. the GIL, Go was the clear winner. Remember, multi-processing wasn't an option for us at the time. But even if it had been, it's just not as convenient as a shared memory threading model.

I certainly watched micro-benchmarks of Python beat Go. Especially in code where what is tested is really optimized C code. But once we left the micro-benchmarks, Python just wasn't holding up.

Sometimes these micro-benchmarks were for things like regexes. I'm sure Python's re library still beats Go's regexp package. But it didn't account for things like bounded memory consumption and constant runtime that Go's regexp provides and re does not (regexp is based on RE2 by Russ Cox).

In the end, we found the realtime performance of Go was multiples of Python (again, no multi-processing) with memory consumption far less. In modern Go, I imagine this gap is getting wider, though we no longer have multiple implementations of the same systems for realistic benchmarks.

This was a language that could be taught to novices

Python was a language that could be taught fairly easily to people who had never programmed before. One of the problems with C++ was that this was not the case. There is a reason dynamic languages have picked up over the years, with one of those being that they are far easier to learn.

We had a few people in a remote office coding at least part-time in Python. And within the local group in Mountain View, we had a lot of people with at least passing familiarity with Python.

These were not hard-core software engineers. Their main job wasn't coding, it was running a network. So any language that we switched to would need to be easy to learn.

I've taught classes in both Python and Go across the globe. I've held office hours to help with coding issues.

What I found was that Go only had two subjects that made it harder to learn than Python: interfaces and pointers.

This meant that once you could convince people this was the way to Go, you could train them fairly easily. The trick of course is convincing people, but that is a whole other article.

The end result

Certainly from the article's title, you can tell I switched to Go. I tend to think, with very little data, that I am about 6x as productive.

One project that myself and a team of SWEs wrote in Python took about a year. And it had a lot of performance issues we worked on for a year following that development.

I re-wrote that project in Go, by myself, with many feature enhancements in about the same time. The new service has about 1000x the usage with no signs of similar performance issues.

While it took about 4 years, I was able to migrate our engineers out of Python and into Go. The tools group took a hybrid approach, with half the projects in C++ and the other half in Go. I like to think some of that was my influence.

Today our situation looks quite different. Engineers are concentrating on the problems they want to solve, not performance issues or random crashes.

Is Go for you?

This certainly isn't something I can answer. It might be the scale of your projects doesn't have the needs we had. Or Go, might not have critical libraries you require.

Go certainly might not be the answer for every programming project. Sometimes you might need the speed that only highly optimized C++ or assembly can give you. Or maybe something like Rust is a better fit.

But if you're a Python/Ruby/Node programmer, I suggest giving Go a chance. Write 5k lines with code reviews from someone knowledgable about Go (it's not just a language, but like all languages requires a certain mindset to truly utilize it). 5k lines, because you have to get over the: "why doesn't it do x, I don't like y" mental game we all do when picking up a new language.

My guess, you won't want to Go back (pun intended).

Happy programming ladies and gentlemen, in whatever language you love!

GopherSRE

Using SSH and Domain Sockets For Serving

SSH To The Rescue

Time to Rock and Roll

WebGears: I'll get your HTML/Javascript and your little dog Toto too!

TLDR

Preamble

Does anyone like HTML for Apps?

Programming in Go/Templates/HTML/CSS/Javascript, GAH!!!

New Goals

Webgear

What does this thing buy me

Why are you letting CSS get away

Enough talk, show me the goods!

Flatbuffers in Go Fall Flat

In short, I never want to use them again

So why are Flatbuffers hard?

That doesn't sound too bad

But I want the speed, so it still sounds worth it

All bad data causes a panic!

Getters are panicy too

Are there other good alternatives?

Okay, so when should I use Flatbuffers?

Proto vs encoding/json - No Contest

Socket to me: A Set Of Unix Socket Packages for Go

Introduction

TLDR

Table of Contents

IPC Choices

Examples

Using the raw stream

Bi-Directionaly Streaming with JSON

RPCs with Protocol Buffers

Benchmarks

Autopool: Speeding Up gRPC With Finalizers

Introduction

gRPC Problem

Deeper Look

Autopool - a use for runtime.SetFinalizer()

But you cannot guarantee when the pool will be added to?

Is this worth doing?

Why not do this with all messages?

Why not finalize just the slices then, wouldn't that be safer?

Garbage Collection is tricky beast, are you sure this won't cause problems?

Let's See Some Numbers

GRPC Service Benchmark

Without Pool Summary:

With Pool Summary:

Conclusions

1K Slices

10K Slices

50K Slices

100K Slices

3MiB Slices

Final Conclusions

Note on some gotchas

Not Required: gRPC Client Certs in Go

A little about TLS use in gRPC

Why supply a public cert anyways?

A note on PKCS#12

Author Note:

About the Author

HalfPike: A framework to avoid problems with standard regexes in operational tooling

Introduction

Before going further

Background

What Is A HalfPike

HalfPike Lexer

HalfPike Parser

Line Objects

ParseFn

Parser

Let’s Parse

Data types to store in

Define our parsing states with ParseFn(s)

Conclusion

About the Author

Go Language Basics

To Hell With Testing and Documentation!

What the worst projects say!