Dynamic Sampling by Example

Dynamic Sampling by Example

 

Sampling with dynamic rates on arbitrarily many keys

What if we can’t predict a finite set of request quotas we want to set — e.g. if we want to cover the customer id case above? It was ugly enough to set the target rates by hand for each key (“error/latency” vs. “normal”), and incurred a lot of duplicate code. We can refactor to instead use a map for each key’s target rate and number of seen events, and do lookups to make sampling decisions. And this is how we get to what’s implemented in the dynsample-go library, which maintains a map over any number of sampling keys and allocates a fair share to each key as long as it’s novel. It looks something like this:

two graphs with varying sample rates over time labeled 200 and 500

var counts map[SampleKey]int
var sampleRates map[SampleKey]float64
var targetRates map[SampleKey]int

func neverSample(k SampleKey) bool {
	// Left to your imagination. Could be a situation where we know request is a keepalive we never want to record, etc.
	return false
}

// Boilerplate main() and goroutine init to overwrite maps and roll them over every interval goes here.

type SampleKey struct {
	ErrMsg        string
	BackendShard  int
	LatencyBucket int
}

// This might compute for each k: newRate[k] = counts[k] / (interval * targetRates[k]), for instance.
// The dynsample library has more advanced techniques of computing sampleRates based on targetRates, or even without explicit targetRates.
func checkSampleRate(resp http.ResponseWriter, start time.Time, err error, sr map[interface{}]float64, c map[interface{}]int) float64 {
	msg := ""
	if err != nil {
		msg = err.Error()
	}
	roundedLatency := 100 *(time.Since(start) / (100*time.Millisecond))
	k := SampleKey {
		ErrMsg:       msg,
		BackendShard: resp.Header().Get("Backend-Shard"),
		LatencyBucket: roundedLatency,
	}
	if neverSample(k) {
		return -1.0
	}

	c[k]++
	if r, ok := sr[k]; ok {
		return r
	} else {
		return 1.0
	}
}

func handler(resp http.ResponseWriter, req *http.Request) {
	var r float64
	if r, err := floatFromHexBytes(req.Header.Get("Sampling-ID")); err != nil {
		r = rand.Float64()
	}

	start := time.Now()
	i, err := callAnotherService(r)
	resp.Write(i)

	sampleRate := checkSampleRate(resp, start, err, sampleRates, counts)
	if sampleRate > 0 && r < 1.0 / sampleRate {
		RecordEvent(req, sampleRate, start, err)
	}
}

We’re close to having everything put together. But let’s make one last improvement by combining the tail-based sampling we’ve done so far with head-based sampling that can request tracing of everything downstream.

Pages: 1 2 3 4 5