As an experienced full-stack developer, sets are a vital data structure I utilize extensively across large-scale applications. However unlike languages such as Python, Java, or JavaScript, Go lacks native built-in support for sets.

In this comprehensive guide, we‘ll explore various methods for emulating sets in Go for production systems – from simple maps to custom types optimized for performance and memory efficiency.

Why Sets Matter

Sets represent an unordered collection of unique elements supporting efficient membership testing, duplicate removal, and mathematical operations. Some common use cases:

Deduplicating Data

Removing duplicate entries from slices, arrays, and databases tables is a very common task. With millions of records, checking each element against all others is hugely inefficient. A set provides uniqueness by design in O(1) time.

Tracking State

Sets shine for tracking distinct state across large systems. For example, efficiently tracking unique users viewing a story, registered websocket connections, or cache keys.

Membership & Validation

Checking set membership provides simple validation of domain entities. Is a given customer_id valid? Does this device_id exist? Sets provide very fast existence checks.

Mathematical Operations

Union, intersection, difference provide powerful building blocks for data manipulation. These shine for integrating data from multiple sources.

By implementing performant custom set types, we can match built-in efficiency of languages like Python while avoiding performance pitfalls.

Sets in Go – Native Maps

With no native set type, the simplest approach builds on Go‘s high performance native maps.

Use a map with keys representing set elements, and empty struct values allowing minimal storage overhead per entry:

type StringSet map[string]struct{}

set := make(StringSet)
set["foo"] = struct{}{}

This leverages hash table performance for O(1) lookup speed. Adding and checking membership is super fast.

However, native maps have limitations:

No ordering – Map iteration order is randomized
Not generic – Each set needs its own concrete map type
No custom logic – Behavior is plain map behavior

Performance is excellent but we give up features expected of purpose-built sets. Let‘s explore some ways to address these limitations.

Custom Set Types

For reusable, encapsulated sets, we can build a custom set type wrapping our native map:

type StringSet struct {
  m map[string]struct{}
}

func NewStringSet() *StringSet {
  return &StringSet{ 
       m: make(map[string]struct{})
  }
}

func (set *StringSet) Add(s string) {
  set.m[s] = struct{}{} 
}

func (set *StringSet) Contains(s string) bool {
  _, exists := set.m[s]
  return exists
}  

This provides:

  • Custom StringSet type abstracting map details
  • Constructor and methods typical of sets
  • Easy to use API

We can even add logic related to a certain data domain:

func (set *UserIDSet) Add(id int) error {
  if id <= 0 {
    return errors.New("invalid user id")
  }

  set.m[id] = struct{}{}
  return nil
}

Encapsulation allows behavior specific to data and use case.

Leveraging Structural Typing

While better, our string set type still ties us to one kind of element. We can genericize further leveraging Go‘s structural typing:

type Set struct {
  m map[interface{}]struct{}
}

func NewSet() *Set {
  return &Set{
    m: make(map[interface{}]struct{}), 
  }
}

func (set *Set) Add(item interface{}) {
  set.m[item] = struct{}{}
}

set := NewSet()
set.Add("foo")
set.Add(123)

This frees us from concrete types – now our set supports any value. The empty struct value requires no storage overhead.

Structural typing enables this kind of generic code difficult in nominally typed languages.

Custom Generic Set Functions

Generating generic set methods is simplified in Go 1.18 with type parameters:

func Set[T comparable](items ...T) []T {
    uniq := make(map[T]struct{})
    for _, item := range items {
        uniq[item] = struct{}{}
    }
    // map keys = set elements
    return keys(uniq) 
}

ints := Set(1, 2, 3, 2, 4) 
// [1, 2, 3, 4]  

This abstracts set logic into reusable functions operating on any comparable type. Pretty powerful!

Sets vs Other Languages

How does Go‘s approach to sets compare to other popular languages?

Language Native Set Notes
Python Built-in hash table set
JavaScript ES6 introduced set
Java Generic HashSet since JDK 1.2
Ruby Set classes since Ruby 1.9
Go Requires DIY approach

Lack of native sets is a common pain point cited by Go developers. But implementing performant custom sets gives us flexibility missing from rigid standard libraries.

Ordered Sets

One major limitation of the map technique is lack of ordering. For sorted sets, we need to change tools.

The golang.org/x/exp/maps package offers alternate map types. The String type provides ordered string keys via a B-Tree:

import "golang.org/x/exp/maps"

type StringSet struct {
  s maps.String
} 

func NewStringSet() *StringSet {
  return &StringSet{
     s: maps.NewString() 
  }
}

func (set *StringSet) Add(x string) {
  set.s[x] = struct{}{} 
}

// sorted keys
fmt.Println(set.s.Keys()) 

This gives us proper ordered string set behavior – including useful methods like Prefix for range queries.

The downside is significantly slower insertion and lookup vs plain hash maps. Only use ordered sets where ordering matters – benchmark first!

Case Study: User Visits Tracking

Let‘s apply sets to efficiently track unique visitors reading articles on a high traffic news site.

Naive Approach

An initial solution stores visits in a database table:

story_id | user_id
-------------------
     12345|     102
     12346|     205
     12346|     205  

This table grows without bounds as traffic increases. Counting uniques requires resource intensive GROUP BY.

Optimized Set Solution

type UserVisitSet struct {
  s maps.IntSet // set of user_ids
}

perStoryVisits := make(map[int64]*UserVisitSet) 

func VisitStory(storyID int64, userID int) {
   // init set
   if _, ok := perStoryVisits[storyID]; !ok {
      set := &UserVisitSet{s: maps.NewIntSet()}
      perStoryVisits[storyID] = set
   }

   set := perStoryVisits[storyID]
   set.s.Add(userID) // track unique user visit 
}

report := perStoryVisits[123].CountUnique() // O(1) unique visits

This scales to any traffic load while only storing truly unique data. IntSet provides O(1) Contains() checks to ensure no duplicates.

Thread Safety

maps and sets built atop them are not intrinsically thread-safe in Go. Concurrent access can put the data structure in an inconsistent, crashed state:

fatal error: concurrent map writes

Protect access with synchronization primitives like mutexes:

type Set struct {
  m map[interface{}]struct{}
  sync.Mutex 
}

func (set *Set) Add(item interface{}) {
  set.Lock()
  defer set.Unlock()

  set.m[item] = struct{}{}
} 

The standard library provides robust building blocks for thread safety beyond the scope here. Check the Go Concurrency Patterns book for best practices.

Well-designed locks and goroutine communication mean we can build highly concurrent custom sets meeting production needs.

Sets and Memory Usage

Sets aim to store distinct data optimally, but still incur overheads around allocation and garbage collection:

Set Type Overhead Notes
Map set moderate Many small allocations
Custom set higher More structure
Ordered set highest B-Trees allocate heavily

GC tuning is vital for sets containing many small keys. The runtime‘s NextGC trigger tracks heap growth:

Set.Add calls: 1436734
Heap Bytes: 56MB   
Next GC target: 64MB

A high key turnover can trigger frequent, expensive GC cycles even with low total memory usage.

Manually call runtime.GC() during key churns to recycle unused entries. Or, use Go 1.18‘s improved GC ergonomics for auto-tuning small object heap settings.

Understand your access patterns and tune the runtime accordingly. A performant GC keeps sets speedy within your memory budget.

Closing Thoughts

Sets enable vital use cases like removing duplicates and membership checks with maximally efficient data storage and access.

While Go currently lacks native sets, we have great building blocks via maps, structs and generic types to implement high performance custom sets matching needs.

Key takeaways:

  • Leverage native map semantics for simple use cases
  • Encapsulate logic in custom set types
  • Employ interface{} and type parameters for generic code
  • Use ordered map packages where sort order matters
  • Protect concurrent access with locks/goroutines
  • Mind GC pressure and overhead with small keys

I hope you‘ve enjoyed this advanced deep dive into crafting and optimizing custom sets in Go. Reach out on Twitter @matt_gopher with any other topics you‘d like to see covered!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *