Date: 2026-03-11
Scope: All 4 services (bridge, server, decoder, location) and every internal package they depend on (~50 Go source files).
Excluded: Security (authentication, authorization, TLS hardening) -- per agreement, the system will operate behind a firewall.
| Category | Score | Notes |
|---|---|---|
| Architecture & layout | 8/10 | Clean microservice split, standard Go project layout, good separation of concerns |
| Error handling | 4/10 | Multiple swallowed errors, panics instead of errors, fmt.Println in production paths |
| Resource management | 4/10 | HTTP response bodies leaked, no server timeouts, no request-size limits |
| Concurrency safety | 5/10 | Good use of sync.RWMutex in most places, but 2 confirmed race conditions |
| Observability | 4/10 | Structured logging is mostly in place, but no metrics, no tracing, health reporting has a memory leak |
| API design | 6/10 | Consistent response helpers, middleware chain present, but no input validation and mixed route naming |
| Testing | 5/10 | Good test structure and SQLite-based controller tests, but test code is stale and e2e is a stub |
| Operational readiness | 5/10 | Dockerfiles and compose exist, but no Makefile, no graceful shutdown timeout, AutoMigrate in production |
Bottom line: The architecture is solid and the code is well-organized. The project needs a focused round of fixes on resource leaks, race conditions, error handling, and observability before it is production ready.
apiclient/auth.go, apiclient/data.go, and apiclient/utils.go read from res.Body but never call res.Body.Close(). Each leaked body holds an open TCP connection and file descriptor. Under sustained load this will exhaust the process’s file descriptor limit and crash the service.
// apiclient/auth.go -- current
res, err := client.Do(req)
if err != nil { ... }
var j response
json.NewDecoder(res.Body).Decode(&j) // Body never closed
// fix: add defer immediately after the error check
res, err := client.Do(req)
if err != nil { ... }
defer res.Body.Close()
The same pattern applies to every function in data.go (GetTrackers, GetGateways, GetTrackerZones, GetZones, InferPosition) and utils.go (getRequest -- the caller must close, or close inside).
internal/app/server/app.go:121-124:
a.Server = &http.Server{
Addr: a.Cfg.HTTPAddr,
Handler: a.RegisterRoutes(),
}
No ReadTimeout, WriteTimeout, or IdleTimeout is set. A single malicious (or slow) client can hold a goroutine and connection indefinitely.
Fix:
a.Server = &http.Server{
Addr: a.Cfg.HTTPAddr,
Handler: a.RegisterRoutes(),
ReadTimeout: 15 * time.Second,
WriteTimeout: 15 * time.Second,
IdleTimeout: 60 * time.Second,
}
LocationToBeaconServiceAIinternal/pkg/service/beacon_service.go:110:
db.Order(fmt.Sprintf("POW(x - %f, 2) + POW(y - %f, 2)", msg.X, msg.Y)).First(&gw)
msg.X and msg.Y come from a deserialized Kafka message. Although they are float32, a manipulated Kafka payload could exploit this string-interpolated ORDER BY clause. Use parameterized queries or GORM’s Expr:
db.Order(gorm.Expr("POW(x - ?, 2) + POW(y - ?, 2)", msg.X, msg.Y)).First(&gw)
InsecureSkipVerify is always true despite config flaginternal/pkg/location/inference.go:27:
func NewDefaultInferencer(skipTLSVerify bool) *DefaultInferencer {
tr := &http.Transport{}
tr.TLSClientConfig = &tls.Config{InsecureSkipVerify: true} // ignores the parameter!
...
}
The skipTLSVerify parameter is accepted but never used. Same issue in apiclient/updatedb.go:22. Fix: use the parameter value.
ActiveBeacons slice grows without bound (memory leak)internal/pkg/common/appcontext/health.go:49-53:
func (b *BaseHealth) GetActiveBeacons(m *AppState) {
beacons := m.GetAllBeacons()
for beacon := range beacons {
b.ActiveBeacons = append(b.ActiveBeacons, beacon)
}
}
This appends on every health tick but never resets the slice. Over hours the slice grows to millions of duplicate entries.
Fix: reset the slice before appending:
func (b *BaseHealth) GetActiveBeacons(m *AppState) {
beacons := m.GetAllBeacons()
b.ActiveBeacons = make([]string, 0, len(beacons))
for beacon := range beacons {
b.ActiveBeacons = append(b.ActiveBeacons, beacon)
}
}
SettingsAppState.GetSettings() (context.go:142) returns a pointer to the settings struct. Any goroutine can read/write the pointed-to fields concurrently.AppState.UpdateSettings() (context.go:240) calls mapstructure.Decode on m.settings without holding the mutex. Meanwhile GetSettingsValue() reads it without a lock too.Fix: hold m.mu in UpdateSettings and GetSettingsValue, and either return a copy from GetSettings or document that callers must hold the lock.
ParserRegistry iterationinternal/pkg/common/utils/beacons.go:51:
for name, parser := range parserRegistry.ParserList { ... }
This iterates the map without holding the registry’s rw read lock, while Register/Unregister modify the map under a write lock. This is a textbook data race.
Fix: either acquire parserRegistry.rw.RLock() around the loop, or pass a snapshot copy.
context.Background() used where parent context should be propagatedIn three hot paths the parent (cancellable) context is replaced with context.Background(), preventing graceful shutdown from cancelling in-flight writes:
| File | Line | Call |
|---|---|---|
decoder/process.go |
62 | kafkaclient.Write(context.Background(), ...) |
bridge/handler.go |
53 | kafkaclient.Write(context.Background(), ...) |
location/filter.go |
26 | ctx := context.Background() |
Fix: thread the parent ctx through these functions so shutdown signal propagates.
getEnvPanic crashes the process on missing env varsinternal/pkg/config/config.go:50-55:
func getEnvPanic(key string) string {
if v := os.Getenv(key); v != "" { return v }
panic(fmt.Sprintf("environment variable %s is not set", key))
}
A panic produces an unstructured stack trace and no chance for cleanup (log flushing, Kafka writer close, etc). Return an error from the Load* functions instead.
All controllers (gateways_controller.go, trackers_controller.go, zone_controller.go, etc.) deserialize JSON into model structs and pass them directly to GORM Create/Save with zero validation. Missing ID fields, empty names, malformed MACs, or extremely long strings all go straight to the database.
At minimum validate required fields and length limits before db.Create.
DB.Create errors silently swallowed during initinternal/app/server/app.go:89-91:
for _, c := range configs {
a.DB.Create(&c) // error discarded
}
If any config row fails to insert (e.g. duplicate primary key), the error is lost. Check and log each error.
internal/pkg/kafkaclient/consumer.go:34:
ch <- data
If the event-loop goroutine is slow and the channel fills up, the consumer goroutine blocks here forever. It will not respond to ctx.Done() because the select only checks context in the outer loop. Options:
select with ctx.Done() when sending to the channel.context.Context stored in struct fieldinternal/app/server/app.go:35:
type ServerApp struct {
...
ctx context.Context
...
}
Go’s own documentation says: “Do not store Contexts inside a struct type.” The context is then passed to every route handler at registration time, meaning request-scoped contexts (timeouts, cancellation, values) cannot be used. Pass ctx as a function parameter instead.
None of the HTTP handlers call http.MaxBytesReader. A single POST with a multi-gigabyte body will be read into memory in full. Add a body-size middleware or limit in each handler:
r.Body = http.MaxBytesReader(w, r.Body, 1<<20) // 1 MB
twosComp silently returns wrong value on bad inputinternal/pkg/common/utils/distance.go:24-26:
func twosComp(inp string) int64 {
i, _ := strconv.ParseInt("0x"+inp, 0, 64)
return i - 256
}
If inp is not valid hex, i is 0 and the function returns -256, producing nonsensical distance. Check the error.
AutoMigrate in productioninternal/pkg/database/database.go:29 runs db.AutoMigrate(...) on every startup. In production this can silently add columns but never removes or renames them. It can also fail or lock tables depending on DB state. Use a versioned migration tool (golang-migrate, goose, atlas).
internal/app/server/app.go:141:
a.Server.Shutdown(context.Background())
context.Background() means the shutdown waits forever for in-flight requests. Use a deadline:
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
a.Server.Shutdown(ctx)
syncTable panics on models without string ID fieldinternal/pkg/apiclient/updatedb.go:90:
v := reflect.ValueOf(item).FieldByName("ID").String()
If any model passed to syncTable doesn’t have a field named ID, or if ID isn’t a string, this panics at runtime. Use an interface constraint or handle the error.
fmt.Println / fmt.Printf used in production codeThe following files use fmt.Println or fmt.Printf instead of slog:
internal/pkg/common/utils/beacons.go:49,54internal/pkg/apiclient/auth.go:29,37internal/pkg/apiclient/data.go:15,22,79,80,91,94,100internal/pkg/apiclient/utils.go:17internal/pkg/apiclient/updatedb.go:67fmt.Println goes to stdout (not the structured log pipeline), has no timestamp, level, or request context, and is invisible to log aggregation. Replace all with slog.Error / slog.Info.
Consume logs errors on context cancellationinternal/pkg/kafkaclient/consumer.go:22-24:
msg, err := r.ReadMessage(ctx)
if err != nil {
slog.Error("error reading message", "error", err)
continue
}
When ctx is cancelled (normal shutdown), ReadMessage returns an error. This logs a spurious error on every shutdown. Check ctx.Err() first:
if err != nil {
if ctx.Err() != nil {
return
}
slog.Error(...)
continue
}
internal/pkg/database/database.go:17:
"host=%s user=%s password=%s dbname=%s port=5432 sslmode=disable"
Port 5432, sslmode=disable are hardcoded. Both should be configurable via config.Config.
Go convention uses CamelCase for exported identifiers. The current naming:
SMALL_CHANNEL_SIZE = 200
MEDIUM_CHANNEL_SIZE = 500
LARGE_CHANNEL_SIZE = 2000
SMALL_TICKER_INTERVAL = 1 * time.Second
MEDIUM_TICKER_INTERVAL = 2 * time.Second
LARGE_TICKER_INTERVAL = 5 * time.Second
Should be SmallChannelSize, MediumChannelSize, etc.
The API mixes two styles:
/reslevis/getGateways, /reslevis/postTracker/configs/beacons, /health, /readyPick one style and be consistent. REST convention: use nouns + HTTP verbs (GET /gateways, POST /gateways, DELETE /gateways/{id}).
getEnv duplicatedBoth internal/pkg/config/config.go and internal/pkg/common/appcontext/context.go define their own getEnv helper. Extract to a shared utility.
internal/pkg/logger/logger.go:13:
f, err := os.OpenFile(fname, ...)
fname is just "bridge.log" etc., so the file goes wherever the binary’s CWD happens to be. In a container this is fine; outside containers it is unpredictable. Consider making the log directory configurable.
For production observability, consider adding:
The current system is log-only, making it hard to diagnose performance issues and set alerts.
internal/pkg/api/middleware/cors.go:13:
if len(origins) == 0 {
origins = []string{"*"}
}
The default is * (all origins). For production, restrict to known frontend origins, even behind a firewall.
There is no Makefile or Taskfile for common operations (build, test, lint, docker-compose up). Adding one standardizes the developer workflow and CI pipeline.
| # | File | Issue |
|---|---|---|
| S1 | beacons.go:62 |
Commented-out debug line // fmt.Println("no parser can parse: ", beacon) |
| S2 | kafkaclient/manager.go |
KafkaReadersMap and KafkaWritersMap export their Lock and map fields but the package is internal; keep unexported |
| S3 | controller/* |
Parameter named context shadows the context package import; use ctx |
| S4 | alert_service.go:18 |
DeleteAlertByTrackerID deletes by id = ? not tracker_id = ? -- function name is misleading |
| S5 | model/parser.go:82 |
Parse uses unsafe type assertions (.(uint16), .(float64)) that will panic if extract returns int(0) on short data |
| S6 | location/app.go:62-65 |
locTicker and healthTicker have the same interval (LARGE_TICKER_INTERVAL), so they always fire together -- either intentional (combine them) or a bug |
| S7 | bridge/app.go:116 |
MQTT publish QoS=0 with retain=true for alerts/trackers -- consider QoS=1 for reliable delivery |
| S8 | database.go:33-34 |
msg variable used only to pass to slog.Info(msg) -- just inline the string |
E2E_TEST=1).Tests appear to be out of sync with production code. The test files reference model.BeaconAdvertisement and model.Settings, but the production code has these types in appcontext. Controller test functions call controller.GatewayListController(db) with one arg, but production code requires two (db, ctx). This means the tests either do not compile against the current code or test a different version.
No test coverage measurement. There is no CI step to report or enforce coverage.
E2E tests are stubs. The tests/e2e/e2e_test.go does nothing meaningful.
Recommendation: Fix the test compilation issues first, then add coverage reporting, and implement at least one real E2E flow (e.g., publish MQTT message -> verify it reaches the Kafka topic -> verify the decoder processes it).
| Priority | Count | Action |
|---|---|---|
| Critical | 9 | Fix before any production deployment |
| High | 10 | Fix before GA / first customer |
| Medium | 9 | Fix in next sprint |
| Minor/Style | 8 | Fix during regular cleanup |
The biggest risks are: resource leaks (C1, C5), race conditions (C6, C7), SQL injection (C3), and missing timeouts (C2). Fixing these 6 items alone would significantly improve production readiness.