# Code Review -- AFASystems Presence **Date:** 2026-03-11 **Scope:** All 4 services (`bridge`, `server`, `decoder`, `location`) and every internal package they depend on (~50 Go source files). **Excluded:** Security (authentication, authorization, TLS hardening) -- per agreement, the system will operate behind a firewall. --- ## Production Readiness Rating: 5.5 / 10 | Category | Score | Notes | |---|---|---| | Architecture & layout | 8/10 | Clean microservice split, standard Go project layout, good separation of concerns | | Error handling | 4/10 | Multiple swallowed errors, panics instead of errors, `fmt.Println` in production paths | | Resource management | 4/10 | HTTP response bodies leaked, no server timeouts, no request-size limits | | Concurrency safety | 5/10 | Good use of `sync.RWMutex` in most places, but 2 confirmed race conditions | | Observability | 4/10 | Structured logging is mostly in place, but no metrics, no tracing, health reporting has a memory leak | | API design | 6/10 | Consistent response helpers, middleware chain present, but no input validation and mixed route naming | | Testing | 5/10 | Good test structure and SQLite-based controller tests, but test code is stale and e2e is a stub | | Operational readiness | 5/10 | Dockerfiles and compose exist, but no Makefile, no graceful shutdown timeout, `AutoMigrate` in production | **Bottom line:** The architecture is solid and the code is well-organized. The project needs a focused round of fixes on resource leaks, race conditions, error handling, and observability before it is production ready. --- ## Critical Issues (Must Fix) ### C1. HTTP response bodies are never closed `apiclient/auth.go`, `apiclient/data.go`, and `apiclient/utils.go` read from `res.Body` but never call `res.Body.Close()`. Each leaked body holds an open TCP connection and file descriptor. Under sustained load this will exhaust the process's file descriptor limit and crash the service. ```go // apiclient/auth.go -- current res, err := client.Do(req) if err != nil { ... } var j response json.NewDecoder(res.Body).Decode(&j) // Body never closed // fix: add defer immediately after the error check res, err := client.Do(req) if err != nil { ... } defer res.Body.Close() ``` The same pattern applies to every function in `data.go` (`GetTrackers`, `GetGateways`, `GetTrackerZones`, `GetZones`, `InferPosition`) and `utils.go` (`getRequest` -- the caller must close, or close inside). --- ### C2. No HTTP server timeouts -- Slowloris vulnerability `internal/app/server/app.go:121-124`: ```go a.Server = &http.Server{ Addr: a.Cfg.HTTPAddr, Handler: a.RegisterRoutes(), } ``` No `ReadTimeout`, `WriteTimeout`, or `IdleTimeout` is set. A single malicious (or slow) client can hold a goroutine and connection indefinitely. **Fix:** ```go a.Server = &http.Server{ Addr: a.Cfg.HTTPAddr, Handler: a.RegisterRoutes(), ReadTimeout: 15 * time.Second, WriteTimeout: 15 * time.Second, IdleTimeout: 60 * time.Second, } ``` --- ### C3. SQL injection in `LocationToBeaconServiceAI` `internal/pkg/service/beacon_service.go:110`: ```go db.Order(fmt.Sprintf("POW(x - %f, 2) + POW(y - %f, 2)", msg.X, msg.Y)).First(&gw) ``` `msg.X` and `msg.Y` come from a deserialized Kafka message. Although they are `float32`, a manipulated Kafka payload could exploit this string-interpolated ORDER BY clause. Use parameterized queries or GORM's `Expr`: ```go db.Order(gorm.Expr("POW(x - ?, 2) + POW(y - ?, 2)", msg.X, msg.Y)).First(&gw) ``` --- ### C4. `InsecureSkipVerify` is always `true` despite config flag `internal/pkg/location/inference.go:27`: ```go func NewDefaultInferencer(skipTLSVerify bool) *DefaultInferencer { tr := &http.Transport{} tr.TLSClientConfig = &tls.Config{InsecureSkipVerify: true} // ignores the parameter! ... } ``` The `skipTLSVerify` parameter is accepted but never used. Same issue in `apiclient/updatedb.go:22`. Fix: use the parameter value. --- ### C5. `ActiveBeacons` slice grows without bound (memory leak) `internal/pkg/common/appcontext/health.go:49-53`: ```go func (b *BaseHealth) GetActiveBeacons(m *AppState) { beacons := m.GetAllBeacons() for beacon := range beacons { b.ActiveBeacons = append(b.ActiveBeacons, beacon) } } ``` This appends on every health tick but never resets the slice. Over hours the slice grows to millions of duplicate entries. **Fix:** reset the slice before appending: ```go func (b *BaseHealth) GetActiveBeacons(m *AppState) { beacons := m.GetAllBeacons() b.ActiveBeacons = make([]string, 0, len(beacons)) for beacon := range beacons { b.ActiveBeacons = append(b.ActiveBeacons, beacon) } } ``` --- ### C6. Race condition on `Settings` - `AppState.GetSettings()` (`context.go:142`) returns a **pointer** to the settings struct. Any goroutine can read/write the pointed-to fields concurrently. - `AppState.UpdateSettings()` (`context.go:240`) calls `mapstructure.Decode` on `m.settings` **without holding the mutex**. Meanwhile `GetSettingsValue()` reads it without a lock too. **Fix:** hold `m.mu` in `UpdateSettings` and `GetSettingsValue`, and either return a copy from `GetSettings` or document that callers must hold the lock. --- ### C7. Race condition on `ParserRegistry` iteration `internal/pkg/common/utils/beacons.go:51`: ```go for name, parser := range parserRegistry.ParserList { ... } ``` This iterates the map without holding the registry's `rw` read lock, while `Register`/`Unregister` modify the map under a write lock. This is a textbook data race. **Fix:** either acquire `parserRegistry.rw.RLock()` around the loop, or pass a snapshot copy. --- ### C8. `context.Background()` used where parent context should be propagated In three hot paths the parent (cancellable) context is replaced with `context.Background()`, preventing graceful shutdown from cancelling in-flight writes: | File | Line | Call | |---|---|---| | `decoder/process.go` | 62 | `kafkaclient.Write(context.Background(), ...)` | | `bridge/handler.go` | 53 | `kafkaclient.Write(context.Background(), ...)` | | `location/filter.go` | 26 | `ctx := context.Background()` | **Fix:** thread the parent `ctx` through these functions so shutdown signal propagates. --- ### C9. `getEnvPanic` crashes the process on missing env vars `internal/pkg/config/config.go:50-55`: ```go func getEnvPanic(key string) string { if v := os.Getenv(key); v != "" { return v } panic(fmt.Sprintf("environment variable %s is not set", key)) } ``` A `panic` produces an unstructured stack trace and no chance for cleanup (log flushing, Kafka writer close, etc). Return an error from the `Load*` functions instead. --- ## High Priority Issues ### H1. No input validation on API endpoints All controllers (`gateways_controller.go`, `trackers_controller.go`, `zone_controller.go`, etc.) deserialize JSON into model structs and pass them directly to GORM `Create`/`Save` with zero validation. Missing `ID` fields, empty names, malformed MACs, or extremely long strings all go straight to the database. At minimum validate required fields and length limits before `db.Create`. --- ### H2. `DB.Create` errors silently swallowed during init `internal/app/server/app.go:89-91`: ```go for _, c := range configs { a.DB.Create(&c) // error discarded } ``` If any config row fails to insert (e.g. duplicate primary key), the error is lost. Check and log each error. --- ### H3. Kafka consumer channel can block indefinitely `internal/pkg/kafkaclient/consumer.go:34`: ```go ch <- data ``` If the event-loop goroutine is slow and the channel fills up, the consumer goroutine blocks here forever. It will not respond to `ctx.Done()` because the select only checks context in the outer loop. Options: - Use a `select` with `ctx.Done()` when sending to the channel. - Or implement backpressure / drop-oldest strategy. --- ### H4. `context.Context` stored in struct field `internal/app/server/app.go:35`: ```go type ServerApp struct { ... ctx context.Context ... } ``` Go's own documentation says: "Do not store Contexts inside a struct type." The context is then passed to every route handler at registration time, meaning request-scoped contexts (timeouts, cancellation, values) cannot be used. Pass `ctx` as a function parameter instead. --- ### H5. No request body size limits None of the HTTP handlers call `http.MaxBytesReader`. A single POST with a multi-gigabyte body will be read into memory in full. Add a body-size middleware or limit in each handler: ```go r.Body = http.MaxBytesReader(w, r.Body, 1<<20) // 1 MB ``` --- ### H6. `twosComp` silently returns wrong value on bad input `internal/pkg/common/utils/distance.go:24-26`: ```go func twosComp(inp string) int64 { i, _ := strconv.ParseInt("0x"+inp, 0, 64) return i - 256 } ``` If `inp` is not valid hex, `i` is 0 and the function returns -256, producing nonsensical distance. Check the error. --- ### H7. `AutoMigrate` in production `internal/pkg/database/database.go:29` runs `db.AutoMigrate(...)` on every startup. In production this can silently add columns but never removes or renames them. It can also fail or lock tables depending on DB state. Use a versioned migration tool (`golang-migrate`, `goose`, `atlas`). --- ### H8. No graceful HTTP shutdown timeout `internal/app/server/app.go:141`: ```go a.Server.Shutdown(context.Background()) ``` `context.Background()` means the shutdown waits forever for in-flight requests. Use a deadline: ```go ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() a.Server.Shutdown(ctx) ``` --- ### H9. `syncTable` panics on models without string `ID` field `internal/pkg/apiclient/updatedb.go:90`: ```go v := reflect.ValueOf(item).FieldByName("ID").String() ``` If any model passed to `syncTable` doesn't have a field named `ID`, or if `ID` isn't a string, this panics at runtime. Use an interface constraint or handle the error. --- ### H10. `fmt.Println` / `fmt.Printf` used in production code The following files use `fmt.Println` or `fmt.Printf` instead of `slog`: - `internal/pkg/common/utils/beacons.go:49,54` - `internal/pkg/apiclient/auth.go:29,37` - `internal/pkg/apiclient/data.go:15,22,79,80,91,94,100` - `internal/pkg/apiclient/utils.go:17` - `internal/pkg/apiclient/updatedb.go:67` `fmt.Println` goes to stdout (not the structured log pipeline), has no timestamp, level, or request context, and is invisible to log aggregation. Replace all with `slog.Error` / `slog.Info`. --- ## Medium Priority Issues ### M1. `Consume` logs errors on context cancellation `internal/pkg/kafkaclient/consumer.go:22-24`: ```go msg, err := r.ReadMessage(ctx) if err != nil { slog.Error("error reading message", "error", err) continue } ``` When `ctx` is cancelled (normal shutdown), `ReadMessage` returns an error. This logs a spurious error on every shutdown. Check `ctx.Err()` first: ```go if err != nil { if ctx.Err() != nil { return } slog.Error(...) continue } ``` --- ### M2. Database port hardcoded `internal/pkg/database/database.go:17`: ```go "host=%s user=%s password=%s dbname=%s port=5432 sslmode=disable" ``` Port 5432, sslmode=disable are hardcoded. Both should be configurable via `config.Config`. --- ### M3. Constant naming convention Go convention uses `CamelCase` for exported identifiers. The current naming: ```go SMALL_CHANNEL_SIZE = 200 MEDIUM_CHANNEL_SIZE = 500 LARGE_CHANNEL_SIZE = 2000 SMALL_TICKER_INTERVAL = 1 * time.Second MEDIUM_TICKER_INTERVAL = 2 * time.Second LARGE_TICKER_INTERVAL = 5 * time.Second ``` Should be `SmallChannelSize`, `MediumChannelSize`, etc. --- ### M4. Inconsistent route naming The API mixes two styles: - RPC-style: `/reslevis/getGateways`, `/reslevis/postTracker` - REST-style: `/configs/beacons`, `/health`, `/ready` Pick one style and be consistent. REST convention: use nouns + HTTP verbs (`GET /gateways`, `POST /gateways`, `DELETE /gateways/{id}`). --- ### M5. `getEnv` duplicated Both `internal/pkg/config/config.go` and `internal/pkg/common/appcontext/context.go` define their own `getEnv` helper. Extract to a shared utility. --- ### M6. Logger writes to CWD-relative file `internal/pkg/logger/logger.go:13`: ```go f, err := os.OpenFile(fname, ...) ``` `fname` is just `"bridge.log"` etc., so the file goes wherever the binary's CWD happens to be. In a container this is fine; outside containers it is unpredictable. Consider making the log directory configurable. --- ### M7. No metrics or tracing For production observability, consider adding: - **Prometheus metrics**: request count, latency histograms, Kafka consumer lag, error rates. - **Distributed tracing**: propagate trace IDs through Kafka messages and HTTP requests (OpenTelemetry). The current system is log-only, making it hard to diagnose performance issues and set alerts. --- ### M8. CORS allows all origins `internal/pkg/api/middleware/cors.go:13`: ```go if len(origins) == 0 { origins = []string{"*"} } ``` The default is `*` (all origins). For production, restrict to known frontend origins, even behind a firewall. --- ### M9. No Makefile There is no `Makefile` or `Taskfile` for common operations (build, test, lint, docker-compose up). Adding one standardizes the developer workflow and CI pipeline. --- ## Minor / Style Issues | # | File | Issue | |---|---|---| | S1 | `beacons.go:62` | Commented-out debug line `// fmt.Println("no parser can parse: ", beacon)` | | S2 | `kafkaclient/manager.go` | `KafkaReadersMap` and `KafkaWritersMap` export their `Lock` and map fields but the package is internal; keep unexported | | S3 | `controller/*` | Parameter named `context` shadows the `context` package import; use `ctx` | | S4 | `alert_service.go:18` | `DeleteAlertByTrackerID` deletes by `id = ?` not `tracker_id = ?` -- function name is misleading | | S5 | `model/parser.go:82` | `Parse` uses unsafe type assertions (`.(uint16)`, `.(float64)`) that will panic if `extract` returns `int(0)` on short data | | S6 | `location/app.go:62-65` | `locTicker` and `healthTicker` have the same interval (`LARGE_TICKER_INTERVAL`), so they always fire together -- either intentional (combine them) or a bug | | S7 | `bridge/app.go:116` | MQTT publish QoS=0 with retain=true for alerts/trackers -- consider QoS=1 for reliable delivery | | S8 | `database.go:33-34` | `msg` variable used only to pass to `slog.Info(msg)` -- just inline the string | --- ## Test Assessment ### What exists - **Unit tests** for bridge MQTT handling, decoder processing, utils, model, logger, config, and appcontext. - **Controller tests** using in-memory SQLite, covering CRUD for gateways, trackers, zones, settings. - **Integration test stubs** for bridge and decoder with mock Kafka writers. - **E2E test** is a placeholder (skips unless `E2E_TEST=1`). ### Issues 1. **Tests appear to be out of sync with production code.** The test files reference `model.BeaconAdvertisement` and `model.Settings`, but the production code has these types in `appcontext`. Controller test functions call `controller.GatewayListController(db)` with one arg, but production code requires two (`db, ctx`). This means the tests either do not compile against the current code or test a different version. 2. **No test coverage measurement.** There is no CI step to report or enforce coverage. 3. **E2E tests are stubs.** The `tests/e2e/e2e_test.go` does nothing meaningful. **Recommendation:** Fix the test compilation issues first, then add coverage reporting, and implement at least one real E2E flow (e.g., publish MQTT message -> verify it reaches the Kafka topic -> verify the decoder processes it). --- ## Summary of Recommended Fix Priority | Priority | Count | Action | |---|---|---| | **Critical** | 9 | Fix before any production deployment | | **High** | 10 | Fix before GA / first customer | | **Medium** | 9 | Fix in next sprint | | **Minor/Style** | 8 | Fix during regular cleanup | The biggest risks are: **resource leaks** (C1, C5), **race conditions** (C6, C7), **SQL injection** (C3), and **missing timeouts** (C2). Fixing these 6 items alone would significantly improve production readiness.