16 KiB

Датотека Пермалинк Blame Историја

Code Review -- AFASystems Presence

Date: 2026-03-11 Scope: All 4 services (bridge, server, decoder, location) and every internal package they depend on (~50 Go source files). Excluded: Security (authentication, authorization, TLS hardening) -- per agreement, the system will operate behind a firewall.

Production Readiness Rating: 5.5 / 10

Category	Score	Notes
Architecture & layout	8/10	Clean microservice split, standard Go project layout, good separation of concerns
Error handling	4/10	Multiple swallowed errors, panics instead of errors, `fmt.Println` in production paths
Resource management	4/10	HTTP response bodies leaked, no server timeouts, no request-size limits
Concurrency safety	5/10	Good use of `sync.RWMutex` in most places, but 2 confirmed race conditions
Observability	4/10	Structured logging is mostly in place, but no metrics, no tracing, health reporting has a memory leak
API design	6/10	Consistent response helpers, middleware chain present, but no input validation and mixed route naming
Testing	5/10	Good test structure and SQLite-based controller tests, but test code is stale and e2e is a stub
Operational readiness	5/10	Dockerfiles and compose exist, but no Makefile, no graceful shutdown timeout, `AutoMigrate` in production

Bottom line: The architecture is solid and the code is well-organized. The project needs a focused round of fixes on resource leaks, race conditions, error handling, and observability before it is production ready.

Critical Issues (Must Fix)

C1. HTTP response bodies are never closed

apiclient/auth.go, apiclient/data.go, and apiclient/utils.go read from res.Body but never call res.Body.Close(). Each leaked body holds an open TCP connection and file descriptor. Under sustained load this will exhaust the process’s file descriptor limit and crash the service.

// apiclient/auth.go -- current
res, err := client.Do(req)
if err != nil { ... }
var j response
json.NewDecoder(res.Body).Decode(&j) // Body never closed

// fix: add defer immediately after the error check
res, err := client.Do(req)
if err != nil { ... }
defer res.Body.Close()

The same pattern applies to every function in data.go (GetTrackers, GetGateways, GetTrackerZones, GetZones, InferPosition) and utils.go (getRequest -- the caller must close, or close inside).

C2. No HTTP server timeouts -- Slowloris vulnerability

internal/app/server/app.go:121-124:

a.Server = &http.Server{
    Addr:    a.Cfg.HTTPAddr,
    Handler: a.RegisterRoutes(),
}

No ReadTimeout, WriteTimeout, or IdleTimeout is set. A single malicious (or slow) client can hold a goroutine and connection indefinitely.

Fix:

a.Server = &http.Server{
    Addr:         a.Cfg.HTTPAddr,
    Handler:      a.RegisterRoutes(),
    ReadTimeout:  15 * time.Second,
    WriteTimeout: 15 * time.Second,
    IdleTimeout:  60 * time.Second,
}

C3. SQL injection in `LocationToBeaconServiceAI`

internal/pkg/service/beacon_service.go:110:

db.Order(fmt.Sprintf("POW(x - %f, 2) + POW(y - %f, 2)", msg.X, msg.Y)).First(&gw)

msg.X and msg.Y come from a deserialized Kafka message. Although they are float32, a manipulated Kafka payload could exploit this string-interpolated ORDER BY clause. Use parameterized queries or GORM’s Expr:

db.Order(gorm.Expr("POW(x - ?, 2) + POW(y - ?, 2)", msg.X, msg.Y)).First(&gw)

C4. `InsecureSkipVerify` is always `true` despite config flag

internal/pkg/location/inference.go:27:

func NewDefaultInferencer(skipTLSVerify bool) *DefaultInferencer {
    tr := &http.Transport{}
    tr.TLSClientConfig = &tls.Config{InsecureSkipVerify: true} // ignores the parameter!
    ...
}

The skipTLSVerify parameter is accepted but never used. Same issue in apiclient/updatedb.go:22. Fix: use the parameter value.

C5. `ActiveBeacons` slice grows without bound (memory leak)

internal/pkg/common/appcontext/health.go:49-53:

func (b *BaseHealth) GetActiveBeacons(m *AppState) {
    beacons := m.GetAllBeacons()
    for beacon := range beacons {
        b.ActiveBeacons = append(b.ActiveBeacons, beacon)
    }
}

This appends on every health tick but never resets the slice. Over hours the slice grows to millions of duplicate entries.

Fix: reset the slice before appending:

func (b *BaseHealth) GetActiveBeacons(m *AppState) {
    beacons := m.GetAllBeacons()
    b.ActiveBeacons = make([]string, 0, len(beacons))
    for beacon := range beacons {
        b.ActiveBeacons = append(b.ActiveBeacons, beacon)
    }
}

C6. Race condition on `Settings`

AppState.GetSettings() (context.go:142) returns a pointer to the settings struct. Any goroutine can read/write the pointed-to fields concurrently.
AppState.UpdateSettings() (context.go:240) calls mapstructure.Decode on m.settings without holding the mutex. Meanwhile GetSettingsValue() reads it without a lock too.

Fix: hold m.mu in UpdateSettings and GetSettingsValue, and either return a copy from GetSettings or document that callers must hold the lock.

C7. Race condition on `ParserRegistry` iteration

internal/pkg/common/utils/beacons.go:51:

for name, parser := range parserRegistry.ParserList { ... }

This iterates the map without holding the registry’s rw read lock, while Register/Unregister modify the map under a write lock. This is a textbook data race.

Fix: either acquire parserRegistry.rw.RLock() around the loop, or pass a snapshot copy.

C8. `context.Background()` used where parent context should be propagated

In three hot paths the parent (cancellable) context is replaced with context.Background(), preventing graceful shutdown from cancelling in-flight writes:

File	Line	Call
`decoder/process.go`	62	`kafkaclient.Write(context.Background(), ...)`
`bridge/handler.go`	53	`kafkaclient.Write(context.Background(), ...)`
`location/filter.go`	26	`ctx := context.Background()`

Fix: thread the parent ctx through these functions so shutdown signal propagates.

C9. `getEnvPanic` crashes the process on missing env vars

internal/pkg/config/config.go:50-55:

func getEnvPanic(key string) string {
    if v := os.Getenv(key); v != "" { return v }
    panic(fmt.Sprintf("environment variable %s is not set", key))
}

A panic produces an unstructured stack trace and no chance for cleanup (log flushing, Kafka writer close, etc). Return an error from the Load* functions instead.

High Priority Issues

H1. No input validation on API endpoints

All controllers (gateways_controller.go, trackers_controller.go, zone_controller.go, etc.) deserialize JSON into model structs and pass them directly to GORM Create/Save with zero validation. Missing ID fields, empty names, malformed MACs, or extremely long strings all go straight to the database.

At minimum validate required fields and length limits before db.Create.

H2. `DB.Create` errors silently swallowed during init

internal/app/server/app.go:89-91:

for _, c := range configs {
    a.DB.Create(&c) // error discarded
}

If any config row fails to insert (e.g. duplicate primary key), the error is lost. Check and log each error.

H3. Kafka consumer channel can block indefinitely

internal/pkg/kafkaclient/consumer.go:34:

ch <- data

If the event-loop goroutine is slow and the channel fills up, the consumer goroutine blocks here forever. It will not respond to ctx.Done() because the select only checks context in the outer loop. Options:

Use a select with ctx.Done() when sending to the channel.
Or implement backpressure / drop-oldest strategy.

H4. `context.Context` stored in struct field

internal/app/server/app.go:35:

type ServerApp struct {
    ...
    ctx context.Context
    ...
}

Go’s own documentation says: “Do not store Contexts inside a struct type.” The context is then passed to every route handler at registration time, meaning request-scoped contexts (timeouts, cancellation, values) cannot be used. Pass ctx as a function parameter instead.

H5. No request body size limits

None of the HTTP handlers call http.MaxBytesReader. A single POST with a multi-gigabyte body will be read into memory in full. Add a body-size middleware or limit in each handler:

r.Body = http.MaxBytesReader(w, r.Body, 1<<20) // 1 MB

H6. `twosComp` silently returns wrong value on bad input

internal/pkg/common/utils/distance.go:24-26:

func twosComp(inp string) int64 {
    i, _ := strconv.ParseInt("0x"+inp, 0, 64)
    return i - 256
}

If inp is not valid hex, i is 0 and the function returns -256, producing nonsensical distance. Check the error.

H7. `AutoMigrate` in production

internal/pkg/database/database.go:29 runs db.AutoMigrate(...) on every startup. In production this can silently add columns but never removes or renames them. It can also fail or lock tables depending on DB state. Use a versioned migration tool (golang-migrate, goose, atlas).

H8. No graceful HTTP shutdown timeout

internal/app/server/app.go:141:

a.Server.Shutdown(context.Background())

context.Background() means the shutdown waits forever for in-flight requests. Use a deadline:

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
a.Server.Shutdown(ctx)

H9. `syncTable` panics on models without string `ID` field

internal/pkg/apiclient/updatedb.go:90:

v := reflect.ValueOf(item).FieldByName("ID").String()

If any model passed to syncTable doesn’t have a field named ID, or if ID isn’t a string, this panics at runtime. Use an interface constraint or handle the error.

H10. `fmt.Println` / `fmt.Printf` used in production code

The following files use fmt.Println or fmt.Printf instead of slog:

internal/pkg/common/utils/beacons.go:49,54
internal/pkg/apiclient/auth.go:29,37
internal/pkg/apiclient/data.go:15,22,79,80,91,94,100
internal/pkg/apiclient/utils.go:17
internal/pkg/apiclient/updatedb.go:67

fmt.Println goes to stdout (not the structured log pipeline), has no timestamp, level, or request context, and is invisible to log aggregation. Replace all with slog.Error / slog.Info.

Medium Priority Issues

M1. `Consume` logs errors on context cancellation

internal/pkg/kafkaclient/consumer.go:22-24:

msg, err := r.ReadMessage(ctx)
if err != nil {
    slog.Error("error reading message", "error", err)
    continue
}

When ctx is cancelled (normal shutdown), ReadMessage returns an error. This logs a spurious error on every shutdown. Check ctx.Err() first:

if err != nil {
    if ctx.Err() != nil {
        return
    }
    slog.Error(...)
    continue
}

M2. Database port hardcoded

internal/pkg/database/database.go:17:

"host=%s user=%s password=%s dbname=%s port=5432 sslmode=disable"

Port 5432, sslmode=disable are hardcoded. Both should be configurable via config.Config.

M3. Constant naming convention

Go convention uses CamelCase for exported identifiers. The current naming:

SMALL_CHANNEL_SIZE     = 200
MEDIUM_CHANNEL_SIZE    = 500
LARGE_CHANNEL_SIZE     = 2000
SMALL_TICKER_INTERVAL  = 1 * time.Second
MEDIUM_TICKER_INTERVAL = 2 * time.Second
LARGE_TICKER_INTERVAL  = 5 * time.Second

Should be SmallChannelSize, MediumChannelSize, etc.

M4. Inconsistent route naming

The API mixes two styles:

RPC-style: /reslevis/getGateways, /reslevis/postTracker
REST-style: /configs/beacons, /health, /ready

Pick one style and be consistent. REST convention: use nouns + HTTP verbs (GET /gateways, POST /gateways, DELETE /gateways/{id}).

M5. `getEnv` duplicated

Both internal/pkg/config/config.go and internal/pkg/common/appcontext/context.go define their own getEnv helper. Extract to a shared utility.

M6. Logger writes to CWD-relative file

internal/pkg/logger/logger.go:13:

f, err := os.OpenFile(fname, ...)

fname is just "bridge.log" etc., so the file goes wherever the binary’s CWD happens to be. In a container this is fine; outside containers it is unpredictable. Consider making the log directory configurable.

M7. No metrics or tracing

For production observability, consider adding:

Prometheus metrics: request count, latency histograms, Kafka consumer lag, error rates.
Distributed tracing: propagate trace IDs through Kafka messages and HTTP requests (OpenTelemetry).

The current system is log-only, making it hard to diagnose performance issues and set alerts.

M8. CORS allows all origins

internal/pkg/api/middleware/cors.go:13:

if len(origins) == 0 {
    origins = []string{"*"}
}

The default is * (all origins). For production, restrict to known frontend origins, even behind a firewall.

M9. No Makefile

There is no Makefile or Taskfile for common operations (build, test, lint, docker-compose up). Adding one standardizes the developer workflow and CI pipeline.

Minor / Style Issues

#	File	Issue
S1	`beacons.go:62`	Commented-out debug line `// fmt.Println("no parser can parse: ", beacon)`
S2	`kafkaclient/manager.go`	`KafkaReadersMap` and `KafkaWritersMap` export their `Lock` and map fields but the package is internal; keep unexported
S3	`controller/*`	Parameter named `context` shadows the `context` package import; use `ctx`
S4	`alert_service.go:18`	`DeleteAlertByTrackerID` deletes by `id = ?` not `tracker_id = ?` -- function name is misleading
S5	`model/parser.go:82`	`Parse` uses unsafe type assertions (`.(uint16)`, `.(float64)`) that will panic if `extract` returns `int(0)` on short data
S6	`location/app.go:62-65`	`locTicker` and `healthTicker` have the same interval (`LARGE_TICKER_INTERVAL`), so they always fire together -- either intentional (combine them) or a bug
S7	`bridge/app.go:116`	MQTT publish QoS=0 with retain=true for alerts/trackers -- consider QoS=1 for reliable delivery
S8	`database.go:33-34`	`msg` variable used only to pass to `slog.Info(msg)` -- just inline the string

Test Assessment

What exists

Unit tests for bridge MQTT handling, decoder processing, utils, model, logger, config, and appcontext.
Controller tests using in-memory SQLite, covering CRUD for gateways, trackers, zones, settings.
Integration test stubs for bridge and decoder with mock Kafka writers.
E2E test is a placeholder (skips unless E2E_TEST=1).

Issues

Tests appear to be out of sync with production code. The test files reference model.BeaconAdvertisement and model.Settings, but the production code has these types in appcontext. Controller test functions call controller.GatewayListController(db) with one arg, but production code requires two (db, ctx). This means the tests either do not compile against the current code or test a different version.
No test coverage measurement. There is no CI step to report or enforce coverage.
E2E tests are stubs. The tests/e2e/e2e_test.go does nothing meaningful.

Recommendation: Fix the test compilation issues first, then add coverage reporting, and implement at least one real E2E flow (e.g., publish MQTT message -> verify it reaches the Kafka topic -> verify the decoder processes it).

Summary of Recommended Fix Priority

Priority	Count	Action
Critical	9	Fix before any production deployment
High	10	Fix before GA / first customer
Medium	9	Fix in next sprint
Minor/Style	8	Fix during regular cleanup

The biggest risks are: resource leaks (C1, C5), race conditions (C6, C7), SQL injection (C3), and missing timeouts (C2). Fixing these 6 items alone would significantly improve production readiness.

16 KiB Датотека Пермалинк Blame Историја