Project 2 - Decision Story

Decision Log & Lessons Learned28 decisions: 12 ADRs + 16 from lessons learned

Every architectural, design, and process decision is recorded with a 4-dimension scoring model (Cost, Time, Safety, Resume), 2 evaluated alternatives, and explicit rationale. Decisions derived from project experience are framed with their lesson context and key takeaways.

How to read this page

Architecture Decisions (12 ADRs) are scored and tiered by impact. Decisions from Lessons Learned (16 entries) show what we learned and what we decided as a result. Expand any entry for details.

Open Safety Platform Open Project Hub

Total Decisions: 28

Architecture ADRs: 12 (scored)

Lesson Decisions: 16 (4 categories)

Architecture Decisions

12 ADRs

3 Architecture | 5 Design | 4 Process

From Lessons Learned

16 Decisions

Safety | Infra | Testing | Process

Scoring Model

Cost | Time | Safety | Resume

1-3 each, total /12

Top Score

ADR-005: 10/12

STM32G474RE MCU platform choice

Architecture Decisions

Filter by tier:

T45/12ADR-001: Structure docs under ASPICE process areas

Cost 1 | Time 1 | Safety 1 | Resume 2

Decision: Organize documentation under docs/aspice/ by process area (SYS, SWE, SUP, MAN, HWE)

Why it wins: File-based ALM in Git gives assessor-friendly structure at zero cost with full version control.

T44/12ADR-002: Keep master-plan.md as source baseline

Cost 1 | Time 1 | Safety 1 | Resume 1

Decision: Keep docs/plans/master-plan.md as the single strategic plan, with ASPICE execution plans as operational breakdowns

Why it wins: Master plan provides strategic context (architecture, BOM, demos) that execution plans don't cover.

T44/12ADR-003: Create central docs/research/ repository

Cost 1 | Time 1 | Safety 1 | Resume 1

Decision: Centralize all external references and research notes in docs/research/

Why it wins: Git-versioned research log keeps provenance with the codebase. Greppable and auditable.

T45/12ADR-004: Add MAN.3 live tracking set

Cost 1 | Time 1 | Safety 1 | Resume 2

Decision: Add progress dashboard, issue log, decision log, and gate readiness checklist to MAN.3

Why it wins: Markdown tracking files in-repo satisfy ASPICE evidence requirements and baseline with the code.

T110/12ADR-005: STM32G474RE Nucleo for 3 zone ECUs

Cost 2 | Time 2 | Safety 3 | Resume 3

Decision: Use 3x STM32G474RE Nucleo-64 boards for CVC, FZC, and RZC zone controllers. 3x FDCAN, 5x ADC, CORDIC+FMAC, 170 MHz Cortex-M4F.

Why it wins: Gains CAN-FD, extra ADCs for dual-sensor plausibility, and HW math accelerators vs F446RE. Arduino Mega disqualified (8-bit/8KB/no CAN).

T19/12ADR-006: AUTOSAR Classic layered BSW

Cost 1 | Time 2 | Safety 3 | Resume 3

Decision: Custom AUTOSAR Classic-inspired layered BSW: MCAL, EAL, Services, RTE — 16 modules, ~5,000 LOC shared across 3 STM32 ECUs.

Why it wins: 16+ automotive resume keywords, ASPICE SWE.2/SWE.3 compliance, HW abstraction for SIL testing. 10/10 resume impact vs 3/10 bare-metal.

T27/12ADR-007: POSIX SocketCAN for simulated ECU MCAL

Cost 1 | Time 2 | Safety 1 | Resume 3

Decision: POSIX SocketCAN API as MCAL layer for 3 simulated ECUs in Docker — 100% code reuse between physical and simulated ECUs.

Why it wins: Only option achieving 100% code reuse. Python-can destroys AUTOSAR portability story. Vector CANoe is $10K+.

T26/12ADR-008: BMW vsomeip for SOME/IP demo

Cost 1 | Time 1 | Safety 1 | Resume 3

Decision: vsomeip (COVESA/BMW) for service-oriented communication between simulated ECUs and Pi gateway.

Why it wins: 4-7 days for working SOME/IP demo with BMW/COVESA credibility. Custom reimplementation wastes 4-8 weeks.

T25/12ADR-009: Docker containers for simulated ECU runtime

Cost 1 | Time 1 | Safety 1 | Resume 2

Decision: Run 3 simulated ECUs (BCM, ICU, TCU) in Docker containers with virtual CAN (vcan/vxcan) interfaces.

Why it wins: Same 1-2 days as native processes but adds isolation, reproducibility, and CI/CD portability. QEMU costs 3-15x more setup.

T27/12ADR-010: Unity + CCS Test + pytest for testing

Cost 1 | Time 1 | Safety 3 | Resume 2

Decision: Unity (ThrowTheSwitch) for C unit testing, CCS built-in for TMS570, pytest for Python gateway/cloud.

Why it wins: Pure C, 2-4 hours setup vs 2-4 days gtest. CMock auto-generates mocks. CUnit is abandoned (2018).

T17/12ADR-011: CAN 2.0B at 500 kbps (no CAN FD)

Cost 1 | Time 1 | Safety 3 | Resume 2

Decision: CAN 2.0B at 500 kbps for all inter-ECU communication. TMS570 DCAN requires classic CAN. 35% bus utilization — no bandwidth pressure.

Why it wins: Zero-cost baseline. CAN FD adds 8-13 days for unneeded bandwidth. Ethernet requires replacing 3 MCU boards + 6-10 weeks.

T26/12ADR-012: AWS IoT Core + Timestream + Grafana

Cost 1 | Time 1 | Safety 1 | Resume 3

Decision: AWS IoT Core (MQTT) + Amazon Timestream (time-series DB) + Grafana for cloud telemetry, batched 1 msg/5sec for free tier.

Why it wins: ~$4-7/month with highest automotive credibility (Mercedes, VW, Toyota use AWS). Azure is 4-5x more expensive. Self-hosted lacks resume value.

Decisions from Lessons Learned

Filter by category:

SafetyLL-001: Execute security hardening in strict phase order

Decision: Always harden in order: emergency fixes → auth/abuse protection → access control → API hardening → monitoring → verification gate. Never skip to advanced features before basics are locked.

What we learned: Jumping to OAuth and reCAPTCHA before basic auth, rate limiting, and CSRF were solid caused significant rework across 10 phases.

Fail-closed pattern: reject on missing config, never silently skip auth
Use generic SMTP from day one — vendor SDK lock-in (Resend) cost multiple phases to remove
Audit marketing claims against code BEFORE launch — stats go stale, absolutes are wrong

InfrastructureLL-002: Test CI tool version first, not local

Decision: Always validate against the CI tool version (cppcheck 2.13) before assuming local results (2.17) will pass. Pin tool versions in CI config.

What we learned: cppcheck 2.13 (CI/Ubuntu apt) vs 2.17 (local/pip) had 6 incompatibilities that only surfaced in CI after pushing.

No # comments in suppressions.txt — cppcheck 2.13 can't parse them
.gitignore blocking suppressions.txt — use ! exception when ignoring directory patterns
1,536 to 0 violations in 1 day with systematic triage: mandatory first, required next, advisory last

TestingLL-003: Use stdint.h types in host tests, not platform types

Decision: Use stdint.h fixed-width types (uint32_t) in all host-compiled test code. Never rely on AUTOSAR platform types that assume ILP32.

What we learned: LP64 (host) vs ILP32 (target): uint32 mapped to unsigned int on target but unsigned long on host, causing 99 type mismatch failures in CI.

Source inclusion pattern for Unity tests — include .c files, don't link; standard embedded TDD
Mock state must be cleared in setUp() — zero ALL mock globals before each test, no exceptions
Header guard collisions across test files (DEM_H, E2E_H) — prefix guards with TEST_ namespace

InfrastructureLL-004: Use 4-bit modular arithmetic for heartbeat alive counters

Decision: Heartbeat alive counter wraps at 4-bit (0-15) per AUTOSAR E2E Profile 1, not 8-bit. Use modular arithmetic and single monitor per ECU.

What we learned: Duplicate heartbeat monitors and 8-bit wrap assumption caused persistent UI flickering in the SIL demo dashboard.

Docker Compose v2: 'docker compose' (space) not 'docker-compose' (hyphen) — breaks scripts silently
NET_RAW capability required for CAN in containers — not obvious from error messages
RTE_MAX_SIGNALS too small causes silent init failure — validate signal count at startup

ProcessLL-005: Single source of truth for cross-document parameters

Decision: Define every shared parameter (CAN bit timing, sensor specs, pin assignments) in exactly one canonical document. All other docs reference, never copy.

What we learned: CAN bit timing (87.5% vs 80% sample point) was inconsistent across 4 docs. ACS723 sensitivity was wrong in 5 places.

CAN IDs most duplicated data — grep project-wide after any change to catch mismatches
MG996R servo: 180-degree, not 360-degree (Amazon BOM listing error) — always verify datasheet
Cross-doc review checklist catches what per-doc review misses

TestingLL-006: Enforce test-first with pre-commit hook for all BSW modules

Decision: Block writes to firmware .c source files unless corresponding test_*.c exists. Enforce AUTOSAR BSW layer build order: MCAL, ECUAL, Services, RTE.

What we learned: Without enforcement, developers skip tests under time pressure. Hook-enforced TDD produced 16 modules with 1,067 tests and zero skipped modules.

Table-driven state machines and dual sensor plausibility patterns proven across all BSW modules
E2E as standalone module: dependency-free, reusable across physical and simulated ECUs
Phase-based execution with status table keeps momentum and prevents scope creep

ProcessLL-007: Use HITL-LOCK markers for human-AI co-development reviews

Decision: Protect human-authored review comments with HITL-LOCK markers. Use Why/Tradeoff/Alternative structure. Date every review comment for CM traceability.

What we learned: AI reformatted and moved human review comments without markers. 443+ HITL comments across 29 docs would be unmanageable without protection.

HITL-LOCK markers are immutable once set — AI must never edit, reformat, or delete locked content
Per-requirement lessons-learned files (SYS-NNN) keep insights linked to their source
Consolidated lessons-learned folder better than scattered per-doc files

ProcessLL-008: Use file-based ALM over heavyweight tools

Decision: Git + Markdown covers 90% of DOORS/Polarion/Jama functionality at 0% licensing cost. ADR format with 4-dimension scoring enables cross-decision comparison.

What we learned: 7-ECU hybrid architecture (4 physical + 3 simulated) proved that CAN bus doesn't distinguish hardware from Docker — file-based tools scale the same way.

Decision audit script catches undocumented decisions via heuristic grep
Tier assignment (T1-T4) follows from scores, not arbitrary classification
Zonal architecture decision driven by resume value + learning opportunity, not just technical merit

SafetyLL-009: Use GND-only fault injection for budget HIL setups

Decision: Skip VBAT fault injection on budget hardware — GND-only + open-circuit covers ~70% of fault modes safely. Add preemptive Zener protection on every channel.

What we learned: VBAT fault injection on unprotected Nucleo boards risks destroying the MCU. A EUR 0.10 Zener diode prevents a EUR 16 board replacement.

DIY HIL is legitimate beyond hobby tier — 7-ECU AUTOSAR with 1,067 tests is industry-relevant
Relay-based fault injection matrix: MUX topology reduces relay count from N*M to N+M
Needle-bed adapter pattern from professional HIL benches scales down to budget hardware

SafetyLL-010: Document shared SPI bus as CCF in Dependent Failure Analysis

Decision: 1oo2D on shared SPI bus is acceptable if detection path exists (CRC + plausibility → safe state). Shared bus failure is a detected multi-point fault. Always document in DFA.

What we learned: Initial design assumed SPI bus independence. HITL review revealed shared SPI is a Common Cause Failure that must be explicitly analyzed.

Both CRCs failing simultaneously = detected fault, not undetected — makes the safety argument
DMA/driver complexity is not a valid argument against SPI1+SPI2 — config is trivial copy-paste
Separate SPI buses justified only if ASIL decomposition requires proven independence

ProcessLL-011: Block CI merge on traceability gaps, not just warn

Decision: CI must block merge on traceability gaps — advisory checks are universally ignored. Stub scripts that pass CI give false confidence; build tooling properly or skip entirely.

What we learned: 4 broken requirement links and 6 untested SWRs were invisible until automated traceability checking found them.

trace-gen.py validates SG → FSR → TSR → SSR → code → test chain completeness
Orphan requirements (no implementation) and orphan tests (no requirement) both caught
Traceability report generated on every CI run — assessors can verify at any time

SafetyLL-012: Build safety case top-down from HARA, never bottom-up

Decision: Always build HARA → Safety Goals → FSR → TSR → SSR. Bottom-up produces rationalization, not defense. FTTI must be derived from mechanisms, not guessed.

What we learned: Attempting to write SSRs before completing HARA led to requirements that didn't trace to any safety goal — wasted effort.

S3/E4/C3 = ASIL D for drive-by-wire — the matrix numbers are unambiguous, no debate needed
Safety goals need FTTI justification: detect_time + react_time derived from concrete mechanisms
GSN (Goal Structuring Notation) makes the argument structure visible and auditable

InfrastructureLL-013: Design HAL for testability — it enables SIL for free

Decision: HAL abstraction designed for unit testing is the same abstraction that enables SIL simulation. SocketCAN uses identical API for vcan and real CAN — swap by interface name only.

What we learned: POSIX SocketCAN backend was added for SIL, but turned out to be the exact same abstraction boundary we needed for host-based unit tests.

POSIX backend files are MISRA-exempt — blanket suppress rules 21.5, 21.6, 21.8, 21.10, 17.7 from day one
vcan module auto-load + error clearing on container restart prevents stale CAN state
100% firmware code reuse between physical ECU and Docker container validates the abstraction

InfrastructureLL-014: Make fault injection sequences deterministic and state-aware

Decision: Every fault injection must follow a defined inject → detect → react → recovery sequence. DTC broadcasts need arbitration backoff. ML anomaly baselines vary per operating state.

What we learned: Random fault injection produced unreproducible results. ML anomaly scores stuck at baseline because thresholds didn't account for different operating states.

DTC broadcast needs randomized 0-50ms backoff to prevent CAN bus flooding from simultaneous DTCs
E-STOP event spam on reset — need proper fault clear sequence before re-enabling
Controller-viewer lock pattern prevents multi-user conflicts in live demo

TestingLL-015: Run SIL integration tests nightly — mocks miss real integration bugs

Decision: SIL tests catch integration issues mocks cannot: ECU-to-ECU communication, E2E protection, fault state transitions. Run nightly in CI. Unknown test step actions must fail loudly.

What we learned: COM TX/RX bridge timing issues and E2E sequence counter mismatches were invisible in unit tests — only surfaced in multi-ECU SIL runs.

COM TX/RX bridge is the most critical SIL component — build and test it before writing scenarios
Fail-closed applies to test infrastructure too — unknown step actions must error, not skip
Plant simulator thermal model needs tuning per operating point, not one-size-fits-all

InfrastructureLL-016: Verify every component against manufacturer datasheet before ordering

Decision: Never trust marketplace listings — always download and verify the manufacturer datasheet for every parameter. Include full part number including variant suffix in BOM.

What we learned: MG996R listed as 360-degree was actually 180-degree. ACS723-05A vs ACS723-20A have 4x different sensitivity (400 vs 100 mV/A). Both caused rework.

SOT-23 components need breakout boards — check package vs assembly method before ordering
Lead time for automotive-grade components can be 8-12 weeks — order early, prototype with commercial grade
BOM must include: full part number, manufacturer, distributor link, datasheet link, package type

Scoring Model

Dimension	1 (Low)	2 (Medium)	3 (High)
Cost	< $50	$50-$500	> $500
Time	< 1 week	1-4 weeks	> 4 weeks
Safety	QM	ASIL A-C	ASIL D
Resume	Generic	Industry-relevant	Top keyword

Decision Process

Captured

Problem and constraints documented
At least 2 alternatives listed with effort estimates

Evaluated

Cost/Time/Safety/Resume scored (1-3 each)
Tradeoff rationale written with explicit comparison

Baselined

Approved with ADR-NNN identifier and tier classification
Linked to implementation evidence in code and docs

Why This Log Is Strong

Every decision has 2 alternatives with effort estimates

Not just 'we chose X' — each ADR shows what was rejected and why
Effort in hours + dollars makes tradeoffs concrete
Auditable by any ASPICE assessor or interviewer

4-dimension scoring enables cross-ADR comparison

Cost, Time, Safety, Resume consistently scored 1-3
ADR-005 (MCU, 10/12) vs ADR-003 (research repo, 4/12) shows clear priority ranking
Tier assignment (T1-T4) follows from scores — not arbitrary

Decisions trace to execution evidence

ADR-006 (AUTOSAR BSW) traces to 16 implemented modules and 195 tests
ADR-005 (STM32G474RE) traces to 6 CVC SWCs and 88 tests
16 lesson-derived decisions link experience back to adopted project standards

Top Scoring Decisions

ADR-005

10/12

STM32G474RE MCU platform

ADR-006

9/12

AUTOSAR Classic layered BSW

ADR-007

7/12

SocketCAN simulation MCAL

ADR-010

7/12

Unity + CCS + pytest testing

ADR-011

7/12

CAN 2.0B at 500 kbps

Related Portfolio Pages

Use this summary together with the Safety Platform page for a complete project overview.

Open Safety Platform Open Project Hub