How to read this page
Architecture Decisions (12 ADRs) are scored and tiered by impact. Decisions from Lessons Learned (16 entries) show what we learned and what we decided as a result. Expand any entry for details.
Every architectural, design, and process decision is recorded with a 4-dimension scoring model (Cost, Time, Safety, Resume), 2 evaluated alternatives, and explicit rationale. Decisions derived from project experience are framed with their lesson context and key takeaways.
Architecture Decisions (12 ADRs) are scored and tiered by impact. Decisions from Lessons Learned (16 entries) show what we learned and what we decided as a result. Expand any entry for details.
Architecture Decisions
12 ADRs
3 Architecture | 5 Design | 4 Process
From Lessons Learned
16 Decisions
Safety | Infra | Testing | Process
Scoring Model
Cost | Time | Safety | Resume
1-3 each, total /12
Top Score
ADR-005: 10/12
STM32G474RE MCU platform choice
Cost 1 | Time 1 | Safety 1 | Resume 2
Decision: Organize documentation under docs/aspice/ by process area (SYS, SWE, SUP, MAN, HWE)
Why it wins: File-based ALM in Git gives assessor-friendly structure at zero cost with full version control.
Cost 1 | Time 1 | Safety 1 | Resume 1
Decision: Keep docs/plans/master-plan.md as the single strategic plan, with ASPICE execution plans as operational breakdowns
Why it wins: Master plan provides strategic context (architecture, BOM, demos) that execution plans don't cover.
Cost 1 | Time 1 | Safety 1 | Resume 1
Decision: Centralize all external references and research notes in docs/research/
Why it wins: Git-versioned research log keeps provenance with the codebase. Greppable and auditable.
Cost 1 | Time 1 | Safety 1 | Resume 2
Decision: Add progress dashboard, issue log, decision log, and gate readiness checklist to MAN.3
Why it wins: Markdown tracking files in-repo satisfy ASPICE evidence requirements and baseline with the code.
Cost 2 | Time 2 | Safety 3 | Resume 3
Decision: Use 3x STM32G474RE Nucleo-64 boards for CVC, FZC, and RZC zone controllers. 3x FDCAN, 5x ADC, CORDIC+FMAC, 170 MHz Cortex-M4F.
Why it wins: Gains CAN-FD, extra ADCs for dual-sensor plausibility, and HW math accelerators vs F446RE. Arduino Mega disqualified (8-bit/8KB/no CAN).
Cost 1 | Time 2 | Safety 3 | Resume 3
Decision: Custom AUTOSAR Classic-inspired layered BSW: MCAL, EAL, Services, RTE — 16 modules, ~5,000 LOC shared across 3 STM32 ECUs.
Why it wins: 16+ automotive resume keywords, ASPICE SWE.2/SWE.3 compliance, HW abstraction for SIL testing. 10/10 resume impact vs 3/10 bare-metal.
Cost 1 | Time 2 | Safety 1 | Resume 3
Decision: POSIX SocketCAN API as MCAL layer for 3 simulated ECUs in Docker — 100% code reuse between physical and simulated ECUs.
Why it wins: Only option achieving 100% code reuse. Python-can destroys AUTOSAR portability story. Vector CANoe is $10K+.
Cost 1 | Time 1 | Safety 1 | Resume 3
Decision: vsomeip (COVESA/BMW) for service-oriented communication between simulated ECUs and Pi gateway.
Why it wins: 4-7 days for working SOME/IP demo with BMW/COVESA credibility. Custom reimplementation wastes 4-8 weeks.
Cost 1 | Time 1 | Safety 1 | Resume 2
Decision: Run 3 simulated ECUs (BCM, ICU, TCU) in Docker containers with virtual CAN (vcan/vxcan) interfaces.
Why it wins: Same 1-2 days as native processes but adds isolation, reproducibility, and CI/CD portability. QEMU costs 3-15x more setup.
Cost 1 | Time 1 | Safety 3 | Resume 2
Decision: Unity (ThrowTheSwitch) for C unit testing, CCS built-in for TMS570, pytest for Python gateway/cloud.
Why it wins: Pure C, 2-4 hours setup vs 2-4 days gtest. CMock auto-generates mocks. CUnit is abandoned (2018).
Cost 1 | Time 1 | Safety 3 | Resume 2
Decision: CAN 2.0B at 500 kbps for all inter-ECU communication. TMS570 DCAN requires classic CAN. 35% bus utilization — no bandwidth pressure.
Why it wins: Zero-cost baseline. CAN FD adds 8-13 days for unneeded bandwidth. Ethernet requires replacing 3 MCU boards + 6-10 weeks.
Cost 1 | Time 1 | Safety 1 | Resume 3
Decision: AWS IoT Core (MQTT) + Amazon Timestream (time-series DB) + Grafana for cloud telemetry, batched 1 msg/5sec for free tier.
Why it wins: ~$4-7/month with highest automotive credibility (Mercedes, VW, Toyota use AWS). Azure is 4-5x more expensive. Self-hosted lacks resume value.
Decision: Always harden in order: emergency fixes → auth/abuse protection → access control → API hardening → monitoring → verification gate. Never skip to advanced features before basics are locked.
What we learned: Jumping to OAuth and reCAPTCHA before basic auth, rate limiting, and CSRF were solid caused significant rework across 10 phases.
Decision: Always validate against the CI tool version (cppcheck 2.13) before assuming local results (2.17) will pass. Pin tool versions in CI config.
What we learned: cppcheck 2.13 (CI/Ubuntu apt) vs 2.17 (local/pip) had 6 incompatibilities that only surfaced in CI after pushing.
Decision: Use stdint.h fixed-width types (uint32_t) in all host-compiled test code. Never rely on AUTOSAR platform types that assume ILP32.
What we learned: LP64 (host) vs ILP32 (target): uint32 mapped to unsigned int on target but unsigned long on host, causing 99 type mismatch failures in CI.
Decision: Heartbeat alive counter wraps at 4-bit (0-15) per AUTOSAR E2E Profile 1, not 8-bit. Use modular arithmetic and single monitor per ECU.
What we learned: Duplicate heartbeat monitors and 8-bit wrap assumption caused persistent UI flickering in the SIL demo dashboard.
Decision: Define every shared parameter (CAN bit timing, sensor specs, pin assignments) in exactly one canonical document. All other docs reference, never copy.
What we learned: CAN bit timing (87.5% vs 80% sample point) was inconsistent across 4 docs. ACS723 sensitivity was wrong in 5 places.
Decision: Block writes to firmware .c source files unless corresponding test_*.c exists. Enforce AUTOSAR BSW layer build order: MCAL, ECUAL, Services, RTE.
What we learned: Without enforcement, developers skip tests under time pressure. Hook-enforced TDD produced 16 modules with 1,067 tests and zero skipped modules.
Decision: Protect human-authored review comments with HITL-LOCK markers. Use Why/Tradeoff/Alternative structure. Date every review comment for CM traceability.
What we learned: AI reformatted and moved human review comments without markers. 443+ HITL comments across 29 docs would be unmanageable without protection.
Decision: Git + Markdown covers 90% of DOORS/Polarion/Jama functionality at 0% licensing cost. ADR format with 4-dimension scoring enables cross-decision comparison.
What we learned: 7-ECU hybrid architecture (4 physical + 3 simulated) proved that CAN bus doesn't distinguish hardware from Docker — file-based tools scale the same way.
Decision: Skip VBAT fault injection on budget hardware — GND-only + open-circuit covers ~70% of fault modes safely. Add preemptive Zener protection on every channel.
What we learned: VBAT fault injection on unprotected Nucleo boards risks destroying the MCU. A EUR 0.10 Zener diode prevents a EUR 16 board replacement.
Decision: 1oo2D on shared SPI bus is acceptable if detection path exists (CRC + plausibility → safe state). Shared bus failure is a detected multi-point fault. Always document in DFA.
What we learned: Initial design assumed SPI bus independence. HITL review revealed shared SPI is a Common Cause Failure that must be explicitly analyzed.
Decision: CI must block merge on traceability gaps — advisory checks are universally ignored. Stub scripts that pass CI give false confidence; build tooling properly or skip entirely.
What we learned: 4 broken requirement links and 6 untested SWRs were invisible until automated traceability checking found them.
Decision: Always build HARA → Safety Goals → FSR → TSR → SSR. Bottom-up produces rationalization, not defense. FTTI must be derived from mechanisms, not guessed.
What we learned: Attempting to write SSRs before completing HARA led to requirements that didn't trace to any safety goal — wasted effort.
Decision: HAL abstraction designed for unit testing is the same abstraction that enables SIL simulation. SocketCAN uses identical API for vcan and real CAN — swap by interface name only.
What we learned: POSIX SocketCAN backend was added for SIL, but turned out to be the exact same abstraction boundary we needed for host-based unit tests.
Decision: Every fault injection must follow a defined inject → detect → react → recovery sequence. DTC broadcasts need arbitration backoff. ML anomaly baselines vary per operating state.
What we learned: Random fault injection produced unreproducible results. ML anomaly scores stuck at baseline because thresholds didn't account for different operating states.
Decision: SIL tests catch integration issues mocks cannot: ECU-to-ECU communication, E2E protection, fault state transitions. Run nightly in CI. Unknown test step actions must fail loudly.
What we learned: COM TX/RX bridge timing issues and E2E sequence counter mismatches were invisible in unit tests — only surfaced in multi-ECU SIL runs.
Decision: Never trust marketplace listings — always download and verify the manufacturer datasheet for every parameter. Include full part number including variant suffix in BOM.
What we learned: MG996R listed as 360-degree was actually 180-degree. ACS723-05A vs ACS723-20A have 4x different sensitivity (400 vs 100 mV/A). Both caused rework.
| Dimension | 1 (Low) | 2 (Medium) | 3 (High) |
|---|---|---|---|
| Cost | < $50 | $50-$500 | > $500 |
| Time | < 1 week | 1-4 weeks | > 4 weeks |
| Safety | QM | ASIL A-C | ASIL D |
| Resume | Generic | Industry-relevant | Top keyword |
ADR-005
10/12
STM32G474RE MCU platform
ADR-006
9/12
AUTOSAR Classic layered BSW
ADR-007
7/12
SocketCAN simulation MCAL
ADR-010
7/12
Unity + CCS + pytest testing
ADR-011
7/12
CAN 2.0B at 500 kbps
Use this summary together with the Safety Platform page for a complete project overview.