Profile picture
Paul Abrams
Software Engineer
Home
Software development
DevOps
Personal projects
Blog
CV / Resume
LinkedIn
GitHub
Back

Software Development

Professional experience building enterprise web applications, modernizing legacy systems, and delivering working software across the Government of Canada and private sector. A consistent pattern across two decades: noticing a labor-cost burden running silently somewhere outside my immediate assignment — manual regression cycles, weekend availability checks, brittle reports — and quietly automating it away.

TWNR: A Space Trading Game

A web-based remake of the classic Trade Wars 2002 BBS door game: an online multiplayer space trading and combat sandbox. Players warp between sectors, trade commodities at ports, build planets and starbases, and fight other players across a procedurally generated universe.

Architected and built end-to-end as a TypeScript monorepo: an Express + WebSocket server backed by PostgreSQL, and a terminal-style xterm.js client served by Vite.

This is a pre-alpha demo; playable today, but not yet open to public sign-ups.

  • Live Demo
Technologies & tools
TypeScript
Node.js
Express
PostgreSQL
WebSockets
JWT
Vite
xterm.js
Docker
TWNR: A Space Trading Game

LLM Training & Code Evaluation

Training and evaluating Large Language Models on software-development tasks for platforms including DataAnnotation.tech and Stellar.ai. The work spans expert code review of AI-generated output, comprehensive unit and integration testing across multiple languages, and the design of prompt fixtures and rubrics that probe specific model failure modes. A consistent discipline is to anchor evaluations to real codebases at specific revisions rather than synthetic toy problems - real multi-package projects expose failure modes that toy challenges cannot.

Representative task types:

  • Security audits at known-vulnerable revisions. Pin a real OSS project to a specific commit, build a categorized vulnerability inventory (each finding annotated with its primary CWE plus the cousin CWEs commonly mistaken for it), optionally seed extra vulnerabilities, then score competing models on which they find, how accurately they classify them under CWE / CVSS v4.0, and whether their fixes match the maintainer’s. Defense-in-depth versus minimalism is itself a graded dimension.
  • Refactor tasks designed to grade for deletion, not addition. Refactors where success means removing legacy types, duplicated mapping tables, and obsolete protocol shapes - not piling new compatibility shims on top of the old ones.
  • Adversarial unit testing. Prompts where the model has to make a new set of tests pass without breaking the existing regression suite. Cleanly separates the models that fix the implementation from the ones that fix the tests.
  • Cross-stack migration prompts. MongoDB -> PostgreSQL migrations that require correct concurrency (SELECT … FOR UPDATE inside a transaction), JavaScript → TypeScript migrations introducing branded nominal types for physical units, dataframe-library swaps. Probes API discipline, FK integrity, and the discipline of finishing top-down designs bottom-up.
  • Architecture-tracing evaluations. Multi-turn sessions where the model walks through a complex flow across a multi-package codebase. Snapshots are captured at every model error - wrong field, conflated status, fabricated logic, claimed-mandatory steps optional - to score depth-of-trace and confabulation resistance.
  • Multi-hop API chaining (Python). Prompts that force the model to combine several public APIs (catalog endpoints, nested response shapes, multi-criterion search) to satisfy a single data question.
  • Multi-deliverable scientific-Python pipelines. Catalog cross-matches with KDTree, clustering in proper-motion space, isochrone fitting with reddening corrections, PRL-style LaTeX outputs - graded against 30–50 item rubrics covering byte-order conversion, masking discipline, expected scientific results, and academic register.
  • Adversarial deliverable prompts with intentionally messy inputs. Real-world deliverables (PowerPoint reports, BI dashboards) where one input is subtly broken - for example, a CSV exported with a different local timezone than its siblings - paired with a high-specificity rubric (24+ items checking exact numeric values, colors, and label text). Probes domain expertise: a practitioner in the field spots the trap immediately.
  • VBA / Office COM Interop. Enterprise-style challenges exercising document generation, event handling, and integration across Outlook, Word, and PowerPoint — a stack rarely covered by mainstream LLM benchmarks.
  • Skill authoring. Short policy documents (Gemini CLI / Claude Code “skills”) that codify a methodology for a class of task - for example, a test-failure-triage skill that classifies failures before fixing and never bypasses them by .skip-ing or weakening assertions. Each skill is evaluated by running the same task with and without it and grading the delta.
  • Personal codebases as fixtures. Reuse real multi-package projects - including TWNR, the websocket trading game also listed on this site - as evaluation environments. Anchoring tasks to a real game server makes refactor and security probes meaningfully harder than they’d be against a contrived test repo.

Cross-cutting methodology: identical starting state across competing runs (tar/zip snapshots, redacted transcripts), pre-authored ground-truth inventories so the scoring isn’t a moving target, mid-session interruption probes when a model rabbit-holes (interaction quality is graded separately from task success), and a codified definition of “real failure” that excludes formatting nits and source-retrieval differences while penalizing reasoning errors and missed safety considerations heavily.

Recent work for Stellar.ai has also included unit-test contributions for the Discourse forum software, exercising its Docker-heavy build and runtime environment.

Technologies & tools
TypeScript
JavaScript
Python
C#
Go
Rust
VBA
PostgreSQL
Docker
LaTeX

Azure DevOps Web Extensions

Built a series of custom Azure DevOps web extensions and scheduled-pipeline tools to fill specific workflow gaps that the out-of-the-box product didn’t address. First exposure to Node.js, TypeScript, and React professionally - picked up while solving real problems.

  • Build/deploy analytics widget. Configurable by pipeline folder path, the widget recursively considers every build and deployment under that path and renders success-rate charts by team using the ADO Builds API and Charts.js. Lets a team or program lead see at a glance which pipelines are healthy, with no SQL or external BI infrastructure required.
  • People-by-project reporter. ADO tracks user identities at the organization level, not the project level, which makes “who is on which project” surprisingly hard to answer. Built an extension that uses an in-memory hashmap to cross-reference users against project memberships efficiently — the naive query pattern was unusably slow.
  • Weekly work-item digest emailer. ADO had no time-based notification triggers, so I set up scheduled ADO pipelines that ran PowerShell against the work-item REST API and sent digest emails through a departmental SMTP server. A small but heavily-used piece of internal plumbing.
  • VSTS Team Calendar fork (CIC era). Forked Microsoft’s vsts-team-calendar extension and added configurable colors, Outlook integration, release-management features, and IE8 backward-compatibility for the department’s mandated browser. Maintained the fork against upstream changes and contributed back where possible.

Built across multiple departments, with a small handful of contributions made back upstream to Microsoft’s vsts-team-calendar and tfs-cli repos along the way.

Technologies & tools
TypeScript
React
Node.js
Azure DevOps SDK
PowerShell
Charts.js

Legal Case Management System (DOJ)

Multi-year role as a core contributor on the Department of Justice Canada’s flagship Legal Case Management System (iCase) - a distributed enterprise application used by 5,000+ lawyers and agents across Canada to manage sensitive case files, timekeeping, billing, and reporting. The system was multi-tier (web/app/db) with role- and object-based security, integrated with the department’s records management, financial, and email systems, and had to remain available 24/7.

Homepage / “My iCase” portal. Led development of a customizable widget-based portal that became the new entry point to iCase. Gathered requirements directly from the business analyst team, ran JAD sessions, produced prototypes, and authored the BDD/TDD documentation before implementing the application largely solo. Built on ASP.NET Web Parts and the Personalization framework with a custom SQL personalization provider, Knockout.js for MVVM binding, async AJAX web service calls, an async HTTP handler that served images out of cached business objects, Microsoft ReportViewer in remote SSRS mode, and a JavaScript polyfill to make HTML5 work in the department-mandated IE8.

  • Diagnosed a homepage report that took ~10 seconds uncached: profiled it with SQL Profiler, Report Execution logs, and actual execution plans, then replaced the heavy live joins with a flat table populated by a nightly job (the report only needed up-to-yesterday data). Result: roughly 50× faster uncached, 3–4× faster cached.

Reliability and monitoring. Built a self-initiated availability-monitoring suite for the web and document servers because support staff were logging in evenings and weekends to run the same checks manually. Scheduled tests pinged each server, and failures were emailed to the appropriate operations group. Eliminated significant overtime hours and reduced the human-resource risk of relying on volunteer monitoring.

COM Interop memory leak. Diagnosed and resolved a memory leak across the document servers and implemented automated health monitoring to catch regressions.

Reporting modernization. Contributed to the multi-year migration from Crystal Reports to SSRS, including authoring 14 new SSRS reports for the Legal Risk Management module, and provided technical guidance to the reports team on the conversion pattern.

Dynamics CRM successor (2016–2018). Returned to the project as dev lead on the legacy iCase side while concurrently helping build its Microsoft Dynamics CRM replacement. On the iCase side, modified schema and front-end UI to surface migration status to users in real-time, and adjusted the integration stored procedures (icisp_*) consumed by the financial information system. On the Dynamics side, engineered CRM plug-ins, custom actions, and processes; rewrote SQL stored procedures as C# with LINQ inside CRM; and used SSIS for the ETL between the two systems.

Other contributions. Co-led the SQL Server 2005 -> 2012 database upgrade including dev coordination and testing; rebuilt iCase’s UI as a widget-based interface using Knockout.js and ASP.NET to eliminate full-page postbacks and reduce server load; built a self-service icon library so the business team could maintain iCase iconography without developer involvement; contributed to the Internet Explorer 8 compatibility upgrade; prototyped a potential next-generation iCase on ASP.NET MVC3 with Razor and Entity Framework; participated in the GoC GCTools Hackathon; built a Timekeeping Compliance Indicator giving employees and managers a real-time view of progress against expected hours.

Mentoring. Trained co-op students on iCase and the automated test framework over consecutive summers, mentored a junior developer in SSRS and T-SQL on a sister project, and supported a deaf colleague (ASL/LSF first language) through the SQL Server upgrade - a useful exercise in patient, written-first technical collaboration.

Recognition. Department of Justice Team Merit Award (2012, Cost Recovery Process Improvement); Team Merit Award (2011, Chart of Accounts implementation); Team Spirit Award (2009).

Technologies & tools
ASP.NET
C#
SQL Server
Knockout.js
SSRS
JavaScript
Microsoft Dynamics CRM
LINQ
SSIS

Internal QA Tools & Test Automation

Co-architected an internal web application enabling QA staff to author structured test scenarios via a web UI at ESDC. Designed the database schema and developed the C# code bridging the application with Azure DevOps. Integrated with the departmental headless CMS platform.

Engineered ADO pipelines for automated testing, provisioning agent machines to execute parallel Selenium UI tests and seamlessly integrating logged test results back into ADO. Championed the transition from LoadRunner to JMeter, significantly reducing licensing costs.

Technologies & tools
C#
Selenium
ADO Pipelines
PowerShell
JMeter

Automated Test Framework (DOJ)

Identified a critical regression-testing bottleneck at the Department of Justice and voluntarily architected a custom Java-based framework in IBM Rational Functional Tester (RFT) that parsed English-language scenario documents and auto-generated executable test scripts. Originally attempted in VB.NET; pivoted to Java on advice from IBM support, who confirmed the .NET path was effectively unsupported.

Replaced the department’s “all-hands” manual regression cycle - roughly 20–25 testers running for two weeks every release - with one person reviewing automated results in five days. ~98% reduction in release validation effort, eliminated the standing 2-week code freeze per release, and saved an estimated $160,000 per cycle.

  • Wrote a separate test manager in C# and SQL Server to schedule scripts, execute them on RFT VMs, and track results.
  • Built round-trip code generation: not only did the framework turn English scenarios into scripts, it could also reproduce updated English-language scenarios from the script source, so QA staff could keep working in their native medium.
  • Trained a team of QA Specialists to author scenarios and ran consecutive co-op summers training students to extend the framework for new modules.
  • Authored a multi-document knowledge base covering installation, configuration, maintenance, troubleshooting, and test management.
  • Contributed over 500 posts on the IBM Developerworks Automated Functional Testing forums, helping other practitioners with Java/RFT framework patterns.
Technologies & tools
Java
C#
SQL Server
IBM Rational Functional Tester
Test Automation

BDM Onboarding Application

Enhanced and maintained a microservices-architecture application for onboarding personnel at Employment and Social Development Canada’s Benefits Delivery Modernization (BDM) project. Integrated with enterprise IAM in Azure AD and Azure DevOps, mirrored in Entra.

Technologies & tools
C# .NET
Blazor
Azure SQL
Azure Logic Apps
APIM
GraphQL
Azure AD