Next-Gen Candidate Judge

What is Next-Gen Candidate Judge?

Next-Gen Candidate Judge is a candidate evaluation platform that provisions isolated Docker-based workspaces for each task attempt, evaluates submissions through multiple judging strategies, and provides real-time feedback via WebSocket — all wrapped in a modern, role-based UI.

Think of it as a self-hosted HackerRank alternative where admins define tasks with Docker environments, and candidates get sandboxed workspaces with SSH access, timers, and automated scoring with attempt-based penalty systems.

The Problem

Traditional candidate evaluation workflows suffer from:

No isolation — candidates run code on shared environments, risking interference
Manual grading — evaluators spend hours checking submissions by hand
No real-time feedback — candidates wait indefinitely for results
Rigid evaluation — one-size-fits-all question formats don’t test diverse skills

I wanted a platform where each candidate gets their own ephemeral Docker workspace with SSH access, works on real-world tasks in a sandboxed environment, and gets evaluated automatically through configurable judging pipelines.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                   Frontend (React + Inertia.js)         │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌────────┐  │
│  │ Dashboard│  │ Task CRUD │  │ Workspace│  │ Users  │  │
│  │ (Admin/  │  │  (Admin)  │  │  (User)  │  │ Mgmt   │  │
│  │Candidate)│  │           │  │          │  │(Admin) │  │
│  └──────────┘  └───────────┘  └──────────┘  └────────┘  │
└────────────────────────┬────────────────────────────────┘
                         │ Inertia Protocol
┌────────────────────────┴────────────────────────────────┐
│                  Backend (Laravel 12)                   │
│  ┌───────────────┐  ┌──────────────┐  ┌───────────────┐ │
│  │  Controllers  │  │   Services   │  │  Job Chains   │ │
│  │ (6 controllers│  │ (Workspace,  │  │ (Server Prov, │ │
│  │   + Dashboard)│  │  Judge, AI,  │  │  Workspace    │ │
│  │               │  │  Provision)  │  │  Provisioning)│ │
│  └───────────────┘  └──────────────┘  └───────┬───────┘ │
│                                               │         │
│  ┌───────────────┐  ┌──────────────┐  ┌───────┴───────┐ │
│  │  Broadcasting │  │   Models     │  │  Queue Worker │ │
│  │  (Pusher/     │  │  (12 models, │  │  (Redis +     │ │
│  │   Soketi)     │  │   3 enums)   │  │   Horizon)    │ │
│  └───────────────┘  └──────────────┘  └───────────────┘ │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────┴────────────────────────────────┐
│                  Infrastructure                         │
│  ┌──────────┐  ┌───────────┐  ┌────────────┐            │
│  │  Docker  │  │  Traefik  │  │ Remote SSH │            │
│  │ Compose  │  │  (Reverse │  │  Servers   │            │
│  │ Per-Task │  │   Proxy)  │  │            │            │
│  └──────────┘  └───────────┘  └────────────┘            │
└─────────────────────────────────────────────────────────┘

Key Features

1. Docker-Based Sandboxed Workspaces

Each task can define a docker-compose.yaml template with placeholders. When a candidate starts a task, the system:

Creates a Linux user on the target server
Finds a free port for the container
Runs pre-provisioning scripts
Fills the Docker Compose template with dynamic data (domain, ports, paths)
Starts the Docker Compose stack
Runs post-provisioning scripts
Optionally configures SSH access into the container
Finalizes and marks the workspace as ready

All of this happens through a Laravel job chain with real-time progress tracking via WebSocket.

2. Multi-Type Judging System

The platform supports four distinct judging strategies, each implementing a JudgeInterface:

Judge Type	How It Works	Use Case
QuizJudge	Multiple-choice questions with correct answers stored server-side	Knowledge checks, MCQs
TextJudge	Exact text matching against expected answers	CLI output verification, specific commands
AiJudge	Sends answers to OpenAI API with custom prompts for semantic evaluation	Open-ended questions, code review
AutoJudge	Runs a custom judge script on the server (WIP)	Custom automated testing

Each judge returns a standardized result with score, details per question, and a lock recommendation.

3. Attempt-Based Penalty System

Scores degrade with each attempt to incentivize preparation:

Attempt 1: 100% of max score available
Attempt 2: 90% of max score
Attempt 3: 80% of max score
…and so on

When the next attempt’s max possible score drops below 20% of the task’s total score, the task gets locked for that candidate. Tasks also lock on 100% correct answers (success lock).

Within each attempt, candidates get a configurable number of submissions (default: 3) before the attempt is marked as failed.

4. Real-Time WebSocket Progress

The platform uses Laravel Broadcasting with Pusher/Soketi for real-time updates:

Server provisioning progress — admins see each provisioning step update live
Workspace provisioning progress — candidates see their workspace being set up
Job run status updates — script execution status broadcasts in real-time

Three dedicated broadcast channels with authorization:

server-updates.{serverId} — admin only
workspace-updates.{attemptId} — owner or admin
job-runs-updated — admin only

5. Server Provisioning Pipeline

Admins can register remote Linux servers and provision them automatically. The provisioning pipeline:

StartProvisioningJob — Validates SSH connectivity
UpdateServerPackageJob — apt update && apt upgrade
InstallNecesseryPackagesJob — Core system packages
InstallDockerJob — Docker Engine + Docker Compose
UpdateServerFirewallJob — UFW firewall configuration
InstallAndSetupTraefikJob — Traefik reverse proxy with Cloudflare DNS + Let’s Encrypt SSL

Each step uses Blade templates for script generation and executes via SSH using the ScriptEngine service.

6. Role-Based Dashboard

The dashboard renders different views based on user role:

Admin Dashboard:

Total users, tasks, and submissions at a glance
Active vs inactive task breakdown
Recent activity table showing all candidate submissions

Candidate Dashboard:

Remaining, completed, and locked task counts
Total score earned
Completion progress bar
Recent submission history with attempt counts

Tech Stack

Backend

Laravel 12 — PHP framework with Fortify auth, Horizon queues, Inertia SSR
Laravel Horizon — Redis-powered queue dashboard for monitoring background jobs
Spatie Laravel Permission — Role & permission management (admin/user)
Pusher / Soketi — WebSocket broadcasting for real-time events
Predis — Redis client for queues and caching

Frontend

React 19 — UI library with TypeScript
Inertia.js — SPA-like navigation without building an API
Tailwind CSS 4 — Utility-first styling
Radix UI — Accessible component primitives (Dialog, Select, Tabs, Progress, etc.)
CodeMirror — In-browser YAML editor for Docker Compose templates
date-fns — Lightweight date formatting
Sonner — Toast notifications
Lucide React — Icon library

Infrastructure

Docker Compose — Per-task container orchestration with templated YAML
Traefik — Reverse proxy with automatic SSL via Cloudflare DNS challenge
SSH/Script Engine — Remote command execution for server and workspace provisioning

Database Schema

The system uses 12 Eloquent models across 26 migrations:

User
 ├── hasMany → UserTaskAttempt (attempts)
 ├── hasMany → TaskUserLock (task locks)
 ├── hasMany → Server (servers)
 └── hasMany → ScriptJobRun (job runs)

Task
 ├── belongsTo → Server
 ├── hasMany → UserTaskAttempt (attempts)
 ├── hasMany → QuizJudge → hasMany → QuizQuestionAnswer
 ├── hasMany → TextJudge
 ├── hasMany → AiJudge
 ├── hasOne  → AutoJudge
 ├── hasMany → TaskUserLock (locked users)
 └── hasMany → ScriptJobRun (job runs)

UserTaskAttempt
 ├── belongsTo → User
 ├── belongsTo → Task
 ├── hasMany → ScriptJobRun (job runs)
 └── hasMany → UserTaskAttemptAnswer

Server
 └── belongsTo → User

Key enums:

AttemptTaskStatus — pending, preparing, started, running, evaluating, completed, done, failed, attempted_failed, terminated, locked
ScriptJobStatus — pending, running, in-progress, completed, failed
TaskUserLockStatus — completed, penalty

Project Structure

app/
├── Contracts/          # JudgeInterface, TracksProgressInterface
├── Enums/              # AttemptTaskStatus, ScriptJobStatus, TaskUserLockStatus
├── Events/             # 6 broadcast events (Server, Workspace, ScriptJobRun)
├── Http/Controllers/   # Dashboard, Task, UserTask, Server, User, ScriptJobRun
├── Interfaces/         # SolutionCheckerInterface
├── Jobs/Scripts/
│   ├── Server/         # 7 server provisioning jobs
│   └── Workspace/      # 8 workspace provisioning jobs
├── Models/             # 12 Eloquent models
├── Services/
│   ├── AI/             # OpenAIService (GPT integration)
│   ├── JudgeServices/  # Quiz, Text, AI, Auto judge services
│   ├── Progress/       # WorkflowRegistry
│   └── ...             # ScriptEngine, WorkspaceService, ServerProvisionService
└── Traits/             # HasMeta, TracksProgress, NotesAccessor

resources/js/
├── components/         # 60+ React components (UI primitives + feature components)
│   ├── ui/             # Radix-based shadcn/ui components
│   ├── AdminDashboard.tsx
│   ├── CandidateDashboard.tsx
│   ├── WorkflowProgressTracker.tsx
│   ├── ServerProvisionProgressTracker.tsx
│   ├── WorkspaceProgressTracker.tsx
│   ├── SubmissionResultModal.tsx
│   ├── QuizProgressModal.tsx
│   └── AiJudgeProgressModal.tsx
├── pages/
│   ├── auth/           # Login, Register, 2FA, Password reset
│   ├── dashboard.tsx   # Role-based dashboard routing
│   ├── tasks/          # CRUD pages (admin)
│   ├── user/tasks/     # Task list + attempt workspace (candidate)
│   ├── servers/        # Server CRUD + provisioning view
│   ├── users/          # User management + attempt analytics
│   ├── jobs/           # Script job run viewer
│   └── settings/       # Profile, password, appearance, 2FA
└── layouts/            # App shell with sidebar navigation

Development Timeline

Date	Milestone
Nov 3, 2025	Project initialized with Laravel 12 + React starter kit
Nov 5	Server provisioning with Docker integration, SSH credential management
Nov 6	Script job execution engine, workspace provisioning pipeline, traits system
Nov 7	Traefik reverse proxy working, judge configuration support (Quiz, Text, AI, Auto)
Nov 9	HTTPS routing fix, SSH access to containers, task show page redesign, TextJudge evaluation working
Nov 10	QuizJudge + AiJudge services, server provision progress tracking, edit page fixes
Nov 11	Enum-based status system, phone registration, code cleanup (removed 6+ unused files), Pusher/Soketi WebSocket
Nov 12	Full WebSocket integration, Jobs page redesign, User Management module
Nov 18	Real-time workflow progress tracking via WebSocket, CodeMirror YAML editor
Nov 19	Pre/post script execution steps, ScriptJobStatus enum migration
Nov 23-24	Timer system, submission count tracking, user task view + analytics
Feb 17, 2026	Role-based dashboard (Admin + Candidate views)

Challenges & Learnings

Dynamic Docker Compose Templating

Each task defines a Docker Compose YAML with {{placeholders}}. The system fills these at runtime with attempt-specific data (domains, ports, paths). Getting Traefik routing to work correctly with dynamic subdomains and Cloudflare DNS was one of the trickier parts.

Job Chain Orchestration with Progress Tracking

Laravel’s Bus::chain() dispatches jobs sequentially, but I needed each job to report its progress back to the frontend in real-time. The solution was a TracksProgress trait that stores workflow state in a JSON metadata column and broadcasts updates via WebSocket after each step.

Attempt Penalty & Locking Logic

The scoring system needed to handle multiple edge cases: successful locks vs penalty locks, submission limits within attempts, and preventing race conditions when starting new attempts (solved with lockForUpdate()).

What’s Next

AutoJudge implementation — Run custom judge scripts inside the container to validate solutions
Leaderboard — Real-time ranking of candidates by total score
Bulk task import/export — JSON/YAML format for task definitions
Container resource limits — CPU/memory constraints per workspace
Automated cleanup — Cron-based cleanup of expired workspaces and containers