Software Development

Walmart Senior Software Engineer Makarand Gujarathi on Why "Production-Grade" Is a Claim That Has to Survive Real Data

Walmart Senior Software Engineer Makarand Gujarathi explains why production-grade software must survive real data, edge cases, scale, and honest code review.

Joshua White

June 30, 20265 min read

Google News

A command-line tool that validates CSV files looks finished the moment it runs. It reads a file, checks each row, prints the good rows to one stream and the bad rows to another, and exits with a clean code. Demo it on a tidy spreadsheet, and it is flawless. Then a single cell somewhere in the file contains a company name with a comma, something like "Gujarathi, Inc.", and the tool silently splits that row into two columns. No crash, no error, no warning. The data is now wrong, and no one was told.

That failure does not show up in a demo. It shows up three weeks later in a downstream report that nobody can reconcile.

Makarand Gujarathi, a senior software engineer at Walmart who builds the real-time monitoring systems that watch this exact class of failure at scale, served as a judge at Code Olympics 2026, and his evaluations consistently separated software that runs from software that holds.

Code Olympics 2026, organized by Hackathon Raptors, challenged teams to ship working software in 72 hours under four simultaneous constraints: a core technical rule, a strict line budget, an assigned project domain, and a programming language the team did not get to choose. The result was a field of deliberately small programs, each one forced to make sharp engineering trade-offs in plain sight.

Gujarathi was a natural fit for that kind of review. At Walmart, he designed and operates a real-time monitoring platform that processes between 12 and 15 million operational events and more than 20 million data-processing events every day across a distributed infrastructure, serving over 100,000 daily active users. He has led a monolith-to-microservices decomposition under production pressure and a migration off SQL Server onto a NoSQL data model. His working life is spent watching where systems quietly bend before they break, which is precisely the lens he brought to a batch of 72-hour submissions.

The Difference Between Working and Correct

The most consistent theme in Gujarathi's evaluations was a refusal to accept "it runs" as evidence that a program is right. He treats the words "production-grade" as a testable claim rather than a label, and several of his sharpest comments are about the gap between the two.

The CSV validator above is a real example. He rated its discipline highly, calling it "extremely clean and disciplined submission," "a textbook example of doing a small problem really well," and "very strong finalist-level work." But he did not stop at the praise. He went into the code and found the seam. "This utility has verified reliability gaps that break the 'production-grade' claim," he noted. "You have used strings.Join(row, ",") to reconstruct CSV rows, which will corrupt any field containing commas, quotes, or newlines." He also flagged a minCols constant left in the source as dead code.

This is worth dwelling on, because it captures how Gujarathi reviews. A lesser evaluation would have scored the validator on whether it validated. His scored it on whether it would survive the inputs a CSV tool exists to handle. A CSV tool that breaks on commas inside fields is broken at the one job that distinguishes a real parser from a naive split. The fix is not large, since the standard library reconstructs rows correctly if you let it. But noticing that the team hand-rolled the dangerous version, and naming exactly which inputs would trigger it, is the difference between a reviewer who reads output and one who reads code.

Doing a small problem really well, in his framing, is not the same as doing a small problem completely. The constraint discipline was excellent. The correctness had a hole. Both things were true, and Gujarathi said both.

Reading Failure Modes That Only Appear at the Edges

The second project that drew his attention, Chunkers' ChunkRelay, was one of the most technically ambitious in the batch: a chunked file-transfer engine written in Rust, with SHA-256 integrity verification, LAN peer discovery, a relay path for transfers across the internet, hand-rolled TLS, and terminal QR codes, all inside a tight line budget and three clean operating modes.

Gujarathi credited the engineering directly. "Strong systems project with real utility," he wrote. "Chunked file transfer, integrity verification, and networking-focused design all show solid engineering instincts. Rust is a strong match for this kind of problem, and the implementation appears credible and product-like." For a submission written in an assigned language that punishes sloppy memory and concurrency, that is real recognition.

Then he found the edge case. "The web server's download endpoint matches on a hash prefix rather than the full SHA-256 digest, which is brittle and could collide." Read quickly, that is a one-line note. Read carefully, it is the whole discipline. A file-transfer tool whose entire value proposition is integrity, meaning chunk it, hash it, verify it, was identifying chunks by a truncated piece of the hash instead of the full digest. Under light testing it works perfectly, because two short test files never share a prefix. Under real load, with many files and many chunks, two prefixes collide and the server hands back the wrong bytes. The exact mechanism the project built to guarantee correctness contained a path to silently violate it.

This is the same shape as the CSV bug, and Gujarathi clearly saw the pattern: a system that is correct on the happy path and wrong at the scale it claims to serve. His monitoring work at Walmart is, in a sense, professionally about this category, the failures that are invisible until volume makes them visible. He reviewed these 72-hour projects as if they would have to face that volume, which is the most useful thing a judge can do for a builder who wants to grow past the demo.

When the Logic Is Hardcoded, There Is No System

Not every gap is a subtle edge case. Some are structural, and Gujarathi was equally direct about those.

Coder Army's Future_Predictor was a terminal application that asked a user a few questions and printed a personalized career roadmap with progress bars and motivational framing. It ran. It was friendly. It had personality. And Gujarathi declined to score it well, for a reason he stated plainly: "This is a good beginner Python exercise but lacks the architectural depth and constraint mastery required for a competition submission. Core logic for prediction is pretty much hardcoded and does not provide extensibility and a data model to validate the name predictor."

That comment names the actual problem rather than the surface one. The issue was never that the project was simple. The issue was that there was no model underneath it. The "prediction" was a fixed formula dressed up with randomness, with nothing extensible or data-driven behind the presentation. Strip away the progress bars and there is no system to grow. Gujarathi's standard here is the one a working engineer applies to any feature pitched as intelligent: show me the data model, show me where this extends, show me what makes the output a function of evidence rather than a script. When the answer is "there isn't one," the polish does not rescue it.

The contrast with how he treated genuinely strong work makes the standard clear rather than harsh. He was not penalizing ambition or beginners. He was distinguishing a presentation layer from an architecture, which is exactly the distinction a hiring engineer makes every week.

Calibration, Contradiction, and the Cost of Disagreeing With Your Own Code

DINooo's Know Thyself was, in Gujarathi's read, one of the more thoughtful submissions in the batch: a terminal quiz that scored not just whether you were right but how well-calibrated your confidence was, turning a simple right/wrong checker into a measure of self-knowledge. The architecture earned specific praise across the panel for letting the constraints shape the design, since a three-character naming limit pushed the team toward a compact tuple-key profile lookup that was cleaner than a verbose version would have been.

Two failure modes still surfaced, and they are instructive because they are the kind that pass every casual test. The program hung in an infinite loop on end-of-input, because the handler for an EOFError looped straight back into another input() call with no way out, so a piped or interrupted session never terminated. And the classification logic broke a tie the wrong way: a user who answered every question "fairly sure," landing exactly on the boundary, was labeled an over-claimer when the neutral, realist label fit better.

The deeper note Gujarathi and the panel raised was about trust. The project's write-up claimed randomness had been excluded and that questions ran in a fixed order, but the code imported the random module and shuffled. Contradicting your own running code, as the feedback put it, undercuts an otherwise excellent self-assessment. This is a reliability concern dressed as a documentation nit. In production, the gap between what a system says it does and what it actually does is where incidents are born. A team that writes "no randomness" over code that shuffles has, in miniature, shipped the exact discrepancy that makes on-call engineers distrust their own dashboards. Gujarathi's instinct to flag it, on a project he otherwise rated highly, is the monitoring engineer's reflex applied to a quiz app.

What Excellent Looked Like

Because so much of Gujarathi's review was about finding seams, it is worth being precise about what cleared his bar, since that is where his standard becomes legible.

DynamicDuo's Chess_Clock was the top-scoring project in his batch, and he was unreserved about it: "Very strong use of the constraints. The project feels complete and practical, not just a technical demo. I especially liked how the Bash limitations became part of the design instead of making the result feel broken or unfinished." A flicker-free, real-time chess clock written in 49 lines of pure Bash, every identifier three characters or fewer, with live timing, turn switching, timeout handling, a clean interrupt trap, and a graceful end-of-game summary, it absorbed four hostile constraints and turned them into the shape of the product rather than apologizing for them.

Code Warriors' Sortviz drew his strongest line on craft: the code quality, he said, was the best in the batch, the kind of codebase that would make his shortlist if he were hiring. That is a meaningful sentence from someone who reviews production code for a living, and he reserved it for one project rather than spreading it around.

The pattern across his top marks is the inverse of his criticisms. The projects he rated highest were the ones where the claim and the code agreed, where "production-grade" was not a phrase in a README but a property you could verify by reading the source. Constraint elegance, correctness under real input, and honesty between documentation and behavior: when all three held, he said so without hedging.

A Review Framework Builders Can Use

Gujarathi's evaluations, taken together, amount to a usable checklist for anyone who wants their own work to survive the jump from demo to deployment.

Test the claim, not the path. "It runs" is the beginning of review, not the end. Before calling anything production-grade, identify the inputs the program exists to handle and confirm it handles the ugly ones: the field with a comma, the file at scale, the piped input that never sends a newline.

Never truncate an identity check. If correctness depends on a hash, a key, or a digest, compare the whole thing. Prefix matching is the kind of shortcut that is invisible in testing and catastrophic at volume, and it defeats the very guarantee it was meant to provide.

Demand a data model behind anything "intelligent." If a feature claims to predict, recommend, or analyze, there must be a model and a path to extend it. A hardcoded formula with a presentation layer is a script, not a system, and no amount of interface polish changes that.

Make your documentation agree with your code. The gap between stated and actual behavior is where production incidents live. If the README says one thing and the source does another, fix the discrepancy before you ship. It is a reliability defect, not a cosmetic one.

Delete dead code. A leftover unused constant is small, but it is a signal. It tells a reviewer the author was not reading their own final draft, and reviewers extrapolate.

None of these require more lines, more time, or more cleverness. They require reading your own work the way someone who has to operate it at scale will read it.

Why This Kind of Review Matters

There is a quiet argument running through Gujarathi's Code Olympics evaluations, and it is one the broader industry keeps relearning. The hard part of software is not making it work once. The hard part is making it keep working against inputs you did not anticipate, at volumes you did not test, in conditions you did not control. A 72-hour hackathon compresses that lesson into a sharp form: every submission runs, so the only thing left to judge is whether it would hold.

Engineers who operate systems at the scale Gujarathi does develop a specific kind of vision. They stop trusting the demo and start asking where the edges are, because the edges are where their pager goes off. Turning that vision on student and early-career projects is generous, because it gives builders the feedback that is hardest to get on your own: not "does it work" but "where will it fail, and what does that failure cost." A judge who reads the source, names the exact line, and traces it to the input that will break it is handing a builder a map of the territory between a working demo and a system someone can depend on.

That is the territory most engineering careers are actually spent crossing. Gujarathi spent his hours as a Code Olympics judge handing out the map.

Code Olympics 2026 was organized by Hackathon Raptors, a Community Interest Company supporting innovation in software development. The event challenged teams to build working software across 72 hours under four simultaneous constraints: a core technical rule, a line budget, an assigned project domain, and a programming language teams did not choose. Makarand Gujarathi served as a judge evaluating projects for functionality, constraint mastery, code quality, and innovation.

Newsletter

From obsession to clarity — one original question every week.

We answer one noisy topic at a time, in full. No daily roundup, no thread bait — just the question, the principles, and the system.

Continue reading

Lead Software Testing Expert and CNCF Kubestronaut Iuliia Kozlova on Why a Slop Detector You Cannot Run Is a Slop Detector You Cannot Trust

Featured image for investigation-management-platforms-government-teams

Software Development

Important Features Government Teams Consider in Investigation Management Platforms

Software Development

The Difference Between Working and Correct

Reading Failure Modes That Only Appear at the Edges

When the Logic Is Hardcoded, There Is No System

Calibration, Contradiction, and the Cost of Disagreeing With Your Own Code

What Excellent Looked Like

A Review Framework Builders Can Use

Why This Kind of Review Matters

From obsession to clarity — one original question every week.

Continue reading

Lead Software Testing Expert and CNCF Kubestronaut Iuliia Kozlova on Why a Slop Detector You Cannot Run Is a Slop Detector You Cannot Trust

Important Features Government Teams Consider in Investigation Management Platforms

Your Guide to Delivering Reliable Software Through Quality Assurance