Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes

AI
Education
Programming
Developer Tools

The article reports that failing grades spiked in a handful of UC Berkeley computer science courses, especially introductory ones, and quotes instructors who see two linked problems. Students are using AI to complete homework and take-home exams, and many are arriving with weaker mathematical foundations than those courses assume. One cited example was a prerequisite linear algebra class that reportedly allowed open internet and open AI use on both homework and exams. Berkeley instructors also pushed back on grade curves, arguing that curving hides a real drop in mastery and turns a standards problem into a sorting exercise.

Most comments landed on a blunt point: the issue is not that AI exists, it is that students are using it to skip the hard reps that make knowledge stick. People drew the same distinction over and over. LLMs can work as tutors, sparring partners, quiz generators, worksheet builders, or critics of your own draft. They are destructive when used as answer machines. Several commenters said they feel this in their own work already. They can ship more with coding agents, but their patience, attention span, memory for details, and willingness to wrestle with difficult problems all degrade if they let the model carry too much of the load. The loss people worry about is less factual recall than the ability to sit with confusion long enough to build new mental models. The strongest pushback was against the article’s framing, not the broader concern. Several commenters noted that the piece appears to cherry-pick a few Berkeley classes and small cohorts, especially spring sections of intro courses, rather than showing a campus-wide rise across all CS offerings. Others argued the article mixes together three different failures: students being caught cheating, students arriving underprepared in math, and students failing in-person assessments after outsourcing practice to AI. A large side debate focused on UC admissions policy. Many commenters think dropping SAT and ACT requirements weakened the math floor for STEM entrants, while others said that cannot explain a sudden 2026 jump and that the immediate change is better explained by AI becoming good enough in the last two years to let students coast through prerequisites. COVID learning loss, long-running declines in attention, and anti-intellectual drift were also treated as contributing factors. Where the comments ended up is pretty practical. Homework can no longer be trusted as a measurement tool. Ungraded homework, frequent quizzes, oral exams, in-person practical tests, and assignments where students must explain tradeoffs are all more credible than take-home deliverables. For working engineers, the same rule applies. Using LLMs after you already know the domain can be a real force multiplier. Using them before you have the basics turns you into a code reviewer of work you barely understand. The thread was gloomy about current incentives, but not confused about the fix: if the goal is learning, the system has to reward doing the thinking, not merely submitting something that passes.

If you run a team, school, or training program, stop treating delivered output as proof of competence. Shift evaluation toward in-person work, oral explanation, and repeated practice under constraints, because AI now makes homework and take-home artifacts a weak signal of real ability.

June 4, 2026
dailycal.org
Discuss on HN

Key insights

Homework is no longer a trustworthy signal

Once students can get polished take-home work from an LLM, grading homework stops measuring mastery and starts measuring willingness to cheat. Professors in the comments converged on a harsher but cleaner setup. Homework should be for practice and feedback, while grades come from in-person written exams, practical tests, or oral defenses that force students to explain decisions in real time. The uncomfortable part is scale. Oral exams and close proctoring work, but they do not fit classes with hundreds or thousands of students.

If you are responsible for screening or training people, demote take-home artifacts in your evaluation stack. Add live explanation, timed exercises, or oral walkthroughs before you trust that someone actually owns the work.

Attribution:

davrosthedalek #1
stefan_ #1
shagie #1
jventura #1 #2

Use LLMs as critics, not ghostwriters

The most durable workflow people described is doing the work yourself first, then using the model to check, challenge, or extend it. That keeps the learning loop intact because the human still has to generate the plan, notice errors, and defend choices. Several commenters said this reduces the headline productivity boost, but preserves the ability to debug subtle failures and reason in novel situations. That trade is worth it because the teams going all-in on agentic output are already struggling when the model misses edge cases.

Set a default rule for yourself and your team: draft first, model second. Use LLMs for review, adversarial questioning, practice problems, and targeted help, not as the first author of anything you need to truly understand.

Attribution:

Otterly99 #1
wffurr #1
Cthulhu_ #1
CuriouslyC #1
camelmel #1

The article's data may be selectively framed

A careful pass over Berkeley grade data suggests the story may be choosing the most dramatic classes and semesters instead of showing the whole CS picture. One commenter pulled grade histories for all still-offered CS courses and did not see a broad spike in F rates across the board. Another pointed out that the most dramatic CS10 number came from a tiny spring cohort, where intro classes are not at peak enrollment and may include more repeat takers. That does not disprove a real problem, but it does weaken the claim that Berkeley CS as a whole suddenly collapsed for one simple reason.

Treat the Berkeley story as a warning sign, not a clean measurement of system-wide decline. Before changing policy, check course-level context, cohort size, and whether a headline jump survives a wider dataset.

Attribution:

rahimnathwani #1 #2
danso #1
krull10 #1

Math preparedness and AI are separate failures

Comments drew a useful distinction between long-building math weakness and the more recent AI shock. Dropping standardized tests and weakening math preparation may have lowered the floor for some STEM students over several years. But the abrupt jump in failures looks more like students reaching advanced or even introductory CS courses after using AI-heavy policies in prerequisite classes, then hitting in-person exams they cannot fake. In other words, poor preparation made the system brittle, and LLMs let students glide past the warning lights until they hit a hard assessment.

Do not collapse every bad outcome into one cause. If you want to fix the pipeline, you need separate interventions for admissions readiness, prerequisite integrity, and AI-resistant assessment.

Attribution:

somenameforme #1
insane_dreamer #1
gamblor956 #1
amiga386 #1

CS cheating is easier to detect than other cheating

Part of the spike may reflect better visibility, not just more misconduct. In CS, copied code and pasted LLM output leave structural clues. Professors mentioned MOSS, submission logs, and obvious mismatches between student history and suddenly polished solutions with advanced constructs. Math and humanities cheating can be just as common but harder to prove. That means failure counts and conduct cases in CS may be a leading indicator of a broader education problem, not evidence that CS students are uniquely dishonest.

Do not read CS conduct numbers too narrowly. If you manage any assessment system, assume the same behavior is happening elsewhere with weaker detection and design controls accordingly.

Attribution:

Manuel_D #1
acbart #1
elictronic #1
jknoepfler #1

Frequent low-stakes quizzes beat clever pedagogy

Several commenters cut through the debate over flipped classrooms and AI policy with a simpler point from learning science. Retrieval practice, spacing, and interleaving still work. Daily or near-daily quizzes, especially ones that students cannot outsource in the moment, create the feedback loop that homework used to provide. Flipped classrooms only help if students actually prepare, and AI now makes fake preparation cheap. Regular short quizzes restore the incentive to come in with material in your head rather than a summary in your chat history.

If you teach or onboard people, increase the cadence of short closed-book checks. Frequent retrieval is a better defense against superficial AI-assisted learning than one big exam at the end.

Attribution:

phantom784 #1
raphman #1
CuriouslyC #1
MichaelNolan #1
jjice #1

Against the grain

LLMs can strengthen reasoning when used adversarially

A smaller camp argued that models do not have to make you passive. Treated as a bluffing sparring partner, an LLM can force you to refine assumptions, defend your plan, and probe alternative viewpoints faster than you could alone. The value is not that the model is right. The value is that it creates a cheap loop for stress-testing your own thinking, especially if you reset context often and use it to argue the opposite side.

If you already have strong fundamentals, experiment with using models as debate partners rather than answer engines. This works best when you are explicitly trying to surface holes in your own reasoning, not outsource it.

Attribution:

Hasz #1 #2
visarga #1

LLMs can fill real teaching gaps

Some commenters pushed back on the blanket anti-AI tone by pointing out that many students never had reliable human help in the first place. Professors can be inaccessible, condescending, or overloaded, and tutors cost money. In that setting, a patient model that explains a concept ten different ways can be better than what many students actually get from institutions. The catch is obvious. It helps only when the student uses it like a tutor, not when they ask it to do the assignment.

Do not confuse bad institutional teaching with proof that AI tutoring is worthless. If your organization cannot offer enough human support, structured AI tutoring may be better than leaving learners stuck and alone.

Attribution:

trismus #1
wtp1saac #1
Cthulhu_ #1

Capability gains may outweigh skill loss for professionals

A minority view held that the hand-wringing is partly loss aversion. For experienced engineers, not understanding every line as intimately as before may be an acceptable trade if they ship better systems with fewer bugs and can still reason well enough to hit the outcome. This camp sees capability, not untouched raw skill, as the relevant metric in real work. They also reject the assumption that any reduction in manual fluency automatically means meaningful decline in practical intelligence.

For senior people in production environments, judge LLM use by outcome quality and failure handling, not by nostalgia for fully manual workflows. The risk is real, but so are the gains when expertise is already in place.

Attribution:

adamtaylor_13 #1 #2 #3

AI may be exposing broken assessments, not breaking learning

A few commenters argued that the collapse in grades may say as much about education as about students. If assignments and tests are easy to game, then they were weak proxies for learning all along. From this angle, AI is less the cause of educational failure than a stress test that reveals how much the system relies on brittle rituals like homework completion, standardized tests, and passive lectures rather than robust proof of understanding.

Use AI as a forcing function to audit what your assessments actually measure. If a tool can beat the assignment without real understanding, redesign the assignment instead of pretending the old signal still works.

Attribution:

rafaelmn #1
eudamoniac #1

In plain english

ACT ↩

Action Chunking Transformer, a robot imitation learning method that predicts short sequences of actions from observations.

CS ↩

Computer science, the academic field that studies computation, software, algorithms, and related systems.

LLM ↩

Large language model, a type of AI system trained on huge amounts of text to generate and analyze language.

MOSS ↩

Measure of Software Similarity, a tool used by educators to detect likely plagiarism in programming assignments by comparing code structure.

SAT ↩

A standardized college admissions test widely used in the United States.

STEM ↩

Science, technology, engineering, and mathematics.

Reference links

Reporting and background on admissions and grading

Daily Cal article on Berkeley CS failing grades and AI use
The main article under discussion
Daily Bruin on UC faculty calling to reinstate SAT and ACT for STEM applicants
Referenced as related evidence that UC faculty see weaker math readiness after test-free admissions
UC student success site hosting faculty letter and materials
Source for the actual faculty letter about reinstating standardized testing
Yale admissions statement archive about predictive value of test scores
Used to support claims that standardized tests predict college performance
AP News on colleges turning to oral exams because of AI
Example of institutions changing assessments to counter AI-assisted cheating

Data and measurement

Berkeleytime grade distributions
Primary data source commenters used to check whether the article cherry-picked courses
Google spreadsheet of CS10 grade data by semester
Detailed grade history shared to question the article's framing of CS10 failure rates
UCSD admissions and math preparedness report
Cited as evidence that more students now need remedial math and that basic numeracy has weakened

Learning methods and pedagogy

The Flipped Classroom article
Shared as an example of reversing lecture and homework time in response to changing educational needs
Technorealism archive
Suggested as a useful framing for how universities should adapt to AI without falling into hype or panic

Cheating detection and academic integrity

MOSS plagiarism detection system
Referenced as the long-running tool used to detect copied programming assignments
Stanford Daily on honor code violations in intro CS
Used to argue that CS cheating was already common before LLMs

Books and cultural references

The Diamond Age
Referenced for its vision of AI-guided personalized education
Surely You're Joking, Mr. Feynman
Mentioned for stories about mental calculation and approximation skill
SMBC comic from 2014-12-17
Linked as a cultural joke about humans and AI-like delegation
The Whispering Earring
Shared as a short fiction piece relevant to AI-assisted thinking and dependence

Health, attention, and cognition references

APA Speaking of Psychology on attention spans
Cited to argue that attention decline predates LLMs and is tied to broader digital overstimulation
BBC Future on AI chatbots possibly making you stupider
Used to support claims about cognitive offloading and declining mental effort

Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Reporting and background on admissions and grading

Data and measurement

Learning methods and pedagogy

Cheating detection and academic integrity

Books and cultural references

Health, attention, and cognition references