“Step one? FIX. Step two? IT.”
“Step three? FIXIT. Now repeat till it is FIXED.”

Wise words from Kenan, patron saint of the first Khan Academy fixit.

Fixit day is just one more in the long list of solid dev lessons I’ve been learning from the Googlers around here. Since I couldn’t find a reputable source explaining the fixit culture (who reads the NY Times?), I figured it’s my duty to share.


Who’s gonna fix it? They are. Because they broke it.

What’s fixit day?

“The goal of a fixit is to address niggling concerns that bother you time and again, but never enough to actually fix them.”

That’s straight from somebody who has more experience with fixits than either of us. I prefer to view fixit day as that moment when you finally wake up in the middle of the night with enough consciousness to properly rearrange the blanket (or swipe away the breadcrumb, if you’re gross like me) that’s been just barely interrupting your sleep for the past five hours, but never enough to break you out of your dream. Man that’s a good feeling. I hate breadcrumbs in bed…but I also really like cinnamon toast.

Anyway, our first crack at the whole thing tackled two levels of fixits:

  1. Two teams split up to handle upgrading us to the latest version of Backbone and refactoring our code to reduce our circular import problem, respectively. These were both nontrivial tasks that would’ve been difficult for an individual to get done in the face of a rapidly changing codebase, so they were perfect fixit fodder.

  2. The rest of us swept away just about any breadcrumb that needed sweeping. Error logs were cleaned up. 404 pages became prettier. Old, unused code was removed. Crashes were fixed. Little CSS bugs were stomped on.

The “fixedit!” Trello column got quite full, fast. And while it’s easy to take potshots at fixit day by claiming that everybody should be fixing this stuff all the time, I was even happier with the team culture inspired by fixit day than with any individual change. Every single person on our team was having fun working on the same thing, and no fix was too small annoying UggggghhhhWhatIsThisBrowserDoing to be undeserving of absolutely anybody’s attention*. Our first fixit day felt very healthy. It was a little chaotic at the end due to our refactoring ambitions, but they were good ambitions that paid off. The next fixit will be even better. I look forward to it.

*Making shit work is everyone’s job, unless you’re the shit umbrella for ever-…wait…nevermind. Too much cursing. This one’s not gonna get past the censors on the official KA blog.

Comments 4/18/12 — 4:51am Permalink
Teacher effectiveness ratings, programmers, and Khan Academy’s data

Today I was doing whatever it is I do when I ran across this link from Joel:

…and my brain started pattern matching. Replace “NYC teacher” with “programmer” in this tweet, and you’d be in classic Joel on Software land. After all, one of Joel’s self-stated missions is to make programmers’ lives better (you can see his efforts played out in both the principles of Fog Creek and Stack Overflow), and he’s spent plenty of time trying to convince us of the stupidity of using automated metrics to assess programmers.

Before we go any further, here’s the gist of the post about NY’s teacher effectiveness ratings that were recently released to the public: they have major flaws. Read the post — but this chart says a lot:


Every point is a teacher that taught the same subject to two different grades in the same year. Think of a middle school teacher handling both 6th and 7th grade math. The x-axis is their effectiveness rating for one of the grades, the y-axis is the other grade.

You don’t have to stare at the graph long to see a surprising lack of correlation. If you’re effective at teaching 7th grade math, shouldn’t you be effective at 6th? Wait. Before you throw effectiveness ratings out the window, read the comment further down the page that points out the increasing usefulness of the published data as you look across multiple years of teachers’ past. Ok, that makes sense. But still, judging teachers on test score summaries alone is madness. Perhaps they’re useful feedback when given to teachers appropriately, with caveats, and all? Maybe the ratings need some tweaking?

None of this matters if the data is published to the public and uses a single, automated metric to reward or punish teachers.

There’s a good reason Bill Gates tore apart the decision to publish this data. He knows that a single metric is bound to be not only flawed, but, if used as an incentive system, also destructive to both teachers and any attempts to improve the metric. His nuanced argument for a system that combines data and highly trained teachers evaluating their peers sounds pretty similar to the belief that highly technical programmers should be the only ones managing other programmers:

But student test scores alone aren’t a sensitive enough measure to gauge effective teaching, nor are they diagnostic enough to identify areas of improvement. Teaching is multifaceted, complex work. A reliable evaluation system must incorporate other measures of effectiveness, like students’ feedback about their teachers and classroom observations by highly trained peer evaluators and principals.

For Teachers, Shame is Not the Solution

Again, replace “teachers” with “software developers” up there and you’ll see Bill Gates is making the same crusade he made for developers — protection from the type of overly simplified management incentives that destroy your ability to focus on the tasks at hand when working in a complex, creative profession.

I was lucky enough to step into professional programming at a time when an exploding number of forward-thinking companies were starting to treat and recruit programmers effectively. I was never promoted or demoted based on the number of bugs I created or lines of code I wrote. But it’s clear that wasn’t always the way things worked, and when I was in college I remember reading Joel and Paul Graham, who both stood out as Defenders of The Programmer against destructive management.

This is why we’d never, ever use Khan Academy data to single-handedly “rank” teachers or anything else so ridiculous. Khan Academy data (and there’s a lot of it, we just passed 400 million practice problems done) is to be put in the hands of teachers, for teachers, as a powerful tool that lets them dive deep into their students’ individual levels of mastery. We aim to empower teachers with the best tools available and believe that the only people assessing them should be highly trained teachers who understand the nuances of their craft and work with them to improve. Sound familiar to you, devs?

I’ve never been a teacher, but I do know that I’ll never even consider working for a company that assesses my performance based on a single automated metric. I think I have Microsoft and Google and Joel and Paul Graham and co. to thank for the software world’s culture of respect for both data and individuals. And now it’s really cool to see Bill Gates and Joel taking a similar stand in defense of teachers.

Ten bucks says none of the teachers in The Academy for Software Engineering suffer from a single metric incentive system.

Comments 3/2/12 — 11:49am Permalink
Required code reviews

This is the story of the growing Khan Academy team converting me into a passionate fan of requiring a code review for every single changeset.

Those who have worked with me know that it’s a surprising position for me to take. On the spectrum of “Follows good development practices even if it slows down the product” to “Just ship the thing, code doesn’t matter, only users matter,” I tend to fall…right about…[furiously scribbling]…here:

Even though I’ve long been a fan of code reviews at both Fog Creek and Khan, I never would’ve suggested requiring them for every single changeset, no matter how small. At first glance it appears all P̝̂R̫͙̼̽ͪ̽̋Ö̝̹̿ͬ́̐̆̈CÈ̝̱S̮̜͙̩̠S̹͍̳̖͍̆̐Y̩̟̥̟̘̺̠ͭͫ̔, this is web development, not rocket science, and Hey what if there’s an emergency and Wait are you serious, you want me to review my single-line change to a trivial #comment???

Luckily for everyone, we’ve been hiring smarter and smarter people at Khan who can save me from myself. (What’s that theory about smart people and some lake? Lake Okeechobee, I think. With the crocodiles.) We told our team that we’re requiring code reviews for all pushes a couple weeks ago. Spoiler alert: it’s not that processy, and even a Just Ship It clown like me is already seeing immense value from the experiment.


Croc! No, sorry, wait…log!…Or, no! Wait! Sorry…Croc!

What’s been so great about requiring reviews?


  1. For a team that values code reviews so highly, we were missing a lot of them. Turns out there are quite a few excuses you’ll tell yourself to explain why this changeset doesn’t need a review. By explicitly setting the expectation, everything’s much simpler. Just get it reviewed.

  2. Frequent, digestible reviews. By committing frequently and reviewing everything, we don’t let features build up to huge, threatening diffs that make your nose bleed like the emo kids from Chronicle. Any change that makes reviews more enjoyable will amplify all of the other well-understood benefits of code review that we’ve listed in our public Khan onboarding docs so I don’t have to make this sentence any longer by talking about them. Small reviews are just more fun.

  3. Those quick one-line reviews aren’t actually a problem. They take 30 seconds to review, see #1. If you have a legitimate emergency, push it and get it reviewed later, ain’t no thang. If I believed for a second that this decision would hurt our ability to Get Things Done, I’d be fighting it tooth and nail.

  4. In two weeks of doing this, I feel I’ve improved as a developer more than any other period in recent memory. All props to my reviewers, of course, and it probably doesn’t hurt that I’m actually writing code for the first time in a while. But…still.


All changesets reviewed.

It’s not for every group. I’m not convinced it would’ve been right when Jason and I were hacking together alone. But requiring code reviews has already made for a better product and a healthier team, two things that I personally care about far more than a healthy codebase. That’s just a nice side-effect.



P.S. When working on Kiln, I was adamantly opposed to building code review requirements directly into the source control product. I stand by that belief, and I’m almost certain the Kiln team still agrees. Reviews should be required by your team’s dynamics and strongly encouraged by your tooling, not the other way around.

Comments 2/28/12 — 12:07pm Permalink
Sharing the inspiring, personal stories of Khan Academy users

We have these emails hanging up all over our office, sent in from Khan Academy users with incredible, personal stories to tell. Every time I read a new one I’m emotionally affected, which means my robot emotion chip is faulty.

So when some curious soul (like a reporter) wanders in and asks me, “How will you know if Khan Academy is really successful?” I always answer their (totally valid) question with an explanation of our data, analytics, and fancy metrics — but what I’m really thinking is, “You haven’t read these letters.

Let’s change that. These students’ and parents’ and teachers’ stories are now available for anyone to be inspired by. It is impossible to read them…go ahead, I challenge you…and not come away with the conclusion that a free educational resource like Khan Academy simply must exist.

Call me a softie. It’s not like I don’t believe in data as the final arbiter of any learning tool’s effectiveness. I do. But if our key data metric happened to be “# of page-long, authentic stories sent in from users who have turned their lives around in the face of drug addiction, unleashed their 2nd-grade son on the advanced math he’s fully qualified to handle, or earned acceptance into a university despite being stuck in a country that does not value education,” I don’t think I’d second guess seeing that number skyrocket.

Hopefully these stories inspire others as much as everybody on our team. We spent time designing the page to celebrate the authors, their letters, and the fact that these are real lives, not product testimonials.

Does the page accomplish this? Feedback welcome, good and bad.

Comments 1/24/12 — 3:50am Permalink
Khan Academy Internship, Fall ‘11
I already can’t wait to drop some major challenges in the laps of our two incoming Fall interns to see what they can build.

- Khan Academy Internship, Summer ‘11

Check!

David Hu and Julian Pulgarin stepped up to the plate this Fall during their coopsinternships from University of Waterloo. We call ‘em internships because we’re from Amurrrrica, you crazy Canadians.

When hanging with friends and family recently, I found myself shocked by how willing my brain is to completely forget stories that were surely once described as unforgettable. I don’t have any desire to forget what we’re doing at Khan Academy, and that’s kinda sorta why being as open as possible is one of our core dev principles.

The Summer ‘11 story is already a nice piece of shared history that helps me answer every intern candidate who asks, “What kind of project do interns work on at Khan Academy?”. Here’s my version of the Fall story.


If tales about battling a bison for the right to cross a road can get up and walk out of my head,
then…well…better keep blogging.

David

It doesn’t get much more open than David’s post about how we use machine learning to assess student mastery. If I tried to summarize it for you, you’d see my managerial hair start spiking up and up and up toward the ceiling, Pinocchio-style. I won’t insult the work by talking about the statistics. Instead, I’ll just say that we now have a much better understanding of how competent each Khan student is in each math subject. Thanks, David.

That’s not all he did. From a dashboard to emphasize the importance of our exercises to a much smoother way of asking students to review work they’ve done in the past, he covers it all in this Vi Hart-inspired internship post-mortem.


Keeping an eye on our students’ activity via David’s dashboard


Forgetful Ben from the future is grateful for this video. And he’s also super-jacked and smart.

Julian

If anybody reading this has ever beaten Julian Pulgarin in chess, please rub his nose in it. Julian wiped the floor with me so many times that he would beg me to play him while he was blindfolded. My ego can’t handle that sort of hit, so I usually just deleted some data from production and pretended to be disappointed at each newfound emergency’s particularly poor timing.

When he wasn’t humoring me, Julian made major contributions across the board.

He started by building a number of new exercises for students to learn from, including an experimental crack at a new way of teaching fraction intuition. While working on this, he realized that it was painful to test our open source contributor’s GitHub pull requests, so he disappeared for a few days and came back with Sandcastle. Sandcastle automatically tags every pull request with a link that lets our developers test out the requests’ new exercise content in one click.

It’s a shining example of the reason we hire smart people and set them free to get things done. The first time Julian told me about his Sandcastle idea, I didn’t even really understand the direction. Now it’s indispensable.

Julian also gave our first KA Friday Tech Talk about how to do gradual feature rollouts for various segments of a large userbase. He ended up bringing this full-circle at the end of his internship by building Gandalf, our tool for doing the following:


Gandalf lets us selectively roll out features to all sorts of different subsections of our userbase.


“YOU SHA — ok, you guys, Hey!, you guys over there, you can pass — BUT OTHER THAN THEM YOU SHALL NOT PASS.”

Julian even “accidentally” left his chess set in Mountain View for us to mail back to him, which I’m pretty sure was his way of dropping the mic and walking off stage. “You clean up.”

In conclusion, University of Waterloo is legit. I skipped over plenty of work done by both David and Julian, and it still makes for an impressive Fall Winter there are seasons in Mountain View? As argued in the previous internship’s summary, any team that’s not dedicating tons of resources to both recruiting and mentoring interns is plain old missing out. We’re loving every minute of working with our interns.

BONUS!

Much like the phoenix or a tyrannosaurus flying a fighter jet, Summer ‘11 intern Joel Burget has risen from out of nowhere and dropped entertaining stories about his summer’s work. If you read them, are impressed, and want to hire Joel…you can forget it. He’s now full-time Joel.

If you wanna tag in next, the Summer ‘12 internship class still has openings.

Comments 1/8/12 — 2:49am Permalink
.end() makes jQuery DOM traversal beautiful

This is my new favorite jQuery trick. I just learned it this year and have mentioned it in enough code reviews to decide it’s worth sharing.

When manipulating the DOM with jQuery, you often see code that looks something like:

$("#container").show();
$("#container .error").hide();
$("#container .zoo").css("background-color", "white");
$("#container .zoo .monkeys").empty();
$("#container .zoo .title").text("The zoo is empty");
$("#container .zoo input").val("");
$("#container .zoo").animate({height: 250});

…or, if somebody gets concerned about performance, they might try to reduce DOM lookups:

var container = $("#container");
container.show();
container.find(".error").hide();
container.find(".zoo").css("background-color", "white");
...

…and so on. Odds are, when you’re manipulating a single element like container, you’ll probably be doing something to nearby elements in the next few lines of code.


Readers of this blog, meet Emma. Emma, meet the three readers of this blog.

Enter .end(), newlines, indentation, and chaining. When you have one jQuery chain going and modify it with, say, .find(), you’re actually pushing the new chained set of elements onto a stack. .end() pops the current jQuery chain off the stack, which lets you do stuff like this:

$("#container")
    .show()
    .find(".error")
        .hide()
        .end()
    .find(".zoo")
        .css("background-color", "white")
        .find(".monkeys")
            .empty()
            .end()
        .find(".title")
            .text("The zoo is empty")
            .end()
        .find("input")
            .val("")
            .end()
        .animate({height: 250});

It’s easy to read, because the indentation is significant and matches up with DOM nesting levels. It gets rid of unnecessary DOM lookups. But most importantly for me, it feels natural to indent in and out as I write the code, using small additional selectors to step deeper into the DOM and .end() to find my way back out.

I’ve been destroying my old, ugly var’s left and right with this trick. I hope it helps somebody else.

Comments 12/26/11 — 1:52pm Permalink
Laughing at Others’ Code

Those who have worked with me will know that I’m an expert on this topic because my code gets laughed at all the time.

I’ve seen Good laughing and Bad laughing. Good is what I imagine happens when Robert De Niro sits next to Al Pacino as they’re watching Al’s cameo in some trainwreck of an Adam Sandler movie, and Bob turns to Al smiling and says, “This is awful.” It’s when you stare at some code and think, “Good grief. I can just imagine whatever took priority over making this code more reasonable.” There’s a knowing wink and a friendly jab between the laughed-at ghost-coder and the laugher:

Hah! I almost feel bad that you wound up solving it this way. I’ve been there before. You must be exhausted. Why don’t you sit down, I’ll take it from here.

Good laughter contains respect for the fact that this code exists at all and is being worked on by more than one programmer, which is more than you can say for what I’d bet is a whole boatload of perfectly refactored, unused files littering hard drives around the world right now.

That’s the type of laugher that’s erupting at Khan Academy these days, and it’s no coinkydink that now is right about when a team of top-notch coders are getting their first gazes into some of my previous creationsabysses. I can say with certainty that in these cases the knowing winks and playful jabs are well-deserved.


The rare (and best) third type of laughter.

A quick story before we get to the Bad laughter. Did you know that when Sal was a one-man show, the entire Khan Academy application was one big main.py file? At least the server-side stuff. All the request handlers, URL mappings, datastore models, data migrations, and even some HTML generation in one big 1000+ line file.

How funny is that! And it stayed that way for months! I mean, please. Who the heck wants to work on a codebase that’s one big file? If you haven’t detected the dripping sarcasm yet, recalibrate your sarcasm detector and start this paragraph again.

It’s easy to see how judging Sal by that one big file while he was busy making 2400+ free educational videos would be like judging a geek for wearing a t-shirt while she was just trying to look presentable enough to get to her computer to start writing code. Guess what? Sal’s code is still around, and it’s responsible for helping teach literally millions of students. Yet, as time goes on, it gets more and more likely that one day somebody will laugh at an out-of-place line with the type of judgment that can only come from being out-of-touch with what really mattered back then…and the fact that that line helped change many students’ lives.


You’ve gotta bend this brilliant tweet a bit to apply it to non-profit education,
but you’ll notice “good code” and “bad code” aren’t on the list.

Bad laughter doesn’t need much explanation, it just lacks respect. It happens too much in our industry, and I’m not sure why. I’m proud to say that it’s not a problem at Khan at all, but we haven’t always been 100% immune. We weren’t always 100% immune at Fog Creek. I’ve been guilty of this laughter myself, and I’d bet money it exists elsewhere. It’s most common among coders who feel like they need to prove themselves, and it can be combatted by a team that emphasizes shipping and the healthy laughter that comes from reminiscing about the last crappy hack you were responsible for when you decided to Just Ship It.

Here’s (just one of) ours:

PLAYLIST_STRUCTURE = [
    {
        "name": "Math",
        "items":
            [
                {
                    "name": "Arithmetic",
                    "playlist": "Arithmetic"
                },
                {
                    "name": "Developmental Math",
                    "items": [
                    ...

That’s our current Khan playlist structure, defined in code. When Sal adds or renames a playlist, we have to change the code. Laugh it up.

This will actually go away soon in favor of something much nicer, and I’m not arguing for crappy code. I’m putting this example here because it got us this far, without many problems, and who knows where we’d be if Sal had spent the first iteration of khanacademy.org trying to decide on the perfect playlist data structure.


…and a more rambling attempt to get a similar point across to a group of UIUC students by playing this powerful Thank You, Khan Academy video before revealing main.py.

Comments 12/15/11 — 11:39am Permalink
After giving logged out users access to pretty much everything*

We had some heated debates a while ago about what would happen if we opened up all of Khan Academy’s content for logged out users. Sal’s videos have always been open in this way, of course, but all the interactive exercises and statistical tracking and badges and stuff required an account.

It felt like the right move when we reconfirmed our belief that educational content should be as open and available as possible. We were also persuaded by Fred Wilson’s belief that giving logged out users more power is an effective way to empower your community. But we worried that registrations would drop, because by handing our 250+ exercises to logged out users, we’d be drastically shrinking the carrot on the other side of the “please login!” boundary.

We decided to go for it this summer. Figured I’d share some results.


A dog

4+ months after the change, we know a little more. Jace looked at the data and found:

  • Registration rate hasn’t changed. Bit of a surprise for both sides of the debate.
  • Percentage of our visitors who try an exercise problem increased by (relative) 10-15%.

So more visitors are trying exercises, but they’re not converting into more registered users…yet. We could have fun making up all sorts of explanations. Maybe we need to show off the badges and points unregistered users are accumulating a little bit more, maybe we’re not asking visitors to login forcefully enough, maybe doing math exercises makes people tired, maybe we don’t have enough pictures of golden retrievers on the login page.


Percentage of all visitors who use our exercises

Far more important than any random explanation is the fact that we’re now getting X% more data about how users learn (or don’t learn) thanks to these new exercise visitors. That data is powering some of the best work coming out of Khan Academy so far, so feeding more logged out users into our exercises is, as Jace puts it, a big win on data collection.

Plus, we’re confident that we could iterate and A/B test our way to higher registration rates for our new exercise users. It’s not our priority (for this week, at least). We’re busy cooking up important changes to the core learning experience.

*They don’t have access to everything. The ability to coach and communicate with other users is still restricted to users who have logged in.

Comments 12/11/11 — 4:48am Permalink
How to make “consecutive days of Khan Academy activity” badges

  1. Think how cool it would be if users were rewarded after learning for X consecutive days.
  2. Think about implementation for a second and then go, “Sweet! This’ll be trivial Let’s knock it out Here we go I’ve got things to do!”

  3. Write a line of code or two.
  4. Get sucker-punched by the gruesome specter of timezones when realizing that a student doing work at 8am on Thursday and 9pm on Friday should probably still earn their badge even though over 24 hours of non-work elapsed.

  5. Think “How does Stack Exchange do this?” and discover that while they suffer from the timezone bug, passionate Stack Exchange users have written actual code to do the trick without asking users for timezone information.
  6. Follow those passionate users’ suggestions, knock it out, and enjoy a few weeks later when the most dedicated users start earning badges for 30 days of consecutive learning.

Comments 11/29/11 — 10:17pm Permalink
A/B testing still works. [Sarcastic *PHEW*].

After releasing GAE/Bingo, we received a number of worried correspondences from various very worried correspondents. It seems that GAE/Bingo, along with practically every other A/B testing framework out there, violates some purist principles of how to do significance testing.

The crux of the argument, reworded so simply that I’m pretty sure all statisticians (I admittedly know nothing about stats) would string me up:

If you repeatedly check the results of an experiment, sometimes you’ll see statistically significant results that aren’t actually significant.

So if you’re constantly checking your A/B dashboard and making decisions based on what it tells you, you’re often screwing up.

It’s a mathematically sound argument, as explained to me by my much smarter teammates. And it must be absolutely devastating for all the programmers who went out and bought the Razer Mamba Elite Wireless Gaming Mouse just to increase their click speed so they could mash the refresh button on their A/B dashboards as fast as possible.

Here’s the thing. I know of absolutely nobody who runs an A/B test like a crazed puppy who keeps sprinting loops around your legs hoping that…..ohboyohboyohboyohboy…..after the next 360° you’ll have lowered the puppy treat to the floor.

That doesn’t mean the argument isn’t valid. If you do check your dashboard every 5 seconds like a crazed puppy and immediately end experiments at the first sign of stat sig, then you probably should read the article and…..ummmmm…..find better uses for your time.

Luckily for us, one of my much smarter teammates with much more experience analyzing numbers’n’stuff landed an early modification to GAE/Bingo that should pacify all worried correspondents:


A historical graph of one of our A/B tests and each alternative’s performance. On our dashboard it’s interactive, weeeeeeeee.

By showing this graph everywhere our dashboard shows A/B results and waiting for the results to stabilize, we can be confident that we’re not making a snap judgment in the zone of idiotic decisions.


Danga zone

Ok, good. We’re safe. But what about everybody else? Did Fog Creek and 37signals and everybody including Google immediately start hemorrhaging money due to their reliance on faulty A/B tests which this truth came to light???

My guess is no, because A) they aren’t making snap judgments at the first sign of stat sig and B) with significant traffic, many of our A/B test experiments don’t even have a zone of idiotic decisions. Lots of ‘em look something like this:

 

…and it’s pretty clear in which cases a difference has been made.

Comments 11/15/11 — 2:51am Permalink