Yep, you nailed it. Had it not been for ACA, making the career leap to co-founder would have been incredibly difficult. Two years later, it's a honor to be able to provide insurance to our employees.
My wife works for a very small business -- five people, to be exact. They get group insurance through the company providing their payroll processing. Not by going directly to the exchange. It seems like that should be an option any founder of a small business could take advantage of?
Would love to better expose our data set. It’s a bit tricky however: we’re specifically interested in enterprise software engineering (the patterns of which differ radically from the open source world).
In order to dig into enterprise data, we sort of need to have a ToS that only allows us to talking abstractly about aggregate data.
You don't have to expose the raw dataset explicitly, but you do need to describe your methodology (moreso than an arbitrary "impact" score; give a formula) and give actual numbers for the analysis, and not just argue a "strong correlation" with a highly arbitrary matrix of attributes.
It's a marketing article - trying to explain their product to potential customers, not a scientific publication.
I'm not saying that you are necessarily wrong but I do wonder if making an article like this more like a formal publication will actually make it better or worse at it's real goal which is presumably to sell more.
Yes, the standards of an marketing article are nonacademic, but you can't make a article containing a data driven argument and skimp on both the data and the argument.
There is no downside to a legitimate marketing article to make the underlying data and analysis available to those who want to drill down. If it is valid, you stand a chance of persuading the skeptics.
More importantly, calling something a marketing article does not immunize it from critical analysis, and especially not on a forum such as HN.
You’re right: this post is a narrative about product development, and a strong correlation we found between two variables across 20 Million+ commits that we thought was fascinating and supports general 'kitchen logic' around best practices. The axes are not labeled, but if you like we can set you up with a demo account and walk through your data with you.
One other note: the typical common use case for the product has been stakeholder management; something we have doubled down on in product development. Any specific critique about how we can improve most welcome!
Specific critiques about how you can improve your blog post:
1. Data is always necessary to back up claims like these. Almost _any_ kind of quantitative data will do - just a simple average! just give me something I can replicate on other data sets. Similarly, just saying "we made up these variables as a combination of these other variables and called them Flavorfulness and Musicality, look how nicely our circles line up!" is not useful to anyone - a formula or methodology would be.
2. As a software engineer, I came to this post expecting to find a way to improve myself. I invested time in reading it because I expected a payoff in the form of, for example, a metric I could apply to my own work. I suspect that the author of this post intentionally gave the impression that the post contained tools like this. I didn't find anything remotely like that.
3. More generally, this post does not show signs of being written with the reader in mind. You have not given me any new information that I can apply; instead you have given me a marketing pitch (we did this data analysis, we promise! Don't you want us to work with you?) dressed up like information.
My actual suggestion was to hop on with the product team/co-founder. This article is a narrative about something we found pretty interesting during product development.
Similarly, the offer here is to take a deeper look at what we're building and a (quite genuine) offer to incorporate any suggestions you might have into our roadmap.
Sure, specific critique: It appears that you are making fancy metrics that are more complex but fundamentally no better than a LOC metric. Is there a way that you could create metrics reflecting actual monetary business value created?
@trevyn We've got some things in that vein that are working in the app today — and are working toward even more. If you’d like, we could hop on a call and give you a tour of the app and show you where we’re headed?
Seems like you've given this some thought on how this should be done right; would love to include them in our product development discussions.
This post and the follow up comment "talk to sales" costs your company a lot of credibility.
Your post gives prescriptive conclusions not supported by the data presented. Of the data that you did analyze, we're given so few numbers actually published that everyone is forced to question the fundamental methodology, even that is not clearly stated. The specific critique is to make well-founded claims.
"Every measure of a good developer is qualitative. That makes them all subjective."
^ This post scratches the surface on a timely and relevant question many teams face, but the author seems to throw hands up and say everything is subjective.
Im currently working on performance optimization. Often my job is days (even weeks) of running tests and reading metrics, followed by a 5 line change one day that significantly speeds things up. This is one reason why metrics cant hope to capture everything. Maybe its not all subjective but a lot of it is - especially for more senior people, its about how much contribution they have to the overall business goals rather than any data you might find in the codebase.
Ofcourse there are people who do this much better/faster than me, but the nature of the job still means the actual code output is pretty low.
I can imagine many other functions where code output is super high (adding lots of text of any kind - blogs, templates, new library/config) or super low. I don't think code output correlates well to productivity.
I have a longstanding suspicion that these issues are why there are so many different 'measurement' structures for programming, which don't ever converge to a few winners. I totally believe the GitPrime result is accurate, but my first thought was "I wonder which teams this doesn't apply to?"
'Programmers' is a coherent grouping in terms of "people who write code as a job", but it doesn't generalize well to "people whose daily work is similar". At the extremes, it's a bit like calling a typist, a copy editor, and a poet all "writers" because they produce text. That's a useful grouping when you're designing Microsoft Word, but it's a lot less useful when you're judging everyone on "lines written today". (I worry that there's a value implication in those three careers, and it's not intended; I just wanted vastly different daily workflows.)
And so the GitPrime result seems useful to know about, but there are clear cases like yours where its inapplicable. I suspect it would also run into issues with things like embedded or critical-failure code - NASA's commit patterns might have nothing in common with the industry standard. Even in my largely standard work, I realize that of my last two projects, one had ~50 commits and the other had ~3. The 3-commit one was a tech-debt project with lots of reading and documentation work, and I'm skeptical the two are at all comparable.
Even within similar work, I imagine things like branching strategies and test coverage alter commit frequency. For the near future, we'll probably continue to see the value of a given assessment stay company-and-role specific.
Those that self-identify as programmers may or may not be current HS or undergrad students. They often know a single language moderately well, maybe a second. Though they know the language, they often don't know what is available to them in the language of choice's standard library, so they often fall victim to re-inventing the wheel.
A developer has a few seasons under their belt. They may know several languages, and even have a fair grasp upon what the standard library of those various languages offer and so less likely to reinvent the wheel (although they still sometimes do). They're more likely to have a larger toolbox and decent grasp on when it's reasonable to use a certain tool (language or framework to tackle a new problem).
An Engineer is more likely to involve tools to solve problems. Got a performance problem? Going to profile instead of running by gut feel. If your company can afford it, vTune is a great tool here. Engineers are more adapt and comfortable using tools such as gdb, valgrind and strace on Linux or say Windbg on Windows. Bonus: they usually don't need to refer to the manuals to specify the correct commands or interpret the results. This doesn't come from accident or pure reading - you have to have real world practice and experience to know when and which tool is appropriate for the problem at hand.
The Senior Engineer knows all of the Engineer as I've briefly outlined above, but also takes the time to teach/mentor/coach the juniors around him or her when they see room for improvement in the junior. And this should not be in the form of "you're doing it wrong, do this instead", but there should be a "why" component in the "you're doing it wrong".
Your description of senior engineer fits me reasonably well - eight years experience, have had 'senior' in my title for the last three jobs, routinely teach the people I'm working with - but I identify as a programmer, not an engineer, because there is no such thing as software engineering, and if by some chance there is and I've missed it, I'm certainly not doing it.
So true! I have yet to see a metric that wasn't gamed somehow, even by people who are generally honest and upright otherwise.
If you base someone's livelihood on a number, expect them to attempt to maximize that number, probably at the expense of other numbers that you may or may not be considering.
This is a big part of why we think engineering metrics are a tough nut to crack: there’s a fair bit of research that suggests tying KPIs to compensation is a huge anti-pattern for certain types of work (chiefly those involving lots of novel problem solving).
Highly recommend Daniel Pink’s “A Whole New Mind” on this topic; it’s pretty fascinating stuff.
I agree with the overall message of the post but not the analysis behind it. It depends completely on the codebase and where you as an engineer are assigned to work in that codebase. There's so many potential ways for bias to be introduced into that kind of analysis (e.g., what if I am a junior dev assigned to copy edits to the static sections of the website?). There's a qualitative perspective to commits that's going completely ignored in this post.
This ties into what I was talking about above with respect to "heavy algorithmic stuff". Most developers can work with intricate legacy code by adding an if here and there to handle special cases. Very few can recognize that it's encoding a state machine with state transition function Y, and rewrite it. Even then, very few have enough 'attention to detail' to do the rewrite, and NOT lose any legacy functionality. This is an example of what I consider to be 'hard stuff'.
This is why I like when people contribute to major open source projects - chances are good they will be exposed to some of these hard problems, which can help them solve future ones.
Engineer happily using GitPrime here. After sitting in on a Skype session we had with your team once upon a time, I took your earlier (pre-Impact) observations on the correlation between commit volume and overall performance to heart and consciously focused on chunking my work more. It will come as no surprise to you that this strategy's helped me stay consistently in the upper right of the 2x2 chart. Thanks, Ben!
From what I've seen, a high frequency of commits means either:
- Has very little understanding and churns out code one day and removes it the next.
- Thinks a lot about their code, and divides them into digestible, independent chunks.
Ah nice, @Eiriksmal — 'once upon a time…' and 'pre-impact', sounds like this was a while ago now :) Encouraging to hear that this has been helpful for you.
It's one of the only clear-metric breakdowns that actually makes sense in light of my programming experiences. I'm hesitant about the idea of using it as a management tool (easy to game, and high commit count or even 'high impact' isn't proof of quality). And, as some other people noted, it's quite role-specific - a performance engineer's commit count won't be comparable to a line engineer.
But as a personal insight, or a mentoring tip? This seems excellent. I know I want to write good code, I'm not going to game my personal metrics, so a clear guideline like "break up code more" is a great thing to have.
@Bartweiss thanks for the feedback. The thing about “gaming the system”, is that it presumes an adversarial relationship between engineers and non-engineers. That’s pretty unfortunate, and it’s our view that a good portion of this is due to non-engineers not really understanding what happens software development.
But yeah, my co-founder has a pretty interesting take on gaming the system.
Adversarial work does seem like it's largely a sign of communication breakdown. I think 'gaming' isn't always as adversarial as it seems, though. I was considering it in terms of Goodhart's Law, where even with good intentions, representative metrics become less representative as you organize around them. The "active days" entry in your link seems like an example; if everyone pushes most days for the sake of pushing, then active days isn't feedback on programmer skill.
Of course, your link touches on that. "Everyone pushes every day" is a great outcome, even if the metric is no longer clear. The common trend is to pick good metrics, then struggle to keep them relevant. I really like the alternative of just setting metrics that will be good after people optimize.
The other side of this is it is likely if the engineers are optimising for a metric that is important enough to be tied to compensation, the companies' non-engineers are likely to place a high value on that metric too, exacerbating the issue.
This is a situation where I feel gaming the system is not an inherently adversarial relationship.
Love this, in my experience also, good programmers make a lot of small commits. While less experienced people do ocasional commits regardless if they are big or small.
Something your chart doesn't show is people who make a lot of commits/work but the quality is subpar. They behave as prolific programmers but still need to improve.
surely this assessment introduces false positives? Someone that implements a poorly specified feature exceptionally well suffers rework, not because their code is poor but due to outside factors. Someone who cherry picks easy items is reflected as prolific where a perfectionist might just be given more difficult problems.
Idk, I worry that people will interpret this backwards and rather than having characteristics of a good developer ape those characteristics to appear as a good developer.
As a management metric, I would find this concerning - perverse incentives and false positives all over the place. But as a personal observation, I'm going to take it to heart.
This (like most other metrics) seems most useful from a starting point of "while doing my daily work and seeking to write good code, I will apply this". That doesn't make a good tool for deciding who's talented, but I'm still going to remember it.
Whatever you measure, when people know it, will affect how people act. People will try to move the needle in the direction they see profitable, even unconsciously.
That's actually fascinating. Is the formula for the mentioned "developer impact" a hush-hush trade secret, or can you go into more detail on how you calculate it? Would be really interested in hearing more!
The danger of developing a qualitative metric for work/job performance, is you now have a recipe to game the system for anyone that is so inclined. Of course, if this metric is used as an indicator for promotion and raises, nearly everyone will be inclined to game the system.
It might not even be as overt as that. It is a well known phenomenon that what you measure, you improve.
So as soon as you have a quantitative metric for work performance, almost by definition, it can become problematic to use, unless you are very careful how it is used.
This is not to say that finding a qualitative method is useless, far from it. It can lead to valuable insights and improve overall quality of the team/business.
Nice article indeed, thanks for sharing. I'm currently struggle to go from perfectionist with low frequency to high frequency.
I've also noticed that experienced programmers do their commits more frequently and they do the right job - they spend less time on unimportant details.
It also depends on the codebase. A programmer who works on similar codebases for 10 years is more likely to have a high impact than the one who work less and consider these codebases rather new. While their approach and skill set may be similar.
Quantitative information can only ever be used to inform qualitative valuation. It should not be absolutely linked (in other words, you should not make hire/fire decisions solely based on metrics of any sort). Your work is definitely interesting, though.
Yeah, have to agree with louprado here. The 'People around you control your mind' although a bit more entertaining and click baity is actually more accurate than 'peer' or 'pressure'.
It's all about the elevation. You can drive for hours on the east coast and be in the same relative eco-system. Here in the west, you drive an hour and can move through several. Start in a scorching red-rock desert and move in elevation to an alpine meadow in a short drive. And the temps drop 3°F for every 1k feet you gain in elevation.
Whoah.