I was hoping to spark a conversation about this approach as no such thing was mentioned in the article even though it should be possible to do.
The whole point of these comment sections is to have a discussion. If the point of this site was just to read articles there wouldn't be a comment section.
I think your question was phrased in a way that suggested you had not understood the article. The described system detects unused libraries and binaries, but does not seem to look for dead code paths in source files.
Dead code path detection is an interesting topic which I think the linked article doesn’t address at all.
Profiling is inherently probabilistic and I don't think it should be used. Anything that inspects the runtime (dynamic) behavior of code isn't good enough for a code deletion tool. Only static analysis will do.
And with enough samples we can learn what code paths are being taken. You can have your dead code bot look at the number of samples that have been collected to know if it should be enabled for something. For example if you have 6 mounths worth of samples of a piece of software running on 10k machines you will get a good idea on what code isn't being used.
>Anything that inspects the runtime (dynamic) behavior of code isn't good enough for a code deletion tool.
I disagree since the code that is being run is by definitions not dead code.
When the developer reviews the CL they can use their judgment to determine if the proposed deletion makes sense. The bot can remember not to send a CL for that same code in the future.
This is hyper dangerous. People are prone to accepting robot CLs and even a small error rate of deleting important code will be a big problem. GWP is also only able to instrument a very small number of requests.
It sounds like the same issue exists with this approach of removing unused libraries and binaries that have not been run in a while. My suggestion is just expanding it from binaries that haven't run to libraries that haven't run. You could even just put this information into some code health dashboard. If code isn't run that means that it isn't providing value or it isn't being tested in production. How can you be sure some rare failure case actually works and doesn't take down the system if you never test it out to see if it works.
>GWP is also only able to instrument a very small number of requests.
As I mentioned before this could be limited to popular services where it is enabled and where a small number of a giant number of requests is still a big number.
> It sounds like the same issue exists with this approach of removing unused libraries and binaries that have not been run in a while.
Not really, it's the difference between knowing that something isn't plugged in and hoping that it isn't powered on.
Libraries that aren't run are deleted, you're suggesting to expand this to stochastically unexercised code paths within libraries that are run. (as opposed to provably unexcercisable code paths, which I think there are tools that do do this but are opt-in instead of on-by-default).
> If code isn't run that means that it isn't providing value or it isn't being tested in production.
Or can't show up in profiling for any number of reasons (I'm immediately brought to the idea of a CHECK fail which would not every show up in profiling I don't think, but could be exercised regularly, and if is included intentionally is likely preventing some kind of data corruption issue so removing it would be terrible).
Yes, but then you run into signal-to-noise issues. It's much easier when I can trust that 90 or 95 or 99% of the automatic deletions will be legitimate, than if only 50% or 20% are, then they become an annoyance.