Starting making hyprwhspr because no other stt library was quite there for performance and model availability.
After that I started writing opub.dev because even minimal success in recent oss showed me just how much has changed, and I’m worried about how expensive everything will get for maintainers.
So, now I’m trying to GIVE people compute so they can start building a helpful filter layer above their projects.
The furthest I've gone in these jazz style culture interviews is asking people what they do outside of work for fun. This was for fully remote async positions. And it was important to know you had other stuff going on because the mental/personal health risk in failing at remote work is massive and life altering.
If, through wherever that discussion went, I wasn't 100% sure that you could stand on your own feet and wouldn't sink into the abyss, it was impossible to move forward. It was a tough line to walk sometimes because you don't want to pry personally. But that doesn't appear to be a universal opinion, it turns out.
Even if I wanted to, these questions aren't allowed in the company I work for, along with feedback related to "team fit". This is dictated by execs, dictated by legal, because it has nothing to do with proving competence, and opens up for employment discrimination lawsuits since you're persuading them (you have to understand the power dynamic) to reveal potentially protected info. For example, if a man say "Oh, I go hiking with my boyfriend!", he could also say "They didn't hire me because I told them I was gay!". Or, even "I spend time with my kids." since familial status is a legally protected class where I am.
As a person who does interviews, I have exactly zero interest in what people do for fun. I just want competent people that are nice to work with (in a productivity sense), and I only have 45 minutes to prove that, knowing that nearly everyone fucking lies. I see it serving no purpose other than helping enforce some monoculture within the group, because, genuinely, why else would you ask about free time activities during an interview?
Related, the only time I've asked this was early on when I didn't know how to interview. The only time I've been asked this, and answered, was with people who had just started interviewing (small startups and new hiring managers).
Great comment. It's really shocking how close to the legal line Silicon Valley tech companies get, and the extent to which many of them actually cross way over the line. A huge number of interviewers I've encountered are in extreme need of training so they don't so casually put their companies at legal risk. If I was Lawful Evil, I could probably make a career out of just suing companies for discriminatory hiring practices, due to the various landmines poorly trained interviewers routinely step into.
BigTech seems to be the best at it. They tend to have rigorous training, and often have a "safe question bank" that interviewers pull questions from, which are all vetted by lawyers and are known not to put the company at legal risk.
I think that's the best you can do for culture fit, cause at the end of the day it's just "can they shoot the shit and are they pleasant to be around". You can't really know a person technically or socially until they've been in the job for at least a little bit though.
On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.
Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.
Incidentally, waiting for Apple to blow this all up with native STT any day now. :)
How does it compare to the more well established https://github.com/cjpais/handy? Are there any stand out features (for either option)? What was the reason for writing your own rather than using or improving existing software?
I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.
Yeah that makes sense, chunking on silence would sidestep the latency issue pretty cleanly. I've been running it through a basic fastapi wrapper so it just takes whatever audio blob gets thrown at it, no chunking logic on the server side. Might be worth adding a vad pass before sending to whisper though, would cut down on processing dead air too.
Maintainer of WhisperKit here, confirming we do exactly that for longform. We search for the longest "low energy" silence in the second half of the audio window and set the chunking point to the middle of that silence. It uses a version of the webrtc vad algorithm, and significantly speeds up longform because we can run a large amount of concurrent inference requests through CoreML's async prediction api. Whisper is also pretty smart with silent portions since the encoder will tell it if there are any words at all in the chunk, and simply stop predicting tokens after the prefill step - although you could save the ~100ms encoder run entirely with a good vad model, which our recently opensourced pyannote CoreML pipeline can do.
Oh nice, the pyannote coreml port is interesting. Last time I looked at pyannote it was pytorch only so getting it to run efficiently on apple silicon was kind of a pain. Does the coreml version handle diarization or just activity detection?
The same is true for database rankings (db-engines).
If entrants are not artificially inflating "organic" signals via fake content spam (Twitter/X), then the criteria themselves are losing their signal strength (StackOverflow/GitHub).
The diffusion makes it increasingly difficult to understand which channels are important and which correlate to strength in the market.
Unfortunately, these can be more than vanity metrics.
Some VCs or financial markets may use these as methods towards valuation.
Happy to answer any questions about deduplication. One thing that's not included in the write-up is that we also address out-of-order indexing alongside deduplication.
i think part of the problem is these kind of messages are alienating exactly because they appear on screen. the meat-space sentiments rarely match the "thoughts and prayers" type online speech-acts, or at least, they are basically never extended as readily.
Nothing personal (I mean, seriously, nothing personal)
Little (probably hard) advice for if/when you're going to say something like that to a zoomer irl (based on personal experience from the receiving end):
The "you aren't as alone as it might seem" gets the "what you're saying is just factually incorrect and what you're trying to do is to bullshit me and maybe possibly yourself" thing going. I have never heard something like that from a person "in the weeds".
Same for "We'll figure it out". How much time have you personally spent "figuring it out" and how much time have you spent playing hot potato with the problem? How important is it compared to your own problems? I guess, not very, so there is no "us" figuring it out.
Basically, don't be a disingenuous dense motherfucker and don't bullshit other people and yourself. Not saying you personally are doing it, but there are definitely more people that do, than that don't.
> Whatever precipitating causes led to such suffering, know that we're _here_, _now_, together.
The article comments on this though:
"All the things that have traditionally made life worth living — love, community, country, faith, work, and family — have been “debunked.”
This is absolutely true and no wonder young folks are feeling down. I think the counter-culture types starting 50+ years ago wanted to tear down the old, but forgot to put something constructive in its place. (Well the leftist/Marxist types tried, but then the USSR imploded)
After that I started writing opub.dev because even minimal success in recent oss showed me just how much has changed, and I’m worried about how expensive everything will get for maintainers.
So, now I’m trying to GIVE people compute so they can start building a helpful filter layer above their projects.
reply