Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cube.js: Headless Semantic Layer (github.com/cube-js)
113 points by klaussilveira on May 1, 2023 | hide | past | favorite | 48 comments


It looks very important, popular and well established, but what is it? I looked at the README hoping to understand what a semantic layer for building data applications means but no love.

It helps data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application. Like an ORM? Or a middleware?

> Cube was designed to work with all SQL-enabled data sources, including cloud data warehouses like Snowflake or Google BigQuery, query engines like Presto or Amazon Athena, and application databases like Postgres.

Still not getting it. Is it that it can perform a single query across multiple databases?


As part of the Cube team, I have to admit that all descriptions in the sibling comments make a lot of sense. Of course, the "semantic layer" thing is quite known to data engineers/analysts and other data folks in general (they also know things like "metrics store", "headless BI", etc.) but not that well known outside of the data space. Probably, it would be best to describe what are the major use cases Cube is created for.

1. Embedded analytics — you have your data somewhere (data warehouse, database, etc.) and you'd like to embed it into a data app. Cube would provide connectivity to data sources, data modeling to define the metrics, caching to make your analytics fast, and APIs and SDKs to deliver them to the data app. E.g., if you decided to add a chart to your front-end app, fetching the data from the API would be as easy as sending a JSON query to Cube.

2. Semantic layer for the internal BI — you have your data somewhere and you'd like to provide access to insights based on that data to business users. Cube would provide connectivity to data sources, data modeling to define the metrics, access control to make sure only ones who need access to metrics have it, caching to make sure every dashboard loads instantly, and APIs to deliver the data to BI tools, notebooks, etc. E.g., if you want to create some dashboards in Superset, Metabase, Tableau, or Power BI, you'd just need to connect Cube's SQL API as if it was a regular database and start creating charts/dashboards.



That makes a lot of sense to me, and I see why it would be hard to coalesce all of that functionality into one or two sentences that would make sense to a more general, non-data, tech audience.


So how does this compare to am embedded analytics service like SiSense, Looker? Is this sort of in between?


My understanding is that it's essentially Looker minus the dashboarding. What you would define via LookML is essentially the "semantic layer" that this is addressing. DBT is attempting to do similar work: https://www.getdbt.com/product/semantic-layer/


"Looker minus dashboarding plus APIs (SQL/REST/GraphQL) and, subjectively, better aggregate awareness (AKA "pre-aggregations" in Cube).


Cube has saved me hundred of hours. I use it as backend for reporting and dashboard inside our SaaS. In our frontend I've build a light-version of PowerBI and I use Cube for a backend. Instead of manipulating SQL directly I use Cube's JSON query format. Kind of difficult to explain, but Cube might be the best piece of software I have ever used.

Maybe a good tagline would be "self-hostable Backend as a Service for data analysis"?


Let’s say you work for a SaaS doing analytics. Your boss says “hey! We need to start reporting on new logos. Can you snag those from the DB?”

But what counts as a new logo? Does a pro serve engagement that doesn’t use the product count? What about a business using the SaaS but still in a trial period? Etc.

A semantic layer helps provide common agrees upon definitions to the business. So any one looking for common data entities can just look those things up… and can come to published definitions (which are backed by queries to databases, data lakes, etc).

Does that help? Another example of this would be dbt for example


It is kind of like an ORM. I find ORM's and semantic layers to be similar in many ways, except that semantic layers are meant for defining metrics too. These metrics describe aggregating data. Like summing order amounts to get revenue, or counting order_ids to get sales.

I wrote a series on semantic layers on my substack, hopefully it helps: https://davidsj.substack.com/p/semantic-superiority-part-1


I think ORMs have got some bad press because they were intended to be used bi-directionally: map data from the data source to business objects and back. With semantic layers, data is only mapped to metrics and rarely back - which makes things much simpler, IMO.


I can't vouch for cube itself as I haven't used it but can confidently say such tools are highly valuable. I built one for use in my own business and have operated other businesses on similar tools.

It brings all data together, provides a consistent interface, and is way faster than writing SQL (though there will still be use cases for that). There is some up front cost to getting configured but it pays off in my case at least.

https://github.com/totalhack/zillion


Say you want to build a dashboard with charts and custom timerange selection using data you already have in Postgres/other DB, without killing your DB under the pressure of queries AND without having to write an additional API?

Cube.js is the tool for that. Handles data modeling (you can define a schema on top of your SQL schema), caching, access control and API for you.


Data modelling is important to highlight, and if OP not familiar with the concept and need then likely won’t see the obvious value of Cube.


From what I read it’s a way to expose SQL via APIs (along with stuff needed to do it like auth, perf, query reuse, etc)

Instead of starting from a general purpose web framework+orm you have your data/schema and can query it over http conveniently to build BI/dashboards.


> It looks very important, popular and well established, but what is it?

It's easier to explain what Cube is if we first define what the Semantic Layer(SL) is. In a few words, the SL is the abstract representation of business objects, for example: sales, users, conversion rates, etc. Cube provides the language to define the SL, an API to access it, access control mechanisms and a caching layer. It's important to emphasize that Cube is a stand-alone SL, decoupled from any BI visualization tool. That's the "headless" part, and I would also add that is "feetless" since it supports multiple source DBs. Looker the other big name in the space has the incentive of selling you more usage of BigQuery and of locking you in with their UI, it just recently started to open up to the idea of APIs. The idea is that you have a central place where you define the SL and then you don't need to duplicate the definition on every downstream application, which may lead to errors or inconsistencies.

> Is it that it can perform a single query across multiple databases?

Cube allows you to join data from multiple databases at the caching layer, that's fundamentally differently than a federated query engine. But from the downstream application perspective it has the same outcome. By being done at the caching layer it has inherent advantages and limitations vs federated queries.

I really like these series of articles by David Jayatillake that go into deeper detail:

1. https://davidsj.substack.com/p/semantic-superiority-part-1 2. https://davidsj.substack.com/p/semantic-superiority-part-2 3. https://davidsj.substack.com/p/semantic-superiority-part-3 4. https://davidsj.substack.com/p/semantic-superiority-part-4 5. https://davidsj.substack.com/p/semantic-superiority-part-5


I saw a project in similar space- Malloy- go by recently, in a post, What Happened to the Semantic Layer, which I thought nicely setup the space & problem. https://carlineng.com/?postid=semantic-layer#blog https://news.ycombinator.com/item?id=35715410

They indeed mentioned Cube.js.

Bunch of Malloy links over time: https://hn.algolia.com/?dateRange=all&query=Malloy&sort=byDa...

And cube.js, https://hn.algolia.com/?dateRange=all&query=cube.js&sort=byD...

It's all very interesting to me because I get the sense these folks feel like we haven't figured out to to make meaning from the data we have. As a developer, that rings true; the developee is the agent responsible for understanding what we have in sql now & how to eek out meaning, how to structure more info in. The information architectures are so occluded & concealed, so stoggily low level. I love these attempts to try to help us think more about our data.


So good you brought Malloy here. I like it quite a bit because the folks really try to innovate (heck, they even have their own data querying syntax to replace SQL). But what I like even more — being part of the Cube team — that the "cons" of existing solutions that Carlin mentions in his blog are actually already solved by Cube.

With Cube, Data exploration, ideation on the data model, querying, and bringing the insights all the way down to BI tools or data apps takes minutes rather than hours or days. Done, case closed :-)


My understanding of Cube is that iterating on the data model requires the user to (1) write SQL to develop a metric (2) edit YAML or JS config to incorporate the new metric (3) issue API request to Cube server and (4) compare results to raw SQL. Am I mistaken? Does Cube offer a smoother way to do this exploration/iteration?


Hey, Carlin! Nice to see you here in comments! (Waving "hi" to the Malloy team.)

Usually, the experience would look like this: one directly develops the data model in YAML (with only bits of SQL, if needed) and instantly explores metrics. No need to start with SQL in a separate tool/place (1), no need to use the API to check metrics (2) (for that, we have Playground, an interactive UI tool), and, thus, no need to compare results to raw SQL (4). You iterate but changing the data model and seeing the metrics in an instant, quite similar to how you work with Malloy, if I may.


My operating theory is that in app centric development the data store is a component of the application. As a result, much of the metadata that changes data into information (data with meaning) is stored in app code and config. Upon ETL to reporting, warehouse, lake, or the like, the app’s semantic layer is lost.


It's a crying shame how little of the data-core makes it's way out of applications. For both the app's owning entity & especially for the users.

I paid respects[1] recently yet again to Window Manager Improved Improved (wmii), which kept it's state in a 9p filesystem any user could easily browse & modify. The state is still a component of the app/window manager, but it's at least malleable to all.

[1] https://news.ycombinator.com/item?id=35768686


Beware of their quite well hidden opt-out telemetry collection:

https://cube.dev/docs/config#options-reference-telemetry

It's really quite improper to not have this clearly mentioned anywhere obvious.


A hot topic. Related recent discussions on active-by-default telemetry:

Dropbox telemetry can't be disabled (4 days ago, 241 comments) https://news.ycombinator.com/item?id=35724939

1Password to Add Telemetry (7 days ago, 342 comments) https://news.ycombinator.com/item?id=35691383

Telemetry in Front-End Tools (26 days ago, 141 comments) https://news.ycombinator.com/item?id=35458974

Go claims telemetry objectors arguing in bad faith and violating Code of Conduct (77 days ago, 337 comments) https://news.ycombinator.com/item?id=34771472

Transparent telemetry for open-source projects (82 days ago, 305 comments) https://news.ycombinator.com/item?id=34707583

p.s. The story I first thought of when I saw your reply was the Golang discussion, aka "Transparent telemetry" above.


Both the configuration option and environment variable for its anonymous telemetry are documented. It's not hidden.


Fair enough, although it's not as clearly advertised as I'm of the opinion that it should be, either.


There used to be a product called Statsbot which was a friendly UI built on top of Cube.js. They shuttered the service a few years back and we're still struggling to find an alternative that is as simple to use by anyone in the company and easy to set up. We've gone through all the usual BI suspects and nothing comes close. I'd love to find the time to rebuild this.


Oh, it's interesting to meet a Statsbot user in the wild! Indeed, Cube was spun off Statsbot and became the foundation on which others can build products like the one mentioned in the sibling comment: Delphi.

Cube acts as the semantic layer, providing the access to data sources and centralizing the data model. Delphi acts as the UI for the end user, enabling them to ask questions in natural language. I've blogged about Cube and Delphi here: https://cube.dev/blog/conversational-interface-for-semantic-.... Also, here's a demo video on YouTube I've recorded recently: https://www.youtube.com/watch?v=FotEaaf20gY


You might take a look at Delphi (https://www.delphihq.com/) which is built on top of semantic layers and recently integrated with Cube.


I've never used Statsbot but you can easily integrate cube with retool. They even have a guide for it, it's the first result on google for cube + retool.

Also worth mentioning retool it's really cheap compared to "delphi" that I'm seeing shared in other comments.


Here's the Retool guide (https://cube.dev/blog/building-an-internal-dashboard-with-re...) but I don't think Retool should be compared to Delphi; one is a low-code tool builder; the other one is a conversational interface for the semantic layer. Both are great for their purposes, both can be used with Cube, even at the same time :-)


Fair enough, I don't know anything about Delphi except its price tag. Given OPs request for a friendly UI to sit on top of cube I thought retool could fit well that use case. And thanks for sharing the guide, I use and love Cube but I didn't want to pass as a shill :)


Does cube support dynamically figuring out joins or is each cube a hard coded set of joins?

My similar, much less polished project doesn't require you to specify joins ahead of time outside of optionally defining a tree-like table lineage: https://github.com/totalhack/zillion


Yep. Cube would automatically figure out the join path for you on top of the defined join graph in the data model using the Dijkstra algorithm. The best practice however is to use views: https://cube.dev/docs/schema/reference/view/#views. Those can be used to explicitly control join paths and get an effect similar to what Looker Explore can provide.


Cool, do you have an example showing a config/setup that can do dynamic joins? All the examples I see have joins explicitly defined at the cube level in the yaml config.

Zillion uses networkx to create a graph of tables and relationships so it sounds like it's doing something similar.

Views are a nice guardrail but can also get in the way! I've experienced this frustration first hand (and seen it with my business users) when using tools that focus on such a premise. Guess it depends how complex your data model is and how quickly your business is evolving / adding to that model. In the move-fast-break-stuff phase of a company they can get annoying at least.


What's happening with this license? I am always worried when the GH license detector says ":shrug:" and I don't know of any easy way to diff the license file against what I presume is a GPLv3 base layer(?)

https://github.com/totalhack/zillion/blob/v0.9.14/LICENSE


I just did a diff of it against <https://www.gnu.org/licenses/lgpl-3.0.txt> and the only differences are the lines above "GNU LESSER GENERAL PUBLIC LICENSE" at the top and a blank line at the bottom.


I didn't realize GitHub would struggle with this. I'll consider cleaning that up, thanks!


I’ve toyed around with Cube and it’s certainly a solid and mature product!

My only gripe was the cube definition. It uses a weird JS-like DSL, but not real JS. You can’t use any packages or anything. Feels like a strange limitation. Also lacks types, which are table stakes these days.

We’ll probably adapt it in the coming SaaS build out!


You actually can use any npm packages. Here's an example on how you can use `node-fetch`: https://cube.dev/docs/schema/advanced/dynamic-schema-creatio.... Same capabilities are coming soon for Python and YAML as well.


(Igor from the Cube team here.) Whoa! Great to see Cube here. Would love to take questions about all things Cube, use cases, our docs, developer experience, etc.


Cube.js originally allowed embedding the app within an express server as just another route. They took this away and insist on using their Docker deployment approach. I'm still using the original versions that support this.


This is a frontend for data is that right? That's a phenomenal concept if performance is a major priority. Does it work with Salesforce?


Frontend as in "access layer" or "API"? Yes, kind of (more than that). Frontend as in "HTML/SVG/JavaScript in the browser"? Certainly not :)

Re: Salesforce. The best way to integrate two is to move (ETL) the data from Salesforce to a data warehouse and have Cube connect to that data warehouse. Works really well in the real world!


Is this tool kind of useless ? I tried to find all documentation on the website but find no way to embed the dashboard inside my Nextjs application. Or atleast, exposing a metrics as API to be consumed directly from frontend.


Come on. The three relevant ways to access the system via API (Rest, graphql, SQL) + available frontend integrations are top level entries in the documentation (https://cube.dev/docs).


Look like the documentation is updated.


https://cube.dev/docs/config/downstream might be a great starting point for that. Here you can get a grasp of how dashboard apps can be built: https://cube.dev/docs/examples#tutorials-front-end-integrati.... Here's the post about embedding into Next.js app in particular: https://cube.dev/blog/building-nextjs-dashboard-with-dynamic....




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: