Comparing graph databases
There are multiple graph databases vendors https://db-engines.com/en/ranking/graph+dbms. See also https://www.marktechpost.com/2024/06/03/top-open-source-graph-databases/
Neo4j is the market leader but there are many competitors:
- TigerGraph
- ArangoDB
- OrientDB
- Aerospike Graph
- etc etc
Some of these databases are multi-model: they combine graph and documents
You are the CTO of a innovative, ground breaking, fully funded startup. You have identified the need for a graph database in order to deploy critical user features. Now you must choose the right graph database.
In this workshop, you will pick one graph database from the websites above and compare it to Neo4j.
Your task is to write a report that justifies your chose of database. Neo4j or an alternative.
You base the report partly in hands on work done with both databases and partly on information found online.
The report should include
- tests you carried out, and issues you encountered. This the hands on part of the work. Start by just trying to install both databases on your local. Do not use cloud based hosting.
Then work on at least a couple of dimensions listed below.
- you can use an LLM to generate some text
- but you must always confirm what the LLM hs written
For instance, ask the LLM
Compare Neo4j and ArangoDB in terms of existing Managed cloud offerings.
The LLM will answer something along the lines of:
Neo4j AuraDB offers specialized graph database capabilities with predictable monthly pricing, while ArangoGraph provides multi-model flexibility with more granular hourly pricing. Choose Neo4j for pure graph workloads requiring advanced graph algorithms, or ArangoDB for applications needing multiple data models in a single platform.
This sounds great, but how do we know if it is true.
It is important that you verify that these statements are correct. You must make sure that what you write is true or at least very probable.
You should include quotes:
- from user generated content platforms (reddit is a good source to get a feel for user experience)
- from websites, open or vendor based
- forums etc
When you quote a statement, you must include the link to the source of the quote.
Comparison criterias
The list below is quite comprehensive.
Pick a subject, a specific topic and start investigating
Total Cost of Ownership
- Licensing costs (open-source vs. commercial)
- Cloud hosting fees (managed vs. self-hosted)
- Support costs and professional services
Scalability & Performance
- Can it handle 10x, 100x your current data?
- What’s the performance degradation curve?
- Can you add nodes without downtime?
Developer Experience
- Developer productivity (faster development = lower costs)
- Learning curve for your team
- Query language complexity (Cypher vs. SQL-like vs. proprietary)
Multi-Model Support
Reduce technical debt and complexity
- Document storage alongside graphs
- Key-value operations for caching
- Full-text search capabilities
- Geospatial features
Cloud-Native Readiness
- Managed cloud offerings (AWS, GCP, Azure)
- Kubernetes deployment options
- Backup/restore automation
- Monitoring and alerting capabilities
- Auto-scaling features
Vendor Lock-in Risk
Protect your future flexibility
- Open standards compliance
- Data export capabilities
- Query portability between systems
- Community vs. commercial ecosystem
Security & Compliance
- Authentication/authorization systems
- Data encryption (at rest and in transit)
- Audit logging capabilities
- Compliance certifications (SOC2, GDPR, etc.)
Ecosystem Integration
Leverage existing investments
- BI/Analytics tools integration (Tableau, PowerBI)
- Application frameworks support
- ETL/ELT pipelines compatibility
- Microservices architecture fit
Geo strategic risks
- Country of creation and development
Deliverable
- Write the report in a google doc.
- Invite me as editor.
- Add the link to the google doc report in the google spreadsheet in the “Compare Neo4j databases” column
You can also share your report in the discord channel.