I had the opportunity to join a great group of people from my organization (ISE) during the Microsoft Hackathon 2024, where I learned a lot about using GraphDBs for RAG (Retrieval Augmented Generation) purposes with AI. Combining knowledge about relationships between multiple elements and LLMs resulted in informative outputs from our own data that we had not seen before.
Using the GraphRAG project (https://github.com/microsoft/graphrag) and regular prompts, we were able to discover insights about how engineers in our organization are using different technologies, as well as commonalities across customer projects that we can leverage in future engagements.
One additional benefit of using GraphRAG is that it generates the Graph DB elements, including nodes and edges, based on the content provided during the indexing process. This is usually a time-consuming effort when designing Graph DBs, and even when it’s not perfect, I was very impressed with the results. In one of the images, I´m showing the generated graph from feeding the project with the “A Christmas Carol” book.
My contribution to the hackathon was to write a utility in Python to project the GraphRAG .parquet files to the Azure CosmosDB implementation of Apache Gremlin, being able to visualize the Graph and run queries against it. A future effort is to augment GraphRAG to support Apache Gremlin natively, as an alternative to the current pysparrow / .parquet files combo.
Big thanks to Amy Huang Caroline Quigg Laurent Bananier Abigail Pham Dylan Freeland and Jeff Stoker for the great collaboration on this project.