Exploring Network Data
As part of a research project, we conducted a social graph analysis based on user interaction data. To make exploratory research on the data possible, I needed to create a pipeline of cleaning, preparing, and eventually ingesting data into a JanusGraph instance and connecting to it from Jupyter notebook. The technical challenges of this project included transforming relational data into network data and migrating data across multiple framework and language boundaries.
The following steps describe the general process:
- Reading the relational PostgreSQL data into Node.js
- Cleaning the data
- Generating node and edge lists
- Deploying the JanusGraph / Elasticsearch / Cassandra stack
- Ingesting nodes and edges into JanusGraph using Java
- Creating multiple graphs within the same JanusGraph instance using Gremlin Query Language
- Deploying a Jupyter Lab Server
- Connecting to JanusGraph from the notebook using Gremlin-Python
- Social Graph analysis and data visualizations with Python
- Running an additional web app for visualizing the graph
With this architecture and process we were able to discover valuable insights and successfully conclude the research project. The notebook structure allowed researchers to conduct further analysis and include effective data visualizations in their reports.