Social Knowledge Graph Exploration

We want to try employ SNA measures to Knowledge Graph to help novice users to explore the DBpedia more efficiently without the need to master query languages and graph structures.

Proposal: Social Knowledge Graph: Employing SNA measures to Knowledge Graph

Introduction

When novice users use DBpedia for querying, the information they really want is always overwhelmed by numerous query results. In this project, we want to leverage the Knowledge Graph of DBpedia to develop a graph-query tool that can help the end user to obtain relevant information w.r.t his request/input/query. We can give the users a subgraph where the concept/entity that students query for is center and it is surrounded by its most important concepts (like the top-5 or top-10, in terms of the Social Network Analysis measures).

Process

The above image shows what our developed system does after a user makes a query.

Users input the center entity they want to query.
Our system converts users input into the SPARQL.
Query requests to the public endpoint
Calculate the importance of all acquired data
Get the top-10 important entities and plot the graph
Users can click on the entities on the graph to continue the query...

Details

This section provides a more detailed explanation of the above process.

Data preprocessing

Dealing with users inputs
- Add underscores between space
- Add backslashes before non-numeric alphabetic characters
Cleaning the returned data of the public endpoint
- Keep only meaningful entities and relationships and remove something like entities about time or relationships called wiki

Ranking methods

We try to leverage 2 kinds of methods to calculate the importance of entities. One is ranked by degree and another is a new method proposed by ourselves.

For the first method, the higher degree proves that the nodes are more scalable and more beneficial to help users expand the graph.
As our proposed method, it consists of the following components:
- Clustering
  - Let the similar relationships all come together
- Normalization
  - Normalization in clusters
- Attenuation
  - Different levels of decay based on normalized ranking
- Ranking
  - Sort all nodes and output them

Our proposed method performs well in removing data bias and presenting as many dimensional attributes as possible.

The following two figures show the different results obtained by the two methods above when Maxwell is the central node.

{
    "Einstein family":  "rdf-schema#seeAlso",
    "United States":  "citizenship", 
    "German Empire":  "birth place",
    "Mileva Mari": "spouse",
    "Fellow of the Royal Society":  "award",
    "Heinrich Friedrich Weber":  "doctoral advisor",
    "Physics":  "fields", 
    "University of Oxford":"institution",
    "Philosophy": "fields",
    "General relativity": "famous",
    "Alfred Kleiner": "doctoral advisor"
    }

We can see the method we propose shows more dimensions while considering scalability(degree) and diversity(relationships).

Due to the some reasons of the second methods, the codes we release uses the degree as ranking evaluation, but you can try the beta version here with Colab.

Attention: It should be noted that even though we choose the seemingly simplest method of degree expansion, we have read a lot of literature and experimented with it. Several common expansion methods can be seen in this paper.

Difficulty

Reasons why not apply for the new method in the final version:

we can't download the data and compared to "degree method", it will cost more time for calculation.
More importantly, the relationships in the DBpedia are not always words.It requires to build a word dictionary manually to make the word similarity calculations successful.

e.g influencedBy -> influence by

Dashboard

Users can click on the nodes in graph to do the further explorations.

Features

input：You should enter the name of an entity as input,e.g James Clerk Maxwell or Albert Einstein
hover: You can hover over the edges to see the specific relationships.
click: You can click an entity to expand and it will return the top-10 entites around the clicked entity.

Development Life Cycle

Tools and Frameworks used for developing this system:

Flask framework (for backend operations and handling requests)
Plotly and D3.js (for visualizations)
Netwokx(for visualization of methodological exploration)
VS code(for efficient development and continous integration)

Running on your local system

git clone https://github.com/dbpedia/social-knowledge-graph.git
pip install -r requirements.txt
python app.py

Future Work

Put service on the cloud

The reason why we didn't put the code on the server was that the cloud service I had purchased required a vpn connection to access the public endpoint, and I kept getting errors when trying to download the data, which turned out to be the storage problems.

Convert the network requests to local queries

The ideal way is download the data to convert the network requests to local queries, and then store the results getting by the two methods in advance which could be efficient and stable for users.If so, we can actually use a more fetching way of expanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Social Knowledge Graph Exploration

Introduction

Process

Details

Data preprocessing

Ranking methods

Difficulty

Dashboard

Features

Development Life Cycle

Running on your local system

Future Work

Put service on the cloud

Convert the network requests to local queries

Files

README.md

Latest commit

History

README.md

File metadata and controls

Social Knowledge Graph Exploration

Introduction

Process

Details

Data preprocessing

Ranking methods

Difficulty

Dashboard

Features

Development Life Cycle

Running on your local system

Future Work

Put service on the cloud

Convert the network requests to local queries