Data Modeling

In the “Introduction to Graph Databases” blog, we discussed the limitless applications of graph databases and how mainstream companies from a wide range of industries are leveraging the technology to meet their needs. What they all had in common was the foundational step they conducted at the very beginning to start the process– data modeling.

In this article, we will explore how data modeling plays a critical role in fraud detection– a demanding application where data is constantly changing and deriving strong insights on your data set has a direct impact on efficiency in catching and preventing fraud.

What is Data Modeling?

Every database has its building blocks (i.e. data objects, their associations, their governing rules) and a way of organizing them, like mapping out a diagram of how the data will be stored and associated with one another; this is the essence of data modeling. In other words, it is a way to organize and define our data and the relationships in between, by giving it a structure. Ultimately, the creation of an abstract model like such provides guidance on how the actual database will be built.

The Necessity of Data Modeling

Data modeling is a must. It is a communication tool that brings the technical team and subject matter experts together. From the beginning, we collaborate to understand our businesses’ needs and identify the data that drives it. Through such a strategic planning method, we are able to bring clarity to our data by establishing clearly defined business rules and emphasizing their reinforcement to enhance consistency across projects. This practically represents how organizations will operate in response to different circumstances.

Let’s take a look at some of the key outcomes to truly appreciate the importance of data modeling, for it is worth examining its pillars and the contributions it can bring to a business.

Ensures all required data objects are implemented to prevent omission of data and future report errors.
Reduce Costs and Accelerate Development–data modeling catches errors quickly which will yield a boost in development time.

The Data Modeling Process

To gain an appreciation of what is involved in the process, we will walk through both the relational and graph data modeling approaches in the section to follow. However, there are general essentials that are common to both techniques:

It starts with intensive planning on what your application will do. This requires you to identify your business needs, the goals of your database, and a path on how they will be achieved.
Classifying data objects as entities or attributes, identifying relationships between entities, determining the types of transactions within the database, and marking the rules that govern the data are important issues to address in this planning phase.
Thereafter, a rough draft of an entity-relationship model is critical to get a first look at the connections between your entities and identify any key attributes–a basic whiteboard sketch.
The next step is to map those attributes to their corresponding entities to delineate the meaning behind your business’ needs.
Finally, validating the data model and refining it to accommodate the constant changes in the data summarizes the process of data modeling.

Relational vs. Graph Data Modeling

Fraud detection involves a meticulous procedure of detecting and deflecting data. A cunning technique has been developed that proposes a present-day challenge to our detection analysts– identity fraud rings. This occurs when a person, or group of people, use false forms of identification– addresses, phone numbers, devices, credit cards, etc., and use them to purchase goods and services.

From the perspective of the detection analyst, one can imagine how important it is to have visual clarity in your data. Consider this: without much complication, a criminal detective simply stares at a board of pictures, locations, and dates interconnected with strings to track down a suspect. Similarly, the ability to clearly visualize a model is important in fraud detection, for clarity can either shape your insights or complicate them.

Keep in mind that although identifying fraud rings can be achieved through both approaches below, executing this operation in real-time is the primary goal.

The Relational Data Model

‍

In the relational approach, once the whiteboard draft is settled on, it is then converted to a conceptual model which offers a full coverage of the business concepts. So, at a high-level we can represent our general entities useful for detecting fraud– identities, accounts, devices. Thereafter, we transition into a logical model to further define the structure of the data entities and the relationships amongst them. This model is then mapped into tables and indexes through the use of multiple JOIN tables, so tables of users, accounts, orders, etc. are able to communicate with one another. In fraud detection, the case of multiple users using the same device to place an order is inevitable and this is where greater caution is needed in detecting fraudulent activity. In the relational model, this relationship is expressed by the use of a bridge between users and devices. In the long run as data expands, this can add a degree of clutter and slow down the detection and prevention process.

The Graph Data Model

‍

Here is another approach that doesn’t involve too many steps– the graph data model. After the whiteboard sketch has been determined, instead of converting our sketch (a graph-like template) into tables, all that is needed is to further enrich it– formalizing our entities, adding properties, and establishing relationships between them. In essence, the whiteboard sketch created at first nearly resembles what will be officially stored in the database. By this approach, we can simply create the user accounts as nodes with links (showing their relationships) to corresponding nodes such as pay accounts, devices, orders, etc. This important point alleviates the need for a bridge table as seen in the relational model above to accommodate for multiple users using the same device to place an order. Consequently, this allows us to reduce our model to what is precisely needed, and this simplicity aids in pattern discovery carried out in real time. From here, there can be different algorithms applied to identify fraudulent claims.

How the Two Compare

Through the advanced fraud rings we see today, fraudsters have outsmarted the traditional prevention technique which is to focus primarily on the specific entities themselves– accounts, orders, devices, etc. However, fraud detection, like in many other preeminent applications, rely heavily on connections. So, it is crucial to set our connections between the data at a higher priority than the individual data points solely. Having that mentioned, it is a necessity for the analysts to be able to traverse through the data and be presented with a model that depicts an insightful visual that allows for easy navigation throughout. Since action must be taken quickly before it's too late, an analyst–like the criminal detective, needs the ability to identify any inharmonious patterns in a timely manner.

Data Modeling is not a one-off activity

Change is inescapable. Users’ needs change and so do business requirements– businesses typically do not abide by one schema. Depending on which model you are using, addressing those changes can make a significant difference. Using the relational model, a schema migration is performed every time it becomes necessary to update the database’s schema. Given the inevitability of constant changes in user needs, this task can get quite involved and time-consuming. Also, keep in mind that data preservation is not always guaranteed upon making schema changes.

Instead of dealing with the hassle involved with schema migrations and a rigid schema, why not use a flexible schema that adapts to changes dynamically–a graph data model? After all, they are designed to work with constantly evolving needs while preserving your data and maintaining its integrity– crucial for fraud detection applications.

Conclusion

Relying on the discrete data points themselves was the essence of the traditional fraud detection method. Recently, fraudsters have outsmarted this solution using advanced measures like fraud rings. Rings composed of fake identities can be easily disguised and overlooked, and this has necessitated the need for much deeper analysis of the connections linking the data. In other words, the traditional method could detect blatant outliers but not today’s disguised fraud rings as seen in Figure 3 below.

**Figure 3:** Fraud rings disguised which are not able to be detected using the relational data model. Rather, they can be realized using graph data technology.

‍

Fraud detection is a connected data problem. Therefore, the solution remains in graph data technology, where relationships between the data can be efficiently studied to draw valuable insights and uncover patterns that may not have been possible through the relational model. Using a query language like Cypher, we can achieve these competitive advantages by quickly and easily traversing the graph model and making queries in a time-efficient manner. After putting it under the lens, we’ve demonstrated just how effective graph database technology can be in detecting fraudulent behavior and preventing potential threats in the future.

Stay tuned to the next blog in the series for a tutorial on building a real-time recommendation engine!

‍

References

ActiveWizards Blog, Graph Databases Use Cases, ActiveWizards. Accessed: 22 October 2020, <https://activewizards.com/blog/graph-databases-use-cases>
Arora, S. 2020, What is Data Modeling, Hackr.io, Accessed: 17 October 2020, <https://hackr.io/blog/what-is-data-modeling>
bitn!ne 2016, What is the Graph Database?, BITNINE GLOBAL INC., Accessed: 19 October 2020: <https://bitnine.net/blog-graph-database/what-is-the-graph-database>
Clover DX Blog, 2019, What Is Data Modeling (And Why Is It Essential)?, Clover DX., Accessed: 17 October 2020, <https://www.cloverdx.com/blog/what-is-data-modelling-and-why-is-it-essential>
Disney, A. 2020, The Ultimate Guide to Creating Graph Data Models, Cambridge Intelligence, Accessed: 22 October 2020, <https://cambridge-intelligence.com/graph-data-modeling-101>
Hunger, M.,Boyd, R., Lloyd, W., et al. 2016, RDBMS & Graphs: Relational vs. Graph Data Modeling, neo4j. Accessed: 19 October 2020, <https://neo4j.com/blog/rdbms-vs-graph-data-modeling>
Liberty University, Data Modeling, Liberty University, Accessed: 22 October 2020 Accessed: 22 October 2020
MarkLogic Whitepaper 2016, Rethink Data Modeling, MarkLogic Corporation, Accessed: 19 October 2020.
Nacinovic, H. 2019, What is Data Modeling? SAP HANA, Accessed: 19 October 2020, <https://saphanajourney.com/data-warehouse-cloud/resources/model-data>
Villedieu, J. 2014, How To Detect Bank Loan Fraud with Graphs : PART 2, Linkurious. Accessed: 19 October 2020, <https://linkurio.us/blog/how-to-detect-bank-loan-fraud-with-graphs-part-2>
Sasaki, B. 2018, Graph Databases for Beginners: The Basics of Data Modeling, Neo4j, Inc., Accessed: 22 October 2020, <https://neo4j.com/blog/data-modeling-basics>

Introduction to Graph Databases

Relational vs Graph Data Modeling