Using Shortest Paths for Risk Assessment and Compliance
Business Case: Transaction Network Risk Analysis
Financial institutions need to assess the risk of new customers by understanding their potential connections to known high-risk entities. In a transaction network: - Nodes represent entities (individuals or businesses) - Edges represent transactions or other relationships - Some nodes are flagged as high-risk (blacklisted)
The key insight is that entities closer to high-risk nodes in the transaction network may pose greater risk themselves.
Why Shortest Paths Matter
The shortest path between two entities in a transaction network represents: - The minimum number of intermediaries needed to connect them - The most direct potential influence or relationship path - A quantifiable measure of "relationship distance"
This analysis helps: - Score new customers based on proximity to high-risk entities - Identify potential money laundering routes - Assess indirect exposure to sanctioned entities - Support Know Your Customer (KYC) processes
Implementation with IbisGraph in Snowflake
Here's how to implement this analysis using IbisGraph while keeping all processing within Snowflake:
import ibis
import ibisgraph as ig
# Connect to Snowflake
conn = ibis.snowflake.connect(
user='YOUR_USER',
password='YOUR_PASSWORD',
account='YOUR_ACCOUNT',
database='YOUR_DATABASE',
schema='YOUR_SCHEMA'
)
# Assume we have these tables in Snowflake:
# - TRANSACTIONS: Historical transaction data
# - BLACKLIST: Known high-risk entities
# - NEW_CUSTOMERS: Customers requiring risk assessment
# Load required tables
transactions = conn.table('TRANSACTIONS')
blacklist = conn.table('BLACKLIST')
new_customers = conn.table('NEW_CUSTOMERS')
# Create a graph from transactions
graph = ig.Graph(
transactions,
source_col='from_entity',
target_col='to_entity',
)
# Get list of blacklisted entity IDs
blacklisted_ids = blacklist.select('entity_id').execute().entity_id.tolist()
# Calculate shortest paths from all blacklisted nodes;
#
# ShortestPaths routine returns a Map<landmark -> distance>
# but in this case we need only distances
paths = (
ig.traversal.shortest_paths(graph, landmarks=blacklisted_ids)
.select(
ibis._["node_id"],
ibid._["distances"].values().name("distances"),
)
)
# Create risk scores based on distances
risk_assessment = (
paths.select([
paths.node_id,
# Calculate minimum distance to any blacklisted entity
paths.distances.min().name('min_distance_to_blacklist'),
# Calculate average distance to blacklisted entities
paths.distances.mean().name('avg_distance_to_blacklist'),
# Count how many blacklisted entities are within 2 steps
paths.distances.filter(lambda x: x <= 2).count().name('close_blacklist_count')
])
)
# Join with new customers to get their risk assessment
new_customer_risk = (
new_customers
.join(risk_assessment, new_customers.entity_id == risk_assessment.node_id)
.select([
'entity_id',
'customer_name',
'min_distance_to_blacklist',
'avg_distance_to_blacklist',
'close_blacklist_count'
])
)
# Execute the analysis
results = new_customer_risk.execute()
Risk Scoring Framework
Here's how to interpret and use the distances for risk scoring:
# Create a more sophisticated risk score
risk_scores = (
new_customer_risk
.mutate(
risk_score=ibis.case()
.when(risk_assessment.min_distance_to_blacklist <= 1, 1.0) # Direct connection
.when(risk_assessment.min_distance_to_blacklist == 2, 0.7) # One intermediary
.when(risk_assessment.min_distance_to_blacklist == 3, 0.4) # Two intermediaries
.when(risk_assessment.min_distance_to_blacklist == 4, 0.2) # Three intermediaries
.else_(0.1) # More distant
)
)
# Add risk categories
categorized_risks = (
risk_scores
.mutate(
risk_category=ibis.case()
.when(risk_scores.risk_score >= 0.8, 'HIGH')
.when(risk_scores.risk_score >= 0.5, 'MEDIUM')
.else_('LOW')
)
)
Advanced Analysis Techniques
Time-Based Analysis
Consider transaction recency in your analysis:
# Create time-weighted graph
recent_transactions = transactions.filter(
transactions.transaction_date >= '2024-01-01'
)
# Separate graphs for different time periods
recent_graph = ig.Graph(
recent_transactions,
source_col='from_entity',
target_col='to_entity'
)
historical_graph = ig.Graph(
transactions.filter(transactions.transaction_date < '2024-01-01'),
source_col='from_entity',
target_col='to_entity'
)
# Compare paths in different time periods
recent_paths = ig.traversal.shortest_paths(recent_graph, blacklisted_ids)
historical_paths = ig.traversal.shortest_paths(historical_graph, blacklisted_ids)
Transaction Volume Consideration
Weight paths by transaction volumes:
# Create graph with transaction amount weights
weighted_graph = ig.Graph(
transactions,
source_col='from_entity',
target_col='to_entity',
weight_col='transaction_amount'
)
# Higher weights mean stronger connections
weighted_paths = ig.traveral.shortest_paths(weighted_graph, blacklisted_ids)
Benefits of Using IbisGraph with Snowflake
-
Data Security
- Sensitive transaction data never leaves Snowflake
- Complies with data governance policies
- Maintains audit trail within Snowflake
-
Performance
- Leverages Snowflake's computational resources
- Scales automatically with data volume
- Efficient processing of large transaction networks
-
Real-time Analysis
- Can be integrated into customer onboarding flows
- Supports continuous monitoring
- Easy to update as new transactions occur
-
Compliance
- Maintains data lineage
- Supports regulatory reporting requirements
- Provides audit trails for risk decisions
Practical Applications
-
Customer Onboarding
- Pre-screen new customers
- Set initial risk levels
- Determine required due diligence level
-
Ongoing Monitoring
- Track changes in risk proximity
- Identify emerging risk patterns
- Support suspicious activity reporting
-
Portfolio Risk Management
- Assess aggregate exposure to high-risk entities
- Monitor risk concentration
- Support strategic decisions
This approach provides a data-driven, scalable solution for risk assessment while maintaining data security and leveraging existing infrastructure. It can be easily integrated into existing compliance workflows and supports both batch and real-time analysis needs.