This article is a guide for grouping
mutually related identifiers in groups using SQL. In math terminology, we could say it will find all connected subgraphs of an undirected graph (Graph Theory - Connectivity).
A graph is said to be connected if there is a path between every pair of a vertex.
Input, connected ident pairs:
Ident1 | Ident2
1 | 2
1 | 3
4 | 5
4 | 6
So, in the example above 1, 2 and 3 are in a group (subgraph) 1.
4, 5 and 6 are in group 2.
Note that 1, 2 and 3 are in the same group although 2 and 3 are not directly related.
Output, idents with related groups:
Ident | Group
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | 2
So, how to make connectivity groups of related elements in SQL?
My suggestion which is relatively easy to achieve is to use sp_GetIdentByGroup stored procedure. This stored procedure expects temporary table #PairIds to be created and filled. #PairIds have connected pairs ids. Result set returns Ident columns with Ids from #PairIds which are assigned to a specific group of connected elements.
First stored procedure sp_GetIdentByGroup should be created and after that sp_GetIdentByGroup can be executed to get a result.
1
2
3
4
5
6
7
8
9
10
11
12
13 | CREATE TABLE #PairIds
(
Ident1 INT,
Ident2 INT
)
INSERT INTO #PairIds
VALUES (1, 2),
(1, 3),
(4, 5),
(4, 6)
exec [dbo].[sp_GetIdentByGroup]
|
This stored procedure use cursor for grouping together related elements. The index makes it relatively fast. It is SQL 2012+ compatible.
Script for create [dbo].[sp_GetIdentByGroup] procedure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137 | CREATE PROCEDURE [dbo].[sp_GetIdentByGroup]
AS
BEGIN
DECLARE @message VARCHAR(70);
DECLARE @IdentInput1 varchar(20)
DECLARE @IdentInput2 varchar(20)
DECLARE @Counter INT
DECLARE @Group1 INT
DECLARE @Group2 INT
DECLARE @Ident varchar(20)
DECLARE @IdentCheck1 varchar(20)
DECLARE @IdentCheck2 varchar(20)
SET @Counter = 1
DECLARE @IdentByGroupCursor TABLE (
Ident varchar(20) UNIQUE CLUSTERED,
GroupID INT
);
-- Use a cursor to select your data, which enables SQL Server to extract
-- the data from your local table to the variables.
declare ins_cursor cursor for
select Ident1, Ident2 from #PairIds
open ins_cursor
fetch next from ins_cursor into @IdentInput1, @IdentInput2 -- At this point, the data from the first row
-- is in your local variables.
-- Move through the table with the @@FETCH_STATUS=0
WHILE @@FETCH_STATUS=0
BEGIN
SET @Group1 = null
SET @Group2 = null
SELECT TOP 1 @Group1 = GroupID, @IdentCheck1 = Ident
FROM @IdentByGroupCursor
WHERE Ident in (@IdentInput1)
SELECT TOP 1 @Group2 = GroupID, @IdentCheck2 = Ident
FROM @IdentByGroupCursor
WHERE Ident in (@IdentInput2)
IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL)
BEGIN
IF @Group1 > @Group2
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group2
WHERE
GroupID = @Group1
END
IF @Group2 > @Group1
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group1
WHERE
GroupID = @Group2
END
END
ELSE IF @Group1 IS NOT NULL
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group1
WHERE
Ident IN (@IdentInput1)
END
ELSE IF @Group2 IS NOT NULL
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group2
WHERE
Ident IN (@IdentInput2)
END
IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL)
BEGIN
IF @Group1 > @Group2
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group2
WHERE
GroupID = @Group1
END
IF @Group2 > @Group1
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group1
WHERE
GroupID = @Group2
END
END
IF @Group1 IS NULL
BEGIN
INSERT INTO @IdentByGroupCursor (Ident, GroupID)
VALUES (@IdentInput1, ISNULL(@Group2, @Counter))
END
IF @Group2 IS NULL
BEGIN
INSERT INTO @IdentByGroupCursor (Ident, GroupID)
VALUES (@IdentInput2, ISNULL(@Group1, @COunter))
END
IF (@Group1 IS NULL OR @Group2 IS NULL)
BEGIN
SET @COunter = @COunter +1
END
-- Once the execution has taken place, you fetch the next row of data from your local table.
fetch next from ins_cursor into @IdentInput1, @IdentInput2
End
-- When all the rows have inserted you must close and deallocate the cursor.
-- Failure to do this will not let you re-use the cursor.
close ins_cursor
deallocate ins_cursor
SELECT Ident ,DENSE_RANK() OVER( ORDER BY GroupID ASC) AS GroupID
FROM @IdentByGroupCursor
END
GO
|