This article is a guide for grouping mutually related identifiers in groups using SQL. In math terminology, we could say it will find all connected subgraphs of an undirected graph (Graph Theory - Connectivity).
A graph is said to be connected if there is a path between every pair of a vertex.
Input, connected ident pairs:
Ident1 | Ident2
---------------------------------
1 | 2
1 | 3
4 | 5
4 | 6
So, in the example above 1, 2 and 3 are in a group (subgraph) 1.
4, 5 and 6 are in group 2.
Note that 1, 2 and 3 are in the same group although 2 and 3 are not directly related.
Output, idents with related groups:
Ident | Group
---------------------------------
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | 2
So, how to make connectivity groups of related elements in SQL?
My suggestion which is relatively easy to achieve is to use sp_GetIdentByGroup stored procedure. This stored procedure expects temporary table #PairIds to be created and filled. #PairIds have connected pairs ids. Result set returns Ident columns with Ids from #PairIds which are assigned to a specific group of connected elements.
First stored procedure sp_GetIdentByGroup should be created and after that sp_GetIdentByGroup can be executed to get a result.
1 2 3 4 5 6 7 8 9 10 11 12 13 | CREATE TABLE #PairIds ( Ident1 INT, Ident2 INT ) INSERT INTO #PairIds VALUES (1, 2), (1, 3), (4, 5), (4, 6) exec [dbo].[sp_GetIdentByGroup] |
This stored procedure use cursor for grouping together related elements. The index makes it relatively fast. It is SQL 2012+ compatible.
Script for create [dbo].[sp_GetIdentByGroup] procedure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | CREATE PROCEDURE [dbo].[sp_GetIdentByGroup] AS BEGIN DECLARE @message VARCHAR(70); DECLARE @IdentInput1 varchar(20) DECLARE @IdentInput2 varchar(20) DECLARE @Counter INT DECLARE @Group1 INT DECLARE @Group2 INT DECLARE @Ident varchar(20) DECLARE @IdentCheck1 varchar(20) DECLARE @IdentCheck2 varchar(20) SET @Counter = 1 DECLARE @IdentByGroupCursor TABLE ( Ident varchar(20) UNIQUE CLUSTERED, GroupID INT ); -- Use a cursor to select your data, which enables SQL Server to extract -- the data from your local table to the variables. declare ins_cursor cursor for select Ident1, Ident2 from #PairIds open ins_cursor fetch next from ins_cursor into @IdentInput1, @IdentInput2 -- At this point, the data from the first row -- is in your local variables. -- Move through the table with the @@FETCH_STATUS=0 WHILE @@FETCH_STATUS=0 BEGIN SET @Group1 = null SET @Group2 = null SELECT TOP 1 @Group1 = GroupID, @IdentCheck1 = Ident FROM @IdentByGroupCursor WHERE Ident in (@IdentInput1) SELECT TOP 1 @Group2 = GroupID, @IdentCheck2 = Ident FROM @IdentByGroupCursor WHERE Ident in (@IdentInput2) IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL) BEGIN IF @Group1 > @Group2 BEGIN UPDATE @IdentByGroupCursor SET GroupID = @Group2 WHERE GroupID = @Group1 END IF @Group2 > @Group1 BEGIN UPDATE @IdentByGroupCursor SET GroupID = @Group1 WHERE GroupID = @Group2 END END ELSE IF @Group1 IS NOT NULL BEGIN UPDATE @IdentByGroupCursor SET GroupID = @Group1 WHERE Ident IN (@IdentInput1) END ELSE IF @Group2 IS NOT NULL BEGIN UPDATE @IdentByGroupCursor SET GroupID = @Group2 WHERE Ident IN (@IdentInput2) END IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL) BEGIN IF @Group1 > @Group2 BEGIN UPDATE @IdentByGroupCursor SET GroupID = @Group2 WHERE GroupID = @Group1 END IF @Group2 > @Group1 BEGIN UPDATE @IdentByGroupCursor SET GroupID = @Group1 WHERE GroupID = @Group2 END END IF @Group1 IS NULL BEGIN INSERT INTO @IdentByGroupCursor (Ident, GroupID) VALUES (@IdentInput1, ISNULL(@Group2, @Counter)) END IF @Group2 IS NULL BEGIN INSERT INTO @IdentByGroupCursor (Ident, GroupID) VALUES (@IdentInput2, ISNULL(@Group1, @COunter)) END IF (@Group1 IS NULL OR @Group2 IS NULL) BEGIN SET @COunter = @COunter +1 END -- Once the execution has taken place, you fetch the next row of data from your local table. fetch next from ins_cursor into @IdentInput1, @IdentInput2 End -- When all the rows have inserted you must close and deallocate the cursor. -- Failure to do this will not let you re-use the cursor. close ins_cursor deallocate ins_cursor SELECT Ident ,DENSE_RANK() OVER( ORDER BY GroupID ASC) AS GroupID FROM @IdentByGroupCursor END GO |
No comments:
Post a Comment