Monday, February 22, 2021

SQL How to group identifiers that are related with each other in specific groups

This article is a guide for grouping mutually related identifiers in groups using SQL. In math terminology, we could say it will find all connected subgraphs of an undirected graph (Graph Theory - Connectivity).

A graph is said to be connected if there is a path between every pair of a vertex.

Input, connected ident pairs:

Ident1 | Ident2 
---------------------------------
 1     | 2         
 1     | 3         
 4     | 5         
 4     | 6        

So, in the example above 1, 2 and 3 are in a group (subgraph) 1.

4, 5 and 6 are in group 2.

Note that 1, 2 and 3 are in the same group although 2 and 3 are not directly related.

Output, idents with related groups:

Ident  | Group
---------------------------------
 1     | 1         
 2     | 1         
 3     | 1         
 4     | 2 
 5     | 2
 6     | 2

So, how to make connectivity groups of related elements in SQL?

My suggestion which is relatively easy to achieve is to use sp_GetIdentByGroup stored procedure. This stored procedure expects temporary table #PairIds to be created and filled. #PairIds have connected pairs ids. Result set returns Ident columns with Ids from #PairIds which are assigned to a specific group of connected elements.

First stored procedure sp_GetIdentByGroup should be created and after that sp_GetIdentByGroup can be executed to get a result.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
CREATE TABLE #PairIds
(
	Ident1 INT,
	Ident2 INT
)

INSERT INTO #PairIds
VALUES (1, 2),
(1, 3),
(4, 5),
(4, 6)

exec [dbo].[sp_GetIdentByGroup]

This stored procedure use cursor for grouping together related elements. The index makes it relatively fast. It is SQL 2012+ compatible.

Script for create [dbo].[sp_GetIdentByGroup] procedure:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
CREATE PROCEDURE [dbo].[sp_GetIdentByGroup]
AS
BEGIN

	DECLARE @message VARCHAR(70);
	DECLARE @IdentInput1 varchar(20)
	DECLARE @IdentInput2 varchar(20)
	DECLARE @Counter INT
	DECLARE @Group1 INT
	DECLARE @Group2 INT
	DECLARE @Ident varchar(20)
	DECLARE @IdentCheck1 varchar(20)
	DECLARE @IdentCheck2 varchar(20)

	SET @Counter = 1

	DECLARE @IdentByGroupCursor TABLE (
	Ident varchar(20) UNIQUE CLUSTERED,
	GroupID INT  
	);

	-- Use a cursor to select your data, which enables SQL Server to extract
	-- the data from your local table to the variables.
	declare ins_cursor cursor for
	select  Ident1, Ident2 from #PairIds


	open ins_cursor
	fetch next from ins_cursor into @IdentInput1, @IdentInput2 -- At this point, the data from the first row
	 -- is in your local variables.

	-- Move through the table with the @@FETCH_STATUS=0
	WHILE @@FETCH_STATUS=0
	BEGIN

		SET @Group1 = null
		SET @Group2 = null

		SELECT TOP 1  @Group1 = GroupID,  @IdentCheck1 = Ident
		FROM @IdentByGroupCursor
		WHERE Ident in (@IdentInput1)

		SELECT TOP 1  @Group2 = GroupID,  @IdentCheck2 = Ident
		FROM @IdentByGroupCursor
		WHERE Ident in (@IdentInput2)

		IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL)
		BEGIN
			IF @Group1 > @Group2
			BEGIN
				UPDATE @IdentByGroupCursor
				SET GroupID = @Group2
				WHERE
				GroupID = @Group1
			END

			IF @Group2 > @Group1
			BEGIN
				UPDATE @IdentByGroupCursor
				SET GroupID = @Group1
				WHERE
				GroupID = @Group2
			END
		END
		ELSE IF @Group1 IS NOT NULL
		BEGIN
			UPDATE @IdentByGroupCursor
			SET GroupID = @Group1
			WHERE
			Ident IN (@IdentInput1)
		END
		ELSE IF @Group2 IS NOT NULL
		BEGIN
			UPDATE @IdentByGroupCursor
			SET GroupID = @Group2
			WHERE
			Ident IN (@IdentInput2)
		END

		IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL)
		BEGIN
			IF @Group1 > @Group2
			BEGIN
				UPDATE @IdentByGroupCursor
				SET GroupID = @Group2
				WHERE
				GroupID = @Group1
			END

			IF @Group2 > @Group1
			BEGIN
				UPDATE @IdentByGroupCursor
				SET GroupID = @Group1
				WHERE
				GroupID = @Group2
			END

		END

			IF @Group1 IS NULL
			BEGIN

				INSERT INTO @IdentByGroupCursor (Ident, GroupID)
				VALUES (@IdentInput1, ISNULL(@Group2, @Counter))

			END

			IF @Group2 IS NULL
			BEGIN
				INSERT INTO @IdentByGroupCursor (Ident, GroupID)
				VALUES (@IdentInput2, ISNULL(@Group1, @COunter))
			END

			IF (@Group1 IS NULL OR @Group2 IS NULL)
			BEGIN

			SET @COunter =  @COunter  +1

		END

		-- Once the execution has taken place, you fetch the next row of data from your local table.
		fetch next from ins_cursor into @IdentInput1, @IdentInput2

	End


	-- When all the rows have inserted you must close and deallocate the cursor.
	-- Failure to do this will not let you re-use the cursor.    
	close ins_cursor
	deallocate ins_cursor


	SELECT Ident ,DENSE_RANK() OVER( ORDER BY GroupID ASC) AS GroupID
	FROM @IdentByGroupCursor

END
GO

No comments: