Tuesday, 5 February 2008

Linking 2 tables based on row positions - SQL2000 vs SQL2005

A simple (and common??) scenario - you have 2 tables of ordered integers and want to combine them into one recordset where the first row of the first table is paired with the first row of the second table and so on, like


Table1
val(int)
---
1
3
5
9

Table2
val(int)
---
4
9
14
17


we want to return


val val
--- ---
1   4
3   9
5   14
9   17


And we want to achieve this WITHOUT the use of cursors (obviously!).

If we could assign a row number to each table then we can join on the row number, row1 = row1, row2=row2, etc... now SQL 2005 gives us a couple of ways of doing this using either the RANK() function or the ROW_NUMBER() function.


if object_id('tempdb..#t1') is not null drop table #t1
if object_id('tempdb..#t2') is not null drop table #t2

create table #t1 (val int)
create table #t2 (val int)

insert into #t1 values (1)
insert into #t1 values (3)
insert into #t1 values (5)
insert into #t1 values (9)

insert into #t2 values (4)
insert into #t2 values (9)
insert into #t2 values (14)
insert into #t2 values (17)


using RANK()


select t3a.val, t3b.val
from
(select val, rank() over (order by val) as seq from #t1) t3a
join
(select val, rank() over (order by val) as seq from #t2) t3b
on t3a.seq=t3b.seq


or using ROW_NUMBER()


select t3a.val, t3b.val
from
(select val, row_number() over (order by val) as seq from #t1) t3a
join
(select val, row_number() over (order by val) as seq from #t2) t3b
on t3a.seq=t3b.seq


But in SQL2000 these functions do not exist. An elegant way to achieve this is to join the table to itself and determine the 'rank' by counting the number of elements that are less than or equal to the current i.e.


select t11.val, count(*) as seq
from #t1 t11, #t1 t12 where t11.val>=t12.val group by t11.val


returns


val seq
--- ---
1   1
3   2
5   3
9   4


hence we can use this as the ranking function


select t3a.val, t3b.val
from
(select t11.val, count(*) as seq
from #t1 t11, #t1 t12 where t11.val>=t12.val group by t11.val) t3a
join
(select t21.val, count(*) as seq
from #t2 t21, #t2 t22 where t21.val>=t22.val group by t21.val) t3b
on t3a.seq=t3b.seq


[Credit goes to Richard Dean for the original idea]

No comments: