program story

SQL, 보조 숫자 테이블

inputbox 2020. 12. 15. 19:16
반응형

SQL, 보조 숫자 테이블


특정 유형의 SQL 쿼리의 경우 보조 숫자 테이블이 매우 유용 할 수 있습니다. 특정 작업에 필요한만큼의 행이있는 테이블 또는 각 쿼리에 필요한 행 수를 반환하는 사용자 정의 함수로 생성 될 수 있습니다.

그러한 기능을 만드는 최적의 방법은 무엇입니까?


ㅎㅎ .. 예전 게시물에 너무 늦어서 죄송합니다. 그리고 네,이 스레드에 대한 가장 인기있는 답변 (당시에는 14 개의 다른 메서드에 대한 링크가 포함 된 Recursive CTE 답변)이 음 ... 성능에 문제가 있기 때문에 응답해야했습니다.

첫째, 14 개의 다른 솔루션이 포함 된 기사는 Numbers / Tally 테이블을 즉석에서 생성하는 다양한 방법을 보는 데 적합하지만 기사와 인용 된 스레드에서 지적했듯이 매우 중요한 인용문이 있습니다.

"효율성 및 성능에 대한 제안은 종종 주관적입니다. 쿼리가 사용되는 방식에 관계없이 물리적 구현은 쿼리의 효율성을 결정합니다. 따라서 편향된 지침에 의존하는 대신 쿼리를 테스트하고 어떤 쿼리를 결정하는 것이 중요합니다. 더 잘 수행됩니다. "

아이러니하게도, 기사 자체는 많은 주관적인 진술과 "바이어스 가이드 라인"포함 "재귀 CTE가 목록 번호 생성 할 수 있습니다 매우 효율적을 ""이것은 효율적인 방법 Itzik 벤 겐하여 뉴스 그룹 포스팅에서 WHILE 루프를 사용을" ( 나는 그가 단지 비교 목적으로 게시했다고 확신합니다). 어서 여러분 ... Itzik의 좋은 이름을 언급하는 것만으로도 그 끔찍한 방법을 실제로 사용하게 될 수 있습니다. 저자는 특히 확장성에 직면하여 말도 안되게 잘못된 진술을하기 전에 자신이 설교하는 것을 연습하고 약간의 성능 테스트를해야합니다.

코드가 무엇을하는지 또는 누군가가 "좋아하는"것에 대해 주관적인 주장을하기 전에 실제로 몇 가지 테스트를 수행한다는 생각으로, 여기에 자신의 테스트를 수행 할 수있는 코드가 있습니다. 테스트를 실행중인 SPID에 대한 프로파일 러를 설정하고 직접 확인해보십시오. "즐겨 찾기"번호에 대해 숫자 1000000을 "Search'n'Replace"하고 확인하십시오.

--===== Test for 1000000 rows ==================================
GO
--===== Traditional RECURSIVE CTE method
   WITH Tally (N) AS 
        ( 
         SELECT 1 UNION ALL 
         SELECT 1 + N FROM Tally WHERE N < 1000000 
        ) 
 SELECT N 
   INTO #Tally1 
   FROM Tally 
 OPTION (MAXRECURSION 0);
GO
--===== Traditional WHILE LOOP method
 CREATE TABLE #Tally2 (N INT);
    SET NOCOUNT ON;
DECLARE @Index INT;
    SET @Index = 1;
  WHILE @Index <= 1000000 
  BEGIN 
         INSERT #Tally2 (N) 
         VALUES (@Index);
            SET @Index = @Index + 1;
    END;
GO
--===== Traditional CROSS JOIN table method
 SELECT TOP (1000000)
        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N
   INTO #Tally3
   FROM Master.sys.All_Columns ac1
  CROSS JOIN Master.sys.ALL_Columns ac2;
GO
--===== Itzik's CROSS JOINED CTE method
   WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT N
   INTO #Tally4
   FROM cteTally
  WHERE N <= 1000000;
GO
--===== Housekeeping
   DROP TABLE #Tally1, #Tally2, #Tally3, #Tally4;
GO

여기에 100, 1000, 10000, 100000 및 1000000 값에 대해 SQL 프로필러에서 얻은 숫자가 있습니다.

SPID TextData                                 Dur(ms) CPU   Reads   Writes
---- ---------------------------------------- ------- ----- ------- ------
  51 --===== Test for 100 rows ==============       8     0       0      0
  51 --===== Traditional RECURSIVE CTE method      16     0     868      0
  51 --===== Traditional WHILE LOOP method CR      73    16     175      2
  51 --===== Traditional CROSS JOIN table met      11     0      80      0
  51 --===== Itzik's CROSS JOINED CTE method        6     0      63      0
  51 --===== Housekeeping   DROP TABLE #Tally      35    31     401      0

  51 --===== Test for 1000 rows =============       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method      47    47    8074      0
  51 --===== Traditional WHILE LOOP method CR      80    78    1085      0
  51 --===== Traditional CROSS JOIN table met       5     0      98      0
  51 --===== Itzik's CROSS JOINED CTE method        2     0      83      0
  51 --===== Housekeeping   DROP TABLE #Tally       6    15     426      0

  51 --===== Test for 10000 rows ============       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method     434   344   80230     10
  51 --===== Traditional WHILE LOOP method CR     671   563   10240      9
  51 --===== Traditional CROSS JOIN table met      25    31     302     15
  51 --===== Itzik's CROSS JOINED CTE method       24     0     192     15
  51 --===== Housekeeping   DROP TABLE #Tally       7    15     531      0

  51 --===== Test for 100000 rows ===========       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method    4143  3813  800260    154
  51 --===== Traditional WHILE LOOP method CR    5820  5547  101380    161
  51 --===== Traditional CROSS JOIN table met     160   140     479    211
  51 --===== Itzik's CROSS JOINED CTE method      153   141     276    204
  51 --===== Housekeeping   DROP TABLE #Tally      10    15     761      0

  51 --===== Test for 1000000 rows ==========       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method   41349 37437 8001048   1601
  51 --===== Traditional WHILE LOOP method CR   59138 56141 1012785   1682
  51 --===== Traditional CROSS JOIN table met    1224  1219    2429   2101
  51 --===== Itzik's CROSS JOINED CTE method     1448  1328    1217   2095
  51 --===== Housekeeping   DROP TABLE #Tally       8     0     415      0

As you can see, the Recursive CTE method is the second worst only to the While Loop for Duration and CPU and has 8 times the memory pressure in the form of logical reads than the While Loop. It's RBAR on steroids and should be avoided, at all cost, for any single row calculations just as a While Loop should be avoided. There are places where recursion is quite valuable but this ISN'T one of them.

As a side bar, Mr. Denny is absolutely spot on... a correctly sized permanent Numbers or Tally table is the way to go for most things. What does correctly sized mean? Well, most people use a Tally table to generate dates or to do splits on VARCHAR(8000). If you create an 11,000 row Tally table with the correct clustered index on "N", you'll have enough rows to create more than 30 years worth of dates (I work with mortgages a fair bit so 30 years is a key number for me) and certainly enough to handle a VARCHAR(8000) split. Why is "right sizing" so important? If the Tally table is used a lot, it easily fits in cache which makes it blazingly fast without much pressure on memory at all.

Last but not least, every one knows that if you create a permanent Tally table, it doesn't much matter which method you use to build it because 1) it's only going to be made once and 2) if it's something like an 11,000 row table, all of the methods are going to run "good enough". So why all the indigination on my part about which method to use???

The answer is that some poor guy/gal who doesn't know any better and just needs to get his or her job done might see something like the Recursive CTE method and decide to use it for something much larger and much more frequently used than building a permanent Tally table and I'm trying to protect those people, the servers their code runs on, and the company that owns the data on those servers. Yeah... it's that big a deal. It should be for everyone else, as well. Teach the right way to do things instead of "good enough". Do some testing before posting or using something from a post or book... the life you save may, in fact, be your own especially if you think a recursive CTE is the way to go for something like this. ;-)

Thanks for listening...


The most optimal function would be to use a table instead of a function. Using a function causes extra CPU load to create the values for the data being returned, especially if the values being returned cover a very large range.


This article gives 14 different possible solutions with discussion of each. The important point is that:

suggestions regarding efficiency and performance are often subjective. Regardless of how a query is being used, the physical implementation determines the efficiency of a query. Therefore, rather than relying on biased guidelines, it is imperative that you test the query and determine which one performs better.

I personally liked:

WITH Nbrs ( n ) AS (
    SELECT 1 UNION ALL
    SELECT 1 + n FROM Nbrs WHERE n < 500 )
SELECT n FROM Nbrs
OPTION ( MAXRECURSION 500 )

This view is super fast and contains all positive int values.

CREATE VIEW dbo.Numbers
WITH SCHEMABINDING
AS
    WITH Int1(z) AS (SELECT 0 UNION ALL SELECT 0)
    , Int2(z) AS (SELECT 0 FROM Int1 a CROSS JOIN Int1 b)
    , Int4(z) AS (SELECT 0 FROM Int2 a CROSS JOIN Int2 b)
    , Int8(z) AS (SELECT 0 FROM Int4 a CROSS JOIN Int4 b)
    , Int16(z) AS (SELECT 0 FROM Int8 a CROSS JOIN Int8 b)
    , Int32(z) AS (SELECT TOP 2147483647 0 FROM Int16 a CROSS JOIN Int16 b)
    SELECT ROW_NUMBER() OVER (ORDER BY z) AS n
    FROM Int32
GO

Using SQL Server 2016+ to generate numbers table you could use OPENJSON :

-- range from 0 to @max - 1
DECLARE @max INT = 40000;

SELECT rn = CAST([key] AS INT) 
FROM OPENJSON(CONCAT('[1', REPLICATE(CAST(',1' AS VARCHAR(MAX)),@max-1),']'));

LiveDemo


에서 가져온 아이디어 우리가 OPENJSON 숫자의 시리즈를 생성하는 데 사용할 수있는 방법은?


편집 : 아래 Conrad의 의견을 참조하십시오.

Jeff Moden의 대답은 훌륭하지만 Postgres에서 E32 행을 제거하지 않으면 Itzik 방법이 실패한다는 것을 알았습니다.

postgres에서 약간 더 빠른 (40ms 대 100ms) 여기 에서 찾은 또 다른 방법 은 postgres에 맞게 조정되었습니다.

WITH 
    E00 (N) AS ( 
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ),
    E01 (N) AS (SELECT a.N FROM E00 a CROSS JOIN E00 b),
    E02 (N) AS (SELECT a.N FROM E01 a CROSS JOIN E01 b ),
    E03 (N) AS (SELECT a.N FROM E02 a CROSS JOIN E02 b 
        LIMIT 11000  -- end record  11,000 good for 30 yrs dates
    ), -- max is 100,000,000, starts slowing e.g. 1 million 1.5 secs, 2 mil 2.5 secs, 3 mill 4 secs
    Tally (N) as (SELECT row_number() OVER (ORDER BY a.N) FROM E03 a)

SELECT N
FROM Tally

SQL Server에서 Postgres 세계로 이동하면서 해당 플랫폼에서 집계 테이블을 수행하는 더 좋은 방법을 놓쳤을 수 있습니다. INTEGER ()? 순서()?


훨씬 나중에, 약간 다른 '전통적인'CTE를 제공하고 싶습니다 (행의 양을 얻기 위해 기본 테이블을 건드리지 않음).

--===== Hans CROSS JOINED CTE method
WITH Numbers_CTE (Digit)
AS
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9)
SELECT HundredThousand.Digit * 100000 + TenThousand.Digit * 10000 + Thousand.Digit * 1000 + Hundred.Digit * 100 + Ten.Digit * 10 + One.Digit AS Number
INTO #Tally5
FROM Numbers_CTE AS One CROSS JOIN Numbers_CTE AS Ten CROSS JOIN Numbers_CTE AS Hundred CROSS JOIN Numbers_CTE AS Thousand CROSS JOIN Numbers_CTE AS TenThousand CROSS JOIN Numbers_CTE AS HundredThousand

This CTE performs more READs then Itzik's CTE but less then the Traditional CTE. However, it consistently performs less WRITES then the other queries. As you know, Writes are consistently quite much more expensive then Reads.

The duration depends heavily on the number of cores (MAXDOP) but, on my 8core, performs consistently quicker (less duration in ms) then the other queries.

I am using:

Microsoft SQL Server 2012 - 11.0.5058.0 (X64) 
May 14 2014 18:34:29 
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

on Windows Server 2012 R2, 32 GB, Xeon X3450 @2.67Ghz, 4 cores HT enabled.

ReferenceURL : https://stackoverflow.com/questions/10819/sql-auxiliary-table-of-numbers

반응형