C int 배열을 0으로 재설정 : 가장 빠른 방법?

program story

C int 배열을 0으로 재설정 : 가장 빠른 방법?

inputbox 2020. 9. 5. 09:54

C int 배열을 0으로 재설정 : 가장 빠른 방법?

T myarray[100]with T = int, unsigned int, long long int 또는 unsigned long long int 가 있다고 가정하면 모든 내용을 0으로 재설정하는 가장 빠른 방법은 무엇입니까 (초기화뿐만 아니라 내 프로그램에서 내용을 여러 번 재설정) ? 아마도 memset과 함께?

같은 동적 배열에 대한 동일한 질문입니다 T *myarray = new T[100].

memset(from <string.h>)은 일반적으로 어셈블리로 직접 작성되고 수작업으로 최적화되는 루틴이기 때문에 아마도 가장 빠른 표준 방법 일 것입니다.

memset(myarray, 0, sizeof(myarray)); // for automatically-allocated arrays
memset(myarray, 0, N*sizeof(*myarray)); // for heap-allocated arrays, where N is the number of elements

그건 그렇고, C ++에서 관용적 인 방법은 std::fill(from <algorithm>) 을 사용하는 것입니다 .

std::fill(myarray, myarray+N, 0);

어떤 수 (A) 내로 자동으로 최적화 memset; 나는 확신이 최대한 빨리 작동 할 것이라고 해요 memset위한 int최적화 스마트 충분이 아닌 경우가 더 작은 유형에 대해 약간 더 수행 할 수있는 반면,의. 여전히 의심 스러울 때 프로필.

그렇지 않은 가장 관용적 인 방법, 또는 라인의 가장 적은 수에 기록 할 수있는 방법,하지만 요청으로이 질문은 오히려 오래된 있지만, 몇 가지 벤치 마크를 필요로 가장 빠른 방법입니다. 그리고 실제 테스트없이 그 질문에 대답하는 것은 어리석은 일입니다. 그래서 저는 memset 대 std :: fill 대 AnT의 답변의 ZERO 대 AVX 내장 함수를 사용하여 만든 솔루션의 네 가지 솔루션을 비교했습니다.

이 솔루션은 일반적이지 않으며 32 비트 또는 64 비트의 데이터에서만 작동합니다. 이 코드가 잘못된 작업을 수행하는 경우 의견을 보내주십시오.

#include<immintrin.h>
#define intrin_ZERO(a,n){\
size_t x = 0;\
const size_t inc = 32 / sizeof(*(a));/*size of 256 bit register over size of variable*/\
for (;x < n-inc;x+=inc)\
    _mm256_storeu_ps((float *)((a)+x),_mm256_setzero_ps());\
if(4 == sizeof(*(a))){\
    switch(n-x){\
    case 3:\
        (a)[x] = 0;x++;\
    case 2:\
        _mm_storeu_ps((float *)((a)+x),_mm_setzero_ps());break;\
    case 1:\
        (a)[x] = 0;\
        break;\
    case 0:\
        break;\
    };\
}\
else if(8 == sizeof(*(a))){\
switch(n-x){\
    case 7:\
        (a)[x] = 0;x++;\
    case 6:\
        (a)[x] = 0;x++;\
    case 5:\
        (a)[x] = 0;x++;\
    case 4:\
        _mm_storeu_ps((float *)((a)+x),_mm_setzero_ps());break;\
    case 3:\
        (a)[x] = 0;x++;\
    case 2:\
        ((long long *)(a))[x] = 0;break;\
    case 1:\
        (a)[x] = 0;\
        break;\
    case 0:\
        break;\
};\
}\
}

I will not claim that this is the fastest method, since I am not a low level optimization expert. Rather it is an example of a correct architecture dependent implementation that is faster than memset.

Now, onto the results. I calculated performance for size 100 int and long long arrays, both statically and dynamically allocated, but with the exception of msvc, which did a dead code elimination on static arrays, the results were extremely comparable, so I will show only dynamic array performance. Time markings are ms for 1 million iterations, using time.h's low precision clock function.

clang 3.8 (Using the clang-cl frontend, optimization flags= /OX /arch:AVX /Oi /Ot)

int:
memset:      99
fill:        97
ZERO:        98
intrin_ZERO: 90

long long:
memset:      285
fill:        286
ZERO:        285
intrin_ZERO: 188

gcc 5.1.0 (optimization flags: -O3 -march=native -mtune=native -mavx):

int:
memset:      268
fill:        268
ZERO:        268
intrin_ZERO: 91
long long:
memset:      402
fill:        399
ZERO:        400
intrin_ZERO: 185

msvc 2015 (optimization flags: /OX /arch:AVX /Oi /Ot):

int
memset:      196
fill:        613
ZERO:        221
intrin_ZERO: 95
long long:
memset:      273
fill:        559
ZERO:        376
intrin_ZERO: 188

There is a lot interesting going on here: llvm killing gcc, MSVC's typical spotty optimizations (it does an impressive dead code elimination on static arrays and then has awful performance for fill). Although my implementation is significantly faster, this may only be because it recognizes that bit clearing has much less overhead than any other setting operation.

Clang's implementation merits more looking at, as it is significantly faster. Some additional testing shows that its memset is in fact specialized for zero--non zero memsets for 400 byte array are much slower (~220ms) and are comparable to gcc's. However, the nonzero memsetting with an 800 byte array makes no speed difference, which is probably why in that case, their memset has worse performance than my implementation--the specialization is only for small arrays, and the cuttoff is right around 800 bytes. Also note that gcc 'fill' and 'ZERO' are not optimizing to memset (looking at generated code), gcc is simply generating code with identical performance characteristics.

Conclusion: memset is not really optimized for this task as well as people would pretend it is (otherwise gcc and msvc and llvm's memset would have the same performance). If performance matters then memset should not be a final solution, especially for these awkward medium sized arrays, because it is not specialized for bit clearing, and it is not hand optimized any better than the compiler can do on its own.

From memset():

memset(myarray, 0, sizeof(myarray));

You can use sizeof(myarray) if the size of myarray is known at compile-time. Otherwise, if you are using a dynamically-sized array, such as obtained via malloc or new, you will need to keep track of the length.

You can use memset, but only because our selection of types is restricted to integral types.

In general case in C it makes sense to implement a macro

#define ZERO_ANY(T, a, n) do{\
   T *a_ = (a);\
   size_t n_ = (n);\
   for (; n_ > 0; --n_, ++a_)\
     *a_ = (T) { 0 };\
} while (0)

This will give you C++-like functionality that will let you to "reset to zeros" an array of objects of any type without having to resort to hacks like memset. Basically, this is a C analog of C++ function template, except that you have to specify the type argument explicitly.

On top of that you can build a "template" for non-decayed arrays

#define ARRAY_SIZE(a) (sizeof (a) / sizeof *(a))
#define ZERO_ANY_A(T, a) ZERO_ANY(T, (a), ARRAY_SIZE(a))

In your example it would be applied as

int a[100];

ZERO_ANY(int, a, 100);
// or
ZERO_ANY_A(int, a);

It is also worth noting that specifically for objects of scalar types one can implement a type-independent macro

#define ZERO(a, n) do{\
   size_t i_ = 0, n_ = (n);\
   for (; i_ < n_; ++i_)\
     (a)[i_] = 0;\
} while (0)

and

#define ZERO_A(a) ZERO((a), ARRAY_SIZE(a))

turning the above example into

 int a[100];

 ZERO(a, 100);
 // or
 ZERO_A(a);

For static declaration I think you could use:

T myarray[100] = {0};

For dynamic declaration I suggest the same way: memset

zero(myarray); is all you need in C++.

Just add this to a header:

template<typename T, size_t SIZE> inline void zero(T(&arr)[SIZE]){
    memset(arr, 0, SIZE*sizeof(T));
}

Here's the function I use:

template<typename T>
static void setValue(T arr[], size_t length, const T& val)
{
    std::fill(arr, arr + length, val);
}

template<typename T, size_t N>
static void setValue(T (&arr)[N], const T& val)
{
    std::fill(arr, arr + N, val);
}

You can call it like this:

//fixed arrays
int a[10];
setValue(a, 0);

//dynamic arrays
int *d = new int[length];
setValue(d, length, 0);

Above is more C++11 way than using memset. Also you get compile time error if you use dynamic array with specifying the size.

참고URL : https://stackoverflow.com/questions/9146395/reset-c-int-array-to-zero-the-fastest-way

'program story' 카테고리의 다른 글

교리 수화 란 무엇입니까? (0)	2020.09.06
T-SQL : UPDATE 문에서 CASE를 사용하여 조건에 따라 특정 열 업데이트 (0)	2020.09.05
ORA-01882 : 시간대 영역이 없습니다 (0)	2020.09.05
OSX 10.10 yosemite beta on git pull : git-sh-setup : No such file or directory (0)	2020.09.05
사용 설명 누락으로 인해 앱이 거부 됨 (Xcode8) (0)	2020.09.05

현재글C int 배열을 0으로 재설정 : 가장 빠른 방법?

inputbox

C int 배열을 0으로 재설정 : 가장 빠른 방법?

C int 배열을 0으로 재설정 : 가장 빠른 방법?

'program story' 카테고리의 다른 글

'program story'의 다른글

티스토리툴바

C int 배열을 0으로 재설정 : 가장 빠른 방법?

C int 배열을 0으로 재설정 : 가장 빠른 방법?

'program story' 카테고리의 다른 글

'program story'의 다른글

관련글

티스토리툴바