TensorFlow : InternalError : Blas SGEMM 시작 실패
실행 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
하면 InternalError: Blas SGEMM launch failed
. 다음은 전체 오류 및 스택 추적입니다.
InternalErrorTraceback (most recent call last)
<ipython-input-9-a3261a02bdce> in <module>()
1 batch_xs, batch_ys = mnist.train.next_batch(100)
----> 2 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
338 try:
339 result = self._run(None, fetches, feed_dict, options_ptr,
--> 340 run_metadata_ptr)
341 if run_metadata:
342 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
562 try:
563 results = self._do_run(handle, target_list, unique_fetches,
--> 564 feed_dict_string, options, run_metadata)
565 finally:
566 # The movers are no longer used. Delete them.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
635 if handle is None:
636 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637 target_list, options, run_metadata)
638 else:
639 return self._do_call(_prun_fn, self._session, handle, feed_dict,
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
657 # pylint: disable=protected-access
658 raise errors._make_specific_exception(node_def, op, error_message,
--> 659 e.code)
660 # pylint: enable=protected-access
661
InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_4, Variable/read)]]
Caused by op u'MatMul', defined at:
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py", line 3, in <module>
app.launch_new_instance()
File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 596, in launch_instance
app.start()
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelapp.py", line 442, in start
ioloop.IOLoop.instance().start()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 162, in start
super(ZMQIOLoop, self).start()
File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 883, in start
handler_func(fd_obj, events)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 391, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/ipkernel.py", line 199, in do_execute
shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2723, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2825, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-d7414c4b6213>", line 4, in <module>
y = tf.nn.softmax(tf.matmul(x, W) + b)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1036, in matmul
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 911, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
스택 : EC2 g2.8xlarge 머신, Ubuntu 14.04
오래된 질문이지만 다른 사람들에게 도움이 될 수 있습니다.
다른 프로세스에서 활성화 된 대화 형 세션을 닫으십시오 (IPython 노트북 인 경우-커널을 다시 시작하십시오). 이것은 나를 도왔다!
또한이 코드를 사용하여 실험 중에이 커널의 로컬 세션을 닫습니다.
if 'session' in locals() and session is not None:
print('Close interactive session')
session.close()
이 문제가 발생하고 사용 된 GPU의 메모리 비율을 구체적으로 정의하는 allow_soft_placement=True
및 을 설정하여 해결했습니다 gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
. 나는 이것이 GPU 메모리를 놓고 경쟁하는 두 개의 tensorflow 프로세스를 피하는 데 도움이되었다고 생각합니다.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))
Tensorflow Distributed를 실행할 때이 오류가 발생했습니다. 작업자가 CUDA_OUT_OF_MEMORY 오류를보고했는지 확인 했습니까? 이 경우 가중치 및 편향 변수를 배치하는 위치와 관련이있을 수 있습니다. 예
with tf.device("/job:paramserver/task:0/cpu:0"):
W = weight_variable([input_units, num_hidden_units])
b = bias_variable([num_hidden_units])
내 환경은 Python 3.5, Tensorflow 0.12 및 Windows 10 (Docker 없음)입니다. 저는 CPU와 GPU 모두에서 신경망을 훈련하고 있습니다. InternalError: Blas SGEMM launch failed
GPU에서 훈련 할 때마다 같은 오류가 발생했습니다 .
이 오류가 발생하는 이유를 찾을 수 없었지만 tensorflow 함수를 피하여 GPU에서 코드를 실행할 수있었습니다 tensorflow.contrib.slim.one_hot_encoding()
. 대신 numpy (입력 및 출력 변수)에서 원-핫 인코딩 작업을 수행합니다.
다음 코드는 오류 및 수정 사항을 재현합니다. y = x ** 2
경사 하강 법을 사용하여 함수 를 배우는 것은 최소한의 설정 입니다.
import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim
def test_one_hot_encoding_using_tf():
# This function raises the "InternalError: Blas SGEMM launch failed" when run in the GPU
# Initialize
tf.reset_default_graph()
input_size = 10
output_size = 100
input_holder = tf.placeholder(shape=[1], dtype=tf.int32, name='input')
output_holder = tf.placeholder(shape=[1], dtype=tf.int32, name='output')
# Define network
input_oh = slim.one_hot_encoding(input_holder, input_size)
output_oh = slim.one_hot_encoding(output_holder, output_size)
W1 = tf.Variable(tf.random_uniform([input_size, output_size], 0, 0.01))
output_v = tf.matmul(input_oh, W1)
output_v = tf.reshape(output_v, [-1])
# Define updates
loss = tf.reduce_sum(tf.square(output_oh - output_v))
trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
update_model = trainer.minimize(loss)
# Optimize
init = tf.initialize_all_variables()
steps = 1000
# Force CPU/GPU
config = tf.ConfigProto(
# device_count={'GPU': 0} # uncomment this line to force CPU
)
# Launch the tensorflow graph
with tf.Session(config=config) as sess:
sess.run(init)
for step_i in range(steps):
# Get sample
x = np.random.randint(0, 10)
y = np.power(x, 2).astype('int32')
# Update
_, l = sess.run([update_model, loss], feed_dict={input_holder: [x], output_holder: [y]})
# Check model
print('Final loss: %f' % l)
def test_one_hot_encoding_no_tf():
# This function does not raise the "InternalError: Blas SGEMM launch failed" when run in the GPU
def oh_encoding(label, num_classes):
return np.identity(num_classes)[label:label + 1].astype('int32')
# Initialize
tf.reset_default_graph()
input_size = 10
output_size = 100
input_holder = tf.placeholder(shape=[1, input_size], dtype=tf.float32, name='input')
output_holder = tf.placeholder(shape=[1, output_size], dtype=tf.float32, name='output')
# Define network
W1 = tf.Variable(tf.random_uniform([input_size, output_size], 0, 0.01))
output_v = tf.matmul(input_holder, W1)
output_v = tf.reshape(output_v, [-1])
# Define updates
loss = tf.reduce_sum(tf.square(output_holder - output_v))
trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
update_model = trainer.minimize(loss)
# Optimize
init = tf.initialize_all_variables()
steps = 1000
# Force CPU/GPU
config = tf.ConfigProto(
# device_count={'GPU': 0} # uncomment this line to force CPU
)
# Launch the tensorflow graph
with tf.Session(config=config) as sess:
sess.run(init)
for step_i in range(steps):
# Get sample
x = np.random.randint(0, 10)
y = np.power(x, 2).astype('int32')
# One hot encoding
x = oh_encoding(x, 10)
y = oh_encoding(y, 100)
# Update
_, l = sess.run([update_model, loss], feed_dict={input_holder: x, output_holder: y})
# Check model
print('Final loss: %f' % l)
maybe you not free your gpu rigthly , if you are using linux,try "ps -ef | grep python" to see what jobs are using GPU. then kill them
In my case, I had 2 python consoles open, both using keras/tensorflow. As I closed the old console (forgotten from previous day), everything started to work correctly.
So it is good to check, if you do not have multiple consoles / processes occupying GPU.
I closed all other Jupyter Sessions running and this solved the problem. I think It was GPU memory issue.
In my case,
First, I run
conda clean --all
to clean up tarballs and unused packages.
Then, I restart IDE (Pycharm in this case) and it works well. Environment: anaconda python 3.6, windows 10 64bit. I install tensorflow-gpu by a command provided on the anaconda website.
For me, I got this problem when I tried to run multiple tensorflow processes (e.g. 2) and both of them require to access GPU resources.
A simple solution is to make sure there has to be only one tensorflow process running at a single time.
For more details, you can see here.
To be clear, tensorflow will try (by default) to consume all available GPUs. It cannot be run with other programs also active. Closing. Feel free to reopen if this is actually another problem.
I encountered this error when running Keras CuDNN tests in parallel with pytest-xdist. The solution was to run them serially.
For me, I got this error when using Keras, and Tensorflow was the the backend. It was because the deep learning environment in Anaconda was not activated properly, as a result, Tensorflow didn't kick in properly either. I noticed this since the last time I activated my deep learning environment (which is called dl
), the prompt changed in my Anaconda Prompt to this:
(dl) C:\Users\georg\Anaconda3\envs\dl\etc\conda\activate.d>set "KERAS_BACKEND=tensorflow"
While it only had the dl
before then. Therefore, what I did to get rid of the above error was to close my jupyter notebook and Anaconda prompt, then relaunch, for several times.
I encountered this error after changing OS to Windows 10 recently, and I never encountered this before when using windows 7.
The error occurs if I load my GPU Tensorflow model when an another GPU program is running; it's my JCuda model loaded as socket server, which is not large. If I close my other GPU program(s), this Tensorflow model can be loaded very successfully.
This JCuda program is not large at all, just around 70M, and in comparison this Tensorflow model is more than 500M and much larger. But I am using 1080 ti, which has much memory. So it would be probably not an out-of-memory progblem, and it would perhaps be some tricky internal issue of Tensorflow regarding OS or Cuda. (PS: I am using Cuda version 8.0.44 and haven't downloaded a newer version.)
Restarting my Jupyter processes wasn't enough; I had to reboot my computer.
In my case, it is enough to open the Jupyter Notebooks in separate servers.
This error only occurs with me if I try using more than one tensorflow/keras model in the same server. It doesn't matter if open one notebook, execute it, than close and try opening another. If they are being loaded in the same Jupyter server the error always happens.
In my case, the network filesystem under which libcublas.so
was located simply died. The node was rebooted and everything was fine. Just to add another point to the dataset.
참고URL : https://stackoverflow.com/questions/37337728/tensorflow-internalerror-blas-sgemm-launch-failed
'program story' 카테고리의 다른 글
PowerShell에서 CMD 명령 실행 (0) | 2020.11.11 |
---|---|
오류 : ggplot2 및 data.table에 대한 패키지 또는 네임 스페이스로드 실패 (0) | 2020.11.11 |
HTTPclient 콘텐츠 유형 = application / x-www-form-urlencoded를 사용하여 POST하는 방법 (0) | 2020.11.11 |
REST API에서 JSON을 반환하는 경우 어떤 MIME 유형입니까? (0) | 2020.11.11 |
Java에서 길이를 알 수없는 바이트 배열 (0) | 2020.11.11 |