理想のpython開発環境を作る：tensorflowを使う

１．簡単なpython環境を作る
 ２．jupyterをVS Codeで使う
 ３．Githubと連携
 ４．Dockerfileで自前の環境構築
 ５．tensorflowを使う(今ココ)
６．pytorchを使う

４．Dockerfileで自前の環境構築ではrequirements.txtの中にtensorflowを入れている。
これが動くかどうか。libcudart.so.11.0がないと怒られる。cuda系のライブラリ群が存在していないからかな。

暫定対処を考える

libcudart.so.11.0はコンテナの中には存在しない。ので、とりあえずcudatoolkitをインストールしてみる。

https://developer.nvidia.com/cuda-11.2.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Debian&target_version=10&target_type=deblocal

インストール中出てきたエラーは↓のページを参考に解決。

https://websetnet.net/ja/fix-add-apt-repository-command-not-found-error-on-ubuntu-and-debian/

GPUは使われるかな？

>from tensorflow.python.client import device_lib
>device_lib.list_local_devices()

dev-user@71c014b56b9c:/workspaces/study_tensor$ /usr/local/bin/python /workspaces/study_tensor/test.py
2023-05-01 14:49:32.310565: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-01 14:49:32.486163: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-05-01 14:49:32.486253: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (71c014b56b9c): /proc/driver/nvidia/version does not exist
```

だめぽ・・・。
driverがないよと怒られる。実際に入っていない。

うーん・・・。

公式のDockerイメージをベースにする

NVIDIAの公式Dockerイメージをベースに環境を作ってみる。
下のDockerfileはgithubにあったものを参考に修正。

#FROM python:3.8

ARG UBUNTU_VERSION=20.04

#公式のdockerイメージをベースにする。
ARG ARCH=
ARG CUDA=11.2
FROM nvidia/cuda${ARCH:+-$ARCH}:${CUDA}.1-base-ubuntu${UBUNTU_VERSION} as base
# ARCH and CUDA are specified again because the FROM directive resets ARGs
# (but their default value is retained if set previously)
ARG ARCH
ARG CUDA
ARG CUDNN=8.1.0.77-1
ARG CUDNN_MAJOR_VERSION=8
ARG LIB_DIR_PREFIX=x86_64
ARG LIBNVINFER=8.0.0-1
ARG LIBNVINFER_MAJOR_VERSION=8

#apt-get中にタイムゾーンの設定を聞かれる・・・
ENV EBIAN_FRONTEND=noninteractive
ENV TZ=Asia/Tokyo
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

#libXX系はgpuのためのライブラリっぽい
SHELL ["/bin/bash", "-c"]
RUN apt update && apt upgrade -y
RUN apt-get update && apt-get install -y \
        python3.8 \
        python3-pip \
        bzip2 \
        ca-certificates \
        cmake \
        ffmpeg \
        git \
        libboost-all-dev \
        libglib2.0-0 \
        libjpeg-dev \
        libpq-dev \
        libsdl2-dev \
        libsm6 \
        libxext6 \
        libxrender1 \
        mercurial \
        subversion \
        sudo \
        swig \
        wget \
        xorg-dev \
        xvfb \
        vim \
        zip \
        zlib1g-dev \
        build-essential \
        cuda-command-line-tools-${CUDA/./-} \
        libcublas-${CUDA/./-} \
        cuda-nvrtc-${CUDA/./-} \
        libcufft-${CUDA/./-} \
        libcurand-${CUDA/./-} \
        libcusolver-${CUDA/./-} \
        libcusparse-${CUDA/./-} \
        curl \
        libcudnn8=${CUDNN}+cuda${CUDA} \
        libfreetype6-dev \
        libhdf5-serial-dev \
        libzmq3-dev \
        pkg-config \
        software-properties-common \
        unzip \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Link the libcuda stub to the location where tensorflow is searching for it and reconfigure
# dynamic linker run-time bindings
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 \
    && echo "/usr/local/cuda/lib64/stubs" > /etc/ld.so.conf.d/z-cuda-stubs.conf \
    && ldconfig

ARG USERNAME=dev-user
ARG GROUPNAME=dev-user
ARG UID=1000
ARG GID=1000
ARG PASSWORD=dev-user

#追加したユーザでsudoを使えるようにする
RUN groupadd -g $GID $GROUPNAME && \
    useradd -m -s /bin/bash -u $UID -g $GID $USERNAME  && \
    adduser dev-user sudo && \
    echo $USERNAME:$PASSWORD | chpasswd && \
    echo "$USERNAME   ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

#rootのパスワードがわからないのでパスワードを変更
RUN echo "root:root" | chpasswd

RUN pip install --upgrade pip
RUN pip install --upgrade pip wheel==0.38.4 setuptools==65.5.1

さらに、devcontainer.jsonに”runArgs”を指定して、gpuを見れるようにすると、gpuを使えるようになる。

{
	"name": "handson-ml2",
	"build": {
		// Sets the run context to one level up instead of the .devcontainer folder.
		"context": "..",
		// Update the 'dockerFile' property if you aren't using the standard 'Dockerfile' filename.
		"dockerfile": "../Dockerfile"
		},
	"settings":{
		"terminal.integrated.shell.linux": "/bin/bash",
		"python.pythonPath": "/usr/local/bin/python"
		},
	"extensions":[
		"ms-python.python"
	],
	"runArgs":["--gpus","all","--shm-size","8gb"],
	"remoteUser": "dev-user",
	"postCreateCommand": "pip install -r requirements.txt"
}

さっきのコードを実行したところ、エラーなく実行完了。

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6

書籍のnotebookファイルをごりごり実行してみたところ、実行も問題なく完了。
リソース見る限り、GPUも利用できているみたい。

いやこれ１つずつパッケージインストールするの面倒なんだけど・・・。

tensorflow-gpu入れれば解決するんか・・・？