Installation

The requirements are:

  • A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required, but gcc 12.x is not supported yet)

  • Python 3.9 or later

  • PyTorch 1.8 or later

Please check your operating system’s options for the C++ compiler. For more detailed information, you can check the requirements to build PyTorch from source. In addition, for layers to speed up on specific hardware (such as CUDA devices, or MacOS M1/2/3 chips), we recommend installing:

  • CUDA Toolkit 11.8 or 12.1 for CUDA accelerated layers

  • MLX for mlx-based layers on MacOS

  • CUTLASS for cutlass-based layers

Binary Release

A first experimental binary release for Linux with CUDA 12.1 is ready. It only supports GPUs with CUDA compute capability with 8.6 or higher (check here). For MacOS or lower compute capability, build the package from source (additional binary release options are planned in the future). We recommend to create a conda environment to manage the installed CUDA version and other packages:

  1. Create Environment for Python 3.10 and activate it:

conda create -y --name bitorch-engine python=3.10
conda activate bitorch-engine

As an alternative, you can also store the environment in a relative path.

Click to here to expand the instructions for this.
export BITORCH_WORKSPACE="${HOME}/bitorch-workspace"
mkdir -p "${BITORCH_WORKSPACE}" && cd "${BITORCH_WORKSPACE}"
conda create -y --prefix ./conda-env python=3.10
conda activate ./conda-env
  1. Install CUDA (if it is not installed already on the system):

conda install -y -c "nvidia/label/cuda-12.1.0" cuda-toolkit
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for CUDA 12.1 and Python 3.10 - you can find other versions here) together with bitorch engine:

pip install \
  "https://packages.greenbit.ai/whl/cu121/torch/torch-2.3.0-cp310-cp310-linux_x86_64.whl" \
  "https://packages.greenbit.ai/whl/cu121/bitorch-engine/bitorch_engine-0.2.6-cp310-cp310-linux_x86_64.whl"

Build From Source

We provide instructions for the following options:

We recommend managing your BITorch Engine installation in a conda environment (otherwise you should adapt/remove certain variables, e.g. CUDA_HOME). You may want to keep everything (environment, code, etc.) in one directory or use the default directory for conda environments. You may wish to adapt the CUDA version to 12.1 where applicable.

Conda on Linux (with CUDA)

To use these instructions, you need to have conda and a suitable C++ compiler installed.

  1. Create Environment for Python 3.9 and activate it:

conda create -y --name bitorch-engine python=3.9
conda activate bitorch-engine
  1. Install CUDA

conda install -y -c "nvidia/label/cuda-11.8.0" cuda-toolkit
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for CUDA 11.8 and Python 3.9 - you can find other versions here):

pip install "https://packages.greenbit.ai/whl/cu118/torch/torch-2.1.0-cp39-cp39-linux_x86_64.whl"
  1. To use cutlass layers, you should also install CUTLASS 2.8.0 (from source), adjust CUTLASS_HOME (this is where we clone and install cutlass) (if you have older or newer GPUs you may need to add your CUDA compute capability in CUTLASS_NVCC_ARCHS):

export CUTLASS_HOME="/some/path"
mkdir -p "${CUTLASS_HOME}"
git clone --depth 1 --branch "v2.8.0" "https://github.com/NVIDIA/cutlass.git" --recursive ${CUTLASS_HOME}/source
mkdir -p "${CUTLASS_HOME}/build" && mkdir -p "${CUTLASS_HOME}/install"
cd "${CUTLASS_HOME}/build"
cmake ../source -DCMAKE_INSTALL_PREFIX="${CUTLASS_HOME}/install" -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_ENABLE_EXAMPLES=OFF -DCUTLASS_NVCC_ARCHS='75;80;86'
make -j 4
cmake --install .

If you have difficulties installing cutlass, you can check the official documentation, use the other layers without installing it or try the docker installation.

As an alternative to the instructions above, you can also store the environment and clone all repositories within one “root” directory.

Click to here to expand the instructions for this.
  1. Set workspace dir (use an absolute path!):

export BITORCH_WORKSPACE="${HOME}/bitorch-workspace"
mkdir -p "${BITORCH_WORKSPACE}" && cd "${BITORCH_WORKSPACE}"
  1. Create Environment for Python 3.9 and activate it:

conda create -y --prefix ./conda-env python=3.9
conda activate ./conda-env
  1. Install CUDA

conda install -y -c "nvidia/label/cuda-11.8.0" cuda-toolkit
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this url is for CUDA 11.8 and Python 3.9 - you can find other versions here):

pip install "https://packages.greenbit.ai/whl/cu118/torch/torch-2.1.0-cp39-cp39-linux_x86_64.whl"
  1. To use cutlass layers, you should also install CUTLASS 2.8.0 (if you have older or newer GPUs you may need to add your CUDA compute capability in CUTLASS_NVCC_ARCHS):

export CUTLASS_HOME="${BITORCH_WORKSPACE}/cutlass"
mkdir -p "${CUTLASS_HOME}"
git clone --depth 1 --branch "v2.8.0" "https://github.com/NVIDIA/cutlass.git" --recursive ${CUTLASS_HOME}/source
mkdir -p "${CUTLASS_HOME}/build" && mkdir -p "${CUTLASS_HOME}/install"
cd "${CUTLASS_HOME}/build"
cmake ../source -DCMAKE_INSTALL_PREFIX="${CUTLASS_HOME}/install" -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_ENABLE_EXAMPLES=OFF -DCUTLASS_NVCC_ARCHS='75;80;86'
make -j 4
cmake --install .
cd "${BITORCH_WORKSPACE}"

If you have difficulties installing cutlass, you can check the official documentation, use the other layers without installing it or try the docker installation.

After setting up the environment, clone the code and build with pip (to hide the build output remove -v):

# make sure you are in a suitable directory, e.g. your bitorch workspace
git clone --recursive https://github.com/GreenBitAI/bitorch-engine
cd bitorch-engine
# only gcc versions 9.x, 10.x, 11.x are supported
# to select the correct gcc, use:
# export CC=gcc-11 CPP=g++-11 CXX=g++-11
CPATH="${CUTLASS_HOME}/install/include" CUDA_HOME="${CONDA_PREFIX}" pip install -e . -v

Docker (with CUDA)

You can also use our prepared Dockerfile to build a docker image (which includes building the engine under /bitorch-engine):

cd docker
docker build -t bitorch/engine .
docker run -it --rm --gpus all --volume "/path/to/your/project":"/workspace" bitorch/engine:latest

Check the docker readme for options and more details.

Conda on MacOS (with MLX)

  1. We recommend to create a virtual environment for and activate it. In the following example we use a conda environment for python 3.9, but virtualenv should work as well.

conda create -y --name bitorch-engine python=3.9
conda activate bitorch-engine
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for macOS with Python 3.9 - you can find other versions here):

pip install "https://packages.greenbit.ai/whl/macosx/torch/torch-2.2.1-cp39-none-macosx_11_0_arm64.whl"
  1. For MacOS users and to use OpenMP acceleration, install OpenMP with Homebrew and configure the environment:

brew install libomp
# during libomp installation it should remind you, you need something like this:
export LDFLAGS="-L$(brew --prefix)/opt/libomp/lib"
export CPPFLAGS="-I$(brew --prefix)/opt/libomp/include"
  1. To use the mlx accelerated MPQLinearLayer, you need to install the python library.

# use one of the following, to either install with pip or conda:
pip install mlx==0.4.0
conda install conda-forge::mlx=0.4.0

Currently, we only tested version 0.4.0. However, newer versions might also work. To train the MPQLinearLayer you need to install our custom PyTorch version (see steps above). Without it, you need to specify requires_grad=False when initializing MPQLinearLayer. 5. You should now be able to build with:

git clone --recursive https://github.com/GreenBitAI/bitorch-engine
cd bitorch-engine
pip install -e . -v