How to use ``torch.compile`` on Windows CPU/XPU =============================================== **Author**: `Zhaoqiong Zheng `_, `Xu, Han `_ Introduction ------------ TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels. This tutorial introduces the steps for using TorchInductor via ``torch.compile`` on Windows CPU/XPU. Software Installation --------------------- Now, we will walk you through a step-by-step tutorial for how to use ``torch.compile`` on Windows CPU/XPU. Install a Compiler ^^^^^^^^^^^^^^^^^^ C++ compiler is required for TorchInductor optimization, let's take Microsoft Visual C++ (MSVC) as an example. #. Download and install `MSVC `_. #. During Installation, select **Workloads** and then **Desktop & Mobile**. Select a checkmark on **Desktop Development with C++** and install. .. image:: ../_static/img/install_msvc.png .. note:: Windows CPU inductor also support C++ compiler `LLVM Compiler `_ and `Intel Compiler `_ for better performance. Please check `Alternative Compiler for better performance on CPU <#alternative-compiler-for-better-performance>`_. Set Up Environment ^^^^^^^^^^^^^^^^^^ Next, let's configure our environment. #. Open a command line environment via cmd.exe. #. Activate ``MSVC`` via below command:: "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat" #. Create and activate a virtual environment: :: #. Install `PyTorch 2.5 `_ or later for CPU Usage. Install PyTorch 2.7 or later refer to `Getting Started on Intel GPU `_ for XPU usage. #. Here is an example of how to use TorchInductor on Windows: .. code-block:: python import torch device="cpu" # or "xpu" for XPU def foo(x, y): a = torch.sin(x) b = torch.cos(x) return a + b opt_foo1 = torch.compile(foo) print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device))) #. Below is the output of the above example:: tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01, 1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00], [ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01, 5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01], [-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01, 6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00], [-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00, 8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01], [ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01, 8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01], [ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00, 9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00], [-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01, 1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00], [-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00, 9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00], [ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01, -1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00], [ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01, 1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]]) Alternative Compiler for better performance on CPU -------------------------------------------------- To enhance performance for inductor on Windows CPU, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC. Intel Compiler ^^^^^^^^^^^^^^ #. Download and install `Intel Compiler `_ with Windows version. #. Set Windows Inductor Compiler via environment variable ``set CXX=icx-cl``. LLVM Compiler ^^^^^^^^^^^^^ #. Download and install `LLVM Compiler `_ and choose win64 version. #. Set Windows Inductor Compiler via environment variable ``set CXX=clang-cl``. Conclusion ---------- In this tutorial, we introduce how to use Inductor on Windows CPU with PyTorch 2.5 or later, and on Windows XPU with PyTorch 2.7 or later. We can also use Intel Compiler or LLVM Compiler to get better performance on CPU.