diff --git a/rust/simd/README.md b/rust/simd/README.md index 0bc57b5f8..11fab57b9 100644 --- a/rust/simd/README.md +++ b/rust/simd/README.md @@ -1,4 +1,4 @@ -# WebAssembly SIMD Example +# WebAssembly SIMD example Unlike other blockchains, the Internet Computer supports WebAssembly SIMD ([Single Instruction, Multiple Data](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data)) @@ -9,38 +9,31 @@ This example showcases different approaches to utilizing the new SIMD instructio The example consists of a canister named `mat_mat_mul` (matrix-matrix multiplication). -## Prerequisites +## Deploying from ICP Ninja -This example requires an installation of: +[![](https://icp.ninja/assets/open.svg)](https://icp.ninja/editor?g=https://github.com/dfinity/examples/tree/master/rust/simd) -- [x] Install the [IC SDK](https://internetcomputer.org/docs/current/developer-docs/getting-started/install). Note: the WebAssembly SIMD support requires `dfx` version `0.20.2-beta.0` or later. -- [x] Clone the example dapp project: `git clone https://github.com/dfinity/examples` +## Build and deploy from the command-line -### Example 1: Floating point matrices multiplications +### 1. [Download and install the IC SDK.](https://internetcomputer.org/docs/building-apps/getting-started/install) -- #### Step 1: Setup project environment +### 2. Download your project from ICP Ninja using the 'Download files' button on the upper left corner, or [clone the GitHub examples repository.](https://github.com/dfinity/examples/) -Navigate into the folder containing the project's files and start a local instance of the replica with the command: +### 3. Navigate into the project's directory. -```sh -cd examples/rust/simd -dfx start --clean -``` +### 4. Deploy the project to your local environment: -```sh -dfx start --clean -Running dfx start for version 0.20.2-beta.0 -[...] -Dashboard: http://localhost:63387/_/dashboard +``` +dfx start --background --clean && dfx deploy ``` -- #### Step 2: Open another terminal window in the same directory +### 5. Open another terminal window in the same directory. ```sh cd examples/rust/simd ``` -- #### Step 3: Compile and deploy `mat_mat_mul` canister +### 6. Compile and deploy mat_mat_mul canister. ```sh dfx deploy @@ -57,7 +50,7 @@ URLs: mat_mat_mul: http://127.0.0.1/?canisterId=... ``` -- #### Step 4: Compare the amount of instructions used for different matrix multiplication implementations +### 7. Compare the amount of instructions used for different matrix multiplication implementations. Call a loop performing 1K element-wise multiplications of `K x 4` packed slices from matrices `A` and `B` using optimized algorithm, the same algorithm with @@ -84,48 +77,8 @@ In this example, Rust's auto-vectorization shines in optimizing matrix multiplic The auto-vectorized code achieves over 10x speedup compared to the optimized version! Also, it's on par with the hand-crafted WebAssembly SIMD multiplication. -### Example 2: Integer matrices multiplications - -- #### Step 1: Setup project environment - -Navigate into the folder containing the project's files and start a local instance of the replica with the command: - -```sh -cd examples/rust/simd -dfx start --clean -``` - -```sh -dfx start --clean -Running dfx start for version 0.20.2-beta.0 -[...] -Dashboard: http://localhost:63387/_/dashboard -``` - -- #### Step 2: Open another terminal window in the same directory - -```sh -cd examples/rust/simd -``` - -- #### Step 3: Compile and deploy `mat_mat_mul` canister - -```sh -dfx deploy -``` - -Example output: -```sh -% dfx deploy -[...] -Deployed canisters. -URLs: - Backend canister via Candid interface: - mat_mat_mul: http://127.0.0.1/?canisterId=... -``` - -- #### Step 4: Compare the amount of instructions used for different matrix multiplication implementations +### 8. Compare the amount of instructions used for different matrix multiplication implementations Call a loop performing 1K element-wise multiplications of `K x 4` packed slices from matrices `A` and `B` using optimized algorithm and the same algorithm @@ -149,53 +102,6 @@ Rust auto-vectorization again demonstrates its power in this example. The auto-vectorized version of the integer matrix multiplication achieves more than a 2x speedup compared to the original code. -## Further learning - -1. Have a look at the locally running dashboard. The URL is at the end of the `dfx start` command: `Dashboard: http://localhost/...` -2. Check out `mat_mat_mul` canister Candid user interface. The URLs are at the end of the `dfx deploy` command: `mat_mat_mul: http://127.0.0.1/?canisterId=...` - -### Canister interface - -The `mat_mat_mul` canister provide the following interface: - -- `naive_f32`/`naive_u32` — - returns the number of instructions used for a loop performing - 1K element-wise multiplications of matrices `A` and `B` - using naive algorithm. -- `optimized_f32`/`optimized_u32` — - returns the number of instructions used for a loop performing - 1K element-wise multiplications of `K x 4` packed slices - from matrices `A` and `B` using optimized algorithm. -- `auto_vectorized_f32`/`auto_vectorized_u32` — - returns the number of instructions used for a loop performing - 1K element-wise multiplications of `K x 4` packed slices - from matrices `A` and `B` using Rust loop auto-vectorization. -- `simd_f32` — - Returns the number of instructions used for a loop performing - 1K element-wise multiplications of `K x 4` packed slices - from matrices `A` and `B` using WebAssembly SIMD instructions. - -Example usage: - -```sh -dfx canister call mat_mat_mul naive_f32 -``` - -## Conclusion - -WebAssembly SIMD instructions unlock new possibilities for the Internet Computer, -particularly in Machine Learning and Artificial Intelligence dApps. This example -demonstrates potential 10x speedups for matrix multiplication with minimal effort -using just Rust's loop auto-vectorization. - -As shown in Example 2, integer operations also benefit, although with a more modest -"2x" speedup. - -The actual speedups will vary depending on the specific application and the type -of operations involved. - ## Security considerations and best practices -If you base your application on this example, we recommend you familiarize -yourself with and adhere to the [security best practices](https://internetcomputer.org/docs/current/references/security/) -for developing on the Internet Computer. This example may not implement all the best practices. +If you base your application on this example, it is recommended that you familiarize yourself with and adhere to the [security best practices](https://internetcomputer.org/docs/building-apps/security/overview) for developing on ICP. This example may not implement all the best practices.