Rust Project Structure for rust-canto

Lessons from vibe coding

Background

I created my first crate rust-canto even though I didn’t know Rust, so that I can bring automatic Cantonese word segmentation and romanizations into Typst.

This is a great idea. Moving from a complex xtask workspace to a streamlined build.rs and build.sh workflow is a common evolution for Rust projects, especially when targeting WebAssembly.

Five lessons learnt from chat bot

1. Avoid the “Crates.io Path Dependency” Trap

  • The Problem: Using a Workspace with a local helper crate.

    [dependencies]
    xtask = { path = "xtask" }
    

    The above code worked locally but prevented me from publishing to Crates.io. The registry cannot resolve local paths.

  • The Fix: Move xtask logic into a local folder (like build_deps/) that is not its own crate. Reference it in build.rs or src/bin/ files using the #[path] attribute:

    #[path = "build_deps/mod.rs"]
    mod codegen;
    

2. Master the OUT_DIR Handshake

  • The Golden Rule: Never allow build.rs to write files into src/ or data/ folders. If cargo publish detects that the source directory was modified during the build, it will fail verification.

  • Writing: Always use the environment variable OUT_DIR provided by Cargo:

    let out_dir = std::env::var("OUT_DIR").unwrap();
    let dest = std::path::Path::new(&out_dir).join("trie.dat");
    std::fs::write(dest, data)?;
    
  • Reading: Use the include_bytes! macro combined with env! to “bake” that data into your binary at compile time:

    const DATA: &[u8] = include_bytes!(concat!(env!("OUT_DIR"), "/trie.dat"));
    

3. Respect Rust 2024 Keyword Changes

  • The Conflict: Rust Edition 2024 introduced gen as a reserved keyword (for generator blocks).
  • The Mistake: Naming your code-generation module mod gen;. This results in an “expected a name” compiler error.
  • The Fix: Use a descriptive name like mod codegen; or mod generate;.

4. Solve the WASM C++ Toolchain Error

  • The Error: fatal error: 'algorithm' file not found.

  • The Cause: Some Rust crates (like zstd) include C++ code by default. The wasm32-unknown-unknown target does not have a C++ standard library, so the build fails.

  • The Fix: In Cargo.toml, disable default features for these crates to force a pure-Rust implementation:

    zstd = { version = "...", default-features = false }
    

5. Optimize for WASM Size and Speed

  • Pre-computing: Instead of building complex structures (like Tries) from raw text at runtime, build them in build.rs, serialize them with postcard, and compress them with zstd.

  • Binary Tools: Avoid adding heavy optimization crates like wasm-opt to your [build-dependencies]. They often require C++ toolchains. Instead, use a shell script (build.sh) to call the wasm-opt system binary:

    wasm-opt -Oz input.wasm -o output.wasm --strip-debug
    
  • The Result: This keeps your WASM plugin small (usually 1–2MB) and ensures it loads instantly in environments like Typst without panicking.


No comment

Your email address will not be published. Required fields are marked *.