Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kserve backend implementation for wasi-nn #6867

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

geekbeast
Copy link
Contributor

@geekbeast geekbeast commented Aug 21, 2023

This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.

This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.

This PR also implements a kserve registry so that models can be loaded via load named models and adds some better error reporting within what is possible in the current framework.

geekbeast and others added 30 commits March 31, 2023 23:12
This also updates wasmtime to use the latest version of the wasi-nn spec
instead of older commit.
* main: (47 commits)
  Add core dump support to the runtime (bytecodealliance#6513)
  Resource table tracks child relationships (bytecodealliance#6779)
  Wasmtime: Move `OnDemandInstanceAllocator` to its own module (bytecodealliance#6790)
  wasi: Test the stdio streams implementation (bytecodealliance#6764)
  Don't generate same-named imports in fact modules (bytecodealliance#6783)
  Wasmtime: Add support for Wasm tail calls (bytecodealliance#6774)
  Cranelift: Fix `ABIMachineSpec::gen_add_imm` for riscv64 (bytecodealliance#6780)
  Update the wasm-tools family of crates, disallow empty component types (bytecodealliance#6777)
  Fix broken link to WASI API documentation (bytecodealliance#6775)
  A bunch of cleanups for cranelift-codegen-meta (bytecodealliance#6772)
  Implement component-to-component calls with resources (bytecodealliance#6769)
  Ignore async_stack_size if async_support is disabled (bytecodealliance#6771)
  A bunch of minor cleanups (bytecodealliance#6767)
  Fix flaky tests in preview2 streams (bytecodealliance#6763)
  Refactor and simplify component trampolines (bytecodealliance#6751)
  Cranelift: Implement tail calls on riscv64 (bytecodealliance#6749)
  WASI Preview 2: rewrite streams and pollable implementation (bytecodealliance#6556)
  cranelift-wasm: Add support for translating Wasm tail calls to CLIF (bytecodealliance#6760)
  Cranelift: Get tail calls working on aarch64 (bytecodealliance#6723)
  Implement component model resources in Wasmtime (bytecodealliance#6691)
  ...
… feature/preview2

* 'feature/preview2' of github.com:geekbeast/wasmtime:
  Change preview2 builder methods to use `&mut self` (bytecodealliance#6770)
  Add a bindgen test that exercises using error types from a different interface (bytecodealliance#6802)
  Resolve trappable error types with fully qualified package paths (bytecodealliance#6795)
  Update the dev-dependency for wit-bindgen to 0.9.0 (bytecodealliance#6800)
  Fix incorrect sample code in documentation (bytecodealliance#6796) (bytecodealliance#6797)
  Update preview1 to trap on misaligned pointers (bytecodealliance#6776)
  Fix posix-signals-on-macos on aarch64-apple-darwin (bytecodealliance#6793)
  consistient WASI preview1 rights reporting (bytecodealliance#6784)
  Wasmtime: Introduce `{Module,Component}::resources_required` (bytecodealliance#6789)
…time into feature/wasi-nn-preview-2

* 'feature/wasi-nn-preview-2' of github.com:geekbeast/wasmtime:
  Memcheck for Wasm guests in Wasmtime (bytecodealliance#6820)
  CI: upgrade to qemu 8.0.4. (bytecodealliance#6849)
  Sync wasi-cli with wit definitions in standards repo  (bytecodealliance#6806)
  Rename `preview2::preview2` to `preview2::host` (bytecodealliance#6847)
  winch: Simplify the MacroAssembler and Assembler interfaces (bytecodealliance#6841)
  There are no files in `preview1` other than `mod.rs` (bytecodealliance#6845)
  Update stdio on Unix to fall back to worker threads (bytecodealliance#6833)
  Update RELEASES.md (bytecodealliance#6838)
  Minor documentation updates to docs/WASI-tutorial.md (bytecodealliance#6839)
  Add support for vector in DataValueExt::int() (bytecodealliance#6844)
@geekbeast geekbeast marked this pull request as ready for review September 11, 2023 19:49
@geekbeast geekbeast requested review from a team as code owners September 11, 2023 19:49
@geekbeast geekbeast requested review from pchickey and removed request for a team September 11, 2023 19:49
@pchickey pchickey requested review from abrown and removed request for pchickey September 11, 2023 20:00
@elliottt elliottt removed the request for review from a team September 12, 2023 16:17
Copy link
Contributor

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I do a closer review, here might be some things to look at:

  • you'll probably want to go through and clean up the conflicts, TODOs, commented-out code, etc., so I don't have to wade through that
  • you can split this into separate PRs to parallelize this if you want, e.g., I'm thinking of the "make wasi-nn host-async" or "improve the error handling parts" (I mean, you don't have to but if you want to move some stuff forward faster it might help)
  • I'm not sure what is going on with the tensor bytes type: that hasn't merged in the specification yet but I see a Git submodule update? What is going on with that?

@geekbeast
Copy link
Contributor Author

I removed the tensor bytes type and for now I'm just assuming that tensor with an initial zero dimension can be interpreted as a string as other sentinel values would likely cause issues.

@geekbeast
Copy link
Contributor Author

Might as well do everything together at this point. Most TODOs are issues I noticed while working on the code and not TODOs for this particular feature.

- Fix cargo.toml
- Use vetted Cargo.lock
- Implement async trait for test
- Remove unused imports
- Allow dead code and mutable where compiler can't figure it out.
- Fix cargo.fmt
Copy link
Contributor

@pchickey pchickey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the context on wasi-nn and kserve to review this comprehensively but from the Rust prespective there are some idiomatic suggestions I provided, and in general this needs a pass to replace all uses of expect outside of #[test] with error handling, as well as cleaning out commented out code from earlier stages in the development process.

anyhow::Result::Ok(match e {
WasiNnError::BackendError(e) => match e {
BackendError::BackendAccess(d) => {
eprintln!("{}", d.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these debugging strings, better to use tracing::error! over eprintln

Ok(ExecutionContext(Box::new(
KServeExecutionContext::new(client, &self.model_name).await?,
)))
// return Err(BackendError::UnsupportedOperation("init_execution_context"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete commented out code


self.inputs.push(kserve_tensor);

// return Err(BackendError::UnsupportedOperation("init_execution_context"));;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete


Ok(self.registry.get_mut(&model_name))

//This has to be here, because you can't do a mutable borrow twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't make sense, maybe its dead code?

}
}

fn copy_to_destination(destination: &mut [u8], start_index: &mut usize, src: &[u8]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this helper function, use a std::io::Cursor and then the byteorder::WriteBytesExt trait at each of the call sites.

// Parse the server url
let url = server_url
.parse::<hyper::Uri>()
.expect("Unable to parse url.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code panics whenever the server url is malformed or the server doesn't respond correctly, please change it to return errors instead and handle those errors appropriately where this is invoked.

Self {
server_url: server_url.clone(),
url,
// stream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear what these commented out members are about


async fn send_request(&mut self, request: Request<Full<Bytes>>) -> Response<Incoming> {
// Perform a TCP handshake
// let (mut sender, conn) = hyper::client::conn::http1::handshake(self.stream).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out code should be deleted?

let server_metadata_url = build_server_metadata_url(&self.server_url)
.parse::<hyper::Uri>()
.expect("Unable to parse url.");
println!("{:?}", server_metadata_url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing::debug! here

pub async fn get_server_metadata(&mut self) -> Result<KServeServerMetadata, BackendError> {
let server_metadata_url = build_server_metadata_url(&self.server_url)
.parse::<hyper::Uri>()
.expect("Unable to parse url.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? instead of expect

Copy link
Contributor

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of merging this PR. Obviously there are a bunch of Rust-level programming issues to clean up before we do that (like @pchickey points out) but, at the higher wasi-nn level, this mostly makes sense. Maybe if we resolve most of that here, we can do the following in follow up PRs:

  • testing: I'm a bit worried about the testing story. From what I see, some of these tests depend on http://localhost:8000, a real kserve implementation, but I don't see that being set up in CI--how are those tests passing? At some point this new functionality needs to be validated during CI so that we immediately know if we've broken something by refactoring
  • documentation: eventually, we should be able to look at the module documentation and get a general idea of what the kserve backend is trying to do ("it connects to ... then sends a GET request to do ..."). Also, explaining why several of APIs must become async would be helpful later

Otherwise, I think the general idea of plugging in a new backend is a good one and validates the refactoring to these Box<dyn ...> wrappers. Let's just clean up the extra code here that @pchickey notes and we can move on from there; @geekbeast, let me know if you want to pair up to finish this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants