-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kserve backend implementation for wasi-nn #6867
base: main
Are you sure you want to change the base?
Conversation
This also updates wasmtime to use the latest version of the wasi-nn spec instead of older commit.
* main: (47 commits) Add core dump support to the runtime (bytecodealliance#6513) Resource table tracks child relationships (bytecodealliance#6779) Wasmtime: Move `OnDemandInstanceAllocator` to its own module (bytecodealliance#6790) wasi: Test the stdio streams implementation (bytecodealliance#6764) Don't generate same-named imports in fact modules (bytecodealliance#6783) Wasmtime: Add support for Wasm tail calls (bytecodealliance#6774) Cranelift: Fix `ABIMachineSpec::gen_add_imm` for riscv64 (bytecodealliance#6780) Update the wasm-tools family of crates, disallow empty component types (bytecodealliance#6777) Fix broken link to WASI API documentation (bytecodealliance#6775) A bunch of cleanups for cranelift-codegen-meta (bytecodealliance#6772) Implement component-to-component calls with resources (bytecodealliance#6769) Ignore async_stack_size if async_support is disabled (bytecodealliance#6771) A bunch of minor cleanups (bytecodealliance#6767) Fix flaky tests in preview2 streams (bytecodealliance#6763) Refactor and simplify component trampolines (bytecodealliance#6751) Cranelift: Implement tail calls on riscv64 (bytecodealliance#6749) WASI Preview 2: rewrite streams and pollable implementation (bytecodealliance#6556) cranelift-wasm: Add support for translating Wasm tail calls to CLIF (bytecodealliance#6760) Cranelift: Get tail calls working on aarch64 (bytecodealliance#6723) Implement component model resources in Wasmtime (bytecodealliance#6691) ...
… feature/preview2 * 'feature/preview2' of github.com:geekbeast/wasmtime: Change preview2 builder methods to use `&mut self` (bytecodealliance#6770) Add a bindgen test that exercises using error types from a different interface (bytecodealliance#6802) Resolve trappable error types with fully qualified package paths (bytecodealliance#6795) Update the dev-dependency for wit-bindgen to 0.9.0 (bytecodealliance#6800) Fix incorrect sample code in documentation (bytecodealliance#6796) (bytecodealliance#6797) Update preview1 to trap on misaligned pointers (bytecodealliance#6776) Fix posix-signals-on-macos on aarch64-apple-darwin (bytecodealliance#6793) consistient WASI preview1 rights reporting (bytecodealliance#6784) Wasmtime: Introduce `{Module,Component}::resources_required` (bytecodealliance#6789)
…time into feature/wasi-nn-preview-2 * 'feature/wasi-nn-preview-2' of github.com:geekbeast/wasmtime: Memcheck for Wasm guests in Wasmtime (bytecodealliance#6820) CI: upgrade to qemu 8.0.4. (bytecodealliance#6849) Sync wasi-cli with wit definitions in standards repo (bytecodealliance#6806) Rename `preview2::preview2` to `preview2::host` (bytecodealliance#6847) winch: Simplify the MacroAssembler and Assembler interfaces (bytecodealliance#6841) There are no files in `preview1` other than `mod.rs` (bytecodealliance#6845) Update stdio on Unix to fall back to worker threads (bytecodealliance#6833) Update RELEASES.md (bytecodealliance#6838) Minor documentation updates to docs/WASI-tutorial.md (bytecodealliance#6839) Add support for vector in DataValueExt::int() (bytecodealliance#6844)
…eature/kserve * 'feature/kserve' of github.com:geekbeast/wasmtime: Refactor Wasmtime CLI to support components (bytecodealliance#6836) Bump the wasm-tools family of crates (bytecodealliance#6861) Wasmtime: refactor the pooling allocator for components (bytecodealliance#6835)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I do a closer review, here might be some things to look at:
- you'll probably want to go through and clean up the conflicts, TODOs, commented-out code, etc., so I don't have to wade through that
- you can split this into separate PRs to parallelize this if you want, e.g., I'm thinking of the "make wasi-nn host-
async
" or "improve the error handling parts" (I mean, you don't have to but if you want to move some stuff forward faster it might help) - I'm not sure what is going on with the tensor bytes type: that hasn't merged in the specification yet but I see a Git submodule update? What is going on with that?
I removed the tensor bytes type and for now I'm just assuming that tensor with an initial zero dimension can be interpreted as a string as other sentinel values would likely cause issues. |
Might as well do everything together at this point. Most TODOs are issues I noticed while working on the code and not TODOs for this particular feature. |
- Fix cargo.toml - Use vetted Cargo.lock - Implement async trait for test - Remove unused imports - Allow dead code and mutable where compiler can't figure it out. - Fix cargo.fmt
da66c4e
to
6675e5b
Compare
1869695
to
ac103bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the context on wasi-nn and kserve to review this comprehensively but from the Rust prespective there are some idiomatic suggestions I provided, and in general this needs a pass to replace all uses of expect
outside of #[test] with error handling, as well as cleaning out commented out code from earlier stages in the development process.
anyhow::Result::Ok(match e { | ||
WasiNnError::BackendError(e) => match e { | ||
BackendError::BackendAccess(d) => { | ||
eprintln!("{}", d.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these debugging strings, better to use tracing::error!
over eprintln
Ok(ExecutionContext(Box::new( | ||
KServeExecutionContext::new(client, &self.model_name).await?, | ||
))) | ||
// return Err(BackendError::UnsupportedOperation("init_execution_context")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete commented out code
|
||
self.inputs.push(kserve_tensor); | ||
|
||
// return Err(BackendError::UnsupportedOperation("init_execution_context"));; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
|
||
Ok(self.registry.get_mut(&model_name)) | ||
|
||
//This has to be here, because you can't do a mutable borrow twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment doesn't make sense, maybe its dead code?
} | ||
} | ||
|
||
fn copy_to_destination(destination: &mut [u8], start_index: &mut usize, src: &[u8]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this helper function, use a std::io::Cursor
and then the byteorder::WriteBytesExt
trait at each of the call sites.
// Parse the server url | ||
let url = server_url | ||
.parse::<hyper::Uri>() | ||
.expect("Unable to parse url."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code panics whenever the server url is malformed or the server doesn't respond correctly, please change it to return errors instead and handle those errors appropriately where this is invoked.
Self { | ||
server_url: server_url.clone(), | ||
url, | ||
// stream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear what these commented out members are about
|
||
async fn send_request(&mut self, request: Request<Full<Bytes>>) -> Response<Incoming> { | ||
// Perform a TCP handshake | ||
// let (mut sender, conn) = hyper::client::conn::http1::handshake(self.stream).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out code should be deleted?
let server_metadata_url = build_server_metadata_url(&self.server_url) | ||
.parse::<hyper::Uri>() | ||
.expect("Unable to parse url."); | ||
println!("{:?}", server_metadata_url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing::debug! here
pub async fn get_server_metadata(&mut self) -> Result<KServeServerMetadata, BackendError> { | ||
let server_metadata_url = build_server_metadata_url(&self.server_url) | ||
.parse::<hyper::Uri>() | ||
.expect("Unable to parse url."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? instead of expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of merging this PR. Obviously there are a bunch of Rust-level programming issues to clean up before we do that (like @pchickey points out) but, at the higher wasi-nn level, this mostly makes sense. Maybe if we resolve most of that here, we can do the following in follow up PRs:
- testing: I'm a bit worried about the testing story. From what I see, some of these tests depend on
http://localhost:8000
, a real kserve implementation, but I don't see that being set up in CI--how are those tests passing? At some point this new functionality needs to be validated during CI so that we immediately know if we've broken something by refactoring - documentation: eventually, we should be able to look at the module documentation and get a general idea of what the kserve backend is trying to do ("it connects to ... then sends a GET request to do ..."). Also, explaining why several of APIs must become
async
would be helpful later
Otherwise, I think the general idea of plugging in a new backend is a good one and validates the refactoring to these Box<dyn ...>
wrappers. Let's just clean up the extra code here that @pchickey notes and we can move on from there; @geekbeast, let me know if you want to pair up to finish this.
This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.
This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.
This PR also implements a kserve registry so that models can be loaded via load named models and adds some better error reporting within what is possible in the current framework.