Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory not being freed after validation #675

Open
PrbNoblem opened this issue Jan 22, 2025 · 3 comments
Open

Memory not being freed after validation #675

PrbNoblem opened this issue Jan 22, 2025 · 3 comments

Comments

@PrbNoblem
Copy link

PrbNoblem commented Jan 22, 2025

After performing seemingly any type of json schema validation, it seems that memory is not being freed. For example, running this version of the example code with the cap crate to print memory usage:

use serde_json::json;
use std::alloc;
use cap::Cap;

#[global_allocator]
static ALLOCATOR: Cap<alloc::System> = Cap::new(alloc::System, usize::max_value());

fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("Allocated before validation: {}KB", ALLOCATOR.allocated()/1024);
    {
        let schema = json!({"maxLength":5});
        let instance = json!("foo");
        let validator = jsonschema::validator_for(&schema)?;
        assert!(validator.validate(&instance).is_ok());
    }
    println!("Allocated after validation: {}KB", ALLOCATOR.allocated()/1024);
    Ok(())
}

results in the following being printed

Allocated before validation: 0KB               
Allocated after validation: 4184KB

Using the jsonschema::is_valid approach instead results in the same behavior:

println!("Allocated before validation: {}KB", ALLOCATOR.allocated()/1024);
    {
        let schema = json!({"maxLength":5});
        let instance = json!("foo");
        assert!(jsonschema::is_valid(&schema, &instance));
        assert!(jsonschema::validate(&schema, &instance).is_ok());
    }
    println!("Allocated after validation: {}KB", ALLOCATOR.allocated()/1024);
Allocated before validation: 0KB
Allocated after validation: 4184KB

Creating a more complex schema seems to increase the memory held after the validation scope:

println!("Allocated before validation: {}KB", ALLOCATOR.allocated()/1024);
    {
        let schema = json!({"type":"object", 
                            "properties": { 
                                "innerThing" : { 
                                    "type" :"string", 
                                    "maxLength": 5, 
                                    "minLength": 1
                                }, 
                                "anotherThing" : {
                                    "type":"string", 
                                    "maxLength":10, 
                                    "minLength": 1
                                }, 
                                "arrayThing" : {
                                    "type": "array", 
                                    "items": {
                                        "type" : "string"
                                    }
                                } 
                            }, 
                            "required": ["innerThing"]
                        });
        let instance = json!("foo");
        _ = jsonschema::is_valid(&schema, &instance);
        _ = jsonschema::validate(&schema, &instance).is_ok();
    }
    println!("Allocated after validation: {}KB", ALLOCATOR.allocated()/1024);
Allocated before validation: 0KB
Allocated after validation: 11355KB

It seems to me that the memory should be released after the program exits the scope in which jsonschema is used, but it does not. Is there an issue with how I'm doing validation here? Or is there a problem with jsonschema itself?

@Stranger6667
Copy link
Owner

Thanks for opening!

There are a few Lazy statics that are evaluated upon first access; for example, meta schemas are needed to validate the input schemas. However, I'd think that the size of meta-schemas is always the same, maybe some other cache (like for patterns, but they are not present).

In any event, I'll take a look if there is anything which is not cleaned or not capped at least

@Stranger6667
Copy link
Owner

Stranger6667 commented Jan 31, 2025

I think you are right. I am inclined to think that the problem is with how the meta schema registry is used during the compilation process. It is cloned and resources are merged together I assume that some of the data is not properly dropped.

For my own reference later:

  • meta schemas also have SPECIFICATION.clone() but they rather should have their specific set of resources instead.
  • currently, the size of the input schema influences the total size of the allocation that is left at the end of the block which makes me think that this object is stored somewhere globally

Also - https://github.com/Stranger6667/jsonschema/actions/runs/13093817151/job/36533592518

@Stranger6667
Copy link
Owner

After some investigation, I realized that the behavior you observed happens because of how $ref and similar keywords are implemented in jsonschema. Right now they are lazy because it is the simplest way to handle deeply recursive schemas and for this reason, are only resolved (and cached inside the validator) on access to the specific keyword. The largest portion of memory growth happens in meta schemas that are used to validate input schemas and grow with new keywords being validated (+ new levels of nesting are also affecting it).

There are also some small parts like seed values for ahash, but it is around 88 bytes on the first access.

This approach is clearly far from optimal and right now I plan to reduce memory usage with #686 and one extra PR after it. A better way is to implement a proper virtual machine (#641, right now jsonschema uses a tree-walk approach to validation) which will resolve cycles and avoid the need for lazy evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants