Adding bloom command meta data, bloom group and bloom data type documentaion #233

zackcam · 2025-02-20T00:26:56Z

This is one of three PR's that will be done for adding information about the bloom module to the Valkey website:
Bloom repo json command files: valkey-io/valkey-bloom#47
valkey-io.github.io: valkey-io/valkey-io.github.io#212

This PR has three main changes

Adding the bloom command group
Adding bloom command metadata files (Example for bf.add below)

3. Adding bloom data type documents

as well Signed-off-by: zackcam <[email protected]>

zuiderkwast

Very interesting!

I skimmed through it very quickly. The documentation itself looks great AFAICT. I can do a more detailed review later.

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install. I think we should mentioned it on the bloom filters topic page with a link to the github repo. The BF command pages should link to that topic page, so the pages are all linked together.

To build man pages, the scripts in this repo need to be able to take multiple command JSON files. This needs to be added to the Makefile, the README and maybe the python scripts too. Please try to build the man pages as described in the README of this repo.

groups.json

zuiderkwast · 2025-02-20T08:33:23Z

Many of the spellcheck errors can be fixed simply but writing the command names in backticks. Stuff in backticks are excluded from spellcheck IIRC.

zackcam · 2025-02-20T18:53:34Z

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Does this seem like something that would be wanted?

zuiderkwast · 2025-02-20T19:07:11Z

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Yes, something like that would be good. In your screenshot it looks like the "Extensions" sub-heading is part of "Module Data Types" though, because of the levels of the headings used. If we do this, then "Module Data Types" should be a level-2 heading and "Bloom Filter" a level-3 heading under it.

How about just mentioning the module within the description? Something like this?

 ## Bloom Filter
 
 [Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.
+Bloom filters are provided by the module `valkey-bloom`.
 For more information, see:

 * [Overview of Bloom Filters](bloomfilters.md)
 * [Bloom filter command reference](../commands/#bloom)
+* [The valkey-bloom module on GitHub](https://github.com/valkey-io/valkey-bloom/)

madolson · 2025-02-20T19:54:25Z

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

zuiderkwast · 2025-02-20T20:37:24Z

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

madolson · 2025-02-20T20:40:45Z

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

I don't have a strong preference one way or the other about flat/nested, so sticking with flat is OK for me.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

Yeah, I guess immediately let's make sure there is something in the JSON file. Maybe Module Required: <link to Bloom>.

madolson

Not a super deep review. I think we should indicate more clearly that the commands are from a module and not part of the core. That can maybe from the json docs only though.

madolson · 2025-02-20T20:43:09Z

commands/bf.add.md

+* key (required) - A Valkey key of Bloom data type
+* item (required) - Item to add


Suggested change

* key (required) - A Valkey key of Bloom data type

* item (required) - Item to add

We typically omit this, since the usage would be included at the top which will indicate if something is required.

Yeah makes sense I removed all these from the bloom commands and if I think the arguments needed explained updated the heading name

madolson · 2025-02-20T20:44:18Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.


Suggested change

Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.

Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.

If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

Updated and made it less wordy as well by removing 'specified' from the description

madolson · 2025-02-20T20:45:03Z

commands/bf.exists.md

@@ -0,0 +1,16 @@
+Determines if a specified item has been added to the specified bloom filter.
+Syntax


Suggested change

Syntax

madolson · 2025-02-20T20:46:16Z

commands/bf.info.md

@@ -0,0 +1,35 @@
+Returns information about a bloomfilter
+
+## Arguments


These need to be kept because they include the info data, but I would change this to be about info fields or something.

madolson · 2025-02-20T20:47:53Z

commands/bf.info.md

+## Arguments
+* key (required) - A valkey key of bloom data type
+* CAPACITY (optional) - Returns the number of unique items that would need to be added before scaling would happen
+* SIZE (optional) - Returns the memory size which is the number of bytes allocated


Suggested change

* SIZE (optional) - Returns the memory size which is the number of bytes allocated

* SIZE (optional) - Returns the number of bytes allocated

Why waste time say lot word when few word do trick?

madolson · 2025-02-20T20:57:57Z

topics/data-types.md

@@ -92,6 +92,14 @@ The [HyperLogLog](hyperloglogs.md) data structures provide probabilistic estimat
 * [Overview of HyperLogLog](hyperloglogs.md)
 * [HyperLogLog command reference](../commands/#hyperloglog)

+## Bloom Filter
+
+[Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.


I would translate this to english with an example.

I tried to make this more understandable but I think potentially having what I use in the exists and mexists commands could also work if the new version still isn't great

madolson · 2025-02-20T20:59:37Z

topics/bloomfilters.md

+
+Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.
+
+## Bloom commands


Are other examples include the "basic commands" up front, and then the more sophisticated commands later. I think we should do the same.

madolson · 2025-02-20T21:00:41Z

topics/bloomfilters.md

+
+**Financial fraud detection**
+
+Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.


Is this a real use case? The false positive here is not idea, since it might make it seem like a transaction is legitimate when it is not.

Updated this use case to be more about card fraud instead of location based checking

madolson · 2025-02-20T21:01:14Z

topics/bloomfilters.md

+
+Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.
+
+For the above each user would have a Bloom filter which is then checked for every transaction.


Might just merge this into the previous paragraph.

madolson · 2025-02-20T21:28:53Z

topics/bloomfilters.md

+
+**Check if URL's are malicious**
+
+Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter. 


Suggested change

Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.

Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

zuiderkwast

Not a complete review.

We need to think about what we want regarding

How to show which module a command belongs to and how to store this in the JSON file(s).
What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

commands/commands

zuiderkwast · 2025-02-20T21:28:26Z

resp2_replies.json

+    "* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",
+    "* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",


With the single quotes it looks a bit like string literals. Use backticks instead to mark it as code? This seems to be how some other commands' integer replies are documented.

Suggested change

"* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",

"* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",

"* [Integer reply](../topics/protocol.md#integers): `1` if the item was successfully added",

"* [Integer reply](../topics/protocol.md#integers): `0` if the item already existed in the bloom filter",

Compare to for example this one:

"CLIENT UNBLOCK": [ "One of the following:", "* [Integer reply](../topics/protocol.md#integers): `0` if the client was unblocked successfully.", "* [Integer reply](../topics/protocol.md#integers): `1` if the client wasn't unblocked." ],

Makes sense, updarted both add and exists in both response files to follow this

zuiderkwast · 2025-02-20T21:30:44Z

topics/bloomfilters.md

+
+Example usage for a default bloom object:
+```
+127.0.0.1:6379> bf.insert validate_scale_fail VALIDATESCALETO 26214301


Uppercase all commands like BF.INSERT and fixed tokens makes it easier to see what is fixed and what is variable.

zuiderkwast · 2025-02-20T21:32:41Z

topics/bloomfilters.md

+
+## Common use cases for bloom filters
+
+**Financial fraud detection**


These look like a sub-headings so I think we should mark them as such. It's semantically more correct. (The others too; not only this one.)

Suggested change

**Financial fraud detection**

### Financial fraud detection

zuiderkwast · 2025-02-20T21:34:42Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
+## Arguments


An empty line before and after headings, before and after bullet lists, etc. makes it more likely to be rendered correctly on website, man pages and github. The all use different markdown implementation with some subtle differences.

Suggested change

## Arguments

## Arguments

topics/bloomfilters.md

zuiderkwast · 2025-02-20T21:47:24Z

resp3_replies.json

+    "[Array reply](../topics/protocol.md#arrays): List of information about the bloom filter.",
+    "When an optional argument is provided:",
+    "* [Integer reply](../topics/protocol.md#integers): argument value",
+    "* [String reply??](../topics/protocol.md#simple-strings): argument value",


Why "String reply??" with double question marks??

that was accidentally left over, we only have a string reply when one of the optional arguments is provided so was meant to come back to this and try and clear up the differences and provide clarity on which case would have a string or integer

resp3_replies.json

madolson · 2025-02-20T23:39:42Z

What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

I think for now we should show the independent modules version number, since we got alignment on that. Internally at AWS we are planning on reviving valkey-io/valkey#408 and posting some suggestions. Once that has alignment, we can maybe add more information about where it's available (i.e. Valkey core since 10.0, valkey-bloom since 1.0)

zackcam · 2025-02-21T07:53:40Z

List of non word choice/ document wording changes
The change to version isn't done in this repo but were discussed on this pr so adding screenshot:

Still looking at how best to determine if a command is from a specific module so that it is easy to expand on for future modules as well (the io pr has not been updated yet to include this module version change I will push that once I find out how to determine between modules)

Man page generation for modules, example for bf.add

For future modules there are only a few places they will need to add to in the make file
Main callout on change they need to do below others should be clear:
Line 187: $(eval VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT) $(FUTURE_MODULE))

…to generate bloom man pages Signed-off-by: zackcam <[email protected]>

zackcam · 2025-02-21T20:17:37Z

New command page example with hyperlink to module repo:

KarthikSubbarao · 2025-03-05T17:36:31Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.


Suggested change

Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.

Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.

KarthikSubbarao · 2025-03-05T17:38:31Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
+
+If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.


By non-standard options, you mean the non default properties. Right?

Suggested change

If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.

To add multiple items to a bloom filter, you can use the BF.MADD or BF.INSERT commands.

If you want to create a bloom filter with non-default properties, use the `BF.INSERT` or `BF.RESERVE` command.

Yeah non standard meant non default, but agree makes more sense to say non default and that keeps it consistent

KarthikSubbarao · 2025-03-05T17:39:34Z

commands/bf.card.md

@@ -0,0 +1,12 @@
+Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter. 


Suggested change

Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.

Returns the cardinality of a Bloom filter which is the number of items that have been successfully added to it.

KarthikSubbarao · 2025-03-05T17:40:38Z

commands/bf.card.md

+1
+127.0.0.1:6379> BF.CARD key
+1
+127.0.0.1:6379> BF.CARD missing


Suggested change

127.0.0.1:6379> BF.CARD missing

127.0.0.1:6379> BF.CARD nonexistentkey

KarthikSubbarao · 2025-03-05T17:43:42Z

commands/bf.exists.md

@@ -0,0 +1,19 @@
+Determines if an item has been added to the bloom filter. 


Suggested change

Determines if an item has been added to the bloom filter.

Determines if an item has been added to the bloom filter previously.

KarthikSubbarao · 2025-03-05T19:02:41Z

commands/bf.insert.md

+* SEED seed - The seed the hash functions will use
+* NONSCALING - Will make it so the filter can not scale
+* VALIDATESCALETO `validatescaleto` - Checks if the filter could scale to this capacity and if not show an error and don’t create the bloom filter
+* ITEMS item - One or more items we will add to the bloom filter


Suggested change

* ITEMS item - One or more items we will add to the bloom filter

* ITEMS item - One or more items to be added to the bloom filter

KarthikSubbarao · 2025-03-05T19:03:35Z

commands/bf.insert.md

+* TIGHTENING `tightening_ratio` - The tightening ratio for the bloom filter
+* SEED seed - The seed the hash functions will use
+* NONSCALING - Will make it so the filter can not scale
+* VALIDATESCALETO `validatescaleto` - Checks if the filter could scale to this capacity and if not show an error and don’t create the bloom filter


Suggested change

* VALIDATESCALETO `validatescaleto` - Checks if the filter could scale to this capacity and if not show an error and don’t create the bloom filter

* VALIDATESCALETO `validatescaleto` - Validates if the filter can scale out and reach to this capacity based on limits and if not, return an error without creating the bloom filter

KarthikSubbarao · 2025-03-05T19:03:50Z

commands/bf.insert.md

+* NOCREATE  - Will not create the bloom filter and add items if the filter does not exist already
+* TIGHTENING `tightening_ratio` - The tightening ratio for the bloom filter
+* SEED seed - The seed the hash functions will use
+* NONSCALING - Will make it so the filter can not scale


@zackcam - you can follow wording from BF.RESERVE

KarthikSubbarao · 2025-03-05T19:04:31Z

commands/bf.insert.md

+* TIGHTENING `tightening_ratio` - The tightening ratio for the bloom filter
+* SEED seed - The seed the hash functions will use


TODO: Add more wording

KarthikSubbarao · 2025-03-05T19:04:57Z

commands/bf.insert.md

+127.0.0.1:6379> BF.INSERT key ITEMS item1 item2
+1) (integer) 1
+2) (integer) 1
+# This does not update the capcity but uses the origianl filters values


Suggested change

# This does not update the capcity but uses the origianl filters values

# This does not update the capacity since the filter already exists. It only adds the provided items.

KarthikSubbarao · 2025-03-05T19:06:39Z

I only reviewed the Command Documentation.

I will need to review the remaining sections next

KarthikSubbarao · 2025-03-06T16:35:30Z

topics/data-types.md

@@ -92,6 +92,17 @@ The [HyperLogLog](hyperloglogs.md) data structures provide probabilistic estimat
 * [Overview of HyperLogLog](hyperloglogs.md)
 * [HyperLogLog command reference](../commands/#hyperloglog)

+## Bloom Filter
+
+[Bloom filters](bloomfilters.md) are a space efficient data type that can tell you if something is definitely not in a set, or it might be in the set. 


Suggested change

[Bloom filters](bloomfilters.md) are a space efficient data type that can tell you if something is definitely not in a set, or it might be in the set.

[Bloom filters](bloomfilters.md) are a space efficient probabilistic data type that can be used to check if item/s are definitely not present in a set, or if they exist within the set (with the configured false positive rate).

KarthikSubbarao · 2025-03-06T16:40:20Z

topics/bloomfilters.md

@@ -0,0 +1,108 @@
+---
+title: "Bloom Filters"


Could we also include the section/s below:

Scaling / Non Scaling Filters and their implications

Added this section and instead of putting the implication in performance added a subsection to the scaling and non scaling section

KarthikSubbarao · 2025-03-06T16:41:22Z

topics/bloomfilters.md

+
+Error rate - 0.01
+
+Expansion - 2


As we mention in command documentation, let us clarify the scaling and non scaling cases where expansion is nil.

KarthikSubbarao · 2025-03-06T16:42:24Z

topics/bloomfilters.md

+12) (integer) 2
+13) Max scaled capacity
+14) (integer) 26214300
+```


We can include advanced / additional properties here as a sub section within the "Default Properties":

Tightening Ratio - We do not recommend tuning this unless there is a specific use case for lower memory usage (with higher false positive) or vice versa.

Seed - This is only useful is a user has a specific 32 byte seed they want their bloom filters to use.

Added these in an advanced properties section.

KarthikSubbarao · 2025-03-06T16:44:46Z

topics/bloomfilters.md

+
+Most bloom commands are O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only work with one 1 item.
+
+There are a few bloom commands that are O(1) as they don't work on items but instead work on the data about the bloom filter itself.


Do you mean BF.CARD and BF.INFO? Maybe you can list the ones you are referring to here

topics/bloomfilters.md

Makefile

Signed-off-by: zackcam <[email protected]>

topics/bloomfilters.md

KarthikSubbarao · 2025-03-11T19:06:23Z

topics/bloomfilters.md

+Capacity - 100
+
+Error rate - 0.01
+
+Expansion - 2


In addition to mentioning the default value, can we follow the standard wording from "command documentation" to have a one liner to explain these properties?

KarthikSubbarao · 2025-03-11T19:09:07Z

topics/bloomfilters.md

+    Introduction to Bloom Filters
+---
+
+The bloom filter data type is taken from a [separate module](https://github.com/valkey-io/valkey-bloom) that users will need to install in order to use. 


We can move this sentence to be after the introduction statement.

Suggested change

The bloom filter data type is taken from a [separate module](https://github.com/valkey-io/valkey-bloom) that users will need to install in order to use.

In Valkey, the bloom filter data type / commands are implemented in the [valkey-bloom module](https://github.com/valkey-io/valkey-bloom) which is an official valkey module compatible with versions 8.0 and above. Users will need to load this module onto their valkey server in order to use this feature.

KarthikSubbarao · 2025-03-11T19:14:16Z

topics/bloomfilters.md

+
+### When should you use scaling vs non-scaling filters
+
+If the data size is known and fixed then using a non-scaling bloom filter is preferred, for example a static dictionary could use a non scaling bloom filter as the amount of items should be fixed. Likewise the reverse case for dynamic data and unknown final sizes is when you should use a scaling bloom filters.   


We can consider briefly explaining benefits of non scaling (better performance and less memory overhead) and its drawbacks - it will error out when it reaches capacity. If you don't want to hit an error and want use-as-you-go capacity, scaling is better, but it uses more memory for the additional capacity which is available. Also, more filters (e.g. >500-1000) means higher command latencies.

KarthikSubbarao · 2025-03-11T19:15:04Z

topics/bloomfilters.md

+
+Seed - The seed used by the bloom filter can be specified by the user in the BF.INSERT command. This property is only useful if you have a specific 32 byte seed that you want your bloom filter to use. By defualt every bloom filter will use a random seed. 
+
+Tightening Ratio - We do not recommend fine tuning this unless there is a specific use case for lower memory usage with higher false positive or vice versa. 


Let's clarify that the BF.INSERT command can help specify this

KarthikSubbarao · 2025-03-12T20:29:45Z

topics/bloomfilters.md

+
+If the data size is known and fixed then using a non-scaling bloom filter is preferred, for example a static dictionary could use a non scaling bloom filter as the amount of items should be fixed. Likewise the reverse case for dynamic data and unknown final sizes is when you should use a scaling bloom filters.   
+
+## Default bloom properties


Optional : We could add a monitoring section and briefly go over the INFO BF command response

section to bloomfilter topic, cleaned up other bloomfilter topic sections

zuiderkwast · 2025-02-21T16:08:34Z

Makefile

+	$(eval VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT))
+	$(eval FINAL_ROOT := $(firstword $(foreach root,$(VALKEY_ROOTS),$(if $(wildcard $(root)/src/commands/$*.json),$(root)))))
+	$(if $(FINAL_ROOT),,$(eval FINAL_ROOT := $(lastword $(VALKEY_ROOTS))))


This looks complicated. We should try to do something more readable.

I don't have a clear idea about how to make this simpler, but maybe we can define the rules differently or maybe it can get more clear if we use $(call ...) instead of $(eval ...).

Or define a separate rule for the bf. commands like this?

$(MAN_DIR)/man3/bf.%.3valkey.gz: commands/bf.%.md ...

For future modules I think making it separate could be detrimental this way any future module will only need to add their equivalent of VALKEY_BLOOM_ROOT to the first eval. Also keeping it in this sort of format makes it so if changes are needed in the future those changes will only be needed here for all commands. I am going to update the last if/eval though so it doesn't actually need to have found a match and will only create on found matches. If this still doesn't sound ideal let me know and I'll look more in depth how to make it clearer (either by changing how this is done or separating steps)

OK, makes sense. Maybe just the first line $(eval VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT)) can be on top level instead of inside the rule?

VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT)

Makefile

zuiderkwast

It starts to look good. I just put a few comments on formatting and such things.

I didn't review the actual docs of the module carefully, because I don't know it very well. To me, it's enough if that part is reviewed by you, the module authors.

zuiderkwast · 2025-03-19T18:34:32Z

README.md

    sudo make install INSTALL_MAN_DIR=/usr/local/share/man

 Prerequisites: GNU Make, Python 3, Python 3 YAML (pyyaml), Pandoc.
-Additionally, the scripts need access to the valkey code repo,
+Additionally, the scripts need access to the valkey and valkey-bloom code repos,


Mention that valkey-bloom is optional and that those pages are excluded if the valkey-bloom path is not provided.

zuiderkwast · 2025-03-19T18:42:47Z

commands/commands

@@ -0,0 +1 @@
+../valkey-doc/commands


Don't add this file (or symlink).

zuiderkwast · 2025-03-19T18:46:05Z

commands/bf.exists.md

+A Bloom filter has two possible responses when you check if an item exists:
+
+* 0 - The item definitely does not exist since with bloom filters, false negatives are not possible.
+
+* 1 - The item exists with a given false positive (`fp`) percentage. There is an `fp` rate % chance that the item does not exist. You can create bloom filters with a more strict false positive rate as needed.


Don't include the reply docs here. They are added in resp2_replies.json and resp3_replies.json so if we add them here to they will appear twice.

zuiderkwast · 2025-03-19T18:52:44Z

commands/bf.info.md

+127.0.0.1:6379> BF.INFO key
+ 1) Capacity
+ 2) (integer) 100
+ 3) Size
+ 4) (integer) 384
+ 5) Number of filters
+ 6) (integer) 1
+ 7) Number of items inserted


This doesn't match the documentation of the field names above. I would expect the field names to be CAPACITY, SIZE, FILTERS etc. rather than "Capacity", "Size", "Number of filters", ...

Btw, why are these uppercase? In the INFO command, the field names are lowercase.

zuiderkwast · 2025-03-19T18:56:17Z

commands/bf.insert.md

+
+## Insert Fields
+
+* CAPACITY `capacity` -  The number of unique items that would need to be added before a scale out occurs or (non scaling) before it rejects addition of unique items. 


Here, CAPACITY is a keyword and capacity is a placeholder for a number?

I suggest we use this formatting instead, with backticks for the keyword and italics for the variable:

Suggested change

* CAPACITY `capacity` - The number of unique items that would need to be added before a scale out occurs or (non scaling) before it rejects addition of unique items.

* `CAPACITY` *capacity* - The number of unique items that would need to be added before a scale out occurs or (non scaling) before it rejects addition of unique items.

zuiderkwast · 2025-03-19T18:58:51Z

commands/bf.mexists.md

+A Bloom filter has two possible responses when you check if an item exists:
+
+* 0 - The item definitely does not exist since with bloom filters, false negatives are not possible.
+
+* 1 - The item exists with a given false positive (`fp`) percentage. There is an `fp` rate % chance that the item does not exist. You can create bloom filters with a more strict false positive rate as needed.


Skip this. Responses are documented in the response JSON files.

(I know, I don't like it. It's unnecessarily complex. I want to move the reply docs into the markdown files some day. But for now, let's just follow the existing structure.)

I think this was wanted to make it explicit how false positive affects the exist command and determining if an item is present. I could try and reword so it explains false positive not based on response but I think the thinking is that showing the response makes it more understandable

The rendered page is showing the response from resp2_responses.json etc. but unfortunately it gets added in the bottom of the web page.

(On the generated man pages, the reply section gets inserted before Examples, which I think is a better place.)

You can keep this text here if you think it's better, and keep it brief in resp{2,3}_replies.json so there is not too much duplicated text.

I agree that it does have some slight duplication but in my opinion I like having this explained as one of the main behaviours of bloom filters is the false positive rate. But am happy to change if others would rather not have the duplication.

I don't mind, but feel free to formulate it in a way so that it doesn't look too much like duplication.

zuiderkwast · 2025-03-19T19:01:00Z

resp2_replies.json

+    "* [Integer reply](../topics/protocol.md#integers): `1` if the item exists in the bloom filter",
+    "* [Integer reply](../topics/protocol.md#integers): `0` if the bloom filter does not exist or the item has not been added to the bloom filter",
+    "",
+    "The command will fail if the wrong number of arguments are provided"


No need to mention error for wrong number of arguments. All commands will return syntax error in this case. This is implicit and we don't need to mention it for every command.

zuiderkwast · 2025-03-19T19:04:22Z

topics/bloomfilters.md

+
+```
+127.0.0.1:6379> info bf
+# bf_bloom_core_metrics


For the INFO command, these section headings match the argument in uppercase, so I would expect # BF here, with a blank line below it.

Suggested change

# bf_bloom_core_metrics

# BF

Are these fields matching redis bloom info fields or are they invented in valkey-bloom?

That title is determined from the bloom module (https://github.com/valkey-io/valkey-bloom/blob/unstable/src/metrics.rs#L17) and that output is exactly what I get when running info bf. I'm pretty sure they were invented in valkey-bloom.

That title is determined from the bloom module

Then maybe valkey-bloom doesn't exactly behave as documented for the INFO command:

Lines can contain a section name (starting with a # character) or a property. All the properties are in the form of field:value terminated by \r\n.

These lines with # are the section names you can also use as argument for fetching a single section. They're not comments.

Or can you do INFO bf_bloom_core_metrics too?

I'm pretty sure they were invented in valkey-bloom.

Then I'm wondering why the prefix of each field is bf_bloom and not just bf? BF stands for bloom filter already, right?

You can just do a certain section so INFO bf_bloom_core_metrics is valid.
I think at some point there was thoughts on expanding the bloom module so wasn't just confined to a bloom filter so wanted to specify this was for bloom in particular.

zuiderkwast · 2025-03-19T19:06:55Z

topics/bloomfilters.md

+
+### Bloom filter core metrics
+
+* bf_bloom_total_memory_bytes: Current total number of bytes used by all bloom filters.


Use backticks on the field names here (and below).

…creation and spelling Signed-off-by: zackcam <[email protected]>

KarthikSubbarao · 2025-03-24T16:04:54Z

topics/bloomfilters.md

+
+### Financial fraud detection
+
+Bloom filters can help answer the question "Has this card been flagged as stolen?", use a bloom filter that has cards reported stolen added to it. Check a card on use that it is not present in the bloom filter. If it isn't then the card is not marked as stolen, if present then a check to the main database can happen or deny the purchase.


Minor rewording:

Bloom filters can be used to answer the question, "Has this card been flagged as stolen?". To do this, use a bloom filter that contains cards reported as stolen. When a card is used, check whether it is present in the bloom filter. If the card is not found, it means it is not marked as stolen. If the card is present in the filter, a check can be made against the main database, or the purchase can be denied.

KarthikSubbarao · 2025-03-24T16:11:37Z

topics/bloomfilters.md

+Bloom filters can help answer the following questions to advertisers:
+* Has the user already seen this ad?
+* Has the user already bought this product?
+
+Use a Bloom filter for every user, storing all bought products. The recommendation engine can then suggest a new product and checks if the product is in the user's Bloom filter.
+
+* If no, the ad is shown to the user and is added to the Bloom filter.
+* If yes, the process restarts and repeats until it finds a product that is not present in the filter.


Bloom filters can help advertisers answer the following questions:

Has the user already seen this ad?

Has the user already purchased this product?

For each user, use a Bloom filter to store all the products they have purchased. The recommendation engine can then suggest a new product and check if it is present in the user's Bloom filter.

If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.

If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.

KarthikSubbarao · 2025-03-24T16:12:09Z

topics/bloomfilters.md

+* If no then we allow access to the site
+* If yes then we can deny access or perform a full check of the URL


Suggested change

* If no then we allow access to the site

* If yes then we can deny access or perform a full check of the URL

* If no, then we allow access to the site

* If yes, then we can deny access or perform a full check of the URL

KarthikSubbarao · 2025-03-24T16:19:48Z

topics/bloomfilters.md

+
+Bloom filters can answer the question: Has this username/email/domain name/slug already been used?
+
+For example for usernames. Use a Bloom filter for every username that has signed up. A new user types in the desired username. The app checks if the username exists in the Bloom filter.


Suggested change

For example for usernames. Use a Bloom filter for every username that has signed up. A new user types in the desired username. The app checks if the username exists in the Bloom filter.

In the username example, we can use use a Bloom filter to track every username that has signed up. When a new user attempts to sign up with their desired username, the app checks if the username exists in the Bloom filter.

KarthikSubbarao · 2025-03-24T16:23:17Z

topics/bloomfilters.md

+
+The difference between scaling and non scaling bloom filters is that scaling bloom filters do not have a fixed capacity, but a capacity that can grow. While non-scaling bloom filters will have a fixed capacity which also means a fixed size. 
+
+When a scaling filter reaches its capacity, adding a new unique item will cause a new bloom filter to be created and added to the vector of bloom filters. This new bloom filter will have a larger capacity (previous bloom filter's capacity * expansion rate of the bloom object).


How can one create a scalable bloom filter?

KarthikSubbarao · 2025-03-24T16:23:39Z

topics/bloomfilters.md

+
+When a scaling filter reaches its capacity, adding a new unique item will cause a new bloom filter to be created and added to the vector of bloom filters. This new bloom filter will have a larger capacity (previous bloom filter's capacity * expansion rate of the bloom object).
+
+When a non scaling filter reaches its capacity, if a user tries to add a new unique item an error will be returned


How can one create a non-scalable bloom filter?

Users can create a non scaling bloom filter using BF.RESERVE and BF.INSERT commands or by changing the default X configuration.

Example:
BF.RESERVE <filter-name> <error-rate> <capacity> NONSCALING.

KarthikSubbarao · 2025-03-24T17:00:57Z

topics/bloomfilters.md

+If the data size is known and fixed then using a non-scaling bloom filter is preferred, for example a static dictionary could use a non scaling bloom filter as the amount of items should be fixed. Likewise the reverse case for dynamic data and unknown final sizes is when you should use a scaling bloom filters.
+
+There are a few benefits for using non scaling filters, a non scaling filter will have better performance than a filter that has scaled out. A non scaling filter also will use less memory for the capacity that is available. However if you don't want to hit an error and want use-as-you-go capacity, scaling is better.


Suggested change

If the data size is known and fixed then using a non-scaling bloom filter is preferred, for example a static dictionary could use a non scaling bloom filter as the amount of items should be fixed. Likewise the reverse case for dynamic data and unknown final sizes is when you should use a scaling bloom filters.

There are a few benefits for using non scaling filters, a non scaling filter will have better performance than a filter that has scaled out. A non scaling filter also will use less memory for the capacity that is available. However if you don't want to hit an error and want use-as-you-go capacity, scaling is better.

If the capacity (number of items we want to add) is known and fixed, using a non-scaling bloom filter is preferred. Likewise the reverse case, if the capacity is unknown / dynamically calculated, using a scaling bloom filters is ideal.

There are a few benefits for using non scaling filters. A non scaling filter will have better performance than a filter that has scaled out several times (e.g. > 100). Also, non scaling filters in general use less memory for a scaling filter that has scaled out several times to hold the same capacity.

However, to ensure you do not hit any capacity related errors, and want use-as-you-go capacity, scaling is better.

KarthikSubbarao · 2025-03-24T17:21:31Z

topics/bloomfilters.md

+</table>
+
+
+As bloom filters have a default expansion of 2 this means all default bloom objects will be scaling. These options are used when not specified explicitly in the commands used to create a new bloom object. For example doing a BF.ADD for a new filter will create a filter with the exact above qualities. These default properties can be configured through configs on the bloom module.


Suggested change

As bloom filters have a default expansion of 2 this means all default bloom objects will be scaling. These options are used when not specified explicitly in the commands used to create a new bloom object. For example doing a BF.ADD for a new filter will create a filter with the exact above qualities. These default properties can be configured through configs on the bloom module.

Since bloom filters have a default expansion of 2, this means all default bloom filter created by default will be scaling. Additionally, the other default properties of a bloom filter creation can be seen in the table above and BF.INFO command response below. These default properties can be configured through configs on the bloom module.

KarthikSubbarao · 2025-03-24T17:21:58Z

topics/bloomfilters.md

+
+
+As bloom filters have a default expansion of 2 this means all default bloom objects will be scaling. These options are used when not specified explicitly in the commands used to create a new bloom object. For example doing a BF.ADD for a new filter will create a filter with the exact above qualities. These default properties can be configured through configs on the bloom module.
+Example of default bloom objects information:


Suggested change

Example of default bloom objects information:

Example of default bloom filter information:

KarthikSubbarao · 2025-03-24T17:26:04Z

topics/bloomfilters.md

+Most bloom commands are O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only work with one 1 item.
+
+As performance can rely on the number of hash functions, choosing the correct capacity and expansion rate can be very important. When you scale out you will be adding more hash functions that will be used. For this reason it is recommended that you should choose a capacity after evaluating your use case as this can avoid several scale outs. 
+
+There are a few bloom commands that are O(1): BF.CARD, BF.INFO, BF.RESERVE, and BF.INSERT (if no items are specified). These commands have constant time complexity since they don't work on items but instead work on the data about the bloom filter itself.


Suggested change

Most bloom commands are O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only work with one 1 item.

As performance can rely on the number of hash functions, choosing the correct capacity and expansion rate can be very important. When you scale out you will be adding more hash functions that will be used. For this reason it is recommended that you should choose a capacity after evaluating your use case as this can avoid several scale outs.

There are a few bloom commands that are O(1): BF.CARD, BF.INFO, BF.RESERVE, and BF.INSERT (if no items are specified). These commands have constant time complexity since they don't work on items but instead work on the data about the bloom filter itself.

The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.

Since performance relies on the number of hash functions, choosing the correct capacity and expansion rate can be important. In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity after evaluating the use case / workload to help avoid several scale outs and reduce the number of checks.

There other bloom filter commands are O(1) time complexity: BF.CARD, BF.INFO, BF.RESERVE, and BF.INSERT (when no items are provided).

KarthikSubbarao · 2025-03-24T17:29:05Z

topics/bloomfilters.md

+
+In Valkey, the bloom filter data type / commands are implemented in the [valkey-bloom module](https://github.com/valkey-io/valkey-bloom) which is an official valkey module compatible with versions 8.0 and above. Users will need to load this module onto their valkey server in order to use this feature.
+
+Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.


TODO: Explain false positive and false negative

KarthikSubbarao · 2025-03-24T19:11:02Z

topics/bloomfilters.md

+bf_bloom_defrag_misses:0
+```
+
+### Bloom filter core metrics


Change object to filter

KarthikSubbarao · 2025-03-24T19:11:12Z

topics/bloomfilters.md

+
+* `bf_bloom_num_items_across_objects`: Current total number of items across all bloom objects.
+
+* `bf_bloom_capacity_across_objects`: Current total number of filters across all bloom objects.


We can update this

Signed-off-by: zackcam <[email protected]>

Adding bloom command meta data and bloom group, adding bloom data type

6f84713

as well Signed-off-by: zackcam <[email protected]>

madolson requested review from zuiderkwast and madolson February 20, 2025 00:37

zackcam mentioned this pull request Feb 20, 2025

Adding functionality for the bloom module to have its commands displayed on the Valkey website valkey-io/valkey-io.github.io#212

Open

1 task

zuiderkwast reviewed Feb 20, 2025

View reviewed changes

groups.json Outdated Show resolved Hide resolved

madolson reviewed Feb 20, 2025

View reviewed changes

zuiderkwast reviewed Feb 20, 2025

View reviewed changes

zackcam force-pushed the main branch from 2121cad to cd80826 Compare February 21, 2025 08:11

First round of rewording and changes to documentation. Added ability …

1f892b9

…to generate bloom man pages Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from cd80826 to 1f892b9 Compare February 21, 2025 20:16

KarthikSubbarao reviewed Mar 5, 2025

View reviewed changes

KarthikSubbarao reviewed Mar 6, 2025

View reviewed changes

zackcam force-pushed the main branch from 9182b75 to b4e71e4 Compare March 7, 2025 19:27

roshkhatri reviewed Mar 8, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Changes based on feedback for bloom commands and documentation

f062a8a

Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from b4e71e4 to f062a8a Compare March 11, 2025 19:24

KarthikSubbarao reviewed Mar 12, 2025

View reviewed changes

Adding aditional field that can be returned by BF.INFO, added monitoring

48d8cd0

section to bloomfilter topic, cleaned up other bloomfilter topic sections

zackcam force-pushed the main branch from a8cbac0 to 48d8cd0 Compare March 12, 2025 23:29

zuiderkwast reviewed Mar 17, 2025

View reviewed changes

zackcam force-pushed the main branch from d8b1637 to bd1aa77 Compare March 18, 2025 18:02

zuiderkwast reviewed Mar 18, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

zackcam force-pushed the main branch from bd1aa77 to a81712e Compare March 18, 2025 18:37

zuiderkwast reviewed Mar 19, 2025

View reviewed changes

Adding table for bloom default properties as well as cleaning up man …

0249aec

…creation and spelling Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from a81712e to 0249aec Compare March 20, 2025 19:51

KarthikSubbarao reviewed Mar 24, 2025

View reviewed changes

zackcam force-pushed the main branch from ae90456 to fa5dccf Compare March 24, 2025 20:45

Topic documentation updates for bloomfilter

47332ac

Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from fa5dccf to 47332ac Compare March 25, 2025 22:55

		* key (required) - A Valkey key of Bloom data type
		* item (required) - Item to add

		@@ -0,0 +1,12 @@
		Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.

	Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
	Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
	If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

		@@ -0,0 +1,16 @@
		Determines if a specified item has been added to the specified bloom filter.
		Syntax

		@@ -0,0 +1,35 @@
		Returns information about a bloomfilter

		## Arguments

	* SIZE (optional) - Returns the memory size which is the number of bytes allocated
	* SIZE (optional) - Returns the number of bytes allocated


		Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.

		## Bloom commands


		Financial fraud detection

		Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.


		Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.

		For the above each user would have a Bloom filter which is then checked for every transaction.


		Check if URL's are malicious

		Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.

	Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.
	Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

		"* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",
		"* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",


		## Common use cases for bloom filters

		Financial fraud detection

	Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
	Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.

-If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.
+To add multiple items to a bloom filter, you can use the BF.MADD or BF.INSERT commands.
+If you want to create a bloom filter with non-default properties, use the `BF.INSERT` or `BF.RESERVE` command.

		@@ -0,0 +1,12 @@
		Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.

	Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.
	Returns the cardinality of a Bloom filter which is the number of items that have been successfully added to it.

	127.0.0.1:6379> BF.CARD missing
	127.0.0.1:6379> BF.CARD nonexistentkey

		@@ -0,0 +1,19 @@
		Determines if an item has been added to the bloom filter.

	* ITEMS item - One or more items we will add to the bloom filter
	* ITEMS item - One or more items to be added to the bloom filter

Adding bloom command meta data, bloom group and bloom data type documentaion #233

Are you sure you want to change the base?

Adding bloom command meta data, bloom group and bloom data type documentaion #233

Conversation

zackcam commented Feb 20, 2025 • edited Loading

This PR has three main changes

zuiderkwast left a comment

Choose a reason for hiding this comment

zuiderkwast commented Feb 20, 2025

zackcam commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

madolson commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

madolson commented Feb 20, 2025

madolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson commented Feb 20, 2025

zackcam commented Feb 21, 2025 • edited Loading

zackcam commented Feb 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KarthikSubbarao Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KarthikSubbarao commented Mar 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zackcam commented Feb 20, 2025 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

zackcam commented Feb 21, 2025 •

edited

Loading

KarthikSubbarao Mar 5, 2025 •

edited

Loading

zuiderkwast Mar 18, 2025 •

edited

Loading

zuiderkwast Mar 20, 2025 •

edited

Loading

	* VALIDATESCALETO `validatescaleto` - Checks if the filter could scale to this capacity and if not show an error and don’t create the bloom filter
	* VALIDATESCALETO `validatescaleto` - Validates if the filter can scale out and reach to this capacity based on limits and if not, return an error without creating the bloom filter

		* TIGHTENING `tightening_ratio` - The tightening ratio for the bloom filter
		* SEED seed - The seed the hash functions will use

	# This does not update the capcity but uses the origianl filters values
	# This does not update the capacity since the filter already exists. It only adds the provided items.

	[Bloom filters](bloomfilters.md) are a space efficient data type that can tell you if something is definitely not in a set, or it might be in the set.
	[Bloom filters](bloomfilters.md) are a space efficient probabilistic data type that can be used to check if item/s are definitely not present in a set, or if they exist within the set (with the configured false positive rate).


		Most bloom commands are O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only work with one 1 item.

		There are a few bloom commands that are O(1) as they don't work on items but instead work on the data about the bloom filter itself.

	The bloom filter data type is taken from a [separate module](https://github.com/valkey-io/valkey-bloom) that users will need to install in order to use.
	In Valkey, the bloom filter data type / commands are implemented in the [valkey-bloom module](https://github.com/valkey-io/valkey-bloom) which is an official valkey module compatible with versions 8.0 and above. Users will need to load this module onto their valkey server in order to use this feature.


		### When should you use scaling vs non-scaling filters

		If the data size is known and fixed then using a non-scaling bloom filter is preferred, for example a static dictionary could use a non scaling bloom filter as the amount of items should be fixed. Likewise the reverse case for dynamic data and unknown final sizes is when you should use a scaling bloom filters.


		Seed - The seed used by the bloom filter can be specified by the user in the BF.INSERT command. This property is only useful if you have a specific 32 byte seed that you want your bloom filter to use. By defualt every bloom filter will use a random seed.

		Tightening Ratio - We do not recommend fine tuning this unless there is a specific use case for lower memory usage with higher false positive or vice versa.


		If the data size is known and fixed then using a non-scaling bloom filter is preferred, for example a static dictionary could use a non scaling bloom filter as the amount of items should be fixed. Likewise the reverse case for dynamic data and unknown final sizes is when you should use a scaling bloom filters.

		## Default bloom properties


		## Insert Fields

		* CAPACITY `capacity` - The number of unique items that would need to be added before a scale out occurs or (non scaling) before it rejects addition of unique items.