Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB Search Adapter #482

Open
alexander-schranz opened this issue Feb 17, 2025 · 3 comments
Open

MongoDB Search Adapter #482

alexander-schranz opened this issue Feb 17, 2025 · 3 comments
Labels
features New feature or request help wanted Extra attention is needed

Comments

@alexander-schranz
Copy link
Member

alexander-schranz commented Feb 17, 2025

As already mention in the Research Document MongoDB has a search feature, called Atlas Search: https://www.mongodb.com/products/platform/atlas-search

Beside our existing lists of Search Engine I think MongoDB would be also a interesting one, designed as a documented based database I think it could be a nice fit for SEAL.

Creating a new adapter should maybe be straight forward https://php-cmsig.github.io/search/cookbooks/create-own-adapter.html sadly I'm not familiar with mongodb ecosystem and how self hosted and atlas hosted version are differently and what would be keep in mind when implementing a search, if one adapter can support both atlas and self hosted or if atlas would be its own adapter.

Maybe @GromNaN or @alcaeus can support here how we can achieve maybe a mongodb adapter.

@alexander-schranz alexander-schranz added features New feature or request help wanted Extra attention is needed labels Feb 17, 2025
@GromNaN
Copy link

GromNaN commented Feb 17, 2025

Of course, that's an excellent idea. Atlas Search is using a Lucene index next to a MongoDB database. You store BSON documents in a MongoDB collection, and they are asynchronously indexed into the search indexes. The search results are documents from the MongoDB collection. This is perfect to avoid manual replication between the main database and the search engine. It can also be used as a search engine on its own.

In order to use MongoDB in a PHP project, you need to install the ext-mongodb extension and the mongodb/mongodb library: Installation instructions.

MongoDB Atlas Search server is available on Atlas Cloud (including with a forever-free cluster). For dev and test, we have as docker image. Example of configuration that I use: docker-composer.yaml. The Atlas search feature will land in the community edition of MongoDB in the coming months.

You can play with MongoDB Search features using:

You can also check the Laravel Scout Engine for MongoDB.

@alexander-schranz
Copy link
Member Author

@GromNaN thank you for the quick response, the resources helps and the docker image is awesome we use docker compose file for the Github CI here nice that we could do that with the mongodb also.

Some questions I have you maybe can quickly answer:

  1. Can the documents identifier be anything? (number, uuid, custom identifier)?
  2. In SEAL we have strict mapping looks like we can force such things via dynamic: true. And define for every field its type and if it is searchable or not. How are nested arrays and arrays of objects handled. We have such complex documents:
    * blocks?: array<array{
    * type: string,
    * title?: string|null,
    * description?: string|null,
    * media?: int[]|string,
    * }>,
    . Depending on some search engines we "flatten" the object to things like "block.text.title": ["Title", "Title 2"] for that one which don't support nesting but for search engines like elasticsearch which do support it we keep the nested array. What is your recommendation for MongoDB here?
  3. Combine the search with typical filters like equal, greater than, lower than, nesting filter via AND and OR conditions are supported?

@GromNaN
Copy link

GromNaN commented Feb 18, 2025

  1. Can the documents identifier be anything? (number, uuid, custom identifier)?

In MongoDB, the identifier field is always _id. It can be any BSON type, including array or subdocument. By default, the driver generates a MongoDB\BSON\ObjectId when the _id field is not set.

  1. In SEAL we have strict mapping looks like we can force such things via dynamic: true. And define for every field its type and if it is searchable or not.

To enforce strict mapping, you must set dynamic: false and set a list of fields with their types.

$collection->createSearchIndex(
    [
        'mappings' => [
            'dynamic' => false,
            'fields' => [
                ['rootField' => ['type' => '<field-type>']],
                ['nested.field.path' => ['type' => '<field-type>']],
                // ...
            ],
        ],
    ],
);

The dot notation is used for the path of nested fields (subdocuments) in query and index definition. Index and query field path is the same for lists and simple values (similar to ElasticSearch), the indexed field can contain a single value or a list of values.

How are nested arrays and arrays of objects handled. What is your recommendation for MongoDB here?

MongoDB supports nested documents, you can insert documents like this:

$collection->insertOne([
    '_id' => 'using a string is possible',
    'date' => new \MongoDB\BSON\UTCDateTime(new DateTime()),
    'bool' => true,
    'float' => 2.1,
    'int' => 10,
    'list_of_ints' => [1, 2, 3, 4],
    'string' => 'Hello',
    'subdocument' => ['foo' => 'bar'],
    'list of subdocuments' => [
        ['_id' => new ObjectId(), 'foo' => 'bar'],
        ['_id' => new ObjectId(), 'foo' => 'baz'],
    ],
    'bigint' => new \MongoDB\BSON\Int64('9007199254740991'),
])

Dates must be converted to MongoDB\BSON\UTCDateTime.

  1. Combine the search with typical filters like equal, greater than, lower than, nesting filter via AND and OR conditions are supported?

I would recommend using the compound operator to combine criteria: this operators has 4 lists of conditions:

  • filter and mustNot are strict conditions
  • should and must contribute to the ranking score
$collection->aggregate([
  [
    '$search' => [
      'compound' => {
        'must' => [[
          'text' => [
            'query' => 'varieties',
            'path' => 'description',
          ],
        ]],
        'mustNot' => [[
          'text' => [
            'query' => 'apples',
            'path' => 'description',
          ],
        ]],
      ],
    ],
  ],
], ['typeMap' => ['root' => 'array', 'document' => 'array', 'array' => 'array']]);

Also, I recommend using this typeMap option to retrieve the results as nested arrays. Otherwise you get BSON objects that are more memory-efficient but less conventional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
features New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants