Skip to content

Commit

Permalink
Switch from config.json to config.yaml to be consistent with the rest…
Browse files Browse the repository at this point in the history
… of DDEV config files
  • Loading branch information
rpkoller committed Jun 12, 2023
1 parent 33448f5 commit 7025f5b
Show file tree
Hide file tree
Showing 5 changed files with 64 additions and 35 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ This repository provides an addon to use [@Spidergram](https://github.com/autogr
> While it can be used to crawl any website, we (the folks at [Autogram](https://autogram.is/)) designed it specifically for "ten websites in a trench coat" scenarios where a web property encompasses multiple CMSs, multiple domains, and multiple design systems, maintained by multiple teams.
## Installation
1. Create a new directory and go into it. For simplicity reasons I am using the name `spidergram` which equals to the project name. You are able to use any other name here.
1. Create a new directory and go into it. For simplicity reasons I am using the name `spidergram` for this guide. You are able to use any other name here.
```
mkdir spidergram && cd spidergram
```

2. Initialize the DDEV project using the suggested defaults.
2. Initialize the DDEV project using the suggested defaults. By using the defaults the project name will be equal to the directory name.
```
ddev config
```
Expand All @@ -47,7 +47,7 @@ The output should look like that:
$> ddev spidergram status
SPIDERGRAM CONFIG
Config file: /var/www/html/spidergram.config.json
Config file: /var/www/html/spidergram.config.yaml
ARANGODB
Status: online
Expand All @@ -61,21 +61,21 @@ ddev spidergram go https://ddev.com
```

3. The ArangoDB backend could be reached via the URL shown for `ddev spidergram status`. You simply have to copy http://spidergram.ddev.site:8529 into your browser.
4. For more details see the [Spidergram documentation](https://github.com/autogram-is/spidergram/tree/main/docs). All configuration changes are applied to the `spidergram.config.json` file.
4. For more details see the [Spidergram documentation](https://github.com/autogram-is/spidergram/tree/main/docs). All configuration changes are applied to the `spidergram.config.yaml` file.

## Behind the scenes
1. Adds a docker-compose file (`docker-compose.arangodb.yaml`) for ArangoDB.
1. Adds a dockerfile (`Dockerfile.spidergram`) to the web-build folder. It runs a `npm install --global spidergram`, `npx playwright install`, and a `npx playwright install-deps` when the addon is installed.
1. Adds a `spidergram` web command. You only have to call for example `ddev spidergram status` instead of `ddev exec spidergram status`.
1. Adds a `spidergram.config.json` to the project root. The json with the exact file name is mandatory for Spidergram to run. In a `post-start`-hook
1. Adds a `spidergram.config.yaml` to the project root. The json with the exact file name is mandatory for Spidergram to run. In a `post-start`-hook
it is ensured that the URL set in the config.json is in line with the overall project settings. The project name based on $DDEV_PROJECT and the TLD based on $DDEV_TLD gets replaced in the URL by a regex.
1. The `config.spidergram.yaml` file ensures that the Node version is set to version 18.

## TODO
- [ ] Figure out the best approach how to upgrade Spidergram and it's dependencies for an already existing Spidergram DDEV instance and update the README accordingly (_I have to wait for that until the next Spidergram release_).
- [ ] Expand the number of settings in `spidergram.config.json`. At the moment I've only using the default values found at https://github.com/autogram-is/create-spidergram/tree/main/templates
- [ ] Figure out the best approach how to upgrade Spidergram and it's dependencies for an already existing Spidergram DDEV instance and update the README accordingly (_I have to wait for the next Spidergram release being able to test that_).
- [ ] Expand the number of settings in the `spidergram.config.yaml`. At the moment I've only using the default values found at https://github.com/autogram-is/create-spidergram/tree/main/templates

## Contributing
Any feedback about bugs and potential improvements is welcome. PRs that add features, especially when covered with tests, will be applauded.
Any feedback in regard to bugs and potential improvements is welcome.

**Contributed and maintained by [@rpkoller](https://github.com/rpkoller) based on the original [ddev-addon-template](https://github.com/ddev/ddev-addon-template)**
2 changes: 1 addition & 1 deletion config.spidergram.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
nodejs_version: "18"
hooks:
post-start:
- exec: perl -pi -e "s/(https?:\/\/).*\d{4}/\1$DDEV_PROJECT.$DDEV_TLD:8529/g" ./spidergram.config.json
- exec: perl -pi -e "s/(url: https?:\/\/).*\d{4}/\1$DDEV_PROJECT.$DDEV_TLD:8529/g" ./spidergram.config.yaml
5 changes: 3 additions & 2 deletions install.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
name: ddev-spidergram

# list of files and directories listed that are copied into project .ddev directory
Expand All @@ -8,9 +9,9 @@ project_files:
- commands/web/spidergram
- web-build/Dockerfile.spidergram
- docker-compose.arangodb.yaml
- spidergram.config.json
- spidergram.config.yaml
- config.spidergram.yaml

post_install_actions:
- |
mv ./spidergram.config.json ../spidergram.config.json
mv ./spidergram.config.yaml ../spidergram.config.yaml
24 changes: 0 additions & 24 deletions spidergram.config.json

This file was deleted.

52 changes: 52 additions & 0 deletions spidergram.config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
# The included Docker Compose file spins up a no-authentication instance
# of ArangoDB — it's simple to use for local crawling, but it's a good
# idea to set up a "real" server, even if it's running locally, once
# you're done kicking the tires.
arango:
databaseName: db
url: http://spidergram.ddev.site:8529
auth:
username: root
password: db

# These options control the behavior of the Spider when it's actually
# crawling pages and searching them for new links. In a simple config
# file, you can change settings. In a .js or .ts configuration script,
# you can pass in custom URL filtering and response handling functions
# for more control.
spider:
# These options control the behavior of the Spider when it's actually
# crawling pages and searching them for new links. In a simple config
# file, you can change settings. In a .js or .ts configuration script,
# you can pass in custom URL filtering and response handling functions
# for more control.
downloadMimeTypes:
- application/pdf
userAgent: MyCustomSpider
# Links will be labeled with these categories based on
# the section of the page they're found in. Each key is
# a region name, and each value is a CSS selector defining
# the region. By default, the 'regions' property is empty
# and saved links are unlabeled.
urlOptions:
regions:
- header
- footer
- body

# Spidergram uses a global URL normalizer to ensure that the same
# rules are applied consistently and pages aren't re-visited needlessly.
# The default URL normalizer is configurable, but a custom function can be
# passed in instead of this settings object when using a .js or .ts
# configuration script.
urlNormalizer:
# forceProtocol: "https:"
# forceLowercase: "host"
# discardSubdomain: "ww*"
# discardAnchor: true
# discardAuth: true
# discardIndex: "**/{index,default}.{htm,html,aspx,php}"
# discardSearch: "!{page,p}"
# sortSearchParams: true
discardTrailingSlash: true # false by default

0 comments on commit 7025f5b

Please sign in to comment.