Skip to content

Commit

Permalink
feat: Include closed milestones in milestones stream (#300)
Browse files Browse the repository at this point in the history
~⚠️ This may be a disruptive change for some users who are only
interested in open milestones and are relying on the fact that this tap
only pulls open milestones. These existing users would need to apply a
stream map or downstream filter in order to continue excluding closed
milestones.~ Update: we've landed on a way to implement it as optional,
with the stream still extracting only open milestones by default.

**Why is it worth making this potentially disruptive change?**
If a user is using this tap to maintain and update a data source about
milestones and their states/attributes, but each run only extracts data
about open milestones, all past and current milestones will show up as
open whether they are open or closed. Other fields (especially counts of
open/closed issues, closed_at, updated_at) would contain potentially
stale data corresponding to the last time they were observed before they
were closed. By extracting data about all milestones, previously
extracted milestones which are now closed can be updated to a closed
state and have any other changed fields updated as well.

The implementation approach for this change follows the pattern used in
streams like IssuesStream and PullRequestsStream when diverging from the
default query parameters.
  • Loading branch information
TrishGillett authored Oct 17, 2024
1 parent 2dd96ae commit 3002f2b
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 2 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,11 @@ This tap accepts the following configuration options:
- `metrics_log_level`
- `stream_maps`
- `stream_maps_config`
- `rate_limit_buffer` - A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000.
- `expiry_time_buffer` - A buffer used when determining when to refresh Github app tokens. Only relevant when authenticating as a GitHub app. Defaults to 10 minutes. Tokens generated by GitHub apps expire 1 hour after creation, and will be refreshed once fewer than `expiry_time_buffer` minutes remain until the anticipated expiry time.
- `stream_options`: Options which can change the behaviour of a specific stream are nested within.
- `milestones`: Valid options for the `milestones` stream are nested within.
- `state`: Determines which milestones will be extracted. One of `open` (default), `closed`, `all`.
- `rate_limit_buffer`: A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000.
- `expiry_time_buffer`: A buffer used when determining when to refresh GitHub app tokens. Only relevant when authenticating as a GitHub app. Defaults to 10 minutes. Tokens generated by GitHub apps expire 1 hour after creation, and will be refreshed once fewer than `expiry_time_buffer` minutes remain until the anticipated expiry time.

Note that modes 1-3 are `repository` modes and 4-5 are `user` modes and will not run the same set of streams.

Expand Down
9 changes: 9 additions & 0 deletions meltano.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ plugins:
kind: array
- name: user_ids
kind: array
- name: stream_options.milestones.state
kind: options
options:
- label: Open
value: open
- label: Closed
value: closed
- label: All
value: all
- name: start_date
kind: date_iso8601
value: '2010-01-01T00:00:00Z'
Expand Down
13 changes: 13 additions & 0 deletions tap_github/repository_streams.py
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,19 @@ class MilestonesStream(GitHubRestStream):
state_partitioning_keys: ClassVar[list[str]] = ["repo", "org"]
ignore_parent_replication_key = True

def get_url_params(
self, context: dict | None, next_page_token: Any | None
) -> dict[str, Any]:
"""Return a dictionary of values to be used in URL parameterization."""
assert context is not None, f"Context cannot be empty for '{self.name}' stream."
params = super().get_url_params(context, next_page_token)
params["state"] = "open"

if "milestones" in self.config.get("stream_options", {}):
params.update(self.config["stream_options"]["milestones"])

return params

schema = th.PropertiesList(
# Parent Keys
th.Property("repo", th.StringType),
Expand Down
24 changes: 24 additions & 0 deletions tap_github/tap.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,30 @@ def logger(cls) -> logging.Logger:
"streams (such as repositories) if it is not selected but children are"
),
),
th.Property(
"stream_options",
th.ObjectType(
th.Property(
"milestones",
th.ObjectType(
th.Property(
"state",
th.StringType,
description=(
"Configures which states are of interest. "
"Must be one of [open, closed, all], defaults to open."
),
default="open",
allowed_values=["open", "closed", "all"],
),
additional_properties=False,
),
description="Options specific to the 'milestones' stream.",
),
additional_properties=False,
),
description="Options which change the behaviour of a specific stream.",
),
).to_dict()

def discover_streams(self) -> list[Stream]:
Expand Down

0 comments on commit 3002f2b

Please sign in to comment.