Skip to content
This repository has been archived by the owner on Jun 2, 2022. It is now read-only.

Implement updated Writable spec #664

Merged
merged 11 commits into from
Jan 3, 2020

Conversation

MikaelSmith
Copy link
Contributor

Implement the new Writable interface (defined to mirror the Readable interface). Implement Write on simple file interfaces: local, AWS S3, and Google Cloud Storage.

Resolves #620. Also fixes #656.

@MikaelSmith
Copy link
Contributor Author

MikaelSmith commented Dec 16, 2019

This is still a work in progress, but I think the approach is about right. Includes a rewrite of fuse/file to broadly mimic https://github.com/bazil/bolt-mount/blob/master/file.go, as suggested in bazil/fuse#225.

Remaining work:

  • Make the local plugin a developer option that's enabled at compile-time. It was very useful for testing.
  • Implement Write on AWS S3 objects.
  • Add more tests for fuse/file.
  • Revisit TODOs on reading data when file is opened for writing. We don't have buffering for block reading, which would be useful here so we don't reread from APIs with $ costs. Writing a file to Google Cloud Storage is currently a little slow because we repeat the read, then write updated data. In many cases we'll receive a full write of the file as well, so we don't need to read any data; we could delay reading until we get a non-contiguous write. Requiring that we can Read on Opening a file for Writing also means that entries that implement Write but not Read will fail.
  • Implement Writable for external plugins.

@MikaelSmith MikaelSmith force-pushed the writes branch 2 times, most recently from ffc29d6 to 8458a1c Compare December 16, 2019 23:24
Copy link
Contributor

@ekinanp ekinanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, just left a few initial comments.

plugin/types.go Outdated Show resolved Hide resolved
plugin/gcp/storageObject.go Show resolved Hide resolved
cmd/internal/server/core.go Outdated Show resolved Hide resolved
fuse/dir.go Outdated Show resolved Hide resolved
@MikaelSmith MikaelSmith force-pushed the writes branch 3 times, most recently from 75c31ea to 03a397d Compare December 17, 2019 22:34
@MikaelSmith MikaelSmith marked this pull request as ready for review December 17, 2019 23:24
@MikaelSmith MikaelSmith requested a review from a team December 17, 2019 23:24
@ghost ghost requested review from ekinanp and removed request for a team December 17, 2019 23:24
@MikaelSmith
Copy link
Contributor Author

I think this is ready for review again.

fuse/file.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ekinanp ekinanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, still reviewing

api/fs/root.go Outdated Show resolved Hide resolved
fuse/file.go Show resolved Hide resolved
fuse/file.go Outdated Show resolved Hide resolved
fuse/file.go Outdated Show resolved Hide resolved
fuse/file.go Outdated Show resolved Hide resolved
This is a developer-centric option that has to be explicitly enabled
with an environment variable.

By default it mounts `/tmp`, can be configured with `local.basepath` to
mount any other directory.

Signed-off-by: Michael Smith <[email protected]>
Implement the new Writable interface (defined to mirror the Readable
interface). Implement Write on a local file interface for testing.

Also adds fuse/file tests.

Supports puppetlabs-toy-chest#620.

Signed-off-by: Michael Smith <[email protected]>
This makes editors that rely on size read the entire file. Saves size
from the last open on the fs node so we can use it to serve attribute
requests.

Fixes puppetlabs-toy-chest#656.

Signed-off-by: Michael Smith <[email protected]>
Also use a context for S3 get object requests.

Signed-off-by: Michael Smith <[email protected]>
Add `sizeValid` to track whether we need to call `plugin.Size` before
using `size`. This is only needed when opening a file WriteOnly to delay
a Read - to ensure we have the entire file content before Write - so we
only do it when necessary.

Handle `io.EOF` when loading the file in case a call to `Setattr`
increased it beyond the original size but didn't write to fill that
space, and pad with null characters when that happens.

Upgrade journal entries to warnings when they result in an error. FUSE
doesn't relay the actual error messages.

Delay releasing data and clearing entry cache until we release all
writers.

Tweak a few tests so we have a balance of ReadWrite tests that rely on
attribute size or calling Read to determine size. Update
TestTruncateAndWrite to ensure we don't call Read when we've called
Setattr to truncate the file first.

Add additional comments explaining how to use BlockReadable and
Writable, and some code comments to clarify how Open works.

Signed-off-by: Michael Smith <[email protected]>
Copy link
Contributor

@ekinanp ekinanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the conversation on Slack, we want a way to distinguish between file-like and non-file-like entries.

  • File-like entries are entries with content, such that Read returns that content and Write updates it. Non-file-like entries are entries without content, so that "Read" data could represent one thing and the "Written" data could represent another.

  • For plugin author docs: file-like entries are entries with a Size attribute (implicitly entry is also Readable). So an entry has content iff it has a Size attribute. Thus, people should not set size for non-file-like entries.

    • Reason: Size attribute means you can filter those things with find. When would filtering on Size make sense for Docker container logs, cloud function logs, etc?
      • We could always change these semantics later to a is-file-like key in the entry JSON on user demand
  • Implementation:

    • File-like entries are Readable, Writable, and have a Size attribute.

      • Reason: Up to now, we tell people “set size if you know it” for Readable entries. We also do that for things like metadata.json entries, which are not file-like. These semantics ensure that those entries don't accidentally become file-like.
    • For file-like entries, f.data represents the content. For non-file-like entries, f.data represents what's about to be written.

    • For file-like entries, the first Write call loads the entire data (because we're going to overwrite it anyways on a Flush). f.data is then updated appropriately with the written block.

    • Flush should just write whatever f.data is.

    • For Read, f.data should only be returned for file-like entries (and if it exists). Otherwise, we should delegate to plugin.ReadWithAnalytics. That's because f.data represents the content for file-like entries while for other entries, it's separate from what could be read.

@ekinanp
Copy link
Contributor

ekinanp commented Dec 19, 2019

One example of a non-file like entry that may have Read/Write could be a GCP log, where read could return its most recent 1000 entries while write could write a new entry.

@MikaelSmith
Copy link
Contributor Author

Updated.

Clarifies difference in how files with and without Size attributes work
by defining them as file-like and non-file-like and describing what that
means. Primarily enables interaction with files where read and write are
not symmetric.

Signed-off-by: Michael Smith <[email protected]>
@MikaelSmith MikaelSmith force-pushed the writes branch 2 times, most recently from dee1c8d to 8fc111e Compare December 20, 2019 19:26
Copy link
Contributor

@ekinanp ekinanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing the tests next

CONTRIBUTING.md Outdated Show resolved Hide resolved
api/fs/root.go Show resolved Hide resolved
### write
The `write` action lets you write data to an entry. Thus, any command that writes a file also works with these entries.

Note that Wash distinguishes between file-like and non-file-like entries. An entry is file-like if it's readable and writable and defines its size; you can edit it like a file. If it doesn't define a size then it's non-file-like, and trying to open it with a ReadWrite handle will error; reads from it may not return data you previously wrote to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also add a note about checking docs for that entry's write semantics (non-file-like entries only)

docs/docs/index.md Show resolved Hide resolved
docs/docs/external-plugins.md Outdated Show resolved Hide resolved
fuse/file.go Outdated Show resolved Hide resolved
fuse/file.go Show resolved Hide resolved
fuse/file.go Outdated Show resolved Hide resolved
fuse/file.go Outdated Show resolved Hide resolved
plugin/externalPluginEntry.go Show resolved Hide resolved
fuse/file_test.go Outdated Show resolved Hide resolved
fuse/file.go Outdated
return &file{fuseNode: newFuseNode("f", p, e), writers: make(map[fuse.HandleID]struct{})}
}

func (f *file) isFileLike() bool {
Copy link
Contributor

@ekinanp ekinanp Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe isFileLike => isFileLikeEntry ? It's slightly weird seeing file#isFileLike, esp. if you're new to this code.

fuse/file.go Show resolved Hide resolved
fuse/file.go Show resolved Hide resolved
@MikaelSmith MikaelSmith force-pushed the writes branch 3 times, most recently from 74e4ad5 to 7ba3196 Compare January 2, 2020 23:59
docs/docs/index.md Outdated Show resolved Hide resolved
interface allows sending data to the entry.

Wash distinguishes between two different patterns for things you can read and write. It considers
a "file-like" entry to be one with a defined size (so the `size` attribute is set when listing the
Copy link
Contributor

@ekinanp ekinanp Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I agree about using size alone for file-like things. But I think we should remove size from the metadata/data.json files because those aren't really "files" (editing them makes no sense) so they shouldn't be treated as such unless we have a good reason to do so in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think editing metadata does make sense for some things: the metadata is a reflection of a mix of state and configuration, and you may be able to make changes to that configuration. Editing those values to change the configuration would make sense. This is especially true for everything in Kubernetes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but what you write isn't going to strictly match what's read -- there isn't a 1:1 symmetry between the content because we output pretty-printed JSON. But I don't have strong feelings about keeping size around either so up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense.

plugin/types.go Outdated
// entry is only Writable, then only full writes (starting from offset 0) are
// allowed, anything else initiated by the filesystem will result in an error.
//
// It's up to the implementer to decide how much data integrity to guarantee.
Copy link
Contributor

@ekinanp ekinanp Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "data integrity"? Clarifying that would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily around whether writing volumes makes sure an fsync happens so that data is serialized to persistent storage before returning. Not sure how to summarize that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. Sounds like an issue for file-like entries, which could be summed up as plugin.Write(data); data == plugin.Read (ensure that the next Read from the API will fetch the written data). Don't know if that implies "always call fsync" for volumes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing this because I don't think any plugins should explicitly implement something like calling sync after writing. It represents a possible loss of data, but the machine would have to die right after and we'd likely get a connection error if that happened. Things like vim and databases will call Fsync, and we're currently lying to them in some cases that it did something, but for now I'd suggest don't run a database on Wash's filesystem.


var _ = internal.Command(&mockCommand{})

func (suite *ExternalPluginEntryTestSuite) TestWrite() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does Write have so many tests? Seems like it should be symmetric with other InvokeAndWait tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to spend a little time on this tomorrow. I'm not sure the other tests are verifying all the assertions we expect. They also mock at a higher level (the script's InvokeAndWait call) than Write had available originally.

@MikaelSmith MikaelSmith force-pushed the writes branch 2 times, most recently from 5531a82 to adc7bc3 Compare January 3, 2020 01:20
Address review comments to try and clarify code.

Signed-off-by: Michael Smith <[email protected]>
Moves script `InvokeAndWait` behavior to a function on the invocation so
we can setup more complicated invocations before running them. Keeps the
script `InvokeAndWait` as a helper for the most common case.

Signed-off-by: Michael Smith <[email protected]>
Don't set size on console output and metadata files. The plugin system
knows how to get the size just as efficiently, and explicitly setting
the size implies they'll behave like files (which may not be true).

Signed-off-by: Michael Smith <[email protected]>
@MikaelSmith MikaelSmith merged commit 2abcd30 into puppetlabs-toy-chest:master Jan 3, 2020
@MikaelSmith MikaelSmith deleted the writes branch January 3, 2020 19:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linux VIM/VSCode doesn't work with files with unknown content size General Write primitive
2 participants