Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "make gcs fuse directories in place" function #63

Open
delgadom opened this issue May 8, 2020 · 1 comment
Open

Add a "make gcs fuse directories in place" function #63

delgadom opened this issue May 8, 2020 · 1 comment

Comments

@delgadom
Copy link
Member

delgadom commented May 8, 2020

def add_fuse_directory_markers_to_cloud_storage(client, bucket_name, root_path="", pbar=True):
    """
    Create gcsfuse directory markers from a bucket and root path

    Parameters
    -----------
    client : google.cloud.storage.Client
        See the [google.cloud.storage.Client](https://googleapis.dev/python/storage/latest/client.html) docs for help setting this up.
    bucket_name : str
        name of the bucket on gcs
    root_path : str, optional
        prefix of "directories" below which to create the directory markers

    Examples
    ---------

    The following will create directory markers for all directories within gs://my-bucket/path/to/root,
    where directories are indicated by the presence of blobs with directory separators (`'/'`) in the
    path. Empty directories will not be created, since these cannot exist on google cloud storage.

    .. code-block:: python

        >>> client = google.cloud.storage.Client.from_service_account_json('/path/to/cred.json')
        >>> add_fuse_directory_markers_to_cloud_storage(client, 'my-bucket', 'path/to/root/')

    """
    blobs = bucket.list_blobs(prefix=root_path)
    pages = blobs.pages
    if pbar:
        progress_bar = tqdm(pages)
        total_items = 0

    directories = set()

    for page in pages:
        if pbar:
            total_items += page.num_items
            progress_bar.total = total_items
            progress_bar.refresh()

        for blob in page:
            if pbar:
                progress_bar.update()

            dirname = os.path.dirname(blob.name).rstrip("\\/") + "/"

            if dirname not in directories:
                dir_blob = bucket.blob(dirname)
                if not dir_blob.exists():
                    dir_blob.upload_from_string(b"")

            directories.add(dirname)

    if pbar:
        progress_bar.close()
@delgadom
Copy link
Member Author

delgadom commented May 8, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant