Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented slice finding for all possible combinations #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iamrohitrc
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@venkatesh-sivaraman venkatesh-sivaraman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, looks great! No rush on this but here are a few suggestions for how to clean this up and merge:

  1. I think there might be a small typo - in the last cell, line 11, I got an error until I changed it to max_score_slice = max_score_slice & (discrete_df[features[index]] == combination[index])
  2. The recursion you wrote is correct, but it might create memory problems later on when we run it on data with hundreds of features. Could you look into refactoring this to use Python iterators instead of returning lists? For example, you can use the yield keyword to return items one by one from a function, and the yield from keyword to yield items from another iterator. Then you can simply loop over the results of the function without casting it to a list, and you'll never have to store the entire set of combinations in memory. I see you've used itertools.product, which conveniently also returns an iterator (but you'd have to remove the list() casting to take advantage of it).
  3. On the same vein, it might be good to only save the top K results instead of saving all of them. I have a function that lets you add scored items to a ranked list, which you can add to your implementation:
def add_to_ranking(ranked_list, item, k, score_idx=-1):
    """
    Adds an item to a ranked list with a maximum of k
    elements. Mutates the original list object.
    
    Args:
        ranked_list: A list of tuples containing ranked
            items.
        item: A tuple representing a new item to add, 
            where the score for the ranking is at index 
            score_idx.
        k: The maximum number of items in the ranked list.
        score_idx: The index of the score value in the tuples
            in the existing ranked_list and the new item.
            
    Returns: the updated ranked list.
    """
    added = False
    for i, comp_item in enumerate(ranked_list):
        if item[score_idx] > comp_item[score_idx]:
            ranked_list.insert(i, item)
            added = True
            break
    if not added and len(ranked_list) < k: ranked_list.append(item)
    return ranked_list[:k]
  1. Once you've done all of those things and confirmed that the code still works, could you move the reusable slice finding code to a Python module (named something like recursive.py)? For the code that's at the top level of the script (where you call populate_slices and then sort), you could wrap that in a new function and then write a docstring for it in the same style as I did with the function above. And you can put any code that might be useful across implementations, like the add_to_ranking and Slice definitions, in a utils.py module so we can easily reuse it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants