Implemented slice finding for all possible combinations #1

iamrohitrc · 2023-01-30T00:03:28Z

No description provided.

venkatesh-sivaraman

As mentioned before, looks great! No rush on this but here are a few suggestions for how to clean this up and merge:

I think there might be a small typo - in the last cell, line 11, I got an error until I changed it to max_score_slice = max_score_slice & (discrete_df[features[index]] == combination[index])
The recursion you wrote is correct, but it might create memory problems later on when we run it on data with hundreds of features. Could you look into refactoring this to use Python iterators instead of returning lists? For example, you can use the yield keyword to return items one by one from a function, and the yield from keyword to yield items from another iterator. Then you can simply loop over the results of the function without casting it to a list, and you'll never have to store the entire set of combinations in memory. I see you've used itertools.product, which conveniently also returns an iterator (but you'd have to remove the list() casting to take advantage of it).
On the same vein, it might be good to only save the top K results instead of saving all of them. I have a function that lets you add scored items to a ranked list, which you can add to your implementation:

def add_to_ranking(ranked_list, item, k, score_idx=-1):
    """
    Adds an item to a ranked list with a maximum of k
    elements. Mutates the original list object.
    
    Args:
        ranked_list: A list of tuples containing ranked
            items.
        item: A tuple representing a new item to add, 
            where the score for the ranking is at index 
            score_idx.
        k: The maximum number of items in the ranked list.
        score_idx: The index of the score value in the tuples
            in the existing ranked_list and the new item.
            
    Returns: the updated ranked list.
    """
    added = False
    for i, comp_item in enumerate(ranked_list):
        if item[score_idx] > comp_item[score_idx]:
            ranked_list.insert(i, item)
            added = True
            break
    if not added and len(ranked_list) < k: ranked_list.append(item)
    return ranked_list[:k]

Once you've done all of those things and confirmed that the code still works, could you move the reusable slice finding code to a Python module (named something like recursive.py)? For the code that's at the top level of the script (where you call populate_slices and then sort), you could wrap that in a new function and then write a docstring for it in the same style as I did with the function above. And you can put any code that might be useful across implementations, like the add_to_ranking and Slice definitions, in a utils.py module so we can easily reuse it later.

Implemented slice finding for all possible combinations

a00c2ba

venkatesh-sivaraman requested changes Jan 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented slice finding for all possible combinations #1

Implemented slice finding for all possible combinations #1

iamrohitrc commented Jan 30, 2023

venkatesh-sivaraman left a comment •

edited

Loading

Implemented slice finding for all possible combinations #1

Are you sure you want to change the base?

Implemented slice finding for all possible combinations #1

Conversation

iamrohitrc commented Jan 30, 2023

venkatesh-sivaraman left a comment • edited Loading

Choose a reason for hiding this comment

venkatesh-sivaraman left a comment •

edited

Loading