-
Notifications
You must be signed in to change notification settings - Fork 37
Support tuples as sample IDs in Bootstrapping #362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
34d3aad
6f74634
f549140
96e9795
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -887,6 +887,23 @@ def reset(self) -> None: | |
self._metric.reset() | ||
return super().reset() | ||
|
||
@staticmethod | ||
def _convert_tuples(ids: List[Tuple[str, int]]) -> np.ndarray: | ||
|
||
sample_tuple = ids[0] | ||
dtype_tuple = [] | ||
|
||
for i, tuple_elem in enumerate(sample_tuple): | ||
if isinstance(tuple_elem, str): | ||
max_len = max(len(str(el[i])) for el in ids) | ||
dtype_tuple.append((f"field{i}", f"U{max_len}")) | ||
else: | ||
dtype_tuple.append((f"field{i}", type(tuple_elem))) | ||
|
||
ids = np.array(ids, dtype=dtype_tuple) | ||
ids = [tuple(x) for x in ids] | ||
return ids | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you returning here a list of tuples or numpy array? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When applying only line 903 the results numpy array includes tuples but they have a problem of being hashable and I was getting error in the line : The suggestion for the fix I found was to updated their types to tuple that is supported in the list and not in array. If I am trying to convert them after to np.array again the tuples are broken to separate elements and we having 2D array |
||
|
||
def eval( | ||
self, results: Dict[str, Any] = None, ids: Optional[Sequence[Hashable]] = None | ||
) -> Dict[str, Any]: | ||
|
@@ -902,7 +919,11 @@ def eval( | |
raise Exception( | ||
"Error: confidence interval is supported only when a unique identifier is specified. Add key 'id' to your data" | ||
) | ||
ids = np.array(ids) | ||
|
||
if isinstance(ids[0], tuple): | ||
ids = self._convert_tuples(ids) | ||
else: | ||
ids = np.array(ids) | ||
|
||
rnd = np.random.RandomState(self._rnd_seed) | ||
original_sample_results = self._metric.eval(results, ids=ids) | ||
|
@@ -920,7 +941,11 @@ def eval( | |
stratum_filter = stratum_id == stratum | ||
n_stratum = sum(stratum_filter) | ||
random_sample = rnd.randint(0, n_stratum, size=n_stratum) | ||
sampled_ids[stratum_filter] = ids[stratum_filter][random_sample] | ||
|
||
flt_indx = np.where(stratum_filter)[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You needed to move to "for loop implementation" even though you converted it ids to numpy array? |
||
for i, idx in enumerate(random_sample): | ||
sampled_ids[flt_indx[i]] = ids[flt_indx[idx]] | ||
|
||
boot_results.append(self._metric.eval(results, sampled_ids)) | ||
|
||
# results can be either a list of floats or a list of dictionaries | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you assuming tuple of str and int?
Can we do something more general?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. The method of convert is supporting tuple of everything of any length, I need to change the interface definition