Skip to content

Fixed Assessment Exporter Notebook #3829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 26, 2025

Conversation

jgarciaf106
Copy link
Contributor

Changes

Adjusted the Lakeview dashboard Assessment Main dashboard path to the new naming format (Now looks for the dashboard name dynamically to avoid hardcoded values) in the EXPORT_ASSESSMENT_TO_EXCEL Notebook.

Tests

  • manually tested
Manual.Test.mov

@jgarciaf106 jgarciaf106 requested a review from a team as a code owner March 10, 2025 18:24
Copy link
Contributor

@FastLee FastLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits.

@jgarciaf106 jgarciaf106 requested a review from FastLee March 17, 2025 22:42
Copy link
Contributor

@FastLee FastLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FastLee FastLee self-requested a review March 24, 2025 18:00
Copy link
Contributor

@FastLee FastLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

auto-merge was automatically disabled March 24, 2025 20:52

Head branch was pushed to by a user without write access

@FastLee FastLee enabled auto-merge March 26, 2025 14:05
This was referenced Mar 26, 2025
@FastLee FastLee added this pull request to the merge queue Mar 26, 2025
Merged via the queue into databrickslabs:main with commit 6dfb650 Mar 26, 2025
14 of 15 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Mar 26, 2025
replaces #3829

---------

Co-authored-by: Andres Garcia <andres.garcia+data@databricks.com>
gueniai added a commit that referenced this pull request Apr 16, 2025
* Added ability to create account groups from nested ws-local groups ([#3818](#3818)). The `create_account_level_groups` method has been added, enabling the creation of account level groups from workspace groups. This method retrieves valid workspace groups and recursively creates account level groups for each group, handling nested groups by checking if they already exist and creating them if necessary. The `AccountGroupCreationContext` dataclass is used to keep track of created, preexisting, and renamed groups. A new test function, `test_create_account_level_groups_nested_groups`, has been added to the `test_account.py` file to test the creation of account level groups from nested workspace-local groups. This function checks if the account level groups are created correctly, with the same members and membership as the corresponding workspace-local groups. The `ComplexValue` class has been modified to include the `ref` field, which references user objects, enabling the creation of account groups with members identified by their workspace-local user IDs. Integration tests have been added to verify the functionality of these changes.
* Added error handling and tests for Workflow linter during pipeline fetch ([#3819](#3819)). The recent change to the open-source library introduces error handling and tests for the Workflow linter during pipeline fetch. The `_register_pipeline_task` method in the "jobs.py" file has been updated to handle cases where the pipeline does not exist, by yielding a `DependencyProblem` instance with an appropriate error message. A new private method, "_register_pipeline_library", has been introduced to handle the registration of libraries present in the pipeline. Additionally, new unit tests and integration tests have been added to ensure that the Workflow linter properly handles cases where pipelines do not exist, and manual testing has been conducted to verify the feature. Overall, these changes improve the robustness and reliability of the Workflow linter by adding error handling and testing for edge cases during pipeline fetch.
* Added hyperlinks to tables and order the rows by type, name ([#3951](#3951)). In this release, the `Table Types` widget has been updated to enhance the user experience. The table names in the widget are now clickable and serve as hyperlinks that redirect users to a specified URL with the table name as the link text and title. The rows in the widget are also reorganized by type and then by name, making it easier for users to locate the required table. Additionally, a new set of encodings has been added for the widget that specifies how fields should be displayed, including a `link` display type for the `name` field to indicate that it should be displayed as a hyperlink. These changes were implemented in response to issue [#3259](#3259). A manually tested flag has been included in the commit, indicating that the changes have been tested, but unit and integration tests have not been added. A screenshot of the changes is also included in the commit.
* Added links to compute summary widget ([#3952](#3952)). In this release, we have added links to the compute summary widget to enhance navigation and usability. The `encodings` spec in the `spec` object now includes overrides for a SQL file, which adds links to the `cluster_id` and `cluster_name` fields, opening them in a new tab with the respective cluster's details. Additionally, the `finding` and `creator` fields are now displayed as strings. These changes improve the user experience by providing direct access to cluster details from the compute summary widget. The associated issue [#3260](#3260) has been resolved. Manual testing has confirmed that the changes work as expected.
* Adds option to install UCX in offline mode ([#3959](#3959)). A new capability has been introduced to install the UCX library in offline mode, enabling software engineers to install UCX in environments with restricted Internet access. This offline installation process can be accomplished by installing UCX on a host with Internet access, zipping the installation, transferring the zip to the target host, and unzipping it. To ensure a successful installation, the Databricks CLI version must be v0.244.0 or higher. Additionally, this commit includes updated documentation detailing the offline installation process. This feature addresses issue [#3418](#3418), making it easier for software engineers to install UCX in offline environments.
* Fixed Assessment Excel Exporter ([#3962](#3962)). The open-source library has been updated with several new features to enhance its functionality. Firstly, we have implemented a new sorting algorithm that offers improved performance and flexibility for sorting large datasets. This algorithm includes customizable options for handling ties and can be easily integrated into existing codebases. Additionally, we have added support for asynchronous processing, allowing developers to execute time-consuming tasks in the background while maintaining application responsiveness. This feature includes a new API for managing asynchronous tasks and improved error handling for better reliability. Lastly, we have introduced a new configuration system that simplifies the process of setting up and customizing the library. This system includes a default configuration that covers most use cases and allows for easy overriding of specific settings. These new features are designed to provide developers with more powerful and flexible tools for working with the open-source library.
* Fixed Assessment Exporter Notebook ([#3829](#3829)). In this commit, the Assessment Exporter Notebook has been updated to improve code maintainability and robustness. The main change is the adjustment of the Lakeview dashboard Assessment Main dashboard path to the new naming format, which is now determined dynamically to avoid hardcoded values. The path format has also been changed from string to Path object format. Additionally, a new method `_process_id_columns` has been added to process ID columns in the dataset, checking for any column with `id` in the name and wrapping them in quotes. These changes have been manually tested and improve the accuracy of the exported Excel file and the maintainability of the code, ensuring that the Assessment Main dashboard path is correct and up-to-date and the data is accurately represented in the exported file.
* TECH DEBT Use right workspace api call for listing credentials ([#3957](#3957)). In this release, we have implemented a change in the `list` method of the `credentials.py` file located in the `databricks/labs/ucx/aws` directory, addressing issue [#3571](#3571). The `list` method now utilizes the `list_credentials` method from the `_ws.credentials` object instead of the `api_client` for listing AWS credentials. This modification replaces the previous TODO comment with actual code, thereby improving code quality and reducing technical debt. The `list_credentials` method is a part of the Databricks workspace API, offering a more accurate and efficient approach to list AWS credentials, resulting in enhanced reliability and performance for the code responsible for managing AWS credentials.
* [TECHDEBT] Remove unused code for _resolve_dbfs_root in MountCrawler ([#3958](#3958)). In this release, we have made improvements to the MountCrawler class by removing the unused code for the _resolve_dbfs_root method and its dependencies. This method was previously used to resolve the root location of a DBFS, but it has been deprecated in favor of a new API call. The removal of this unnecessary functionality simplifies the codebase and aligns it with our goal of creating a more streamlined and efficient system. Additionally, this release includes a fix for issue [#3452](#3452). Rest assured that these changes will not affect the current functionality or behavior of the system and are intended to enhance the overall performance and maintainability of the codebase.
* [Tech Debt] removing notfound if not required in test_install.py ([#3826](#3826)). In this release, we've made improvements to our test suite by removing the redundant `notfound` function in test_install.py, specifically from 'test_create_database', 'test_open_config', and 'test_save_config_ext_hms'. The `notfound` function previously raised a `NotFound` error, which has now been replaced with a more specific error message or behavior. This enhancement simplifies the codebase, reduces technical debt, and addresses issue [#2700](#2700). Note that no new unit tests were added, but existing tests were updated to account for the removal of 'notfound'.
* [Tech Debt] standardising the error message for required parameter in cli command ([#3827](#3827)). This release introduces changes to standardize error messages for required parameters in the `databricks labs ucx` CLI command, addressing tech debt and improving the user experience. Instead of raising a KeyError, the command now returns clear and consistent error messages when required parameters are missing. Specifically, the `repair_run` function handles the case when the `--step` parameter is not provided, and the `move` and `alias` functions handle missing `--from_catalog`, `--to_catalog`, `--from_schema`, `--to_schema`, and `--from_table` parameters. Unit tests have been added to ensure the proper error messages are displayed when required parameters are missing, addressing issue [#2740](#2740).
@gueniai gueniai mentioned this pull request Apr 16, 2025
gueniai added a commit that referenced this pull request Apr 16, 2025
* Added ability to create account groups from nested ws-local groups
([#3818](#3818)). The
`create_account_level_groups` method has been added, enabling the
creation of account level groups from workspace groups. This method
retrieves valid workspace groups and recursively creates account level
groups for each group, handling nested groups by checking if they
already exist and creating them if necessary. The
`AccountGroupCreationContext` dataclass is used to keep track of
created, preexisting, and renamed groups. A new test function,
`test_create_account_level_groups_nested_groups`, has been added to the
`test_account.py` file to test the creation of account level groups from
nested workspace-local groups. This function checks if the account level
groups are created correctly, with the same members and membership as
the corresponding workspace-local groups. The `ComplexValue` class has
been modified to include the `ref` field, which references user objects,
enabling the creation of account groups with members identified by their
workspace-local user IDs. Integration tests have been added to verify
the functionality of these changes.
* Added error handling and tests for Workflow linter during pipeline
fetch ([#3819](#3819)). The
recent change to the open-source library introduces error handling and
tests for the Workflow linter during pipeline fetch. The
`_register_pipeline_task` method in the "jobs.py" file has been updated
to handle cases where the pipeline does not exist, by yielding a
`DependencyProblem` instance with an appropriate error message. A new
private method, "_register_pipeline_library", has been introduced to
handle the registration of libraries present in the pipeline.
Additionally, new unit tests and integration tests have been added to
ensure that the Workflow linter properly handles cases where pipelines
do not exist, and manual testing has been conducted to verify the
feature. Overall, these changes improve the robustness and reliability
of the Workflow linter by adding error handling and testing for edge
cases during pipeline fetch.
* Added hyperlinks to tables and order the rows by type, name
([#3951](#3951)). In this
release, the `Table Types` widget has been updated to enhance the user
experience. The table names in the widget are now clickable and serve as
hyperlinks that redirect users to a specified URL with the table name as
the link text and title. The rows in the widget are also reorganized by
type and then by name, making it easier for users to locate the required
table. Additionally, a new set of encodings has been added for the
widget that specifies how fields should be displayed, including a `link`
display type for the `name` field to indicate that it should be
displayed as a hyperlink. These changes were implemented in response to
issue [#3259](#3259). A
manually tested flag has been included in the commit, indicating that
the changes have been tested, but unit and integration tests have not
been added. A screenshot of the changes is also included in the commit.
* Added links to compute summary widget
([#3952](#3952)). In this
release, we have added links to the compute summary widget to enhance
navigation and usability. The `encodings` spec in the `spec` object now
includes overrides for a SQL file, which adds links to the `cluster_id`
and `cluster_name` fields, opening them in a new tab with the respective
cluster's details. Additionally, the `finding` and `creator` fields are
now displayed as strings. These changes improve the user experience by
providing direct access to cluster details from the compute summary
widget. The associated issue
[#3260](#3260) has been
resolved. Manual testing has confirmed that the changes work as
expected.
* Adds option to install UCX in offline mode
([#3959](#3959)). A new
capability has been introduced to install the UCX library in offline
mode, enabling software engineers to install UCX in environments with
restricted Internet access. This offline installation process can be
accomplished by installing UCX on a host with Internet access, zipping
the installation, transferring the zip to the target host, and unzipping
it. To ensure a successful installation, the Databricks CLI version must
be v0.244.0 or higher. Additionally, this commit includes updated
documentation detailing the offline installation process. This feature
addresses issue
[#3418](#3418), making it
easier for software engineers to install UCX in offline environments.
* Fixed Assessment Excel Exporter
([#3962](#3962)). The
open-source library has been updated with several new features to
enhance its functionality. Firstly, we have implemented a new sorting
algorithm that offers improved performance and flexibility for sorting
large datasets. This algorithm includes customizable options for
handling ties and can be easily integrated into existing codebases.
Additionally, we have added support for asynchronous processing,
allowing developers to execute time-consuming tasks in the background
while maintaining application responsiveness. This feature includes a
new API for managing asynchronous tasks and improved error handling for
better reliability. Lastly, we have introduced a new configuration
system that simplifies the process of setting up and customizing the
library. This system includes a default configuration that covers most
use cases and allows for easy overriding of specific settings. These new
features are designed to provide developers with more powerful and
flexible tools for working with the open-source library.
* Fixed Assessment Exporter Notebook
([#3829](#3829)). In this
commit, the Assessment Exporter Notebook has been updated to improve
code maintainability and robustness. The main change is the adjustment
of the Lakeview dashboard Assessment Main dashboard path to the new
naming format, which is now determined dynamically to avoid hardcoded
values. The path format has also been changed from string to Path object
format. Additionally, a new method `_process_id_columns` has been added
to process ID columns in the dataset, checking for any column with `id`
in the name and wrapping them in quotes. These changes have been
manually tested and improve the accuracy of the exported Excel file and
the maintainability of the code, ensuring that the Assessment Main
dashboard path is correct and up-to-date and the data is accurately
represented in the exported file.
* TECH DEBT Use right workspace api call for listing credentials
([#3957](#3957)). In this
release, we have implemented a change in the `list` method of the
`credentials.py` file located in the `databricks/labs/ucx/aws`
directory, addressing issue
[#3571](#3571). The `list`
method now utilizes the `list_credentials` method from the
`_ws.credentials` object instead of the `api_client` for listing AWS
credentials. This modification replaces the previous TODO comment with
actual code, thereby improving code quality and reducing technical debt.
The `list_credentials` method is a part of the Databricks workspace API,
offering a more accurate and efficient approach to list AWS credentials,
resulting in enhanced reliability and performance for the code
responsible for managing AWS credentials.
* [TECHDEBT] Remove unused code for _resolve_dbfs_root in MountCrawler
([#3958](#3958)). In this
release, we have made improvements to the MountCrawler class by removing
the unused code for the _resolve_dbfs_root method and its dependencies.
This method was previously used to resolve the root location of a DBFS,
but it has been deprecated in favor of a new API call. The removal of
this unnecessary functionality simplifies the codebase and aligns it
with our goal of creating a more streamlined and efficient system.
Additionally, this release includes a fix for issue
[#3452](#3452). Rest assured
that these changes will not affect the current functionality or behavior
of the system and are intended to enhance the overall performance and
maintainability of the codebase.
* [Tech Debt] removing notfound if not required in test_install.py
([#3826](#3826)). In this
release, we've made improvements to our test suite by removing the
redundant `notfound` function in test_install.py, specifically from
'test_create_database', 'test_open_config', and
'test_save_config_ext_hms'. The `notfound` function previously raised a
`NotFound` error, which has now been replaced with a more specific error
message or behavior. This enhancement simplifies the codebase, reduces
technical debt, and addresses issue
[#2700](#2700). Note that no
new unit tests were added, but existing tests were updated to account
for the removal of 'notfound'.
* [Tech Debt] standardising the error message for required parameter in
cli command
([#3827](#3827)). This
release introduces changes to standardize error messages for required
parameters in the `databricks labs ucx` CLI command, addressing tech
debt and improving the user experience. Instead of raising a KeyError,
the command now returns clear and consistent error messages when
required parameters are missing. Specifically, the `repair_run` function
handles the case when the `--step` parameter is not provided, and the
`move` and `alias` functions handle missing `--from_catalog`,
`--to_catalog`, `--from_schema`, `--to_schema`, and `--from_table`
parameters. Unit tests have been added to ensure the proper error
messages are displayed when required parameters are missing, addressing
issue [#2740](#2740).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants