Skip to content

add timeout to requests and optimize directory traversal #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AliSoua
Copy link

@AliSoua AliSoua commented Jul 21, 2025

Performance Optimization: Early Skipping of Excluded Directories

This update introduces an optimization in the repository traversal logic to improve performance by avoiding unnecessary processing of excluded directories.

Problem

Previously, directories that matched an exclusion pattern (e.g., vendor/, node_modules/, dist/) were still being entered and traversed. Although their files were later excluded, this resulted in:

  • Unnecessary file system access
  • Redundant API calls
  • Increased processing time

This behavior led to performance degradation, especially in repositories with large dependency or build directories.

Performance Before Optimization

The application spent significant time processing excluded directories, even when they contained hundreds or thousands of irrelevant files:

Old Performance


Solution

The new implementation introduces a pre-check that evaluates whether a directory matches any exclude pattern before entering it. If a match is found, the traversal skips that directory entirely, eliminating further file reads and API calls within it.

Performance After Optimization

With early exclusion applied, the processing time and API calls were significantly reduced:

New Performance


Summary of Changes

  • Added timeout to requests.get() calls to handle network instability more robustly.
  • Optimized recursive traversal logic to skip excluded directories before processing their contents.

This optimization enhances both performance and scalability, particularly when analyzing large repositories.

- Added timeout=(30, 30) to requests.get for improved network robustness
- Avoid unnecessary recursion into excluded directories to reduce API calls
@AliSoua
Copy link
Author

AliSoua commented Jul 21, 2025

Improved Network Robustness with Request Timeouts

The requests.get() call now includes an explicit timeout:

response = requests.get(url, headers=headers, timeout=(30, 30))

@zachary62
Copy link
Member

I'm on vocation but will check it out this weekend/next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants