Skip to content

Add metrics scraping for additional services #1424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

creatorrr
Copy link
Contributor

@creatorrr creatorrr commented May 20, 2025

User description

Summary

  • include Temporal, LiteLLM and Traefik scrape jobs in Prometheus config
  • document new scrape targets in monitoring README

Testing

  • ruff format
  • ruff check

PR Type

enhancement, documentation


Description

  • Add Prometheus scrape jobs for Temporal, LiteLLM, and Traefik services.

  • Update Prometheus config to separate and clarify scrape targets.

  • Document new scrape targets in monitoring README.


Changes walkthrough 📝

Relevant files
Enhancement
prometheus.yml
Add and organize Prometheus scrape jobs for new services 

monitoring/prometheus/config/prometheus.yml

  • Added separate scrape jobs for Temporal, LiteLLM, and Traefik.
  • Removed Temporal from agents-api scrape targets.
  • Clarified and organized scrape_configs for better maintainability.
  • Included both standard and managed variants for new services.
  • +43/-9   
    Documentation
    README.md
    Document new Prometheus scrape targets in README                 

    monitoring/README.md

  • Documented new Prometheus scrape targets: Temporal, LiteLLM, Traefik.
  • Explained ports and purpose for each new target.
  • Clarified that dashboards will include new metrics automatically.
  • +10/-0   

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.

  • Important

    Add Prometheus scrape jobs for Temporal, LiteLLM, and Traefik, and update documentation.

    • Prometheus Configuration:
      • Add temporal scrape job in prometheus.yml for temporal:15000 and temporal-managed:15000.
      • Add litellm scrape job in prometheus.yml for litellm:4000 and litellm-managed:4000.
      • Add traefik scrape job in prometheus.yml for gateway:8082.
    • Documentation:
      • Update README.md to include new scrape targets: Temporal, LiteLLM, and Traefik.

    This description was created by Ellipsis for 2b77843. You can customize this summary. It will automatically update as commits are pushed.

    @creatorrr creatorrr marked this pull request as ready for review May 20, 2025 19:06
    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Service Availability

    The PR adds new scrape targets for both standard and managed variants of services. Verify that all these services actually exist in the environment and are accessible at the specified ports.

        - targets: ['temporal:15000', 'temporal-managed:15000']
    
    # AIDEV-NOTE: LiteLLM metrics endpoint
    - job_name: litellm
      honor_timestamps: true
      scrape_interval: 5s
      scrape_timeout: 3s
      metrics_path: /metrics
      scheme: http
      follow_redirects: true
      static_configs:
        - targets: ['litellm:4000', 'litellm-managed:4000']

    Copy link
    Contributor

    CI Feedback 🧐

    A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

    Action: Typecheck

    Failed stage: Generate openapi code [❌]

    Failure summary:

    The action failed due to a dependency conflict in the npm packages. Specifically:

  • There's a conflict between different versions of @typespec/compiler:
    - The project requires
    @typespec/compiler@0.61.x
    - But typespec-http-new@1.0.1 requires @typespec/compiler@^1.0.0

  • npm couldn't resolve this conflict and exited with error code 1 (lines 214-242)
  • The error suggests fixing the upstream dependency conflict or using --force or --legacy-peer-deps
    flags

  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    150:  prune-cache: true
    151:  ignore-nothing-to-cache: false
    152:  ##[endgroup]
    153:  Downloading uv from "https://github.com/astral-sh/uv/releases/download/0.7.6/uv-x86_64-unknown-linux-gnu.tar.gz" ...
    154:  [command]/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /home/runner/work/_temp/69e367f4-90bc-48d3-a96b-41a5d77ac065 -f /home/runner/work/_temp/0343e3e3-0156-402e-a48d-4ff68baedb8b
    155:  Added /opt/hostedtoolcache/uv/0.7.6/x86_64 to the path
    156:  Added /home/runner/.local/bin to the path
    157:  Set UV_CACHE_DIR to /home/runner/work/_temp/setup-uv-cache
    158:  Successfully installed uv version 0.7.6
    159:  Searching files using cache dependency glob: **/uv.lock
    160:  /home/runner/work/julep/julep/agents-api/uv.lock
    161:  /home/runner/work/julep/julep/cli/uv.lock
    162:  /home/runner/work/julep/julep/integrations-service/uv.lock
    163:  Found 3 files to hash.
    164:  Trying to restore uv cache from GitHub Actions cache with key: setup-uv-1-x86_64-unknown-linux-gnu-0.7.6-d92603d25acef1c08e643c37cc2475e5e190deb9690356b084828d60043a591f
    165:  ##[warning]Failed to restore: Cache service responded with 422
    166:  No GitHub Actions cache found for key: setup-uv-1-x86_64-unknown-linux-gnu-0.7.6-d92603d25acef1c08e643c37cc2475e5e190deb9690356b084828d60043a591f
    ...
    
    199:  npm warn   @typespec/compiler@"0.61.x" from the root project
    200:  npm warn   7 more (@typespec/events, @typespec/http, @typespec/openapi, ...)
    201:  npm warn
    202:  npm warn Could not resolve dependency:
    203:  npm warn peer @typespec/compiler@"^1.0.0" from @typespec/asset-emitter@0.70.1
    204:  npm warn node_modules/@typespec/asset-emitter
    205:  npm warn   @typespec/asset-emitter@"^0.70.0" from typespec-openapi3-new@1.0.0
    206:  npm warn   node_modules/typespec-openapi3-new
    207:  npm warn
    208:  npm warn Conflicting peer dependency: @typespec/compiler@1.0.0
    209:  npm warn node_modules/@typespec/compiler
    210:  npm warn   peer @typespec/compiler@"^1.0.0" from @typespec/asset-emitter@0.70.1
    211:  npm warn   node_modules/@typespec/asset-emitter
    212:  npm warn     @typespec/asset-emitter@"^0.70.0" from typespec-openapi3-new@1.0.0
    213:  npm warn     node_modules/typespec-openapi3-new
    214:  npm error code ERESOLVE
    215:  npm error ERESOLVE could not resolve
    216:  npm error
    217:  npm error While resolving: @typespec/http@1.0.1
    218:  npm error Found: @typespec/compiler@0.61.2
    219:  npm error node_modules/@typespec/compiler
    220:  npm error   @typespec/compiler@"0.61.x" from the root project
    221:  npm error   peer @typespec/compiler@"~0.61.0" from @typespec/events@0.61.0
    222:  npm error   node_modules/@typespec/events
    223:  npm error     @typespec/events@"0.61.x" from the root project
    224:  npm error     peer @typespec/events@"~0.61.0" from @typespec/sse@0.61.0
    225:  npm error     node_modules/@typespec/sse
    226:  npm error       @typespec/sse@"0.61.x" from the root project
    227:  npm error   6 more (@typespec/http, @typespec/openapi, @typespec/openapi3, ...)
    228:  npm error
    229:  npm error Could not resolve dependency:
    230:  npm error peer @typespec/compiler@"^1.0.0" from typespec-http-new@1.0.1
    231:  npm error node_modules/typespec-http-new
    232:  npm error   typespec-http-new@"npm:@typespec/http@^1.0.1" from the root project
    233:  npm error
    234:  npm error Conflicting peer dependency: @typespec/compiler@1.0.0
    235:  npm error node_modules/@typespec/compiler
    236:  npm error   peer @typespec/compiler@"^1.0.0" from typespec-http-new@1.0.1
    237:  npm error   node_modules/typespec-http-new
    238:  npm error     typespec-http-new@"npm:@typespec/http@^1.0.1" from the root project
    239:  npm error
    240:  npm error Fix the upstream dependency conflict, or retry
    241:  npm error this command with --force or --legacy-peer-deps
    242:  npm error to accept an incorrect (and potentially broken) dependency resolution.
    243:  npm error
    244:  npm error
    245:  npm error For a full report see:
    246:  npm error /home/runner/.npm/_logs/2025-05-20T19_06_38_933Z-eresolve-report.txt
    247:  npm error A complete log of this run can be found in: /home/runner/.npm/_logs/2025-05-20T19_06_38_933Z-debug-0.log
    248:  ##[error]Process completed with exit code 1.
    249:  Post job cleanup.
    

    Copy link
    Contributor

    qodo-merge-for-open-source bot commented May 20, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix service name reference

    The Traefik metrics endpoint is typically exposed at /metrics but the service
    name should be traefik instead of gateway to match Traefik's default container
    name in most deployments. This mismatch could prevent metrics collection.

    monitoring/prometheus/config/prometheus.yml [48-57]

     # AIDEV-NOTE: Traefik gateway metrics endpoint
     - job_name: traefik
       honor_timestamps: true
       scrape_interval: 5s
       scrape_timeout: 3s
       metrics_path: /metrics
       scheme: http
       follow_redirects: true
       static_configs:
    -    - targets: ['gateway:8082']
    +    - targets: ['traefik:8082']
    • Apply / Chat
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion correctly identifies a likely mismatch between the service name 'gateway' and the expected 'traefik' for the Traefik metrics endpoint, which could prevent Prometheus from scraping metrics. This is a moderate-impact fix that ensures correct metrics collection but does not address a critical bug or security issue.

    Medium
    • Update

    Copy link
    Contributor

    @ellipsis-dev ellipsis-dev bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Important

    Looks good to me! 👍

    Reviewed everything up to 2b77843 in 48 seconds. Click for details.
    • Reviewed 81 lines of code in 2 files
    • Skipped 0 files when reviewing.
    • Skipped posting 2 draft comments. View those below.
    • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
    1. monitoring/README.md:52
    • Draft comment:
      Consider updating the README to mention the additional 'managed' endpoints (e.g. temporal-managed, litellm-managed) if they are intended to be scraped, to ensure consistency with the Prometheus config.
    • Reason this comment was not posted:
      Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment suggests updating the README, which is not allowed according to the rules. It does not provide a specific code suggestion or ask for a test to be written. It is purely informative and suggests ensuring consistency, which is not allowed.
    2. monitoring/prometheus/config/prometheus.yml:15
    • Draft comment:
      Consider using YAML anchors or a shared configuration snippet to DRY common fields (scrape_interval, scrape_timeout, metrics_path, scheme, follow_redirects) across scrape jobs.
    • Reason this comment was not posted:
      Confidence changes required: 50% <= threshold 50% None

    Workflow ID: wflow_qZJycnGvDunyohCd

    You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

    Copy link

    gitguardian bot commented Jun 7, 2025

    ⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

    Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

    🔎 Detected hardcoded secret in your pull request
    GitGuardian id GitGuardian status Secret Commit Filename
    17693055 Triggered JSON Web Token 3e9d8ae cli/tests/test_auth.py View secret
    🛠 Guidelines to remediate hardcoded secrets
    1. Understand the implications of revoking this secret by investigating where it is used in your code.
    2. Replace and store your secret safely. Learn here the best practices.
    3. Revoke and rotate this secret.
    4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

    To avoid such incidents in the future consider


    🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants