aboutsummaryrefslogtreecommitdiff
path: root/benchmark
AgeCommit message (Collapse)AuthorFilesLines
2024-02-29fix(benchmark/reports): Resolve error in format.py on `attempt.cost` is `None`Gravatar Reinier van der Leer 1-1/+2
2024-02-20fix(benchmark/reports): Make format.py executableGravatar Reinier van der Leer 1-0/+2
2024-02-20fix(benchmark/challenges): Improve spec and eval of TicTacToe challengeGravatar Albert Örwall 2-2/+2
* In challenge specification, specify `subprocess.PIPE` for `stdin` and `stderr` for completeness * Additional tweak: let Pytest load only the current file when running the test file as a script Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2024-02-19feat(benchmark): Add reports/format.py script to convert report.json to markdownGravatar Reinier van der Leer 1-0/+136
2024-02-19feat(benchmark): Include Steps in ReportGravatar Reinier van der Leer 4-1/+16
2024-02-18debug(benchmark): Improve `TestResult` validation error output formatGravatar Reinier van der Leer 1-5/+8
2024-02-17debug(benchmark): Add more debug code to pinpoint cause of rare crashGravatar Reinier van der Leer 2-15/+23
Target: https://github.com/Significant-Gravitas/AutoGPT/actions/runs/7941977633/job/21684817491
2024-02-17debug(benchmark): Make sure `TestResult` validator error output is ↵Gravatar Reinier van der Leer 1-1/+1
sufficient to debug
2024-02-17debug(benchmark): Add log statement to validator on `TestResult`Gravatar Reinier van der Leer 1-0/+8
Validation errors don't mention the values causing the error, making it hard to debug. This happened a few times in autogpts-benchmark.yml, so let's put this log statement here until we figure out what makes it crash.
2024-02-16fix(benchmark): Fix `TestResult.fail_reason` assignment conditionGravatar Reinier van der Leer 1-1/+1
The condition must be the same as for `success`, because otherwise it causes a crash when `call.excinfo` evaluates to `False` but is not `None`.
2024-02-16fix(benchmark): Unbreak `-N`/`--attempts` optionGravatar Reinier van der Leer 3-4/+4
2024-02-16feat(benchmark): Get agent task cost from `Step.additional_output`Gravatar Reinier van der Leer 3-0/+18
2024-02-16feat(benchmark/report): Add and record `TestResult.n_steps`Gravatar Reinier van der Leer 4-0/+9
- Added `n_steps` attribute to `TestResult` type - Added logic to record the number of steps to `BuiltinChallenge.test_method`, `WebArenaChallenge.test_method`, and `.reports.add_test_result_to_report`
2024-02-16lint(benchmark): Remove unnecessary `pass` statement in __main__.pyGravatar Reinier van der Leer 1-1/+0
2024-02-16fix(benchmark): Include `WebArenaSiteInfo.additional_info` (e.g. ↵Gravatar Reinier van der Leer 1-7/+19
credentials) in task input Without the `additional_info`, it is impossible to get past the login page on challenges where that is necessary.
2024-02-16feat(benchmark/cli): Add `challenge list`, `challenge info` subcommandsGravatar Reinier van der Leer 5-6/+219
- Add `challenge list` command with options `--all`, `--names`, `--json` - Add `tabular` dependency - Add `.utils.utils.sorted_by_enum_index` function to easily sort lists by an enum value/property based on the order of the enum's definition - Add `challenge info [name]` command with option `--json` - Add `.utils.utils.pretty_print_model` routine to pretty-print Pydantic models - Refactor `config` subcommand to use `pretty_print_model`
2024-02-16refactor(benchmark): `load_webarena_challenges`Gravatar Reinier van der Leer 2-22/+43
- Reduce duplicate and nested statements - Add `skip_unavailable` parameter Related changes: - Add `available` and `unavailable_reason` attributes to `ChallengeInfo` and `WebArenaChallengeSpec` - Add `pytest.skip` statement to `WebArenaChallenge.test_method` to make sure unavailable challenges are not run
2024-02-15feat(benchmark): Make report output folder configurableGravatar Reinier van der Leer 5-8/+15
- Make `AgentBenchmarkConfig.reports_folder` directly configurable (through `REPORTS_FOLDER` env variable). The default is still `./agbenchmark_config/reports`. - Change all mentions of `REPORT_LOCATION` (which fulfilled the same function at some point in the past) to `REPORTS_FOLDER`.
2024-02-14lint(benchmark): Remove unused importsGravatar Reinier van der Leer 2-2/+1
2024-02-14fix(benchmark): Mock mode, python evals, `--attempts` flag, challenge ↵Gravatar Reinier van der Leer 6-44/+63
definitions - Fixed `--mock` mode - Moved interrupt to beginning of the step iterator pipeline (from `BuiltinChallenge` to `agent_api_interface.py:run_api_agent`). This ensures that any finish-up code is properly executed after executing a single step. - Implemented mock mode in `WebArenaChallenge` - Fixed `fixture 'i_attempt' not found` error when `--attempts`/`-N` is omitted - Fixed handling of `python`/`pytest` evals in `BuiltinChallenge` - Disabled left-over Helicone code (see 056163e) - Fixed a couple of challenge definitions - WebArena task 107: fix spelling of months (Sepetember, Octorbor *lmao*) - synthesize/1_basic_content_gen (SynthesizeInfo): remove empty string from `should_contain` list - Added some debug logging in agent_api_interface.py and challenges/builtin.py
2024-02-13chore(benchmark): Update `python-multipart` dependency to mitigate vulnerabilityGravatar Reinier van der Leer 2-6/+6
- python-multipart vulnerable to Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/55
2024-02-13chore(benchmark): Update `aiohttp` and `fastapi` dependencies to mitigate ↵Gravatar Reinier van der Leer 2-92/+92
vulnerabilities Addressed vulnerabilities: - python-multipart vulnerable to Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/55 Dependants: - FastAPI Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/53 - Starlette Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/48 - aiohttp is vulnerable to directory traversal - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/46 - aiohttp's HTTP parser (the python one, not llhttp) still overly lenient about separators - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/43
2024-01-22feat(benchmark): Add `-N`, `--attempts` option for multiple attempts per ↵Gravatar Reinier van der Leer 12-137/+177
challenge LLMs are probabilistic systems. Reproducibility of completions is not guaranteed. It only makes sense to account for this, by running challenges multiple times to obtain a success ratio rather than a boolean success/failure result. Changes: - Add `-N`, `--attempts` option to CLI and `attempts_per_challenge` parameter to `main.py:run_benchmark`. - Add dynamic `i_attempt` fixture through `pytest_generate_tests` hook in conftest.py to achieve multiple runs per challenge. - Modify `pytest_runtest_makereport` hook in conftest.py to handle multiple reporting calls per challenge. - Refactor report_types.py, reports.py, process_report.ty to allow multiple results per challenge. - Calculate `success_percentage` from results of the current run, rather than all known results ever. - Add docstrings to a number of models in report_types.py. - Allow `None` as a success value, e.g. for runs that did not render any results before being cut off. - Make SingletonReportManager thread-safe.
2024-01-19feat(benchmark): JungleGym WebArena (#6691)Gravatar Reinier van der Leer 4-1/+1005
* feat(benchmark): Add JungleGym WebArena challenges - Add `WebArenaChallenge`, `WebArenaChallengeSpec`, and other logic to make these challenges work - Add WebArena challenges to Pytest collection endpoint generate_test.py * feat(benchmark/webarena): Add hand-picked selection of WebArena challenges
2024-01-19fix(benchmark/report): Fix and clean up logic in ↵Gravatar Reinier van der Leer 1-8/+5
`update_challenges_already_beaten` - `update_challenges_already_beaten` incorrectly marked challenges as beaten if it was present in the report file but set to `false`
2024-01-19fix(benchmark): Fix challenge input artifact uploadGravatar Reinier van der Leer 1-1/+3
2024-01-18refactor(benchmark): Interface & type consoledation, and arch change, to ↵Gravatar Reinier van der Leer 16-814/+923
allow adding challenge providers Squashed commit of the following: commit 7d6476d3297860f74c276d571da995d958a8cc1a Author: Reinier van der Leer <pwuts@agpt.co> Date: Tue Jan 9 18:10:45 2024 +0100 refactor(benchmark/challenge): Set up structure to support more challenge providers - Move `Challenge`, `ChallengeData`, `load_challenges` to `challenges/builtin.py` and rename to `BuiltinChallenge`, `BuiltinChallengeSpec`, `load_builtin_challenges` - Create `BaseChallenge` to serve as interface and base class for different challenge implementations - Create `ChallengeInfo` model to serve as universal challenge info object - Create `get_challenge_from_source_uri` function in `challenges/__init__.py` - Replace `ChallengeData` by `ChallengeInfo` everywhere except in `BuiltinChallenge` - Add strong typing to `task_informations` store in app.py - Use `call.duration` in `finalize_test_report` and remove `timer` fixture - Update docstring on `challenges/__init__.py:get_unique_categories` - Add docstring to `generate_test.py` commit 5df2aa7939b45d85a2c2b5de9ac0522330d1502a Author: Reinier van der Leer <pwuts@agpt.co> Date: Tue Jan 9 16:58:01 2024 +0100 refactor(benchmark): Refactor & rename functions in agent_interface.py and agent_api_interface.py - `copy_artifacts_into_temp_folder` -> `copy_challenge_artifacts_into_workspace` - `copy_agent_artifacts_into_folder` -> `download_agent_artifacts_into_folder` - Reorder parameters of `run_api_agent`, `copy_challenge_artifacts_into_workspace`; use `Path` instead of `str` commit 6a256fef4c7950b7ee82fb801e70c83afe6b6f8b Author: Reinier van der Leer <pwuts@agpt.co> Date: Tue Jan 9 16:02:25 2024 +0100 refactor(benchmark): Refactor & typefix report generation and handling logic - Rename functions in reports.py and ReportManager.py to better reflect what they do - `get_previous_test_results` -> `get_and_update_success_history` - `generate_single_call_report` -> `initialize_test_report` - `finalize_reports` -> `finalize_test_report` - `ReportManager.end_info_report` -> `SessionReportManager.finalize_session_report` - Modify `pytest_runtest_makereport` hook in conftest.py to finalize the report immediately after the challenge finishes running instead of after teardown - Move result processing logic from `initialize_test_report` to `finalize_test_report` in reports.py - Use `Test` and `Report` types from report_types.py where possible instead of untyped dicts: reports.py, utils.py, ReportManager.py - Differentiate `ReportManager` into `SessionReportManager`, `RegressionTestsTracker`, `SuccessRateTracker` - Move filtering of optional challenge categories from challenge.py (`Challenge.skip_optional_categories`) to conftest.py (`pytest_collection_modifyitems`) - Remove unused `scores` fixture in conftest.py commit 370d6dbf5df75d78e3878877968e8cd309d6d7fb Author: Reinier van der Leer <pwuts@agpt.co> Date: Tue Jan 9 15:16:43 2024 +0100 refactor(benchmark): Simplify models in report_types.py - Removed ForbidOptionalMeta and BaseModelBenchmark classes. - Changed model attributes to optional: `Metrics.difficulty`, `Metrics.success`, `Metrics.success_percentage`, `Metrics.run_time`, and `Test.reached_cutoff`. - Added validator to `Metrics` model to require `success` and `run_time` fields if `attempted=True`. - Added default values to all optional model fields. - Removed duplicate imports. - Added condition in process_report.py to prevent null lookups if `metrics.difficulty` is not set.
2024-01-16chore(benchmark): Upgrade OpenAI client lib from v0 to v1Gravatar Reinier van der Leer 4-23/+33
2024-01-16refactor(benchmark): Disable Helicone integrationsGravatar Reinier van der Leer 5-160/+123
We want to upgrade the OpenAI library, but `helicone` does not support `openai@^1.0.0`, so we're disabling the Helicone integration for now.
2024-01-02AGBenchmark codebase clean-up (#6650)Gravatar Reinier van der Leer 46-7749/+2119
* refactor(benchmark): Deduplicate configuration loading logic - Move the configuration loading logic to a separate `load_agbenchmark_config` function in `agbenchmark/config.py` module. - Replace the duplicate loading logic in `conftest.py`, `generate_test.py`, `ReportManager.py`, `reports.py`, and `__main__.py` with calls to `load_agbenchmark_config` function. * fix(benchmark): Fix type errors, linting errors, and clean up CLI validation in __main__.py - Fixed type errors and linting errors in `__main__.py` - Improved the readability of CLI argument validation by introducing a separate function for it * refactor(benchmark): Lint and typefix app.py - Rearranged and cleaned up import statements - Fixed type errors caused by improper use of `psutil` objects - Simplified a number of `os.path` usages by converting to `pathlib` - Use `Task` and `TaskRequestBody` classes from `agent_protocol_client` instead of `.schema` * refactor(benchmark): Replace `.agent_protocol_client` by `agent-protcol-client`, clean up schema.py - Remove `agbenchmark.agent_protocol_client` (an offline copy of `agent-protocol-client`). - Add `agent-protocol-client` as a dependency and change imports to `agent_protocol_client`. - Fix type annotation on `agent_api_interface.py::upload_artifacts` (`ApiClient` -> `AgentApi`). - Remove all unused types from schema.py (= most of them). * refactor(benchmark): Use pathlib in agent_interface.py and agent_api_interface.py * refactor(benchmark): Improve typing, response validation, and readability in app.py - Simplified response generation by leveraging type checking and conversion by FastAPI. - Introduced use of `HTTPException` for error responses. - Improved naming, formatting, and typing in `app.py::create_evaluation`. - Updated the docstring on `app.py::create_agent_task`. - Fixed return type annotations of `create_single_test` and `create_challenge` in generate_test.py. - Added default values to optional attributes on models in report_types_v2.py. - Removed unused imports in `generate_test.py` * refactor(benchmark): Clean up logging and print statements - Introduced use of the `logging` library for unified logging and better readability. - Converted most print statements to use `logger.debug`, `logger.warning`, and `logger.error`. - Improved descriptiveness of log statements. - Removed unnecessary print statements. - Added log statements to unspecific and non-verbose `except` blocks. - Added `--debug` flag, which sets the log level to `DEBUG` and enables a more comprehensive log format. - Added `.utils.logging` module with `configure_logging` function to easily configure the logging library. - Converted raw escape sequences in `.utils.challenge` to use `colorama`. - Renamed `generate_test.py::generate_tests` to `load_challenges`. * refactor(benchmark): Remove unused server.py and agent_interface.py::run_agent - Remove unused server.py file - Remove unused run_agent function from agent_interface.py * refactor(benchmark): Clean up conftest.py - Fix and add type annotations - Rewrite docstrings - Disable or remove unused code - Fix definition of arguments and their types in `pytest_addoption` * refactor(benchmark): Clean up generate_test.py file - Refactored the `create_single_test` function for clarity and readability - Removed unused variables - Made creation of `Challenge` subclasses more straightforward - Made bare `except` more specific - Renamed `Challenge.setup_challenge` method to `run_challenge` - Updated type hints and annotations - Made minor code/readability improvements in `load_challenges` - Added a helper function `_add_challenge_to_module` for attaching a Challenge class to the current module * fix(benchmark): Fix and add type annotations in execute_sub_process.py * refactor(benchmark): Simplify const determination in agent_interface.py - Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS` * fix(benchmark): Register category markers to prevent warnings - Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime. * refactor(benchmark/challenges): Fix indentation in 4_revenue_retrieval_2/data.json * refactor(benchmark): Update agent_api_interface.py - Add type annotations to `copy_agent_artifacts_into_temp_folder` function - Add note about broken endpoint in the `agent_protocol_client` library - Remove unused variable in `run_api_agent` function - Improve readability and resolve linting error * feat(benchmark): Improve and centralize pathfinding - Search path hierarchy for applicable `agbenchmark_config`, rather than assuming it's in the current folder. - Create `agbenchmark.utils.path_manager` with `AGBenchmarkPathManager` and exporting a `PATH_MANAGER` const. - Replace path constants defined in __main__.py with usages of `PATH_MANAGER`. * feat(benchmark/cli): Clean up and improve CLI - Updated commands, options, and their descriptions to be more intuitive and consistent - Moved slow imports into the entrypoints that use them to speed up application startup - Fixed type hints to match output types of Click options - Hid deprecated `agbenchmark start` command - Refactored code to improve readability and maintainability - Moved main entrypoint into `run` subcommand - Fixed `version` and `serve` subcommands - Added `click-default-group` package to allow using `run` implicitly (for backwards compatibility) - Renamed `--no_dep` to `--no-dep` for consistency - Fixed string formatting issues in log statements * refactor(benchmark/config): Move AgentBenchmarkConfig and related functions to config.py - Move the `AgentBenchmarkConfig` class from `utils/data_types.py` to `config.py`. - Extract the `calculate_info_test_path` function from `utils/data_types.py` and move it to `config.py` as a private helper function `_calculate_info_test_path`. - Move `load_agent_benchmark_config()` to `AgentBenchmarkConfig.load()`. - Changed simple getter methods on `AgentBenchmarkConfig` to calculated properties. - Update all code references according to the changes mentioned above. * refactor(benchmark): Fix ReportManager init parameter types and use pathlib - Fix the type annotation of the `benchmark_start_time` parameter in `ReportManager.__init__`, was mistyped as `str` instead of `datetime`. - Change the type of the `filename` parameter in the `ReportManager.__init__` method from `str` to `Path`. - Rename `self.filename` with `self.report_file` in `ReportManager`. - Change the way the report file is created, opened and saved to use the `Path` object. * refactor(benchmark): Improve typing surrounding ChallengeData and clean up its implementation - Use `ChallengeData` objects instead of untyped `dict` in app.py, generate_test.py, reports.py. - Remove unnecessary methods `serialize`, `get_data`, `get_json_from_path`, `deserialize` from `ChallengeData` class. - Remove unused methods `challenge_from_datum` and `challenge_from_test_data` from `ChallengeData class. - Update function signatures and annotations of `create_challenge` and `generate_single_test` functions in generate_test.py. - Add types to function signatures of `generate_single_call_report` and `finalize_reports` in reports.py. - Remove unnecessary `challenge_data` parameter (in generate_test.py) and fixture (in conftest.py). * refactor(benchmark): Clean up generate_test.py, conftest.py and __main__.py - Cleaned up generate_test.py and conftest.py - Consolidated challenge creation logic in the `Challenge` class itself, most notably the new `Challenge.from_challenge_spec` method. - Moved challenge selection logic from generate_test.py to the `pytest_collection_modifyitems` hook in conftest.py. - Converted methods in the `Challenge` class to class methods where appropriate. - Improved argument handling in the `run_benchmark` function in `__main__.py`. * refactor(benchmark/config): Merge AGBenchmarkPathManager into AgentBenchmarkConfig and reduce fragmented/global state - Merge the functionality of `AGBenchmarkPathManager` into `AgentBenchmarkConfig` to consolidate the configuration management. - Remove the `.path_manager` module containing `AGBenchmarkPathManager`. - Pass the `AgentBenchmarkConfig` and its attributes through function arguments to reduce global state and improve code clarity. * feat(benchmark/serve): Configurable port for `serve` subcommand - Added `--port` option to `serve` subcommand to allow for specifying the port to run the API on. - If no `--port` option is provided, the port will default to the value specified in the `PORT` environment variable, or 8080 if not set. * feat(benchmark/cli): Add `config` subcommand - Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config. * fix(benchmark): Gracefully handle incompatible challenge spec files in app.py - Added a check to skip deprecated challenges - Added logging to allow debugging of the loading process - Added handling of validation errors when parsing challenge spec files - Added missing `spec_file` attribute to `ChallengeData` * refactor(benchmark): Move `run_benchmark` entrypoint to main.py, use it in `/reports` endpoint - Move `run_benchmark` and `validate_args` from __main__.py to main.py - Replace agbenchmark subprocess in `app.py:run_single_test` with `run_benchmark` - Move `get_unique_categories` from __main__.py to challenges/__init__.py - Move `OPTIONAL_CATEGORIES` from __main__.py to challenge.py - Reduce operations on updates.json (including `initialize_updates_file`) outside of API * refactor(benchmark): Remove unused `/updates` endpoint and all related code - Remove `updates_json_file` attribute from `AgentBenchmarkConfig` - Remove `get_updates` and `_initialize_updates_file` in app.py - Remove `append_updates_file` and `create_update_json` functions in agent_api_interface.py - Remove call to `append_updates_file` in challenge.py * refactor(benchmark/config): Clean up and update docstrings on `AgentBenchmarkConfig` - Add and update docstrings - Change base class from `BaseModel` to `BaseSettings`, allow extras for backwards compatibility - Make naming of path attributes on `AgentBenchmarkConfig` more consistent - Remove unused `agent_home_directory` attribute - Remove unused `workspace` attribute * fix(benchmark): Restore mechanism to select (optional) categories in agent benchmark config * fix(benchmark): Update agent-protocol-client to v1.1.0 - Fixes issue with fetching task artifact listings
2023-11-21Clean up & fix GitHub workflows (#6313)Gravatar Reinier van der Leer 5-8/+10
* ci: Mitigate security issues in autogpt-ci.yml - Remove unnecessary pull_request_target paths and related variables and config - Set permissions for contents to read only * ci: Simplify steps in autogpt-ci.yml workflow using GitHub CLI - Simplify step in 'autogpt-ci.yml' by using GitHub CLI instead of API for adding label and comment functionality - Replace curl command with 'gh issue edit' to add "behaviour change" label to the pull request - Replace gh api command with 'gh issue comment' to leave a comment about the changed behavior of AutoGPT in the pull request * ci: Fix issues in workflows - Move environment variable definition to top level in benchmark-ci.yml (because the other job also needs it) - Removed invalid 'branches: [hackathon]' restriction in hackathon.yml workflow - Removed redundant 'ref' and 'repository' fields in the 'checkout' step of both workflows. * ci: Delete legacy benchmarks.yml workflow * ci: Add triggers for CI workflows - Add triggers to run CI workflows when they are edited. - Update the paths for the CI workflows in the trigger configuration. * fix: Fix benchmark lint error - Removed unnecessary blank lines in report_types.py - Fixed string quotes in challenge.py to maintain consistency * fix: Update task description in password generator data.json - Update task description in `data.json` file for the password generator challenge to clarify the input requirements and error handling. - This change is made in an attempt to make the Benchmark CI pass. * fix: Fix PasswordGenerator challenge in CI - Fix the behavior of the reference password_generator.py to align with the task description - Use default password length 8 instead of a random length in the generate_password function - Retrieve the password length from the command line arguments if "--length" is provided, else set it to 8
2023-11-09fix: Fixing BenchmarkingGravatar SwiftyOS 2-0/+6
- Importing missing metadata field in Test class in report_types.py - Adding GAIA categories 1, 2, and 3 in data_types.py
2023-10-20reverting new challengesGravatar Silen Naihin 6-107/+0
2023-10-20case sensitivity, updating challengesGravatar Silen Naihin 7-1/+13
2023-10-20fix capitalization, renameGravatar Silen Naihin 2-2/+1
2023-10-19fix data challengesGravatar Silen Naihin 2-2/+2
2023-10-19scrape synthesize challenge additionsGravatar Silen Naihin 10-5/+133
2023-10-17fixing password gen and revenue retrieval 2 challengesGravatar Silen Naihin 2-16/+17
2023-10-17Fix subproject dependency compatibilityGravatar Reinier van der Leer 2-11/+31
2023-10-14Update data.jsonGravatar Silen Naihin 1-1/+1
2023-10-13Update test.py (#5721)Gravatar merwanehamadi 1-6/+2
2023-10-09fix label csv (#5656)Gravatar merwanehamadi 1-12/+12
2023-10-06Fix password generator (#5581)Gravatar merwanehamadi 1-6/+2
2023-10-06Fix agbenchmark client (#5578)Gravatar merwanehamadi 1-12/+1
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-10-05Fix challenges (#5561)Gravatar merwanehamadi 2-5/+5
Fix challenges and CI Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-10-03Fix custom_python not being copied (#5512)Gravatar merwanehamadi 2-6/+12
2023-10-02Correct create_game method definition in the challenge input (#5460)Gravatar Albert Örwall 1-1/+1
Co-authored-by: merwanehamadi <merwanehamadi@gmail.com>
2023-10-02Fix benchmark ci (#5478)Gravatar merwanehamadi 2-3/+4
Fix benchmark CI Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-10-02add load_dotenv (#5474)Gravatar merwanehamadi 1-0/+2
2023-10-02Correct revenue retrieval challenge (#5471)Gravatar merwanehamadi 2-2/+2