index
:
Auto-GPT.git
add-codex-ability
benchmark/concurrency
break-linting
bringing-in-the-benchmark
data/benchmark-reports
fixing-linting
forge/fixes
frontend-build/master
github-repo-stats
groq
master
python-coverage-comment-action-data
reinier/open-765-do-not-run-profile-generator-on-agent-creation
release-autogpt-v0.5.x
remove-git-from-cli
security/analysis-workflows-sandbox
self-feedback-rough-example
summary_memory
swiftyos/remove-depreciated-benchmarks
zamilmajdy/code-validation
An experimental open-source attempt to make GPT-4 fully autonomous.
Torantulino
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
benchmark
/
agbenchmark
Age
Commit message (
Expand
)
Author
Files
Lines
2024-02-20
fix(benchmark/challenges): Improve spec and eval of TicTacToe challenge
Albert Örwall
2
-2
/
+2
2024-02-19
feat(benchmark): Include Steps in Report
Reinier van der Leer
4
-1
/
+16
2024-02-18
debug(benchmark): Improve `TestResult` validation error output format
Reinier van der Leer
1
-5
/
+8
2024-02-17
debug(benchmark): Add more debug code to pinpoint cause of rare crash
Reinier van der Leer
2
-15
/
+23
2024-02-17
debug(benchmark): Make sure `TestResult` validator error output is sufficient...
Reinier van der Leer
1
-1
/
+1
2024-02-17
debug(benchmark): Add log statement to validator on `TestResult`
Reinier van der Leer
1
-0
/
+8
2024-02-16
fix(benchmark): Fix `TestResult.fail_reason` assignment condition
Reinier van der Leer
1
-1
/
+1
2024-02-16
fix(benchmark): Unbreak `-N`/`--attempts` option
Reinier van der Leer
3
-4
/
+4
2024-02-16
feat(benchmark): Get agent task cost from `Step.additional_output`
Reinier van der Leer
3
-0
/
+18
2024-02-16
feat(benchmark/report): Add and record `TestResult.n_steps`
Reinier van der Leer
4
-0
/
+9
2024-02-16
lint(benchmark): Remove unnecessary `pass` statement in __main__.py
Reinier van der Leer
1
-1
/
+0
2024-02-16
fix(benchmark): Include `WebArenaSiteInfo.additional_info` (e.g. credentials)...
Reinier van der Leer
1
-7
/
+19
2024-02-16
feat(benchmark/cli): Add `challenge list`, `challenge info` subcommands
Reinier van der Leer
3
-5
/
+203
2024-02-16
refactor(benchmark): `load_webarena_challenges`
Reinier van der Leer
2
-22
/
+43
2024-02-15
feat(benchmark): Make report output folder configurable
Reinier van der Leer
4
-7
/
+14
2024-02-14
lint(benchmark): Remove unused imports
Reinier van der Leer
2
-2
/
+1
2024-02-14
fix(benchmark): Mock mode, python evals, `--attempts` flag, challenge definit...
Reinier van der Leer
6
-44
/
+63
2024-01-22
feat(benchmark): Add `-N`, `--attempts` option for multiple attempts per chal...
Reinier van der Leer
12
-137
/
+177
2024-01-19
feat(benchmark): JungleGym WebArena (#6691)
Reinier van der Leer
4
-1
/
+1005
2024-01-19
fix(benchmark/report): Fix and clean up logic in `update_challenges_already_b...
Reinier van der Leer
1
-8
/
+5
2024-01-19
fix(benchmark): Fix challenge input artifact upload
Reinier van der Leer
1
-1
/
+3
2024-01-18
refactor(benchmark): Interface & type consoledation, and arch change, to allo...
Reinier van der Leer
16
-814
/
+923
2024-01-16
chore(benchmark): Upgrade OpenAI client lib from v0 to v1
Reinier van der Leer
1
-4
/
+4
2024-01-16
refactor(benchmark): Disable Helicone integrations
Reinier van der Leer
3
-21
/
+22
2024-01-02
AGBenchmark codebase clean-up (#6650)
Reinier van der Leer
43
-6682
/
+1274
2023-11-21
Clean up & fix GitHub workflows (#6313)
Reinier van der Leer
5
-8
/
+10
2023-11-09
fix: Fixing Benchmarking
SwiftyOS
2
-0
/
+6
2023-10-20
reverting new challenges
Silen Naihin
6
-107
/
+0
2023-10-20
case sensitivity, updating challenges
Silen Naihin
7
-1
/
+13
2023-10-20
fix capitalization, rename
Silen Naihin
2
-2
/
+1
2023-10-19
fix data challenges
Silen Naihin
2
-2
/
+2
2023-10-19
scrape synthesize challenge additions
Silen Naihin
10
-5
/
+133
2023-10-17
fixing password gen and revenue retrieval 2 challenges
Silen Naihin
2
-16
/
+17
2023-10-14
Update data.json
Silen Naihin
1
-1
/
+1
2023-10-13
Update test.py (#5721)
merwanehamadi
1
-6
/
+2
2023-10-09
fix label csv (#5656)
merwanehamadi
1
-12
/
+12
2023-10-06
Fix password generator (#5581)
merwanehamadi
1
-6
/
+2
2023-10-06
Fix agbenchmark client (#5578)
merwanehamadi
1
-12
/
+1
2023-10-05
Fix challenges (#5561)
merwanehamadi
1
-1
/
+1
2023-10-03
Fix custom_python not being copied (#5512)
merwanehamadi
2
-6
/
+12
2023-10-02
Correct create_game method definition in the challenge input (#5460)
Albert Örwall
1
-1
/
+1
2023-10-02
Fix benchmark ci (#5478)
merwanehamadi
1
-1
/
+2
2023-10-02
add load_dotenv (#5474)
merwanehamadi
1
-0
/
+2
2023-10-02
Correct revenue retrieval challenge (#5471)
merwanehamadi
2
-2
/
+2
2023-10-01
Correct Battleship Challenge (#5450)
merwanehamadi
2
-2
/
+2
2023-09-28
Add more data challenges (#5390)
merwanehamadi
77
-208
/
+1097
2023-09-28
Fix pathing issues
SwiftyOS
1
-4
/
+17
2023-09-28
fix artifact bug
SwiftyOS
1
-2
/
+5
2023-09-28
Fixed CORS and proxy timeout issues
SwiftyOS
1
-1
/
+3
2023-09-27
Add more challenges + cleanup (#5368)
merwanehamadi
35
-39
/
+132
[next]