aboutsummaryrefslogtreecommitdiff
AgeCommit message (Expand)AuthorFilesLines
2024-02-21Update frontend build based on commit e44ca4185a0d56273b7e2cd68fc5ca42a4f9b73efrontend-build/masterGravatar Pwuts 3-11046/+11057
2024-02-21fix(frontend): Unbreak `ChatInputField`Gravatar Reinier van der Leer 1-2/+1
2024-02-21fix(ci/frontend): Add trigger on `push` including workflow fileGravatar Reinier van der Leer 1-0/+1
2024-02-21fix(ci/frontend): Add and fix trigger on workflow fileGravatar Reinier van der Leer 1-1/+1
2024-02-21ci: Revise Frontend CIGravatar Reinier van der Leer 2-46/+59
2024-02-20chore(agent/llm): Update model alias `gpt-3.5-turbo` -> `gpt-3.5-turbo-0125`Gravatar Reinier van der Leer 1-1/+2
2024-02-20fix(ci/benchmark): Install benchmark dependenciesGravatar Reinier van der Leer 1-0/+2
2024-02-20fix(benchmark/reports): Make format.py executableGravatar Reinier van der Leer 1-0/+2
2024-02-20fix(agent/browser): Print descriptive error if ChromeDriver install failsGravatar Reinier van der Leer 1-7/+14
2024-02-20fix(agent/llm): Include `id` in tool_calls in promptGravatar Reinier van der Leer 2-2/+4
2024-02-20fix(autogpt/llm): Omit `AssistantChatMessage.tool_calls` if no tool calls are...Gravatar Reinier van der Leer 2-3/+6
2024-02-20fix(ci/benchmark): Specify poetry env path for report conversion stepGravatar Reinier van der Leer 1-1/+1
2024-02-20fix(benchmark/challenges): Improve spec and eval of TicTacToe challengeGravatar Albert Örwall 2-2/+2
2024-02-20fix(agent/setup): Fix revising constraints and best practices (#6777)Gravatar Thunder Drag 1-3/+19
2024-02-20feat(frontend): Allow sending a message with the enter key (#6378)Gravatar Ethan Presberg 1-0/+6
2024-02-20fix(ci/benchmark): Unbreak "Push reports to data branch" stepGravatar Reinier van der Leer 1-1/+2
2024-02-19feat(ci/benchmark): Generate step summary from benchmark reportGravatar Reinier van der Leer 1-0/+12
2024-02-19feat(benchmark): Add reports/format.py script to convert report.json to markdownGravatar Reinier van der Leer 1-0/+136
2024-02-19chore: Update `agbenchmark` dependency for agent and forgeGravatar Reinier van der Leer 2-2/+2
2024-02-19feat(benchmark): Include Steps in ReportGravatar Reinier van der Leer 4-1/+16
2024-02-18chore: Update `agbenchmark` dependency for agent and forgeGravatar Reinier van der Leer 2-2/+2
2024-02-18debug(benchmark): Improve `TestResult` validation error output formatGravatar Reinier van der Leer 1-5/+8
2024-02-17fix(ci/benchmark): Mitigate VCS conflicts with files in data branchGravatar Reinier van der Leer 1-0/+3
2024-02-17fix(ci/benchmark): Add `set +e` because we expect (some) challenges to failGravatar Reinier van der Leer 1-0/+2
2024-02-17chore: Update `agbenchmark` dependency for agent and forgeGravatar Reinier van der Leer 2-2/+2
2024-02-17debug(benchmark): Add more debug code to pinpoint cause of rare crashGravatar Reinier van der Leer 2-15/+23
2024-02-17ci: Allow telemetry for non-push events, as long as it's on `master`Gravatar Reinier van der Leer 5-8/+3
2024-02-17ci: Fix setting/passing `TELEMETRY_*` environment variablesGravatar Reinier van der Leer 4-17/+11
2024-02-17chore: Update `agbenchmark` dependency for agent and forgeGravatar Reinier van der Leer 2-3/+3
2024-02-17ci: Update actions to newest versionsGravatar Reinier van der Leer 10-36/+38
2024-02-17debug(benchmark): Make sure `TestResult` validator error output is sufficient...Gravatar Reinier van der Leer 1-1/+1
2024-02-17debug(benchmark): Add log statement to validator on `TestResult`Gravatar Reinier van der Leer 1-0/+8
2024-02-17fix(ci/benchmark): Allow workflow to continue regardless of challenge outcomesGravatar Reinier van der Leer 1-0/+7
2024-02-16chore: Update agbenchmark dependency for agent and forgeGravatar Reinier van der Leer 2-2/+2
2024-02-16fix(benchmark): Fix `TestResult.fail_reason` assignment conditionGravatar Reinier van der Leer 1-1/+1
2024-02-16chore: Update `agbenchmark` dependency for agent and forgeGravatar Reinier van der Leer 2-2/+2
2024-02-16fix(benchmark): Unbreak `-N`/`--attempts` optionGravatar Reinier van der Leer 3-4/+4
2024-02-16Rename autogpts-benchmark-nightly.yml to autogpts-benchmark.ymlGravatar Reinier van der Leer 1-0/+0
2024-02-16feat(agent/serve): Report task cost through `Step.additional_output`Gravatar Reinier van der Leer 3-10/+21
2024-02-16feat(benchmark): Get agent task cost from `Step.additional_output`Gravatar Reinier van der Leer 3-0/+18
2024-02-16feat(benchmark/report): Add and record `TestResult.n_steps`Gravatar Reinier van der Leer 4-0/+9
2024-02-16ci(benchmark): Add nightly benchmark workflowGravatar Reinier van der Leer 1-0/+71
2024-02-16lint(benchmark): Remove unnecessary `pass` statement in __main__.pyGravatar Reinier van der Leer 1-1/+0
2024-02-16chore: Update `agbenchmark` dependency for agent and forgeGravatar Reinier van der Leer 2-2/+32
2024-02-16fix(benchmark): Include `WebArenaSiteInfo.additional_info` (e.g. credentials)...Gravatar Reinier van der Leer 1-7/+19
2024-02-16feat(benchmark/cli): Add `challenge list`, `challenge info` subcommandsGravatar Reinier van der Leer 5-6/+219
2024-02-16refactor(benchmark): `load_webarena_challenges`Gravatar Reinier van der Leer 2-22/+43
2024-02-15chore: Update `agbenchmark` dependency for agent and forgeGravatar Reinier van der Leer 2-3/+3
2024-02-15feat(benchmark): Make report output folder configurableGravatar Reinier van der Leer 6-9/+16
2024-02-15feat(agent/telemetry): Distinguish between `production` and `dev` environment...Gravatar Reinier van der Leer 2-2/+55