aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/builtin-record.c
AgeCommit message (Collapse)AuthorFilesLines
2023-02-15perf record: Fix segfault with --overwrite and --max-sizeGravatar Yang Jihong 1-10/+6
When --overwrite and --max-size options of perf record are used together, a segmentation fault occurs. The following is an example: # perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1 [ perf record: Woken up 1 times to write data ] perf: Segmentation fault Obtained 12 stack frames. ./perf/perf(+0x197673) [0x55f99710b673] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f] ./perf/perf(+0x8eb40) [0x55f997002b40] ./perf/perf(+0x1f6882) [0x55f99716a882] ./perf/perf(+0x794c2) [0x55f996fed4c2] ./perf/perf(+0x7b7c7) [0x55f996fef7c7] ./perf/perf(+0x9074b) [0x55f99700474b] ./perf/perf(+0x12e23c) [0x55f9970a223c] ./perf/perf(+0x12e54a) [0x55f9970a254a] ./perf/perf(+0x7db60) [0x55f996ff1b60] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86] ./perf/perf(+0x7dfe9) [0x55f996ff1fe9] Segmentation fault (core dumped) backtrace of the core file is as follows: (gdb) bt #0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234 #1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242 #2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263 #3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618 #4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658, from=from@entry=0) at util/synthetic-events.c:1895 #5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658) at util/synthetic-events.c:1905 #6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997 #7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802 #8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258 #9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330 #10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384 #11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428 #12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562 The reason is that record__bytes_written accesses the freed memory rec->thread_data, The process is as follows: __cmd_record -> record__free_thread_data -> zfree(&rec->thread_data) // free rec->thread_data -> record__synthesize -> perf_event__synthesize_id_index -> process_synthesized_event -> record__write -> record__bytes_written // access rec->thread_data We add a member variable "thread_bytes_written" in the struct "record" to save the data size written by the threads. Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiwei Sun <jiwei.sun@windriver.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20perf tools: Fix usage of the verbose variableGravatar Yang Jihong 1-2/+2
The data type of the verbose variable is integer and can be negative, replace improperly used cases in a unified manner: 1. if (verbose) => if (verbose > 0) 2. if (!verbose) => if (verbose <= 0) 3. if (XX && verbose) => if (XX && verbose > 0) 4. if (XX && !verbose) => if (XX && verbose <= 0) Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20221220035702.188413-3-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-14perf evlist: Remove group option.Gravatar Ian Rogers 1-2/+0
The group option predates grouping events using curly braces added in commit 89efb029502d7f2d ("perf tools: Add support to parse event group syntax"). The --group option was retained for legacy support (in August 2012) but keeping it adds complexity. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Shaomin Deng <dengshaomin@cdjrlc.com> Cc: Stephane Eranian <eranian@google.com> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221213232651.1269909-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-14perf build: Use libtraceevent from the systemGravatar Ian Rogers 1-0/+2
Remove the LIBTRACEEVENT_DYNAMIC and LIBTRACEFS_DYNAMIC make command line variables. If libtraceevent isn't installed or NO_LIBTRACEEVENT=1 is passed to the build, don't compile in libtraceevent and libtracefs support. This also disables CONFIG_TRACE that controls "perf trace". CONFIG_LIBTRACEEVENT is used to control enablement in Build/Makefiles, HAVE_LIBTRACEEVENT is used in C code. Without HAVE_LIBTRACEEVENT tracepoints are disabled and as such the commands kmem, kwork, lock, sched and timechart are removed. The majority of commands continue to work including "perf test". Committer notes: Fixed up a tools/perf/util/Build reject and added: #include <traceevent/event-parse.h> to tools/perf/util/scripting-engines/trace-event-perl.c. Committer testing: $ rpm -qi libtraceevent-devel Name : libtraceevent-devel Version : 1.5.3 Release : 2.fc36 Architecture: x86_64 Install Date: Mon 25 Jul 2022 03:20:19 PM -03 Group : Unspecified Size : 27728 License : LGPLv2+ and GPLv2+ Signature : RSA/SHA256, Fri 15 Apr 2022 02:11:58 PM -03, Key ID 999f7cbf38ab71f4 Source RPM : libtraceevent-1.5.3-2.fc36.src.rpm Build Date : Fri 15 Apr 2022 10:57:01 AM -03 Build Host : buildvm-x86-05.iad2.fedoraproject.org Packager : Fedora Project Vendor : Fedora Project URL : https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/ Bug URL : https://bugz.fedoraproject.org/libtraceevent Summary : Development headers of libtraceevent Description : Development headers of libtraceevent-libs $ Default build: $ ldd ~/bin/perf | grep tracee libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007f1dcaf8f000) $ # perf trace -e sched:* --max-events 10 0.000 migration/0/17 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, dest_cpu: 1) 0.005 migration/0/17 sched:sched_wake_idle_without_ipi(cpu: 1) 0.011 migration/0/17 sched:sched_switch(prev_comm: "", prev_pid: 17 (migration/0), prev_state: 1, next_comm: "", next_prio: 120) 1.173 :0/0 sched:sched_wakeup(comm: "", pid: 3138 (gnome-terminal-), prio: 120) 1.180 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 3138 (gnome-terminal-), next_prio: 120) 0.156 migration/1/21 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, orig_cpu: 1, dest_cpu: 2) 0.160 migration/1/21 sched:sched_wake_idle_without_ipi(cpu: 2) 0.166 migration/1/21 sched:sched_switch(prev_comm: "", prev_pid: 21 (migration/1), prev_state: 1, next_comm: "", next_prio: 120) 1.183 :0/0 sched:sched_wakeup(comm: "", pid: 1602985 (kworker/u16:0-f), prio: 120, target_cpu: 1) 1.186 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 1602985 (kworker/u16:0-f), next_prio: 120) # Had to tweak tools/perf/util/setup.py to make sure the python binding shared object links with libtraceevent if -DHAVE_LIBTRACEEVENT is present in CFLAGS. Building with NO_LIBTRACEEVENT=1 uncovered some more build failures: - Make building of data-convert-bt.c to CONFIG_LIBTRACEEVENT=y - perf-$(CONFIG_LIBTRACEEVENT) += scripts/ - bpf_kwork.o needs also to be dependent on CONFIG_LIBTRACEEVENT=y - The python binding needed some fixups and util/trace-event.c can't be built and linked with the python binding shared object, so remove it in tools/perf/util/setup.py and exclude it from the list of dependencies in the python/perf.so Makefile.perf target. Building without libtraceevent-devel installed uncovered more build failures: - The python binding tools/perf/util/python.c was assuming that traceevent/parse-events.h was always available, which was the case when we defaulted to using the in-kernel tools/lib/traceevent/ files, now we need to enclose it under ifdef HAVE_LIBTRACEEVENT, just like the other parts of it that deal with tracepoints. - We have to ifdef the rules in the Build files with CONFIG_LIBTRACEEVENT=y to build builtin-trace.c and tools/perf/trace/beauty/ as we only ifdef setting CONFIG_TRACE=y when setting NO_LIBTRACEEVENT=1 in the make command line, not when we don't detect libtraceevent-devel installed in the system. Simplification here to avoid these two ways of disabling builtin-trace.c and not having CONFIG_TRACE=y when libtraceevent-devel isn't installed is the clean way. From Athira: <quote> tools/perf/arch/powerpc/util/Build -perf-y += kvm-stat.o +perf-$(CONFIG_LIBTRACEEVENT) += kvm-stat.o </quote> Then, ditto for arm64 and s390, detected by container cross build tests. - s/390 uses test__checkevent_tracepoint() that is now only available if HAVE_LIBTRACEEVENT is defined, enclose the callsite with ifder HAVE_LIBTRACEEVENT. Also from Athira: <quote> With this change, I could successfully compile in these environment: - Without libtraceevent-devel installed - With libtraceevent-devel installed - With “make NO_LIBTRACEEVENT=1” </quote> Then, finally rename CONFIG_TRACEEVENT to CONFIG_LIBTRACEEVENT for consistency with other libraries detected in tools/perf/. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20221205225940.3079667-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-05perf tools: Use dedicated non-atomic clear/set bit helpersGravatar Sean Christopherson 1-3/+3
Use the dedicated non-atomic helpers for {clear,set}_bit() and their test variants, i.e. the double-underscore versions. Depsite being defined in atomic.h, and despite the kernel versions being atomic in the kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move to the double-underscore versions so that the versions that are expected to be atomic (for kernel developers) can be made atomic without affecting users that don't want atomic operations. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: James Morse <james.morse@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: alexandru elisei <alexandru.elisei@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Cc: kvmarm@lists.linux.dev Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20221119013450.2643007-6-seanjc@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf record: Use sig_atomic_t for signal handlersGravatar Ian Rogers 1-4/+4
This removes undefined behavior as described in: https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers Suggested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024181913.630986-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf tools: Make quiet mode consistent between toolsGravatar James Clark 1-1/+1
Use the global quiet variable everywhere so that all tools hide warnings in quiet mode and update the documentation to reflect this. 'perf probe' claimed that errors are not printed in quiet mode but I don't see this so remove it from the docs. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf record: Fix event fd racesGravatar Ian Rogers 1-16/+25
The write call may set errno which is problematic if occurring in a function also setting errno. Save and restore errno around the write call. done_fd may be used after close, clear it as part of the close and check its validity in the signal handler. Suggested-by: <gthelen@google.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anand K Mistry <amistry@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024011024.462518-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf tools: Add debug messages and comments for testingGravatar Adrian Hunter 1-0/+8
Add debug messages to enable scripts to track aspects of 'perf record' behaviour. The messages will be consumed after 'perf record' has run, with the exception of "perf record has started" which is consequently flushed. Put comments so developers know which messages are also being used by test scripts. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220912083412.7058-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf record: Fix a segfault in record__read_lost_samples()Gravatar Namhyung Kim 1-0/+6
When it fails to open events record__open() returns without setting the session->evlist. Then it gets a segfault in the function trying to read lost sample counts. You can easily reproduce it as a normal user like: $ perf record -p 1 true ... perf: Segmentation fault ... Skip the function if it has no evlist. And add more protection for evsels which are not properly initialized. Fixes: a49aa8a54e861af1 ("perf record: Read and inject LOST_SAMPLES events") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220909235024.278281-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf record: Read and inject LOST_SAMPLES eventsGravatar Namhyung Kim 1-0/+64
When there are lost samples, it can read the number of PERF_FORMAT_LOST and convert it to PERF_RECORD_LOST_SAMPLES and write to the data file at the end. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901195739.668604-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf record: Update use of pthread mutexGravatar Ian Rogers 1-4/+9
Switch to the use of mutex wrappers that provide better error checking for synth_lock. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Colin Ian King <colin.king@intel.com> Cc: Dario Petrillo <dario.pk1@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Fangrui Song <maskray@google.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pavithra Gurushankar <gpavithrasha@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Weiguo Li <liwg06@foxmail.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: William Cohen <wcohen@redhat.com> Cc: Zechuan Chen <chenzechuan1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Cc: yaowenbin <yaowenbin1@huawei.com> Link: https://lore.kernel.org/r/20220826164242.43412-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf record: Allow multiple recording time rangesGravatar Adrian Hunter 1-2/+22
AUX area traces can produce too much data to record successfully or analyze subsequently. Add another means to reduce data collection by allowing multiple recording time ranges. This is useful, for instance, in cases where a workload produces predictably reproducible events in specific time ranges. Today we only have perf record -D <msecs> to start at a specific region, or some complicated approach using snapshot mode and external scripts sending signals or using the fifos. But these approaches are difficult to set up compared with simply having perf do it. Extend perf record option -D/--delay option to specifying relative time stamps for start stop controlled by perf with the right time offset, for instance: perf record -e intel_pt// -D 10-20,30-40 to record 10ms to 20ms into the trace and 30ms to 40ms. Example: The example workload is: $ cat repeat-usleep.c int usleep(useconds_t usec); int usage(int ret, const char *msg) { if (msg) fprintf(stderr, "%s\n", msg); fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n"); return ret; } int main(int argc, char *argv[]) { unsigned long usecs; char *end_ptr; if (argc != 2) return usage(1, "Error: Wrong number of arguments!"); errno = 0; usecs = strtoul(argv[1], &end_ptr, 0); if (errno || *end_ptr || usecs > UINT_MAX) return usage(1, "Error: Invalid argument!"); while (1) { int ret = usleep(usecs); if (ret & errno != EINTR) return usage(1, "Error: usleep() failed!"); } return 0; } $ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500 Events disabled Events enabled Events disabled Events enabled Events disabled Events enabled Events disabled [ perf record: Woken up 5 times to write data ] [ perf record: Captured and wrote 0.204 MB perf.data ] Terminated A dlfilter is used to determine continuous data collection (timestamps less than 1ms apart): $ cat dlfilter-show-delays.c static __u64 start_time; static __u64 last_time; int start(void **data, void *ctx) { printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay"); return 0; } int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx) { __u64 delta; if (!sample->time) return 1; if (!last_time) goto out; delta = sample->time - last_time; if (delta < 1000000) goto out2;; printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0); out: start_time = sample->time; out2: last_time = sample->time; return 1; } int stop(void *data, void *ctx) { printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0); return 0; } The result shows the times roughly match the --delay option: $ perf script --itrace=qb --dlfilter dlfilter-show-delays.so Time Duration Delay 39215.302317300 9.7 20.5 39215.332480217 30.4 40.9 39215.403837717 49.8 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220824072814.16422-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_eventGravatar Adrian Hunter 1-14/+1
Patch "perf record: Fix way of handling non-perf-event pollfds" added a generic way to handle non-perf-event file descriptors like evlist->ctl_fd. Use it instead of handling evlist->ctl_fd separately. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220824072814.16422-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf record: Fix way of handling non-perf-event pollfdsGravatar Adrian Hunter 1-0/+80
perf record __cmd_record() does not poll evlist pollfds. Instead it polls thread_data[0].pollfd. That happens whether or not threads are being used. perf record duplicates evlist mmap pollfds as needed for separate threads. The non-perf-event represented by evlist->ctl_fd has to handled separately, which is done explicitly, duplicating it into the thread_data[0] pollfds. That approach neglects any other non-perf-event file descriptors. Currently there is also done_fd which needs the same handling. Add a new generalized approach. Add fdarray_flag__non_perf_event to identify the file descriptors that need the special handling. For those cases, also keep a mapping of the evlist pollfd index and thread pollfd index, so that the evlist revents can be updated. Although this patch adds the new handling, it does not take it into use. There is no functional change, but it is the precursor to a fix, so is marked as a fix. Fixes: 415ccb58f68a6beb ("perf record: Introduce thread specific data array") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220824072814.16422-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-21perf record: Fix cpu mask bit setting for mixed mmapsGravatar Adrian Hunter 1-0/+2
With mixed per-thread and (system-wide) per-cpu maps, the "any cpu" value -1 must be skipped when setting CPU mask bits. Prior to commit cbd7bfc7fd99acdd ("tools/perf: Fix out of bound access to cpu mask array") the invalid setting went unnoticed, but since then it causes perf record to fail with an error. Example: Before: $ perf record -e intel_pt// --per-thread uname Failed to initialize parallel data streaming masks After: $ perf record -e intel_pt// --per-thread uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.068 MB perf.data ] Fixes: ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220915122612.81738-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-08perf record: Fix synthesis failure warningsGravatar Adrian Hunter 1-2/+6
Some calls to synthesis functions set err < 0 but only warn about the failure and continue. However they do not set err back to zero, relying on subsequent code to do that. That changed with the introduction of option --synth. When --synth=no subsequent functions that set err back to zero are not called. Fix by setting err = 0 in those cases. Example: Before: $ perf record --no-bpf-event --synth=all -o /tmp/huh uname Couldn't synthesize bpf events. Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ] $ perf record --no-bpf-event --synth=no -o /tmp/huh uname Couldn't synthesize bpf events. After: $ perf record --no-bpf-event --synth=no -o /tmp/huh uname Couldn't synthesize bpf events. Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ] Fixes: 41b740b6e8a994e5 ("perf record: Add --synth option") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220907162458.72817-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-06tools/perf: Fix out of bound access to cpu mask arrayGravatar Athira Rajeev 1-6/+20
The cpu mask init code in "record__mmap_cpu_mask_init" function access "bits" array part of "struct mmap_cpu_mask". The size of this array is the value from cpu__max_cpu().cpu. This array is used to contain the cpumask value for each cpu. While setting bit for each cpu, it calls "set_bit" function which access index in "bits" array. If we provide a command line option to -C which is greater than the number of CPU's present in the system, the set_bit could access an array member which is out-of the array size. This is because currently, there is no boundary check for the CPU. This will result in seg fault: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Segmentation fault (core dumped) <<>> Debugging with gdb, points to function flow as below: <<>> set_bit record__mmap_cpu_mask_init record__init_thread_default_masks record__init_thread_masks cmd_record <<>> Fix this by adding boundary check for the array. After the patch: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Failed to initialize parallel data streaming masks <<>> With this fix, if -C is given a non-exsiting CPU, perf record will fail with: <<>> ./perf record -C 50 ls Failed to initialize parallel data streaming masks <<>> Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220905141929.7171-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-12perf record: Improve error message of -p not_existing_pidGravatar Martin Liška 1-2/+9
When one uses -p $not_existing_pid, the output of --help is printed: $ perf record -p 123456789 2>&1 | head -n3 Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] Let's change it something similar what perf top -p $not_existing_pid prints: $ ./perf top -p 123456789 --stdio Error: Couldn't create thread/CPU maps: No such process Newly suggested error message: $ ./perf record -p 123456789 Couldn't create thread/CPU maps: No such process Signed-off-by: Martin Liška <mliska@suse.cz> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/8e00eda1-4de0-2c44-ce67-d4df48ac1f7c@suse.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-23perf record: Add finished init eventGravatar Adrian Hunter 1-0/+27
In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. This is needed to enable injecting events after the initial synthesized user events (that have an all zero id sample) but before regular events. Committer notes: Add entry about PERF_RECORD_FINISHED_INIT to tools/perf/Documentation/perf.data-file-format.txt. Committer testing: Before: # perf report -D | grep FINISHED 0 0x5910 [0x8]: PERF_RECORD_FINISHED_ROUND FINISHED_ROUND events: 1 ( 0.5%) # After: # perf record -- sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf report -D | grep FINISHED 0 0x5068 [0x8]: PERF_RECORD_FINISHED_INIT: unhandled! 0 0x5390 [0x8]: PERF_RECORD_FINISHED_ROUND FINISHED_ROUND events: 1 ( 0.5%) FINISHED_INIT events: 1 ( 0.5%) # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220610113316.6682-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-23perf record: Add new option to sample identifierGravatar Adrian Hunter 1-0/+2
In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. Add an option to always include sample type PERF_SAMPLE_IDENTIFIER. Committer testing: # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf evlist -v cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # # # perf record --sample-identifier sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.022 MB perf.data (7 samples) ] # perf evlist -v cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220615052511.4441-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-23perf record: Always record id indexGravatar Adrian Hunter 1-7/+5
In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. Adjust the logic so that if there are IDs then the id index is recorded. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220610113316.6682-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-23perf record: Always get text_poke events with --kcore optionGravatar Adrian Hunter 1-0/+3
kcore provides a copy of the running kernel including any modified code. A trace that benefits from that also benefits from text_poke events, so enable them. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220610113316.6682-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26perf record: Add cgroup support for off-cpu profilingGravatar Namhyung Kim 1-1/+1
This covers two different use cases. The first one is cgroup filtering given by -G/--cgroup option which controls the off-cpu profiling for tasks in the given cgroups only. The other use case is cgroup sampling which is enabled by --all-cgroups option and it adds PERF_SAMPLE_CGROUP to the sample_type to set the cgroup id of the task in the sample data. Example output. $ sudo perf record -a --off-cpu --all-cgroups sleep 1 $ sudo perf report --stdio -s comm,cgroup --call-graph=no ... # Samples: 144 of event 'offcpu-time' # Event count (approx.): 48452045427 # # Children Self Command Cgroup # ........ ........ ............... .......................................... # 61.57% 5.60% Chrome_ChildIOT /user.slice/user-657345.slice/user@657345.service/app.slice/... 29.51% 7.38% Web Content /user.slice/user-657345.slice/user@657345.service/app.slice/... 17.48% 1.59% Chrome_IOThread /user.slice/user-657345.slice/user@657345.service/app.slice/... 16.48% 4.12% pipewire-pulse /user.slice/user-657345.slice/user@657345.service/session.slice/... 14.48% 2.07% perf /user.slice/user-657345.slice/user@657345.service/app.slice/... 14.30% 7.15% CompositorTileW /user.slice/user-657345.slice/user@657345.service/app.slice/... 13.33% 6.67% Timer /user.slice/user-657345.slice/user@657345.service/app.slice/... ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220518224725.742882-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26perf record: Implement basic filtering for off-cpuGravatar Namhyung Kim 1-1/+1
It should honor cpu and task filtering with -a, -C or -p, -t options. Committer testing: # perf record --off-cpu --cpu 1 perf bench sched messaging -l 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 1.722 [sec] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 1.446 MB perf.data (7248 samples) ] # # perf script | head -20 perf 97164 [001] 38287.696761: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696764: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696765: 9 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696767: 212 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97164 [001] 38287.696768: 5130 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97164 [001] 38287.696770: 123063 cycles: ffffffffb6e0011e syscall_return_via_sysret+0x38 (vmlinux) perf 97164 [001] 38287.696803: 2292748 cycles: ffffffffb636c82d __fput+0xad (vmlinux) swapper 0 [001] 38287.702852: 1927474 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97513 97513 [001] 38287.767207: 1172536 cycles: ffffffffb612ff65 newidle_balance+0x5 (vmlinux) swapper 0 [001] 38287.769567: 1073081 cycles: ffffffffb618216d ktime_get_mono_fast_ns+0xd (vmlinux) :97533 97533 [001] 38287.770962: 984460 cycles: ffffffffb65b2900 selinux_socket_sendmsg+0x0 (vmlinux) :97540 97540 [001] 38287.772242: 883462 cycles: ffffffffb6d0bf59 irqentry_exit_to_user_mode+0x9 (vmlinux) swapper 0 [001] 38287.773633: 741963 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97552 97552 [001] 38287.774539: 606680 cycles: ffffffffb62eda0a page_add_file_rmap+0x7a (vmlinux) :97556 97556 [001] 38287.775333: 502254 cycles: ffffffffb634f964 get_obj_cgroup_from_current+0xc4 (vmlinux) :97561 97561 [001] 38287.776163: 427891 cycles: ffffffffb61b1522 cgroup_rstat_updated+0x22 (vmlinux) swapper 0 [001] 38287.776854: 359030 cycles: ffffffffb612fc5e load_balance+0x9ce (vmlinux) :97567 97567 [001] 38287.777312: 330371 cycles: ffffffffb6a8d8d0 skb_set_owner_w+0x0 (vmlinux) :97566 97566 [001] 38287.777589: 311622 cycles: ffffffffb614a7a8 native_queued_spin_lock_slowpath+0x148 (vmlinux) :97512 97512 [001] 38287.777671: 307851 cycles: ffffffffb62e0f35 find_vma+0x55 (vmlinux) # # perf record --off-cpu --cpu 4 perf bench sched messaging -l 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 1.613 [sec] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 1.415 MB perf.data (6729 samples) ] # perf script | head -20 perf 97650 [004] 38323.728036: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728040: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728041: 9 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728042: 208 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97650 [004] 38323.728044: 5026 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97650 [004] 38323.728046: 119970 cycles: ffffffffb6d0bebc syscall_exit_to_user_mode+0x1c (vmlinux) perf 97650 [004] 38323.728078: 2190103 cycles: 54b756 perf_tool__process_synth_event+0x16 (/home/acme/bin/perf) swapper 0 [004] 38323.783357: 1593139 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.785352: 1593139 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.797330: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.802350: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.806333: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97996 97996 [004] 38323.807145: 1418936 cycles: 7f5db9be6917 [unknown] ([unknown]) :97959 97959 [004] 38323.807730: 1445074 cycles: ffffffffb6329d36 memcg_slab_post_alloc_hook+0x146 (vmlinux) :97959 97959 [004] 38323.808103: 1341584 cycles: ffffffffb62fd90f get_page_from_freelist+0x112f (vmlinux) :97959 97959 [004] 38323.808451: 1227537 cycles: ffffffffb65b2905 selinux_socket_sendmsg+0x5 (vmlinux) :97959 97959 [004] 38323.808768: 1184321 cycles: ffffffffb6d1ba35 _raw_spin_lock_irqsave+0x15 (vmlinux) :97959 97959 [004] 38323.809073: 1153017 cycles: ffffffffb6a8d92d skb_set_owner_w+0x5d (vmlinux) :97959 97959 [004] 38323.809402: 1126875 cycles: ffffffffb6329c64 memcg_slab_post_alloc_hook+0x74 (vmlinux) :97959 97959 [004] 38323.809695: 1073248 cycles: ffffffffb6e0001d entry_SYSCALL_64+0x1d (vmlinux) # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220518224725.742882-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26perf record: Enable off-cpu analysis with BPFGravatar Namhyung Kim 1-0/+25
Add --off-cpu option to enable the off-cpu profiling with BPF. It'd use a bpf_output event and rename it to "offcpu-time". Samples will be synthesized at the end of the record session using data from a BPF map which contains the aggregated off-cpu time at context switches. So it needs root privilege to get the off-cpu profiling. Each sample will have a separate user stacktrace so it will skip kernel threads. The sample ip will be set from the stacktrace and other sample data will be updated accordingly. Currently it only handles some basic sample types. The sample timestamp is set to a dummy value just not to bother with other events during the sorting. So it has a very big initial value and increase it on processing each samples. Good thing is that it can be used together with regular profiling like cpu cycles. If you don't want to that, you can use a dummy event to enable off-cpu profiling only. Example output: $ sudo perf record --off-cpu perf bench sched messaging -l 1000 $ sudo perf report --stdio --call-graph=no # Total Lost Samples: 0 # # Samples: 41K of event 'cycles' # Event count (approx.): 42137343851 ... # Samples: 1K of event 'offcpu-time' # Event count (approx.): 587990831640 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. ......................... # 81.66% 0.00% sched-messaging libc-2.33.so [.] __libc_start_main 81.66% 0.00% sched-messaging perf [.] cmd_bench 81.66% 0.00% sched-messaging perf [.] main 81.66% 0.00% sched-messaging perf [.] run_builtin 81.43% 0.00% sched-messaging perf [.] bench_sched_messaging 40.86% 40.86% sched-messaging libpthread-2.33.so [.] __read 37.66% 37.66% sched-messaging libpthread-2.33.so [.] __write 2.91% 2.91% sched-messaging libc-2.33.so [.] __poll ... As you can see it spent most of off-cpu time in read and write in bench_sched_messaging(). The --call-graph=no was added just to make the output concise here. It uses perf hooks facility to control BPF program during the record session rather than adding new BPF/off-cpu specific calls. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220518224725.742882-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26perf tools: Allow all_cpus to be a superset of user_requested_cpusGravatar Adrian Hunter 1-6/+12
To support collection of system-wide events with user requested CPUs, all_cpus must be a superset of user_requested_cpus. In order to support all_cpus to be a superset of user_requested_cpus, all_cpus must be used instead of user_requested_cpus when dealing with CPUs of all events instead of CPUs of requested events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20220524075436.29144-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()Gravatar Adrian Hunter 1-18/+3
Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() in preparation for allowing system-wide events on all CPUs while the user requested events are on only user requested CPUs. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20220524075436.29144-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-05perf cpumap: Switch to using perf_cpu_map APIGravatar Ian Rogers 1-6/+7
Switch some raw accesses to the cpu map to using the library API. This can help with reference count checking. Some BPF cases switch from index to CPU for consistency, this shouldn't matter as the CPU map is full. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20220503041757.2365696-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-14perf record: Fix per-thread optionGravatar Alexey Bayduraev 1-5/+17
Per-thread mode doesn't have specific CPUs for events, add checks for this case. Minor fix to a pr_debug by Ian Rogers <irogers@google.com> to avoid an out of bound array access. Fixes: 7954f71689f90cb2 ("perf record: Introduce thread affinity and mmap masks") Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Alexey Bayduraev <alexey.bayduraev@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220414014642.3308206-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01perf evlist: Rename cpus to user_requested_cpusGravatar Ian Rogers 1-3/+3
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps of all evsels. For non-task targets, cpus is set to be cpus requested from the command line, defaulting to all online cpus if no cpus are specified. For an uncore event, all_cpus may be just CPU 0 or every online CPU. This causes all_cpus to have fewer values than the cpus variable which is confusing given the 'all' in the name. To try to make the behavior clearer, rename cpus to user_requested_cpus and add comments on the two struct variables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-22perf data: Adding error message if perf_data__create_dir() failsGravatar Alexey Bayduraev 1-1/+3
Add proper return codes for all cases of data directory creation failure and add error message output based on these codes. Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220222091417.11020-1-alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Implement compatibility checksGravatar Alexey Bayduraev 1-3/+46
Implement compatibility checks for other modes and related command line options: asynchronous (--aio) trace streaming and affinity (--affinity) modes, pipe mode, AUX area tracing --snapshot and --aux-sample options, --switch-output, --switch-output-event, --switch-max-files and --timestamp-filename options. Parallel data streaming is compatible with Zstd compression (--compression-level) and external control commands (--control). CPU mask provided via -C option filters --threads specification masks. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/fadc1cf74057af4d5766248fcfe5cdde40732aa9.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Extend --threads command line optionGravatar Alexey Bayduraev 1-2/+316
Extend --threads option in perf record command line interface. The option can have a value in the form of masks that specify CPUs to be monitored with data streaming threads and its layout in system topology. The masks can be filtered using CPU mask provided via -C option. The specification value can be user defined list of masks. Masks separated by colon define CPUs to be monitored by one thread and affinity mask of that thread is separated by slash. For example: <cpus mask 1>/<affinity mask 1>:<cpu mask 2>/<affinity mask 2> specifies parallel threads layout that consists of two threads with corresponding assigned CPUs to be monitored. The specification value can be a string e.g. "cpu", "core" or "package" meaning creation of data streaming thread for every CPU or core or package to monitor distinct CPUs or CPUs grouped by core or package. The option provided with no or empty value defaults to per-cpu parallel threads layout creating data streaming thread for every CPU being monitored. Document --threads option syntax and parallel data streaming modes in Documentation/perf-record.txt. Suggested-by: Jiri Olsa <jolsa@kernel.org> Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/079e2619be70c465317cf7c9fdaf5fa069728c32.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce --threads command line optionGravatar Alexey Bayduraev 1-1/+47
Provide --threads option in perf record command line interface. The option creates a data streaming thread for each CPU in the system. Document --threads option in Documentation/perf-record.txt. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/01aeae43b047f428596c4ef9f9342ab94865cedd.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce data transferred and compressed statsGravatar Alexey Bayduraev 1-3/+22
Introduce bytes_transferred and bytes_compressed stats so they would capture statistics for the related data buffer transfers. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/b5d598034c507dfb7544d2125500280b7d434764.1642440724.git.alexey.v.bayduraev@linux.intel.com [ Use PRiu64 to print u64 values, fixing the build on 32-bit architectures ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce compressor at mmap buffer objectGravatar Alexey Bayduraev 1-7/+11
Introduce compressor object into mmap object so it could be used to pack the data stream from the corresponding kernel data buffer. Initialize and make use of the introduced per mmap compressor. Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Link: https://lore.kernel.org/r/80edc286cf6543139a7d5a91217605123aa0b50d.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce bytes written statsGravatar Alexey Bayduraev 1-3/+18
Introduce a function to calculate the total amount of data written and use it to support the --max-size option. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/3e2c69186641446f8ab003ec209bccc762b3394d.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce data file at mmap buffer objectGravatar Alexey Bayduraev 1-5/+32
Introduce data file objects into mmap object so it could be used to process and store data stream from the corresponding kernel data buffer. Initialize data files located at mmap buffer objects so trace data can be written into several data file located at data directory. Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Link: https://lore.kernel.org/r/177077f7734b63e5c999ccd75ac6dc3c694f0d0d.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Start threads in the beginning of trace streamingGravatar Alexey Bayduraev 1-1/+120
Start thread in detached state because its management is implemented via messaging to avoid any scaling issues. Block signals prior thread start so only main tool thread would be notified on external async signals during data collection. Thread affinity mask is used to assign eligible CPUs for the thread to run. Wait and sync on thread start using thread ack pipe. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/95784dd9f7c81ee408eab27b50b4c09ad4cf7be6.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Stop threads in the end of trace streamingGravatar Alexey Bayduraev 1-0/+31
Signal thread to terminate by closing write fd of msg pipe. Receive THREAD_MSG__READY message as the confirmation of the thread's termination. Stop threads created for parallel trace streaming prior their stats processing. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/55ef8cc5ec3a96360660d9dc1763573225325f8c.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce thread local variableGravatar Alexey Bayduraev 1-46/+94
Introduce thread local variable and use it for threaded trace streaming. Use thread affinity mask instead of record affinity mask in affinity modes. Use evlist__ctlfd_update() to propagate control commands from thread object to global evlist object to enable evlist__ctlfd_* functionality. Move waking and sample statistic to struct record_thread and introduce record__waking function to calculate the total number of wakes. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/0d127555219991c1dcd6c6bb76b24fa6b78d2932.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce thread specific data arrayGravatar Alexey Bayduraev 1-3/+244
Introduce thread specific data object and array of such objects to store and manage thread local data. Implement functions to allocate, initialize, finalize and release thread specific data. Thread local maps and overwrite_maps arrays keep pointers to mmap buffer objects to serve according to maps thread mask. Thread local pollfd array keeps event fds connected to mmaps buffers according to maps thread mask. Thread control commands are delivered via thread local comm pipes and ctlfd_pos fd. External control commands (--control option) are delivered via evlist ctlfd_pos fd and handled by the main tool thread. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/fc9f74af6f822d9c0fa0e145c3564a760dbe3d4b.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10perf record: Introduce thread affinity and mmap masksGravatar Alexey Bayduraev 1-0/+123
Introduce affinity and mmap thread masks. Thread affinity mask defines CPUs that a thread is allowed to run on. Thread maps mask defines mmap data buffers the thread serves to stream profiling data from. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/9042bf7daf988e17e17e6acbf5d29590bde869cd.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-15perf record: Disable debuginfod by defaultGravatar Jiri Olsa 1-0/+13
Fedora 35 sets DEBUGINFOD_URLS by default, which might lead to unexpected stalls in perf record exit path, when we try to cache profiled binaries. # DEBUGINFOD_PROGRESS=1 ./perf record -a ^C[ perf record: Woken up 1 times to write data ] Downloading from https://debuginfod.fedoraproject.org/ 447069 Downloading from https://debuginfod.fedoraproject.org/ 1502175 Downloading \^Z Disabling DEBUGINFOD_URLS by default in perf record and adding debuginfod option and .perfconfig variable support to enable id. Default without debuginfo processing: # perf record -a Using system debuginfod setup: # perf record -a --debuginfod Using custom debuginfd url: # perf record -a --debuginfod='https://evenbetterdebuginfodserver.krava' Adding single perf_debuginfod_setup function and using it also in perf buildid-cache command. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20211209200425.303561-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf cpumap: Give CPUs their own typeGravatar Ian Rogers 1-1/+1
A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-21perf tools: Record ARM64 LR register automaticallyGravatar Alexandre Truong 1-0/+8
On ARM64, automatically record the link register if the frame pointer mode is on. It will be used to do a dwarf unwind to find the caller of the leaf frame if the frame pointer was omitted. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211217154521.80603-2-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07perf tools: Check vmlinux/kallsyms arguments in all toolsGravatar James Clark 1-0/+4
Only perf report checked the validity of these arguments so apply the same check to all tools that read them for consistency. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20211018134844.2627174-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-20perf tools: Add support for PERF_RECORD_AUX_OUTPUT_HW_IDGravatar Adrian Hunter 1-1/+1
The PERF_RECORD_AUX_OUTPUT_HW_ID event provides a way to match AUX output data like Intel PT PEBS-via-PT back to the event that it came from, by providing a hardware ID that is present in the AUX output. Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20210907163903.11820-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-17perf record: Add --synth optionGravatar Namhyung Kim 1-9/+39
Add an option to control the synthesizing behavior. --synth <no|all|task|mmap|cgroup> Fine-tune event synthesis: default=all This can be useful when we know it doesn't need some synthesis like in a specific usecase and/or when using pipe: $ perf record -a --all-cgroups --synth cgroup -o- sleep 1 | \ > perf report -i- -s cgroup Committer notes: Added a clarification to the man page entry for --synth that this is about pre-existing threads. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https //lore.kernel.org/r/20210811044658.1313391-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>