About
Events
Publications
Awards
Get Involved
Videos

ACM SIGMOBILE Research Papers Artifact Evaluation Guidelines

This document presents a set of guidelines for evaluating paper artifacts at conferences sponsored by ACM SIGMOBILE.

With this document we would like to encourage all members of the SIGMOBILE community to promote the notion of open and repeatable science, by means of:

preparing and submitting the research artifacts by the authors,
evaluating the research artifacts by the artifact evaluation committee, and
(re-)use and reproduce them for future research.

Only verifiable and open science is the science that can be trusted and research artifacts should be their foundation. While many core conferences in the programming languages domain have well established artifact evaluation tracks (e.g. PLDI, ASPLOS), core mobile systems conferences have a shorter record of artifact evaluation (i.e. MobiSys 2022 and MobiCom 2020) or none (i.e. SenSys). With this guideline we aim to help establish artifact evaluation track as a standard and self-evident element of every mobile systems conference sponsored by SIGMOBILE.

The need of this document spans from the realization of many authors of mobile systems papers that their research is heavy based on custom-made hardware and one-time deployments. Thus it is much more difficult to prepare and evaluate research artifacts that are based purely on software. While this document will not be able to answer all questions related to the artifact preparation and evaluation, we hope it will answer some of the most pressing questions related to research artifacts in the mobile systems community.

This document was prepared by the Research Artifact Evaluation Advisory Committee.

Guidelines for conference chairs for encouraging of submission of artifacts for accepted papers

Encourage the submissions: Websites of the conference should specify that they highly encourage that each paper should submit an artifact.
Award the hard work: Conferences should strive for the creation of annual “Best Artifact” award and “Best Artifact Evaluator” (both awards should be suggested by the artifact evaluation committee chairs to the main conference chairs); industry-sponsorship of both of these awards is encouraged.
Consider artifacts for the best paper awards: Best paper awards committee should consider artifacts for the papers where it is relevant. The declared criteria for the best paper can include the requirement of at least “artifact available” badge to be considered for the award - this would encourage the submission of artifacts from all accepted papers.
Give podium to artifacts at the demo session: Demo sessions at conferences should by definition have artifact demonstration tracks, such that authors of all accepted artifacts can show their demos and explain how to run them to the conference audience.
Help to make artifacts better, not to reject them: The conference chairs, as well as artifact evaluation chairs, should make sure that the intention of the artifact evaluation is to not to prove that the artifact is wrong or not working but to help identify potential issues in the hardware and software implementations that may benefit the authors' future research. In other words, the artifact evaluation is about inclusiveness not exclusiveness (as the papers have already passed a very tough process of paper acceptance).
Artifacts should not be a limiting factor in submitting your paper: There is a very limited (but non-zero) probability that future studies will focus on themes that are easy to be evaluated as artifacts (in other words - students might avoid working on hardware-based systems as they would have the highest difficulty of artifact evaluation). This mindset should be avoided by all means and conference chairs should encourage submission of all types of mobile systems papers irrespective of the difficulty of artifact creation. Artifacts should be an enabler, not a limiting factor of academic progress.

Guidelines for selecting and guiding artifact evaluation committee chairs

Search for diversity: At least two committee chairs from two different institutions from two different continents, preferably with a diverse background (eg. one man and one woman, each person with different ethnic background).
Give a chance to new people: Each year a new person should be invited for the role of the artifact evaluation committee chair (i.e. a person that has never been an artifact co-chair of another SIGMOBILE-sponsored conference before). This way we will increase the group of people that are involved in the artifact assessment decision process.
Select chairs at the same time as the TPC chairs: Artifact evaluation process must be integral to each SIGMOBILE-sponsored conference and TPC chairs and artifact chairs should be in-sync with the artifact chairs.
Make at least one of the artifact chairs a program committee member: This way one committee chair will have access to every version of the accepted paper (submitted, shepherded, final) and will be able to see all the reviews. This way artifact chairs will not bother TPC chairs with questions related to the status of each artifact-evaluated paper and will gain instant access to the recent status of the paper.
Consult with the previous chairs’ generations: Chairs from the last year’s artifact evaluation committee should be consulted on the matters related to organization of the artifact evaluation.
Create communications channels: Create a Slack channel or an email list (or anything similar that allows for efficient group communication) with a pool of all previous artifact chairs to: (a) learn from each other, and (b) gain information on the experience in artifact handling by the previous generation of artifact chairs.

Guidelines for selecting artifact evaluation committee members

Four evaluators per artifact: Based on our experience we recommend at least four evaluators per artifact. This number would help to strive to have at least two completed evaluations per submitted artifact (as some artifact reviewers will not complete their promised work).
Two artifacts per evaluator: Based on our experience we recommend that the number of artifacts per evaluator is limited to be no more than two. This would help regulate the workload of the evaluator and ensure a reasonable time for the evaluator to spend per artifact.
Experience of evaluators: Artifact members can be selected from the level of PhD student and up.
Diversity of evaluators: Strive for diversity of committee members (gender, ethnicity, continent, etc.) - as in the case of artifact evaluation chairs - is strongly encouraged.
Award the effort: To encourage participation in the artifact evaluation process, conferences should create an annual “Best Artifact Evaluator” award with two runner-up awards. The award should be given during the conference banquet.
Incentivize the participation: Provide some incentive to students to participate in artifact evaluation. For example, give conference attendance discounts, or certificates, to students that were artifact evaluators.
Way of selection: Selection of artifact evaluations can be done by nomination or self-nomination by the openly accessible online form. Therein the candidates should provide the link to her/his website, email address, GitHub profile (or other profile demonstrating software skills of the evaluator). After all candidates have indicated their availability, the artifact evaluation committee chairs decide which candidates to choose depending on the self-chosen criteria. As stated before, we recommend choosing young persons (PhD students or early postdocs) that have published at least one paper in the mobile systems domain that can demonstrate the ability to run and read code. Artifact evaluation committee chairs can also nominate the most suitable candidates themselves.
Start selection early: Link to the nomination submission form should be provided on the conference website well in advance, preferably at the start of the review phase of research papers for the main conference track.
**Mentor: Recruit a small group of faculty to become mentors to student artifact evaluators during the artifact evaluation process.
Educate your reviewers: Host info sessions to artifact reviewers before the artifact review starts.
Match expertise to artifacts: After the evaluation committee has been selected and when all artifacts have been submitted for evaluation, match the expertise of the reviewer and its hardware to the artifact. In other words, through the means of the online form ask which hardware/software each of the artifact reviewer has access to and assign them to the review of these particular artifacts. The information on the software/hardware requirements should be provided as part of the artifact submission form.

Guidelines for timeline of artifact evaluation

Start early: Artifact evaluation should start immediately after the paper notification deadline for conditionally accepted papers.
Announce early: Artifact evaluation should be officially announced at the previous conference (during a conference banquet).
Pick a timeline that works for you: Artifact evaluation should run (a) until the camera-ready deadline of the conference [if the artifact chairs decide that all artifact decision must be made before the actual conference starts]; (b) or until the end of the conference [if the artifact evaluation is also extended to the demo session at the conference]; (c) or until the next year conference [to allow ample of time for artifact evaluation and to announce the artifact awards at the next year conference edition]. For options (b) and (c), consider that the criteria for the best paper award should not require an artifact badge.

Guidelines for the assessment process of the artifact

Make it a discussion between the authors and reviewers: Artifact evaluators can ask questions to the artifact authors, during the whole duration of the artifact evaluation, regarding the setup of the artifact, such that small problems related to artifact setup/hardware choice/compilation problems/file openings/etc. will not stop the evaluators from assessing the artifact. In other words, the review process should be a dialog between the artifact author and the reviewers. This way the review will focus on the content of the artifact, not on the installation issues. Also, this way the authors would be able to update their artifact during the review and resubmit it for evaluation.
Strive to make the process double-blind: If possible, the names of the artifact evaluators should be anonymous to the authors of the artifact and vice versa. However, we know that this might be very hard (if part of the evaluation process involves video session with the authors, then the artifact evaluator identity will be immediately known), so we suggest a “best effort” principle - if possible make the review double blind (and make this a default option) but if someone’s identity is exposed that do not reject the artifact for this.
Use HotCRP: We recommend HotCRP as a system where artifact evaluation can take place, since most (if not all) SIGMOBLE conferences will use HotCRP anyway.
Reviewers must access the latest version of the paper: Artifact evaluation committee should be able to see the paper to which the artifact refers to, as to be able to inspect which results are reproducible and which not. This includes the shepherding plan, such that the version of the paper that is close to the camera-ready version is evaluated.
Enable open access evaluation: Artifact chairs can opt in for open access evaluation of submitted artifacts: during such artifact evaluations the authors of artifacts can specify whether they want to open their artifact for evaluation to the outside world, i.e., to people that are not in the artifact evaluation committee. This means that the artifact chairs would have to open a dedicated website through which all information about the evaluated artifacts is shared and how external people can contribute to the artifact evaluation.
Make artifact availability a priority above all other artifact types: Focus on the artifact “availability” first - artifacts should be tested if they contain all required information and data and code to work with it by external parties. Artifact reusability and replicability should be encouraged, but they should be treated as a second order of priority due to the complexity of their evaluation, especially given a limited review time.
Avoid “Replicated” badges: Artifact evaluation time has its limits and majority of systems papers will not be replicable due to limited amount of time unless the artifact chairs decide on whole year evaluation (until the next year conference).
Define expectations: Artifact chairs should instruct the reviewers to read the accompanying paper first before dwelling into artifact. This reading will determine the expectations of what is the core metric of the paper that needs to be assessed/replicated and which aspects of the paper can be skipped from the artifact evaluation.
Define replication tolerance: Define what is a reasonable tolerance of result replication.
Be inclusive: Ideally, all papers should have artifacts available. Since this is hard to achieve we suggest that chairs should aim at 70-80% of the papers having artifacts available.
Do not bother asking which artifact the authors want to get during paper submission: Asking the paper authors which artifacts they aim at the beginning of the paper evaluation will not help the artifact evaluators, as the artifact evaluators will decide it themselves based on the eventually submitted artifact.

Guidelines for the preparation of research artifacts (for the authors)

Prepare installation guideline: When submitting your paper for artifact evaluation, together with your paper please prepare a short appendix of how to install and run your artifact, with the list of hardware and software requirements.
Include a minimum working example as part of your artifact: The artifacts should contain a minimum working example that the artifact works, such that the artifact evaluators know that the basic functionalities of the artifact are easily reproducible.
Specify time to run your artifact: Artifact should specify how much time it takes to run each component of the artifact and what are the hardware requirements to run the artifact.
Share your code: Source code of the artifact can be shared through openly accessible repositories such as http://github.com or http://gitlab.com or dedicated links provided directly to the artifact evaluators.
Make your artifact permanent after evaluation: After the artifact acceptance please put your artifact on publicly accessible repositories such as https://zenodo.org/, https://figshare.com/, or http://datadryad.org/ to have “Artifact Available” badge awarded. These repositories will generate a dedicated DOI link for your repository, which you should add to the Artifact appendix of your accepted paper.

Guidelines for the preparation of research artifacts after the artifact decisions (for the artifact evaluation committee chairs)

Add appendix to the camera-ready paper if possible: Artifact evaluation committee chairs can request the authors to provide the appendix in their accepted paper on how to use the prepared artifact. An example of such a research artifact appendix can be found here: https://github.com/ctuning/ck-artifact-evaluation/blob/master/wfe/artifact-evaluation/templates/ae.tex The process must be coordinated with the conference publication chairs. Moreover in the case of an artifact evaluation after the main conference acceptance, i.e. during the artifact evaluation session at the conference, the camera-ready version with the artifact description appendix can be submitted to ACM after the end of the conference.
Create a single website aggregating all accepted artifacts for a SIGMOBILE-sponsored conferences: If possible, a common website aggregating the information about the research artifacts from all previous SIGMOBILE-sponsored conferences is encouraged (similar to https://sysartifacts.github.io), such that new generations of artifact evaluators would build on the knowledge base of previous research artifact sessions.
Learn from your past mistakes: Meet the next year’s conference artifact chairs, publication chairs and TPC chairs and tell them about your (eventual) mistakes and ask them to re-evaluate the evaluation process.

Guidelines for the assessment of software-only research artifacts

Use software containers: The artifact should be submitted as a container, using for example https://www.virtualbox.org/ or https://www.docker.com/, that can run without the access to the internet (i.e. it does not depend on external software packages, APIs, websites, etc.).

Guidelines for the assessment of hardware-dependent research artifacts

Option 1 - use Demo session at the conference: If possible, artifact evaluators would postpone the decision on the badge type only after the successful demonstration of the artifact at the conference (remark: this will violate the double blind policy of the artifact evaluation we recommend).
Option 2 - use publicly available testbeds or emulators (paper-specific): Remotely connect to the hardware setup for the artifact evaluators to assess the artifact; either via a publicly available testbed (e.g. https://powderwireless.net/, or https://www.cosmos-lab.org/), or via a hardware emulator.
Option 3 - ship the hardware to artifact evaluators (not recommended): Authors of the artifact ship the hardware setup to the hardware artifact evaluators, together with the instructions on how to use it (remark: this will violate the double blind policy of the artifact evaluation we recommend; also it will be unclear who would pay for the cost of shipment, taxes, insurance, etc.).
Option 4 - connect with the authors via video: Setup a video session with the authors and ask them to execute the task on the hardware that will result in paper-specific data. Remarks: Note that the evaluation of such artifacts might be sometimes impossible due to many constraints.

Guidelines for the assessment of human research-dependent research artifacts

Unsolved problem: Bear in mind that it will be very hard to reproduce such artifacts, as data collection by the artifact evaluators would be impossible. In this case assessment of availability of raw data (with accompanying parsers) would be enough to grant “artifact available” badge.

Additional information about the artifact review process

ACM Artifact Review Bading explanation https://www.acm.org/publications/policies/artifact-review-and-badging-current
Tuning Foundation Artifact Evaluation Information https://ctuning.org/ae/
Software Systems Conferences Artifacts Guideline https://sysartifacts.github.io/chair-guide.html
Thoughts about Artifact Badging https://eng.ox.ac.uk/media/5209/zilberman2020thoughts.pdf
SIGCOMM 2021 guidelines for evaluators: https://docs.google.com/document/d/15nju5WsnLOEIupk24skdGO5qYNNGXJ0fr0Z_azE_TgE/
Guidelines: https://github.com/mlcommons/ck/blob/master/docs/artifact-evaluation/faq.md

Research Artifact Evaluation Advisory Committee

	Wei Gao University of Pittsburgh		Inseok Hwang POSTECH, South Korea
	Przemysław Pawełczak (Committee Chair) TU Delft, Netherlands		Nirupam Roy University of Maryland, College Park, USA

Acknowledgements

We would like to thank the following individuals for their valuable input in preparing this document (listed in alphabetical order).

Jonathan Bell, Northeastern University Artifact co-chair of PLDI 2020
Eva Darulova, Uppsala University Artifact co-chair of ASPLOS 2022
Eric Eide, University of Utah Artifact co-chair of OSDI 2020
Aaron Gember-Jacobson, Colgate University Artifact co-chair of SIGCOMM 2021
Anjo Vahldiek-Oberwagner, Intel Labs Artifact co-chair of OSDI 2020, EuroSys 2022, SuperComputing 2021, and USENIX Security 2023
Chengyu Zhang, ETH Zurich Artifact co-chair of OSDI 2022 and ATC 2022
Danyang Zhuo, Duke University Artifact co-chair of SIGCOMM 2022

THE ACM SPECIAL INTEREST GROUP ON MOBILITY OF SYSTEMS USERS, DATA, AND COMPUTING