This document presents a set of guidelines for evaluating paper artifacts at conferences sponsored by ACM SIGMOBILE.
This document was prepared by the Research Artifact Evaluation Advisory Committee.
Encourage the submissions: Websites of the conference should specify that they highly encourage that each paper should submit an artifact.
Award the hard work: Conferences should strive for the creation of annual “Best Artifact” award and “Best Artifact Evaluator” (both awards should be suggested by the artifact evaluation committee chairs to the main conference chairs); industry-sponsorship of both of these awards is encouraged.
Consider artifacts for the best paper awards: Best paper awards committee should consider artifacts for the papers where it is relevant. The declared criteria for the best paper can include the requirement of at least “artifact available” badge to be considered for the award - this would encourage the submission of artifacts from all accepted papers.
Give podium to artifacts at the demo session: Demo sessions at conferences should by definition have artifact demonstration tracks, such that authors of all accepted artifacts can show their demos and explain how to run them to the conference audience.
Help to make artifacts better, not to reject them: The conference chairs, as well as artifact evaluation chairs, should make sure that the intention of the artifact evaluation is to not to prove that the artifact is wrong or not working but to help identify potential issues in the hardware and software implementations that may benefit the authors' future research. In other words, the artifact evaluation is about inclusiveness not exclusiveness (as the papers have already passed a very tough process of paper acceptance).
Search for diversity: At least two committee chairs from two different institutions from two different continents, preferably with a diverse background (eg. one man and one woman, each person with different ethnic background).
Give a chance to new people: Each year a new person should be invited for the role of the artifact evaluation committee chair (i.e. a person that has never been an artifact co-chair of another SIGMOBILE-sponsored conference before). This way we will increase the group of people that are involved in the artifact assessment decision process.
Select chairs at the same time as the TPC chairs: Artifact evaluation process must be integral to each SIGMOBILE-sponsored conference and TPC chairs and artifact chairs should be in-sync with the artifact chairs.
Make at least one of the artifact chairs a program committee member: This way one committee chair will have access to every version of the accepted paper (submitted, shepherded, final) and will be able to see all the reviews. This way artifact chairs will not bother TPC chairs with questions related to the status of each artifact-evaluated paper and will gain instant access to the recent status of the paper.
Consult with the previous chairs’ generations: Chairs from the last year’s artifact evaluation committee should be consulted on the matters related to organization of the artifact evaluation.
Four evaluators per artifact: Based on our experience we recommend at least four evaluators per artifact. This number would help to strive to have at least two completed evaluations per submitted artifact (as some artifact reviewers will not complete their promised work).
Two artifacts per evaluator: Based on our experience we recommend that the number of artifacts per evaluator is limited to be no more than two. This would help regulate the workload of the evaluator and ensure a reasonable time for the evaluator to spend per artifact.
Experience of evaluators: Artifact members can be selected from the level of PhD student and up.
Diversity of evaluators: Strive for diversity of committee members (gender, ethnicity, continent, etc.) - as in the case of artifact evaluation chairs - is strongly encouraged.
Award the effort: To encourage participation in the artifact evaluation process, conferences should create an annual “Best Artifact Evaluator” award with two runner-up awards. The award should be given during the conference banquet.
Incentivize the participation: Provide some incentive to students to participate in artifact evaluation. For example, give conference attendance discounts, or certificates, to students that were artifact evaluators.
Way of selection: Selection of artifact evaluations can be done by nomination or self-nomination by the openly accessible online form. Therein the candidates should provide the link to her/his website, email address, GitHub profile (or other profile demonstrating software skills of the evaluator). After all candidates have indicated their availability, the artifact evaluation committee chairs decide which candidates to choose depending on the self-chosen criteria. As stated before, we recommend choosing young persons (PhD students or early postdocs) that have published at least one paper in the mobile systems domain that can demonstrate the ability to run and read code. Artifact evaluation committee chairs can also nominate the most suitable candidates themselves.
Start selection early: Link to the nomination submission form should be provided on the conference website well in advance, preferably at the start of the review phase of research papers for the main conference track.
**Mentor: Recruit a small group of faculty to become mentors to student artifact evaluators during the artifact evaluation process.
Educate your reviewers: Host info sessions to artifact reviewers before the artifact review starts.
Start early: Artifact evaluation should start immediately after the paper notification deadline for conditionally accepted papers.
Announce early: Artifact evaluation should be officially announced at the previous conference (during a conference banquet).
Make it a discussion between the authors and reviewers: Artifact evaluators can ask questions to the artifact authors, during the whole duration of the artifact evaluation, regarding the setup of the artifact, such that small problems related to artifact setup/hardware choice/compilation problems/file openings/etc. will not stop the evaluators from assessing the artifact. In other words, the review process should be a dialog between the artifact author and the reviewers. This way the review will focus on the content of the artifact, not on the installation issues. Also, this way the authors would be able to update their artifact during the review and resubmit it for evaluation.
Strive to make the process double-blind: If possible, the names of the artifact evaluators should be anonymous to the authors of the artifact and vice versa. However, we know that this might be very hard (if part of the evaluation process involves video session with the authors, then the artifact evaluator identity will be immediately known), so we suggest a “best effort” principle - if possible make the review double blind (and make this a default option) but if someone’s identity is exposed that do not reject the artifact for this.
Use HotCRP: We recommend HotCRP as a system where artifact evaluation can take place, since most (if not all) SIGMOBLE conferences will use HotCRP anyway.
Reviewers must access the latest version of the paper: Artifact evaluation committee should be able to see the paper to which the artifact refers to, as to be able to inspect which results are reproducible and which not. This includes the shepherding plan, such that the version of the paper that is close to the camera-ready version is evaluated.
Enable open access evaluation: Artifact chairs can opt in for open access evaluation of submitted artifacts: during such artifact evaluations the authors of artifacts can specify whether they want to open their artifact for evaluation to the outside world, i.e., to people that are not in the artifact evaluation committee. This means that the artifact chairs would have to open a dedicated website through which all information about the evaluated artifacts is shared and how external people can contribute to the artifact evaluation.
Make artifact availability a priority above all other artifact types: Focus on the artifact “availability” first - artifacts should be tested if they contain all required information and data and code to work with it by external parties. Artifact reusability and replicability should be encouraged, but they should be treated as a second order of priority due to the complexity of their evaluation, especially given a limited review time.
Avoid “Replicated” badges: Artifact evaluation time has its limits and majority of systems papers will not be replicable due to limited amount of time unless the artifact chairs decide on whole year evaluation (until the next year conference).
Define expectations: Artifact chairs should instruct the reviewers to read the accompanying paper first before dwelling into artifact. This reading will determine the expectations of what is the core metric of the paper that needs to be assessed/replicated and which aspects of the paper can be skipped from the artifact evaluation.
Define replication tolerance: Define what is a reasonable tolerance of result replication.
Be inclusive: Ideally, all papers should have artifacts available. Since this is hard to achieve we suggest that chairs should aim at 70-80% of the papers having artifacts available.
Prepare installation guideline: When submitting your paper for artifact evaluation, together with your paper please prepare a short appendix of how to install and run your artifact, with the list of hardware and software requirements.
Include a minimum working example as part of your artifact: The artifacts should contain a minimum working example that the artifact works, such that the artifact evaluators know that the basic functionalities of the artifact are easily reproducible.
Specify time to run your artifact: Artifact should specify how much time it takes to run each component of the artifact and what are the hardware requirements to run the artifact.
Share your code: Source code of the artifact can be shared through openly accessible repositories such as http://github.com or http://gitlab.com or dedicated links provided directly to the artifact evaluators.
Add appendix to the camera-ready paper if possible: Artifact evaluation committee chairs can request the authors to provide the appendix in their accepted paper on how to use the prepared artifact. An example of such a research artifact appendix can be found here: https://github.com/ctuning/ck-artifact-evaluation/blob/master/wfe/artifact-evaluation/templates/ae.tex The process must be coordinated with the conference publication chairs. Moreover in the case of an artifact evaluation after the main conference acceptance, i.e. during the artifact evaluation session at the conference, the camera-ready version with the artifact description appendix can be submitted to ACM after the end of the conference.
Create a single website aggregating all accepted artifacts for a SIGMOBILE-sponsored conferences: If possible, a common website aggregating the information about the research artifacts from all previous SIGMOBILE-sponsored conferences is encouraged (similar to https://sysartifacts.github.io), such that new generations of artifact evaluators would build on the knowledge base of previous research artifact sessions.
Option 1 - use Demo session at the conference: If possible, artifact evaluators would postpone the decision on the badge type only after the successful demonstration of the artifact at the conference (remark: this will violate the double blind policy of the artifact evaluation we recommend).
Option 2 - use publicly available testbeds or emulators (paper-specific): Remotely connect to the hardware setup for the artifact evaluators to assess the artifact; either via a publicly available testbed (e.g. https://powderwireless.net/, or https://www.cosmos-lab.org/), or via a hardware emulator.
Option 3 - ship the hardware to artifact evaluators (not recommended): Authors of the artifact ship the hardware setup to the hardware artifact evaluators, together with the instructions on how to use it (remark: this will violate the double blind policy of the artifact evaluation we recommend; also it will be unclear who would pay for the cost of shipment, taxes, insurance, etc.).
ACM Artifact Review Bading explanation https://www.acm.org/publications/policies/artifact-review-and-badging-current
Tuning Foundation Artifact Evaluation Information https://ctuning.org/ae/
Software Systems Conferences Artifacts Guideline https://sysartifacts.github.io/chair-guide.html
Thoughts about Artifact Badging https://eng.ox.ac.uk/media/5209/zilberman2020thoughts.pdf
SIGCOMM 2021 guidelines for evaluators: https://docs.google.com/document/d/15nju5WsnLOEIupk24skdGO5qYNNGXJ0fr0Z_azE_TgE/
Wei Gao University of Pittsburgh |
Inseok Hwang POSTECH, South Korea |
||
Przemysław Pawełczak (Committee Chair) TU Delft, Netherlands |
Nirupam Roy University of Maryland, College Park, USA |
We would like to thank the following individuals for their valuable input in preparing this document (listed in alphabetical order).
Jonathan Bell, Northeastern University Artifact co-chair of PLDI 2020
Eva Darulova, Uppsala University Artifact co-chair of ASPLOS 2022
Eric Eide, University of Utah Artifact co-chair of OSDI 2020
Aaron Gember-Jacobson, Colgate University Artifact co-chair of SIGCOMM 2021
Anjo Vahldiek-Oberwagner, Intel Labs Artifact co-chair of OSDI 2020, EuroSys 2022, SuperComputing 2021, and USENIX Security 2023
Chengyu Zhang, ETH Zurich Artifact co-chair of OSDI 2022 and ATC 2022