Automated Employment Decision Tools
Rule status: Proposed
Comment by date: October 24, 2022
Rule Full Text
The Department of Consumer and Worker Protection is proposing to add rules to implement new legislation related to the use of automated employment decision tools. The proposed rules would clarify the requirements for the use of automated employment decision tools within New York City, the notices to employees and candidates for employment regarding the use of the tool, the bias audit for the tool, and the required published results of the bias audit.
Attendees who need reasonable accommodation for a disablity such as a sign language translation should contact the agency by calling 1 (212) 436-0396 or emailing Rulecomments@dcwp.nyc.gov by October 17, 2022
Send comments by
- Email: Rulecomments@dcwp.nyc.gov
October 24, 2022
11:00am - 12:00pm EDT
Meeting ID: 874 1701 0175 Password: 448584
-To participate in the public hearing via phone, please dial 646 558 8656- - Meeting ID: 874 1701 0175 - Password: 448584
Comments are now closed.
Online comments: 20
Fred Oswald, Rice University, workforce.rice.edu
Note that there are a number of different-yet-reasonable ways to calculate adverse impact statistics, other than the impact ratio. The 4/5ths Rule (impact ratio of < .80 as prima facie case for disparate impact) arises from the UGESP FAQ, but does not preclude these important alternatives. The 4/5ths Rule itself is asymmetric; that is, the 4/5ths Rule for protected-group selection operates differently than a 5/4ths Rule for protected-group rejection, even though conceptually, it is the same. Furthermore, the 4/5ths Rule is easier to violate at lower selection rates (e.g., 1.5% vs. 2% selection violates the 4/5ths Rule, even though this is a .5% selection rate difference) than at higher ones (e.g., 76% vs. 95% does not violate the 4/5ths rule, yet is a 19% selection rate difference).
For alternative adverse impact indices that are possible, see the Free Adverse Impact Resource (FAIR) at https://orgtools.shinyapps.io/FAIR/, and a related book chapter:
Oswald, F. L., Dunleavy, E., & Shaw, A. (2017). Measuring practical significance in adverse impact analysis. In S. B. Morris & E. M. Dunleavy (Eds.). Adverse impact analysis: Understanding data, statistics and risk (pp. 92-112). Routledge.
The proposed rules defines independent auditor as “a person or group that is not involved in using or developing an AEDT that is responsible for conducting a bias audit of such AEDT.” In the first example under Section 5-301 (Bias Audit), the proposed rules state “…The employer asks the vendor for a bias audit. The vendor uses historical data it has collected from employers on applicants selected for each category to conduct a bias audit…” Since the majority of AEDT vendors are also the developers of AEDTs, this seems to indicate that vendor can conduct a bias audit for the employer as long as the person or group employed by the vendor is not the same person or group that developed the AEDT.
Similarly, the second example states “…The employer uses historical data on average “culture fit” score of applicants for each category to conduct a bias audit…” This also seems to indicate that the employer can conduct their own bias audit, as long as the person or group conducting the audit is not using the AEDT.
Put another way, the proposed rules do not require that the independent auditor be a separate legal entity from either the vendor or the employer. If that is the intent, it would be beneficial to clarify within the final rules.
The expanded definition of machine learning, statistical modelling, data analytics, or artificial intelligence in the proposed rules is helpful. However, it could still be interpreted to include a broad range of employment decision tools that fall outside the conventional definitions of machine learning and/or artificial intelligence (ML/AI). If the intent of LL 144 and the proposed rules are to regulate tools that do fall under the conventional definitions of ML/AI, I would recommend the following:
(1) Remove the terms “data analytics” and “statistical modelling”
(2) in part (ii) of the expanded definition, revise to read “for which a computer actively identifies the inputs….”
(3) Provide additional examples of what does not fall under the intended definition of an AEDT.
Hannah Wade, NYU Langone Health
NYU Langone Health Comments on Proposed Rule Amendment, Subchapter T: Automated Employment Decision Tools
RULE TITLE: Requirement for Use of Automated Employment Decisionmaking Tools
REFERENCE NUMBER: 2022 RG 061
RULEMAKING AGENCY: Department of Consumer and Worker Protection
On behalf of NYU Langone Health, please accept our comments on the proposed rules to implement Local Law 144 of 2021 related to the use of automated employment decision tools, or AEDTs. We appreciate the Department of Consumer and Worker Protection (DCWP) for the opportunity to comment.
As the healthcare system strives to recover from the COVID-19 pandemic, New York City hospitals are facing significant workforce challenges. The DCWP should consider providing an exemption to the healthcare field due to ongoing public health crises. These crises, including recovery from the COVID-19 epidemic, the monkeypox outbreak, and the recent influx of asylum seekers, have put significant stress on all New York City hospitals. We are deeply troubled by any additional measures that prevent us from fulfilling our mission to provide safe, quality care for our patients.
At NYU Langone Health, we are opposed to any additional barriers to fill urgently needed positions including nursing, allied health, clinical support and other support services. In particular, we have concerns about the potential hiring delays presented by the requirement (Section 5-303) to provide notice to candidates and employees 10 business days prior to the use of an automated employment decision tool, or AEDT. This requirement presents an unnecessary waiting period that will prolong staffing shortages and negatively impact patients in New York City.
During our Fiscal Year 2022, we received 368,536 applications for 12,796 posted positions which require the use of data analytics to effectively process. Delays of 10 business days in processing time would pose an undue hardship on our healthcare system as we work to recruit and employ talent to best serve our patients.
Once again, thank you for the opportunity to comment. Please reach out to us with any questions or for additional information.
§ 5-303 of the proposed rules (Notice to Candidates and Employees) states: “(a) For the purpose of compliance with § 20-871(b)(1) and (2) of the Code, an employer or employment agency may provide notice to a candidate for employment who resides in the city by: (1) Including notice on the careers or jobs section of its website in a clear and conspicuous manner at least 10 business days prior to use of an AEDT”
If an employer were to provide a notice on their general careers/jobs site 10 business days before posting an opening for a specific job where an AEDT would be used, would this be in compliance with LL 144?
§ 5-303 of the proposed rules (Notice to Candidates and Employees) states:(c) The notice required by § 20-871(b)(1) of the Code must include instructions for how to request an alternative selection process or accommodation.
It also state:
(e) Nothing in this subchapter requires an employer or employment agency to provide an alternative selection process.
If an employer is not required to provide an alternative, why would they be required to include instructions on how to request one?
Clarification needed around lack of data on protective attributes.
Most vendors do not have existing data regarding many of the listed protective attributes, such as ethnicity or sex. Employers often choose not to share such data with the vendor, even if they do collect them, due to their sensitive nature. As a vendor trying to satisfy the requirements of the audit (not just for existing customers but also for any future customers that the legislation may apply to in the next 12 months), that makes the application of the suggested test on the output of any machine learning algorithm rather tricky without actually knowing the protective attributes of the sample population. What would be the suggested approach for vendors who do not currently receive or process data on protective attributes to satisfy the audit requirement?
Thank you for the opportunity to comment on recent guidance for this landmark law. As vendors in the model audit space, we provide tools, technology, and services to employers and HR vendors who offer AEDTs to their clients. From this perspective, alongside our work with other regulating bodies, we hope to share some insight from the front lines of our work with clients, policy-makers, and academia alike.
Setting the Standard
As activists and academics in this space, our team has been working on issues of AI fairness, explainability, accountability, and model accuracy for many years. Our clients come to us because they know we possess deep knowledge of the many ways in which models can go wrong. In the past year, we have performed a number of audits prior to the release of this guidance that have sought to dig far deeper than the guidance requires. As has been previously stated by many in our community, the scope of the law and its proposed guidance, while desperately needed, may miss its intended effects of reducing discriminatory algorithmic behavior. Algorithms typically discriminate in a number of ways, from age to disability to race and gender, with many intersectional combinations thereof. By limiting the scope of the guidance as it currently stands, our prospective clients may be less motivated to pursue thorough examinations of their technology, leading to the shaping of our field’s form of practice. The remainder of our comments will seek to illuminate many of the complexities and conflicts within the practice of algorithmic auditing, in order to display a few of the ways in which the guidance may fail to uncover discrimination where it may continue to exist.
Vendors vs. Employers
The proposed guidance contains a premise that may in many cases be untrue: that by modeling reporting from EEOC guidance, the commercial lift on industry will be reduced. However, employers are seldom the ones responsible for the development of AEDT tools, and employers far more frequently rely on vendors to provide the AEDT for a hiring process. This provides significant tension between the types of data needed for analysis, and the data that each entity possesses. Questions remain as to whether employers may themselves be allowed to simply repurpose the highest level reporting from the AEDT vendors, or whether these employers must also themselves perform an audit on their own limited candidate supply. One of the many ways that AEDTs can go wrong is due to generalizability errors. This is to say that vendor suitability may vary by employer (i.e. a tool that works well in healthcare may perform terribly and/or discriminate on engineers). The guidance, assuming it should take place at the global AEDT level, fails to address this kind of error, providing one of many ways that algorithms can be biased, but which will not be discovered in due course of a model audit seeking to replicate this draft for public reporting. One of the major conflicts between employer and vendor analysis for compliance with this law, deals with the question of whether and which entity may possess demographic data.
Because the guidance assumes it will be produced by employers, who may not have access to match scores at scale provided by their vendor, many of them have requested copies of this report from their vendors for their own websites. This presents a catch-22 for the vendors, in that they largely intentionally limit their collection practices for demographic candidate data, in order to comply with conflicting laws that require data minimization and candidate privacy like the GDPR. In the absence of self-reported data at application time, demographic data is difficult (and costly) to obtain. Candidates are often reluctant to provide this data post-hoc, as is well known in the financial industry who are permitted to collect this data only after applying to credit applications for compliance with fair lending laws, resulting in very low survey response rates. Even the simple act of retaining this data can generate enhanced risk for AEDT vendors, in that the leakage or theft of this data can result in fines or other legal penalties.
Availability of Intersectionality Supply
We applaud the guidance in that it seeks to apply concepts of intersectionality in dividing the report into combinations of race and gender. However, when one requires that we investigate bias by multiple co occurring protected categories, we are making intersectional comparisons that split each of the broader categories into smaller ones, significantly creating large differences in who is represented in the audit. These more rare intersectional identities can often bear the majority of bias and prejudice throughout their careers and experience a great deal of gatekeeping. There have already been experiences wherein our investigations have resulted in intersectional slices for minority categories for which the amount of data in each category is so limited that the results are misleading, in that they do not represent statistically significant segments. Employers and AEDT vendors in this situation are faced with a significant problem, in that releasing the numbers for these categories may imply discrimination when the evidence of discrimination is simply not there. In some cases, there may even be categories for which no data exists. In this situation, we often see vendors turn to synthetic data or globally representative datasets which may not be sufficiently connected to the candidates truly subjected to the AEDT. The science on this notion is fairly limited, yielding great uncertainty in the audit community, and paving the way for insufficient audits that fail to represent reality.
Some good practices can conflict with each other. For example, data minimization and anonymization can protect our privacy, but also hide the very information we need to conduct an audit. Some industries (like the financial sector) use gender and race inference, or synthetic/global datasets to avoid the privacy issue, but this adds a thick layer of uncertainty to the audit that could cause some audits to under or overestimate bias, which we’ll elaborate on later.
Confounding Factors (i.e. Human Selection Bias)
In defining “ground truth” it may be tempting to use some signal of approval (e.g. a candidate was hired, or a candidate was moved forward in the hiring process). However, these signals are human in nature, and therefore even more full of potential for discriminatory behavior (or the lack thereof) to obscure the behavior of the algorithm itself. An exceedingly DEI-consious hiring manager’s decision-making may cancel out a highly discriminatory algorithm, and of course the inverse is also true. A deeper examination of the tool’s training data, predictions, methods, structure, and behavior at scale in the context of the UX of the system can indeed illuminate this bias, but today’s guidance requires none of this type of reporting, and will therefore miss opportunities for improvement.
Mathification of Subjectivity
When assessing algorithmic discrimination, it is vital to have a definition of “ground truth”. In the case of hiring, this notion is quite subjective, where the definition of “good candidate to job fit” can differ from organization to organization, and even among the hiring managers within that organization. This makes the challenge of a model audit an inconsistent one, where these definitions will vary significantly by audit vendor. In short, it is entirely possible to “game the system”, allowing vendors to provide audits that reflect a lack of bias where bias truly exists. The guidance in its current form does make way for one method to avoid assessing the human factor, by allowing for analysis of adverse impact by match score alone. However, later on in these comments we will detail just a few scenarios in which this simplified reporting may miss many forms of bias that remain, despite “passing” metrics. In order to assess algorithmic discrimination, a combination of quantitative and qualitative analysis is required, in order to contextualize and fully situate the impact to candidates amid the totality of the system. Candidate positioning, ranking, and display qualities matter a great deal to a candidate’s likelihood of receiving an offer. In addition, there are many standardization practices that AEDT vendors can undertake to limit discrimination that can only be uncovered through an assessment of their risk and control practices. By neglecting the qualitative elements of the field of algorithmic impact, the city paves the way for these reports to be misleading, and ultimately to fail to reflect real-world discrimination where it exists.
As we’ve previously stated, employers may possess demographic data for their hired candidates, but the vendors who provide this technology often make active effort not to collect this vital information. As a result, these AEDT vendors often turn to methods like BISG to infer race and gender characteristics. BISG, as the most prevalent of these methods, was developed in healthcare research, and has been employed at great scale within the financial sector. However, besides concerns around accuracy, the methods themselves pose structural inequity. Race itself is a subjective attribute, and one which many have claimed can never be truly inferred. These methods also only allow for analysis on a gender binary, obscuring discrimination which may occur against others along the gender spectrum. An unintended consequence of this guidance may be the proliferation of these techniques, which have received deep scrutiny and criticism for their lack of inclusivity, and propensity for error. In fact, these error rates may in many cases be high enough to further obscure discrimination or lack thereof. If a set of candidates are improperly associated to the incorrect protected group, this may result in low enough accuracy to make the report incorrect, and therefore misleading. Additionally, common inference methods like BISG can only be effective in regions where we can assume that redlining, white flight, and gentrification have homogenized the racial makeup of the area. This seems broadly inadequate for a city as diverse as New York where there may be just as many Black John Smiths in the same zip code as there are white John Smiths. In our field, the vast consensus is that the only proper way to use demographic data in analysis is when it is volunteered from the candidates themselves. We recommend to our AEDT vendor clients that they engage in post-hoc surveys, despite our expectations that response rates will be low, because it will yield the greatest accuracy. These surveys take time, however, and in many cases the clients who have only begun this analysis in the second half of 2022 will not have adequate time to complete this initiative sufficiently prior to the release of their public reports.
AEDT Snake Oil
At Parity, we are lucky enough to have the principle and luxury of refusing to perform audits for technology that entirely fails to accurately represent accurate matching capabilities to employers. We have been approached by AEDT vendors who seek to provide technology that attempts to apply pseudo-scientific principles to the concept of job fit, and these vendors may be able to display reports that make them seem somehow fairer or more accurate than tools that avoid these forms of algorithmic phrenology. Stated another way, a tool that is always incorrectly guessing a candidate’s fit for the job, or that may approve or deny every group equally along vectors of race and gender, might appear fair. However, this fairness is only trivial – an algorithm that denies everyone for every job may be fair, but it is also not useful. These pseudo-scientific algorithms present far greater danger for candidates with disabilities, but the law in its current form will fail to capture this discrimination entirely.
Inaccruate Reporting of Discrimination/Lack of Discrimination
Finally, that this guidance is so high-level allows for many opportunities to obscure discrimination, or to report discrimination where it does not exist, leading to undeserved risk and scrutiny to the vendors/employers. There are many scenarios under which discrimination may be obscured, but to name a few:
1) Imagine an algorithm that performs quite fairly on nursing jobs, but discriminates to a great degree in engineering. If these numbers are high enough in some categories to cancel out the low scores from another category, then the reports will appear to “pass” the 4/5ths rule, but discrimination will remain.
2) Imagine a situation where an employer receives a set of applicants for a position wherein all of the female candidates for a position are, in fact, highly qualified for the job, and all of the male candidates are similarly unqualified. When one set of match scores are high and the other set is low, it can appear that adverse impact exists in favor of women against men, when the resulting metric may fail to represent a lack of discriminatory bias in the tool itself, but instead a feature of the applicant base. This is a direct result of the subjectivity of quantifying “job fit” in mathematical terms.
3) It may be the case that, due to systemic inequity and historical adversity, some intersectional slices of demographic pools may receive lower match scores than others. This may not be the result of a discriminatory tool, but instead a feature of the applicant population as it exists today. Correcting for bias under these circumstances is recommended by academia, but itself may pose a form of “disparate treatment” by virtue of adding weight or altering thresholds to cater to one disadvantaged group over the other.
4) Due to the lack of demographic data, some categories may have insufficient amounts of representation to be accurately quantified, leading to numbers that skew inaccurately in a way that would not reflect discrimination that exists at scale.
5) When demographic inference is employed, the error rates may be so high as to make the resulting metrics adequately inaccurate such that they will not reflect reality, be that in the form of discrimination or a lack thereof.
6) Imagine a tool that is simply of very poor quality. This tool may be trivially fair, in that it approves or denies all candidates equally because it simply does not work. Employers choosing vendors may be misled into thinking that the tool is worthy of use by virtue of this needed reporting, when in fact it simply “stabs in the dark” at job fit, and may present cases of individual discrimination, especially with regard to demographic categories not represented by the guidance as it stands today.
These may seem like toy examples, but from our work with clients and in research, we find them to be fairly common. Today’s guidance may miss these situations, but the field of algorithmic interrogation has provided myriad tools and methods to uncover these scenarios, and we would encourage the city to pursue guidance that is more closely in line with the latest our field has to offer.
Conclusion, Recommendations, Next Steps
We’d like to reiterate our gratitude for the opportunity to provide comment. Significant questions remain on the scope of the law that would preferably be answered in advance of the compliance deadline:
1) Is a broad, global vendor analysis sufficient to each employer who uses the AEDT? Or should each employer tailor the report to the candidates they’ve assessed?
2) Will race/gender inference suffice for the analysis despite its possibility for decreased accuracy, and if not, what methods do you recommend when demographic information is unavailable?
3) When intersectional slices for rarer combinations of categories are present, and the amounts may not be statistically significant themselves, what sort of reporting does the city recommend?
4) Would universal or synthetic datasets suffice for the analysis, even if these datasets may not be representative of the candidates truly subjected to the system’s decisions?
5) Are other forms of screening models (e.g. “culture fit, engagement probability, geographic closeness”, etc.) within the scope of the law? Or is the scope limited to assessments of job-to-candidate fit?
6) Will the city consider some form of vendor certification moving forward in order to limit the ability for tools and employers to game the system by choosing unscrupulous providers?
7) Our field of algorithmic scrutiny is rapidly advancing, will the guidance make room for the advancements not currently included in the guidance, and continue to evolve with the pace of science?
8) Will the city consider extending the compliance deadline in order to provide more time to employers and vendors to begin the arduous practice of collecting demographic information from candidates?
We would be happy to further engage with DCWP in order to clarify these questions or to improve guidance for this upcoming or future years, and look forward to your feedback.
Thank you,Comment attachment
The Parity Team
Please find attached my public comments on DCWP’s Proposed Rules on Automated Employment Decision Tools (Local Law 144).Comment attachment
Founder, AIethicist.org, and Lighthouse Career Consulting LLC
Holistic AI Team
Thank you for the opportunity to provide questions on this important matter.
We have requests for points of clarification regarding the Proposed Rules and their implementation. These are:
1. Intersectionality when conducting bias audits
Although neither the Law nor the Proposed Rules explicitly mandate calculating the ‘selection rates’ and ‘impact ratios’ for different protected categories in an intersectional manner, the Bias Audit examples, and corresponding illustrative data provided by the DCWP in the Proposed Rules, provides calculations on an intersectional basis.
In the example provided in the last update, ‘selection rate’ and ‘impact ratio’ figures are provided for males and females, broken down by their race/ethnicity (e.g., Hispanic or Latino, White, Black or African American etc.).
It would be useful for the DCWP to clarify its position with respect to whether the calculations of ‘impact ratios’ and ‘selection rates’ should be performed in an intersectional manner? Is this mandatory, or is it being encouraged as best practice?
Furthermore, the DCWP should be mindful of issues relating to small sample sizes, if an intersectional approach is taken.
2. Additional Metrics and Approaches
Impact ratios can be problematic if sample sizes are small; other metrics, like the two standard deviation rules, could be more suitable. The DCWP could clarify its position regarding the use of additional metrics to calculate bias.
Furthermore, there are potential issues regarding a lack of consideration of the distribution of scores. For example, in the presence of outliers or if the score distribution is bimodal, the mean score will not be informative. However, looking at the score distribution may provide better insights into how the tool performs for different sub-groups.
Full comment and questions attached. Please contact firstname.lastname@example.org for any further information or follow-up on this submission.Comment attachment
Dear Chair and members of the Department:
My name is Julia Stoyanovich. I hold a Ph.D. in Computer Science from Columbia University in the City of New York. I am an Associate Professor of Computer Science and Engineering at the Tandon School of Engineering, an Associate Professor of Data Science at the Center for Data Science, and the founding Director of the Center for Responsible AI at New York University. In my research and public engagement activities, I focus on incorporating legal requirements and ethical norms, including fairness, accountability, transparency, and data protection, into data-driven algorithmic decision making. I teach responsible data science courses to graduate and undergraduate students at NYU. Most importantly, I am a devoted and proud New Yorker.
I would like to commend New York City on taking on the ambitious task of overseeing the use of automated decision systems in hiring. I see Local Law 144 as an incredible opportunity for the City to lead by example, but only if this law is enacted in a way that is responsive to the needs of all key stakeholders. The conversation thus far has been dominated by the voices of commercial entities, especially by AEDT vendors and organizations that represent them, but also employers who use AEDT, and commercial entities wishing to conduct AEDT audits. However, as is evident from the fact that we are testifying in front of the Department of Consumer and Worker Protection, the main stakeholder group Local Law 144 aims to protect – from unlawful discrimination, and arbitrary and capricious decision-making – are job candidates and employees. And yet, their voices and the voices of their representatives are conspicuously missing from the conversation!
As an academic and an individual with no commercial interests in AEDT development, use, or auditing, I am making my best effort to speak today to represent the interests of the job candidates, employees, and the broader public. However, I cannot speak on behalf of this diverse group alone. Therefore, my main recommendation today is that New York City must ensure active participation of a diverse group of job seekers, employees and their representatives in both rule making and enactment of Local Law 144.
For background: I actively participated in the deliberations leading up to the adoption of Local Law 144 of 2021 and have carried out several public engagement activities around this law when it was proposed . Informed by my research and by opinions of members of the public, I have written extensively on the auditing and disclosure requirements of this Law, including an opinion article in the New York Times and an article in the Wall Street Journal. I have also been teaching members of the public about the impacts of AI and about its use in hiring, most recently by offering a free in-person course at the Queens Public Library called “We are AI”. (Course materials are available online.) Based on my background and experience, I would like to make 4 recommendations regarding the enforcement of Local Law 144.
Please see my complete testimony and detailed recommendations in the attached document.Comment attachment
Please find attached the public comments from the BABL AI team on DCWP’s proposed rules for Local Law 144.
Shea BrownComment attachment
Founder & CEO, BABL AI Inc.
The New York Civil Liberties Union (“NYCLU”) respectfully submits the following testimony regarding the proposed rules to implement Local Law 144 of 2021. The NYCLU, the New York affiliate of the American Civil Liberties Union, is a not-for-profit, non-partisan organization with eight offices throughout the state and more than 180,000 members and supporters. The NYCLU’s mission is to defend and promote the fundamental principles, rights, and values embodied in the Bill of Rights, the U.S. Constitution, and the Constitution of the State of New York. The NYCLU works to expand the right to privacy, increase the control individuals have over their personal information, and ensure civil liberties are enhanced rather than compromised by technological innovation.
The New York City Council enacted Local Law 144 of 2021 (“LL 144”) which laudably attempts to tackle bias in automated employment decision tools (“AEDT”). AEDT, similar to automated decision systems in other areas, are in urgent need of transparency, oversight, and regulation. These technologies all too often replicate and amplify bias, discrimination, and harm towards populations who have been and continue to be disproportionately impacted by bias and discrimination: women, Black, Indigenous, and all people of color, religious and ethnic minorities, LGBTQIA people, people living in poverty, people with disabilities, people who are or have been incarcerated, and other marginalized communities. And the use of AEDT is often accompanied by an acute power imbalance between those deploying these systems and those affected by them, particularly given that AEDT operate without transparency or even the most basic legal protections.
Unfortunately, LL 144 falls far short of providing comprehensive protections for candidates and workers. Worse, the rules proposed by the New York City Department of Consumer and Worker Protection (“DCWP” or “Department”) would stymie the law’s mandate and intent further by limiting its scope and effect.
The DCWP must strengthen the proposed rules to ensure broad coverage of AEDT, expand the bias audit requirements, and provide transparency and meaningful notice to affected people in order to ensure that AEDT do not operate to digitally circumvent New York City’s laws against discrimination. Candidates and workers should not need to worry about being screened by a discriminatory algorithm.
Re: § 5-300. Definitions
LL 144 defines AEDT as tools that “substantially assist” in decision making. The proposed rules by DCWP further narrow this definition to “one of a set of criteria where the output is weighted more than any other criterion in the set.” This definition goes beyond the law’s intent and meaning, risking coverage only over certain scenarios and a subset of AEDT. In the most absurd case, an employer could deploy two different AEDT, weighted equally, and neither would be subject to this regulation. More problematically, an employer could employ an AEDT in a substantial way that doesn’t meet this threshold, while still having significant impact on the candidates or workers. The Department should revise this definition to be consistent with the statute.
The proposed definition of “simplified output” would exclude “output from analytical tools that translate or transcribe existing text, e.g., convert a resume from a PDF or transcribe a video or audio interview.” However, existing transcription tools are known to have racial bias, and their outputs could very well be used as inputs to other AEDT systems, resulting in biased results.
Re: § 5-301 Bias Audit
The definition for bias the audit in LL 144, § 20-870, explicitly lists disparate impact calculation as a component but not the sole component (“include but not be limited to”). follows. The examples given in section 5-301 of the proposed rules do not account for an AEDT’s impact on age and disability, or other forms of discrimination.
At a minimum, in addition to an evaluation of disparate impact of the AEDT, any evaluation that could properly qualify as a bias audit would need to include an assessment of:
• the risks of discriminatory outcomes that an employer should be aware of and control for with the specific AEDT, including risks that may arise in the implementation and use of the AEDT;
• the sources of any training/modeling data, and the steps taken to ensure that the training data and samples are accurate and representative in light of the position’s candidate pool;
• the attributes on which the AEDT relies and whether it engages in disparate treatment by relying on any protected attribute or any proxy for a protected attribute;
• what less discriminatory alternative inputs where considered and which were adopted;
• the essential functions for each position for which the AEDT will be used to evaluate candidates, whether the traits or characteristics that the AEDT measures are necessary for the essential functions, and whether the methods used by the AEDT are a scientifically valid means of measuring people’s ability to perform essential job functions.
Similar essential components are outlined in the federal EEOC guidance, which recommends including “information about which traits or characteristics the tool is designed to measure, the methods by which those traits or characteristics are to be measured, and the disabilities, if any, that might potentially lower the assessment results or cause screen out.”
The bias audit should clearly state the origin of the data used for the statistics reported. This includes where the data was gathered from, by who, when, and how it was processed. It should also provide justification for why the source of the data for the bias audit model population is believed to be relevant to this specific deployment of the AEDT.
The proposed rules for the ratio calculations also make no mention of appropriate cutoffs when a specific candidate category (per EEO-1 Component 1) has a small or absent membership that could result in unrepresentative statistics.
LL 144 mandates that any “tool has been the subject of a bias audit conducted no more than one year prior to the use of such tool.” The DCWP’s rules should clarify that this requires annual auditing for continued AEDT use and provide an example similar to the rate calculation scenarios.
Re: § 5-302 Published Results
The disclosure of the bias audit on employers’ and employment agencies’ websites should not be limited to the selection rates and impact rates results described in §5-301. It should include all the elements mentioned in our comments on the bias audit. The summary should describe the AEDT appropriately and include information on traits the tool is intended to assess, the methods used for this, the source and types of data collected on the candidate or employee, and any other variables and factors that impact the output of the AEDT. It should state whether any disabilities may impact the output of the AEDT.
Additionally, the published results should list the vendor of the AEDT, the specific version(s) of product(s) used, and the independent auditor that conducted the bias audit. The DCWP should provide examples that include such information.
The “distribution date” indicated in the proposed rules for the published results should also describe which particular part of the employment or promotion process the AEDT is used for on this date. It is insufficient to note “an AEDT with the bias audit described above will be deployed on 2023-05-21” unless there are already clear, public indicators that describe which specific employment or promotion decision-making process happened on that date. Any examples should be updated to include a reasonable deployment/distribution description.
Published results should include clear indicators about the parameters of the AEDT as audited and testing conditions, and the regulations should clarify that employers may not use the AEDT in a manner that materially differs from the manner in which the bias audit was conducted. This includes how input data is gathered from candidates or employees compared to how the comparable input data was gathered from the model population used for the bias audit. For example, if testing of an AEDT used a specific cutoff or weighting scheme, the cutoff or weighting scheme used in the actual deployment should match it as closely as possible, and the publication should indicate any divergence and the reason for it. A tool may not show a disparate impact when cut offs or rankings are set at one level and show a disparate impact with other levels. Likewise, if one input variable is hours worked per week, the model population for the bias audit derives those figures from internal payroll data, but candidate data will come from self-reporting, then the publication should indicate that divergence and provide commentary on the reason for the divergence and an assessment of the impact the divergence is likely to have on the relevance of the bias audit.
Lastly, the rules should clarify that the published results must be disclosed in machine readable and ADA compliant formats in order to be accessible to people with various assistive technologies.
Re: § 5-303 Notice to Candidates and Employees
Section 20-871(b)(2) of LL 144 requires the disclosure of the job qualifications and characteristics that the AEDT will use in the assessment. The rules should clarify that candidates or employees should be provided with as much information as possible to meaningfully assess the impact the AEDT has on them and whether they need to request an alternative selection process or accommodation.
The law also requires that the employer “allow a candidate to request an alternative selection process or accommodation.” The regulations should provide employers with parameters of how to provide alternative selection processes or accommodations, including what processes may be used to give equal and timely consideration to candidates that are assessed with accommodations or through alternative processes. By merely stating in the regulations that “Nothing in this subchapter requires an employer or employment agency to provide an alternative selection process,” the regulations suggest that the law provides an empty protection to candidates to be solely allowed to make a request for an alternative without any obligation on the part of the employer to in any way consider or honor that request.
In conclusion, the NYCLU thanks the Department of Consumer and Worker Protection for the opportunity to provide testimony. The Department’s rulemaking is instrumental in ensuring a productive implementation of Local Law 144 and making clear that discriminatory technology has no place in New York. We strongly urge the Department to amend and strengthen the proposed rules to deliver on the law’s promise to mitigate bias and provide people with the information they need.
Attached please find ORCAA’s comments on the proposed rules.
Thank you,Comment attachment
Chief Strategist, ORCAA
Robert T Szyba
Attached please fine comments regarding the proposed rules on the use of Automated Employment Decision Tools under Local Law 144 or 2021 on behalf of Seyfarth Shaw LLP.Comment attachment
HireVue is a video interviewing and assessment platform. We’ve hosted more than 30 million interviews and 200 million chat-based candidate engagements for customers around the globe. HireVue supports both the candidate and employer interview experience in a broad range of industries. HireVue’s comments on the Bill are based on our extensive experience with the use of AI in the context of hiring.
We know that human decision making in hiring is not immune to bias – ranging from the existence of overt prejudice to subconscious bias. HireVue’s commitment to mitigating human bias in hiring has been rooted from our beginning in our use of scientifically-validated methods implemented by our team of advanced degree industrial and organizational (IO) psychologists and data scientists.
HireVue’s ongoing commitments to transparency, accountability, continual improvement and engaging with stakeholders helps to drive a more equitable hiring process offering a fairer and more flexible experience for our customers and their candidates.
With this experience in mind and turning to the proposed Bill, HireVue questions legislation that severely restricts good practices and beneficial innovation that pragmatically broadens access to employment in the name of attempting to mitigate algorithmic bias. The Bill’s “one size fits all” approach will not necessarily effectively address the Bill’s underlying goals.
We would also call to the attention of the agency that some specific requirements of the Bill cannot be met in the employment context, such as running analysis on data (race, gender, marital status, etc.) that employers cannot require due to pre-existing employment laws and/or data minimization concepts in various privacy laws and service providers may not naturally have. The audit requirement also fails to incorporate a threshold of statistically significant data before the audit should be performed to ensure a degree of confidence in the outcome.
Lastly, with respect to any algorithm audit requirement, a thoughtful audit and any legislation which requires it should leave room for continuous development and improvement to build on the “good” and to identify and address concerning results. Based on its experience HireVue offers the following points for the council’s consideration with respect to algorithmic audits:
Firstly, Audit criteria must be clearly defined. Much like audit standards in other industries like privacy, finance, etc., an audit of an AI tool should include reference to the relevant and well established industry and legal standards against which the tool is tested. It should explain how a model works, its purposes and its limitations, and the data it relies upon to make decisions and it should do so in language understood by a layman.
Secondly, the focus of the audit should be at the outset of product development to ensure algorithmic tools are designed, developed, trained and tested – including steps to identify and mitigate bias – before deployment. Once deployed The algorithm should also be periodically monitored after deployment to identify any unexpected results.
Finally, the vendors should be responsible for delivering an independent audit on the AI-based products they provide to their customers. Vendors will differ in how their tools are developed and what sort of data they use, thus the way audits are conducted will not be universal. Audits must always consider the industry and context of where and how the AI is being used.
To provide context: In HireVue’s case, it is against the EEOC Guidelines to deploy algorithms that treat job applicants differently from “day-to-day”, thus we have chosen to only deploy static algorithmic models (after auditing them and testing against established frameworks in hiring). This means our algorithms are “locked” and do not learn or continually change from real-time uncurated and unfiltered customer data – as this would be unfair to the job applicants. This approach prevents the risk of bias creeping into our pre-tested models.
We also suggest that Customers using HireVue’s audited algorithmic tool should be able to rely on those audit results without needing to conduct an independent audit of the deployed model–though the model should continue to be periodically monitored to validate the deployment and use of an AI tool in their particular setting, e.g. with their live data. This distinction and concept fails to be captured in the current Bill.
Ongoing dialogue between appropriate stakeholders is the key to creating legislation that protects candidates, companies, and innovation. HireVue welcomes legislation related to the potential impact of all hiring technologies as it encourages transparency and improves the process and fairness for all candidates. Legislation like this demands that all vendors meet the high standards that we have supported since the beginning.
Please find attached additional comments regarding the Law. Finally, given the ongoing nature of the rulemaking for the Law, HireVue recommends that enforcement be deferred until 6 months after issuance any final regulatory rule making or guidance to allow employers and service providers to review and take action.
Thank you,Comment attachment
Naziol S. Nazarinia Scott, General Counsel and SVP of Legal, HireVue, Inc.
On Behalf of The Institute for Workplace Equality (“IWE” or “The Institute”), we submit the attached comments in response to the New York City (“NYC” or the “City”) Department of Consumer and Worker Protection’s (“DCWP” or the “Department”) invitation. The Department’s Notice of Proposed Rules is seeking to clarify the requirements set forth by NYC’s Local Law 144 that will regulate the use of automated employment decision tools (“AEDT”) wherein hiring decisions are made or substantially assisted by algorithmically-driven mechanisms.Comment attachment
C. Mitch Taylor
Please accept the attached comments on behalf of SHRM, the Society for Human Resource Management, regarding the proposed rules on Automated Employment Decision Tools (AEDT) under Local Law 144.Comment attachment
Dear committee members:Comment attachment
I am pleased to submit comments for Requirement for Use of Automated Employment Decisionmaking Tools (Ref. No. DCWP-21; “The Rules”). I am the owner of Responsible Artificial Intelligence LLC, a New York company offering algorithmic auditing and consultancy services. Previously, I was a Founder and Chief Technology Officer of Parity Technologies, Inc., a startup dedicated to modernizing model risk management and AI compliance; and also a Director of AI Research at JPMorgan Chase & Co., and a Senior Manager of Data Science at Capital One, where I started R&D teams for responsible AI and its applications to financial regulatory compliance such as the Equal Credit Opportunity Act; and also a Research Scientist at the Massachusetts Institute of Technology working on data science technologies. Since the start of 2022, I have spoken with multiple HR vendors, both startups and established companies, as well as a prominent legal firm, who have sought my input on how to establish compliance with the AEDT law (“The Law”). I would like to provide some comments on the real-world operational questions that have surfaced through my discussions with multiple data scientists, lawyers and vendors, as well as my thoughts on how best practices from federal regulators in employment and finance can be usefully translated into the context of the Law.
Vendor liability. One of the largest open questions around compliance for the Law is what liabilities vendors have. Most employers are unlikely to have in-house expertise to assess whether their use of such AEDTs are compliant, and thus will outsource compliance audits to third-party auditors. This is presumably the intent of the Law, to require such audits to be conducted. However, most employers also lack the ability to build their own AEDTs, and choose instead to purchase solutions from external vendors. In such situations, there is a separation of ownership between an employer’s real-world employment data on one side, and the vendor’s AEDT code and training data on the other. Auditors must thence navigate challenging internal politics to ensure both that the employer’s data and the vendor AEDT is available for a successful audit, assuming that the vendor-client contract even permits such access.
Data ownership issues hinder testing of robustness. Modern data-intensive AEDTs that are built according to current best practices are not defined solely by their internal algorithmic rules, but also by the training data used to develop the AEDT, and also the development process of how models are built, selected against potential alternatives, and validated internally. It is therefore critical to assess the statistical similarity of the training data to the actual deployed use case. The performance of AI systems depends critically on the data fed into them – for example, most AI systems fed only men’s profiles will not pick up on the absence of women, even if they were assessed to not have discriminatory outcomes when used on a gender-balanced test population. Therefore, the assessment of discriminatory outcomes must be evaluated in the context of the data for a specific population, which makes it difficult to purely absolve users of vendor solutions. Conversely, it is difficult for vendors to attest or certify that an AEDT will not produce discriminatory outcomes without making strong assumptions on typical usage of their solutions. In short, when vendors own training data while downstream users own test data, the two parties must engage in some form of data sharing to test for robustness and out-of-sample generalization error. Such data sharing must be done carefully to protect the identities in the data sets, particularly when it comes to EEO-1 demographic data and other personally identifiable information which could compromise privacy if leaked.
The risk of ethics-washing. There is an unavoidable conflict of interest that arises when companies pay for audits of their own products. Even in this nascent stage of development for the algorithmic auditing industry, controversy has already arisen over how some companies heavily censor the audits before release, or use the ostensibly neutral platform of academic publications to obtain validation for their reviews in the form of peer review. On one hand, it is important to recognize that compliance audits usually happen under attorney-client privilege, so that clients can address and remediate any negative findings without incriminating themselves in the process. On the other hand, the pay-to-play nature of auditing necessarily creates a conflict of interest that incentivizes keeping the auditee happy in exchange for future business and building relationships. Such concerns are of course not new, and have plagued financial auditing for decades. The experience of the financial services industry clearly points to the need for independent verification of audits, which are usually manifested in the form of regulatory audits by government entities.
Reproducibility requires data and algorithmic governance. The very act of auditing an AEDT implies that an auditor can independently reproduce claims made by developers about the properties of an AEDT, such as its lack of discriminatory impact or expected performance on a given test set. However, the act of reproducing an AI/ML system itself – to set up a replica of the production environment with a replica of a data stream that resembles real world conditions – can itself be a major engineering challenge. Successible reproduction of a production system in an audit environment that does not affect production data streams will be necessary to ensure that the auditing process does not inadvertently pollute and affect the use of the AEDT itself.
Data quality issues in demographic label collection. A related issue is that of data quality – the most highly predictive AEDT is still likely to fail if fed erroneous data. In the context of algorithmic auditing, data quality issues extend not just to the data fed into an AEDT, but also the EEO-1 demographic data and other personally identifying information that needs to be collected in order to correctly classify people by race, gender, age, and other protected classes. In practice, EEO-1 demographic data is voluntarily provided by applicants and employees, which means that people will voluntarily refuse to self-identify. Such refusal is not statistically random, but is disproportionately likely to occur when membership in a protected class, such as having a mental disability or being of a particular sexual orientation, carries social stigma or otherwise is likely to cause harm by “outing” someone to belong to some group. This missing not-at-random nature of demographic label quality ought to be considered whether or not discriminatory outcomes can be measured with sufficient statistical power, particularly if an imputation method like Bayesian Improved Surname Geocoding (BISG) is used to fill in missing demographic labels, as is commonly done in compliance testing for consumer financial services.
Construct validity. The need for AEDTs is greatest when there is an inherent scaling challenge to the number of decisions that have to be made. In the employment context, this usually shows up in the early stages of recruiting to narrow the funnel of applicants that are shortlisted for subsequent rounds of interviews. However, it is unclear if data collected at early stages of an employment decision, such as receiving a resume or video recording from a job candidate, will contain enough predictive signal to accurately predict a candidates suitability for hiring. In practice, AEDTs cannot predict something abstract like “employability”, but instead compute metrics that purport to measure suitability scores or the like for such abstract concepts. An audit must necessarily assess the problem of construct validity, that a prediction target of an AEDT is indeed a valid and suitable quantification that operationalizes the employment decision being considered. Such considerations are of course of long-standing debate in federal employment laws; however, the algorithmic nature of decision-making and its use in making quantitative predictions bring such fundamental measurability concerns to the forefront of assessment. Many metrics purporting to quantify algorithmic bias implicitly assume that the prediction target of the AEDT is perfectly well-defined without any measurement ambiguity, which is unlikely to be true in practice. Therefore, the construct validity of the prediction target needs to be assessed critically to avoid false overprecision and overconfidence in the quantitative evidence for or against algorithmic bias.
The ethics of negative class sampling. A particularly thorny data quality problem goes by the name of reject inference in credit decisions, and is closely related to the problem of positive-unlabeled learning in other machine learning contexts. It is a problem for AEDTs that create a data asymmetry between positive and negative decision classes. For example, an employer incrementally collects more and more information about a candidate that passes multiple interview rounds. Conversely, a candidate not selected for an interview will have less data about them. This means that for hiring decisions, it is easier to assess false positives (a promising candidate that turned out to be a poor employee) than false negatives (a candidate that did not interview well that would have been a good employee). The counterfactual nature of the negative class makes assessments involving them difficult to assess in practice – someone removed from the candidate pool is by definition someone that was never placed in a job, and hence there was no real measurement of whether or not they were good at their job. A critical assessment of an AEDT’s predictive value ought to include assessments of how well they classify the negative decision class, but if this class is not measured in any data set, then expert review is needed to validate negative decisions, or otherwise an experimentation framework is needed in order to test counterfactual changes to the AEDT prediction. There are obvious ethical risks to deliberately altering an employment decision for the sake of algorithmic assessment, as well as high costs of incorrect classification which will hinder the collection of real-world validation data. A well-designed audit should recognize the importance of negative class sampling, while at the same time have procedures in place to effect the necessary counterfactual testing without undue cost.
Intersectionality and subject privacy. The explicit call-out for intersectional testing across multiple protected classes is a welcome strengthening of current federal standards, which do not require testing of, say, race and gender, simultaneously. Nevertheless, intersectional concerns increase the risk of identification and hence loss of privacy for underrepresented groups. The more labels used to define a category, such as “people of gender A, race B, and age group C”, the fewer people are likely to belong to that exact category. Taken to its logical extreme of testing every single protected class defined under federal employment laws, there is a risk that the intersectional categories are so fine-grained that only a single person may belong to that category. When such categories exist, summary statistics can leak information about a single person. In practice, the granularity of intersectional categories must be balanced against privacy concerns. I have some very preliminary research that indicates that differential privacy is a promising mechanism for achieving these goals in an algorithmic audit, although field testing will be required to validate the theoretical work we have been able to publish.
The fallacy of the four-fifths rule. The literature on algorithmic bias has unfortunately perpetuated a misconception of the significance of the four-fifths rule which the current rules are at risk of perpetuating and codifying. It is often claimed that the Equal Employment Opportunity Act enshrines the disparate impact ratio as the only legitimate metric for measuring employment discrimination, and that when it exceeds 80%, there is no finding of employment discrimination. In reality, tracing the historical development of the four-fifths rules reveals that it was only ever meant to be a bureaucratic rule of thumb for prioritizing cases for further regulatory scrutiny, and in fact the 80% threshold was effectively set arbitrarily in a 1971 meeting of the California Fair Employment Practice Commission as a compromise between a 70% camp and a 90% camp, a compromise that seems to not have been revisited with much scrutiny ever since. The arbitrariness of the four-fifths rule has been recognized by multiple federal courts in multiple court cases: courts have found that the 80% threshold is neither necessary nor sufficient to make a determination of discriminatory outcomes, and have admitted other forms of statistical testing, such as hypothesis testing for equality of means, in actual court cases. In short, the 80% threshold is arbitrary and fails to capture less severe discriminatory outcomes, particularly when the sample size is small and when the membership of people in protected classes is unclear.
To address these operational challenges, I would like to make the following recommendations for your consideration.
Recommendation 1. The City should invest in their own auditors and regulators to assess if audits need to be themselves independently audited, adapting relevant best practices from financial regulators and auditors where helpful.
Recommendation 2. The Rules would benefit from clarification on governance requirements for AEDTs and their associated data sets.
Recommendation 3. The Rules should clarify how robustness and generalization ought to be tested, and if so, how data sharing between different owners can be effected for the purposes of compliance audits.
Recommendation 4. The Rules would benefit from clarification on what liability vendors have for selling AEDTs to downstream clients, and to what extent (if any) these downstream procurers of AEDTs are able to shift liability to the vendor.
Recommendation 5. Regulators should work with standards-setting bodies, such as the National Institute for Standards and Technology (NIST), to develop and curate test data sets that represent typical populations which may be affected by AEDTs, so as to enable high quality testing of AEDTs that affords apples-to-apples comparisons.
Recommendation 6. The regulators should favor companies that have voluntarily adopted the NIST AI Risk Management Framework (RMF) or similar best practices for building and using AI systems. The regulators should issue more specific guidance aligned with the AI RMF to streamline compliance reviews.
Recommendation 7. The Rules should not codify any specific metric or threshold for passing or failing, but rather accommodate a possible plurality of valid metrics, and insist on tests of statistical validity rather than simply passing a numerical threshold.
In closing, I would like to congratulate the City for its innovation for enacting the Law, the first of its kind for the employment industry. The comments above are not meant to detract from the significance of the Law, but rather to highlight implementation risks that ought to be managed in order for the Law to have its desired effect to promote inclusivity and accessibility of job opportunities, improve transparency in high-stakes decision making, and reduce discimination in employment decisions. Please do not hesitate to reach out if I may be able to provide further clarifications on these comments.
As avid supporters of Local Law #144, and as a company proudly headquartered in New York City, retrain.ai applauds the efforts put forth by lawmakers to protect candidates and employees, promote fairness and transparency, and ensure that AI in hiring can reach its full potential as a driver of diversity, equity and inclusion in the workplace.
Globally, the retrain.ai Talent Intelligence Platform is an industry leader in the Responsible AI space. Built on the understanding that any organization’s most valuable asset is its people, our platform is intentionally employee-centric, designed to help enterprises put the right people into best-fit roles to benefit both the individual and the employer, future-proofing the skills needed for the former to thrive and the latter to compete.
In addition to helping enterprises with talent acquisition and talent management solutions, we are members of the World Economic Forum, where we collaborate with public- and private-sector leaders to contribute new solutions to overcome the current workforce crisis and build future resiliency. retrain.ai also works with the Responsible Artificial Intelligence Institute (RAII), a leading non-profit organization building tangible governance tools for trustworthy, safe and fair artificial intelligence.
Our comments on the law, attached here, are being submitted based on our extensive expertise in AI, machine learning, Responsible AI and our vast experience with the use of AI in the context of hiring, promoting and developing workers in a wide spectrum of industries. Thank you for the opportunity to contribute to this important conversation.Comment attachment