Machine Intelligence

  • New paper: “Risks from learned optimization”
    true
    Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant have a new paper out: “Risks from learned optimization in advanced machine learning systems.” The paper’s abstract: We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research. The critical distinction presented in the paper is between what an AI system is optimized to do (its base objective) and what it actually ends up optimizing for (its mesa-objective), if it optimizes for anything at all. The authors are interested in when ML models will end up optimizing for something, as well as how the objective an ML model ends up optimizing for compares to the objective it was selected to achieve. The distinction between the objective a system is selected to achieve and the objective it actually optimizes for isn’t new. Eliezer Yudkowsky has previously raised similar concerns in his discussion of optimization daemons, and Paul Christiano has discussed such concerns in “What failure looks like.” The paper’s contents have also been released this week as a sequence on the AI Alignment Forum, cross-posted to LessWrong. As the authors note there: We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we plan to present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning.   <div class="" style="margin-top:20px;"> Sign up to get updates on new MIRI technical results Get notified every time a new technical paper is published. jQuery(document).ready(function($) { $('#mc-embedded-subscribe-form3').validate({ rules: { EMAIL: { required: true, email: true }, FNAME: { required: true }, LNAME: { required: true } }, errorClass: 'text-error', errorPlacement: function(error, element) { error.appendTo(element.closest('.control-group')); }, highlight: function(element) { $(element).closest('.control-group').removeClass('success').addClass('error'); }, success: function(element) { $(element).closest('.control-group').removeClass('error').addClass('success'); }, submitHandler: function(form) { form.submit(); $('#NewPublicationsFormTab a[href="#NewPublicationsMessage"]').tab('show'); $('#NewPublicationsMessage').addClass('alert-success').html('×Almost finished... We need to confirm your email address. To complete the subscription process, please click the link in the email we just sent you.'); _gaq.push(['_trackEvent', 'other engagement', 'submit form', 'newsletter']); //form.remove(); } }) });   The post New paper: “Risks from learned optimization” appeared first on Machine Intelligence Research Institute. Read more »
  • June 2019 Newsletter
    Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant have released the first two (of five) posts on “mesa-optimization”: The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? The sequence begins with Risks from Learned Optimization: Introduction and continues with Conditions for Mesa-Optimization. (LessWrong mirror.) Other updates New research posts: Nash Equilibria Can Be Arbitrarily Bad; Self-Confirming Predictions Can Be Arbitrarily Bad; And the AI Would Have Got Away With It Too, If…; Uncertainty Versus Fuzziness Versus Extrapolation Desiderata We've released our annual review for 2018. Applications are open for two AI safety events at the EA Hotel in Blackpool, England: the Learning-By-Doing AI Safety Workshop (Aug. 16-19), and the Technical AI Safety Unconference (Aug. 22-25). A discussion of takeoff speed, including some very incomplete and high-level MIRI comments. News and links Other recent AI safety posts: Tom Sittler's A Shift in Arguments for AI Risk and Wei Dai's “UDT2” and “against UD+ASSA”. Talks from the SafeML ICLR workshop are now available online. From OpenAI: “We’re implementing two mechanisms to responsibly publish GPT-2 and hopefully future releases: staged release and partnership-based sharing.” FHI's Jade Leung argues that “states are ill-equipped to lead at the formative stages of an AI governance regime,” and that “private AI labs are best-placed to lead on AI governance”. The post June 2019 Newsletter appeared first on Machine Intelligence Research Institute. Read more »
  • 2018 in review
    Our primary focus at MIRI in 2018 was twofold: research—as always!—and growth. Thanks to the incredible support we received from donors the previous year, in 2018 we were able to aggressively pursue the plans detailed in our 2017 fundraiser post. The most notable goal we set was to “grow big and grow fast,” as our new research directions benefit a lot more from a larger team, and require skills that are a lot easier to hire for. To that end, we set a target of adding 10 new research staff by the end of 2019. 2018 therefore saw us accelerate the work we started in 2017, investing more in recruitment and shoring up the foundations needed for our ongoing growth. Since our 2017 fundraiser post, we’re up 3 new research staff, including noted Haskell developer Edward Kmett. I now think that we’re most likely to hit 6–8 hires by the end of 2019, though hitting 9–10 still seems quite possible to me, as we are still engaging with many promising candidates, and continue to meet more. Overall, 2018 was a great year for MIRI. Our research continued apace, and our recruitment efforts increasingly paid out dividends. Below I’ll elaborate on our: research progress and outputs, research program support activities, including more details on our recruitment efforts, outreach related activities, and fundraising and spending. 2018 Research Our 2018 update discussed the new research directions we’re pursuing, and the nondisclosure-by-default policy we’ve adopted for our research overall. As described in the post, these new directions aim at deconfusion (similar to our traditional research programs, which we continue to pursue), and include the themes of “seeking entirely new low-level foundations for optimization,” “endeavoring to figure out parts of cognition that can be very transparent as cognition,” and “experimenting with some [relatively deep] alignment problems,” and require building software systems and infrastructure. In 2018, our progress on these new directions and the supporting infrastructure was steady and significant, in line with our high expectations, albeit proceeding significantly slower than we’d like, due in part to the usual difficulties associated with software development. On the whole, our excitement about these new directions is high, and we remain very eager to expand the team to accelerate our progress. In parallel, Agent Foundations work continued to be a priority at MIRI. Our biggest publication on this front was “Embedded Agency,” co-written by MIRI researchers Scott Garabrant and Abram Demski. “Embedded Agency” reframes our Agent Foundations research agenda as different angles of attack on a single central difficulty: we don’t know how to characterize good reasoning and decision-making for agents embedded in their environment. Below are notable technical results and analyses we released in each research category last year.1 These are accompanied by predictions made last year by Scott Garrabrant, the research lead for MIRI’s Agent Foundations work, and Scott’s assessment of the progress our published work represents against those predictions. The research categories below are explained in detail in “Embedded Agency.” The actual share of MIRI’s research that was non-public in 2018 ended up being larger than Scott expected when he registered his predictions. The list below is best thought of as a collection of interesting (though not groundbreaking) results and analyses that demonstrate the flavor of some of the directions we explored in our research last year. As such, these assessments don’t represent our model of our overall progress, and aren’t intended to be a good proxy for that question. Given the difficulty of predicting what we’ll disclose for our 2019 public-facing results, we won’t register new predictions this year. Decision theory Predicted progress: 3 (modest) Actual progress: 2 (weak-to-modest) Scott sees our largest public decision theory result of 2018 as Prisoners’ Dilemma with Costs to Modeling, a modified version of open-source prisoners’ dilemmas in which agents must pay resources in order to model each other. Other significant write-ups include: Logical Inductors Converge to Correlated Equilibria (Kinda): A game-theoretic analysis of logical inductors. New results in Asymptotic Decision Theory and When EDT=CDT, ADT Does Well represent incremental progress on understanding what’s possible with respect to learning the right counterfactuals. Additional decision theory research posts from 2018: From Alex Appel, a MIRI contractor and summer intern: (a) Distributed Cooperation; (b) Cooperative Oracles; (c) When EDT=CDT, ADT Does Well; (d) Conditional Oracle EDT Equilibria in Games From Abram Demski: (a) In Logical Time, All Games are Iterated Games; (b) A Rationality Condition for CDT Is That It Equal EDT (Part 1); (c) A Rationality Condition for CDT Is That It Equal EDT (Part 2) From Scott Garrabrant: (a) Knowledge is Freedom; (b) Counterfactual Mugging Poker Game; (c) (A → B) → A From Alex Mennen, a MIRI summer intern: When Wishful Thinking Works Embedded world-models Predicted progress: 3 (modest) Actual progress: 1 (limited) Some of our relatively significant results related to embedded world-models included: Sam Eisenstat’s untrollable prior, explained in illustrated form by Abram Demski, shows that there is a Bayesian solution to one of the basic problems which motivated the development of non-Bayesian logical uncertainty tools (culminating in logical induction). This informs our picture of what’s possible, and may lead to further progress in the direction of Bayesian logical uncertainty. Sam Eisenstat and Tsvi Benson-Tilsen’s formulation of Bayesian logical induction. This framework, which has yet to be written up, forces logical induction into a Bayesian framework by constructing a Bayesian prior which trusts the beliefs of a logical inductor (which must supply those beliefs to the Bayesian regularly). Sam and Tsvi’s work can be viewed as evidence that “true” Bayesian logical induction is possible. However, it can also be viewed as a demonstration that we have to be careful what we mean by “Bayesian”—the solution is arguably cheating, and it isn’t clear that you get any new desirable properties by doing things this way. Scott assigns the untrollable prior result a 2 (weak-to-modest progress) rather than a 1 (limited progress), but is counting this among our 2017 results, since it was written up in 2018 but produced in 2017. Other recent work in this category includes: From Alex Appel: (a) Resource-Limited Reflective Oracles; (b) Bounded Oracle Induction From Abram Demski: (a) Toward a New Technical Explanation of Technical Explanation; (b) Probability is Real, and Value is Complex Robust delegation Predicted progress: 2 (weak-to-modest) Actual progress: 1 (limited) Our most significant 2018 public result in this category is perhaps Sam Eisenstat’s logical inductor tiling result, which solves a version of the tiling problem for logically uncertain agents.2 Other posts on robust delegation: From Stuart Armstrong (MIRI Research Associate): (a) Standard ML Oracles vs. Counterfactual Ones; (b) “Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents“ From Abram Demski: Stable Pointers to Value II: Environmental Goals From Scott Garrabrant: Optimization Amplifies From Vanessa Kosoy (MIRI Research Associate): (a) Quantilal Control for Finite Markov Decision Processes; (b) Computing An Exact Quantilal Policy From Alex Mennen: Safely and Usefully Spectating on AIs Optimizing Over Toy Worlds Subsystem alignment Predicted progress: 2 (weak-to-modest) Actual progress: 2 We achieved greater clarity on subsystem alignment in 2018, largely reflected in Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse,3 and Scott Garrabrant’s forthcoming paper, “Risks from Learned Optimization in Advanced Machine Learning Systems.”4 This paper is currently being rolled out on the AI Alignment Forum, as a sequence on “Mesa-Optimization.”5 Scott Garrabrant’s Robustness to Scale also discusses issues in subsystem alignment (“robustness to relative scale”), alongside other issues in AI alignment. Other Predicted progress: 2 (weak-to-modest) Actual progress: 2 Some of the 2018 publications we expect to be most useful cut across all of the above categories: “Embedded Agency,” Scott and Abram’s new introduction to all of the above research directions. Fixed Point Exercises, a set of exercises created by Scott to introduce people to the core ideas and tools in agent foundations research. Here, other noteworthy posts include: From Scott Garrabrant: (a) Sources of Intuitions and Data on AGI; (b) History of the Development of Logical Induction 2018 Research Program Support We added three new research staff to the team in 2018: Ben Weinstein-Raun, James Payor, and Edward Kmett. We invested a large share of our capacity into growing the research team in 2018, and generally into activities aimed at increasing the amount of alignment research in the world, including: Running eight AI Risk for Computer Scientist (AIRCS) workshops. This is an ongoing all-expenses-paid workshop series for computer scientists and programmers who want to get started thinking about or working on AI alignment. At these workshops, we introduce AI risk and related concepts, share some CFAR-style rationality content, and introduce participants to the work done by MIRI and other safety research teams. Our overall aim is to cause good discussions to happen, improve participants’ ability to make progress on whether and how to contribute, and in the process work out whether they may be interested in joining MIRI or other alignment groups. Of 2018 workshop participants, we saw one join MIRI full-time, four take on internships with us, and on the order of ten with good prospects of joining MIRI within a year, in addition to several who have since joined other safety-related organizations. Running a 2.5-week AI Summer Fellows Program (AISFP) with CFAR.6 Additionally, MIRI researcher Tsvi Benson-Tilsen and MIRI summer intern Alex Zhu ran a mid-year AI safety retreat for MIT students and alumni. Running a 10-week research internship program over the summer, reviewed in our summer updates. Interns also participated in AISFP and in a joint research workshop with interns from the Center for Human-Compatible AI. Additionally, we hosted three more research interns later in the year. We are hopeful that at least one of them will join the team in 2019. Making grants to two individuals as part of our AI Safety Retraining Program. In 2018 we received $150k in restricted funding from the Open Philanthropy Project, “to provide stipends and guidance to a few highly technically skilled individuals. The goal of the program is to free up 3–6 months of time for strong candidates to spend on retraining, so that they can potentially transition to full-time work on AI alignment.” We issued grants to two people in 2018, including Carroll Wainwright who went on to become a Research Scientist at Partnership on AI. In addition to the above, in 2018 we: Hired additional operations staff to ensure we have the required operational capacity to support our continued growth. Moved into new larger office space. 2018 Outreach and Exposition On the outreach, coordination, and exposition front, we: Released a new edition of Rationality: From AI to Zombies, beginning with volumes one and two, featuring a number of updates to the text and an official print edition. We also made Stuart Armstrong’s 2014 book on AI risk, Smarter Than Us: The Rise of Machine Intelligence, available on the web for free at smarterthan.us. Released 2018 Update: Our New Research Directions, a lengthy discussion of our research, our nondisclosure-by-default policies, and the case for computer scientists and software engineers to apply to join our team. Produced other expository writing: Two Clarifications About “Strategic Background”; Challenges to Paul Christiano’s Capability Amplification Proposal (discussion on LessWrong, including follow-up conversations); Comment on Decision Theory; The Rocket Alignment Problem (LessWrong link). Received press coverage in Axios, Forbes, Gizmodo, and Vox (1, 2), and were interviewed in Nautilus and on Sam Harris’ podcast. Spoke at Effective Altruism Global in San Francisco and at the Human-Aligned AI Summer School in Prague. Presented on logical induction at the joint Applied Theory Workshop / Workshop in Economic Theory. Released a paper, “Categorizing Variants of Goodhart’s Law,” based on Scott Garrabrant’s 2017 “Goodhart Taxonomy.” We also reprinted Nate Soares’ “The Value Learning Problem” and Nick Bostrom and Eliezer Yudkowsky’s “The Ethics of Artificial Intelligence” in Artificial Intelligence Safety and Security. Several MIRI researchers also received recognition from the AI Alignment Prize, including Scott Garrabrant receiving first place and second place in the first round and second round, respectively, MIRI Research Associate Vanessa Kosoy winning first prize in the third round, and Scott and Abram Demski tying Alex Turner for first place in the fourth round. MIRI senior staff also participated in AI research and strategy events and conversations throughout the year. 2018 Finances Fundraising 2018 was another strong year for MIRI’s fundraising. While the total raised of just over $5.1M was a 12% drop from the amount raised in 2017, the graph below shows that our strong growth trend continued—with 2017, as I surmised in last year’s review, looking like an outlier year driven by the large influx of cryptocurrency contributions during a market high in December 2017.7 new Chart(document.getElementById("TotalContributionsPerYear"), { type: "bar", data: { labels: [ "2014", "2015", "2016", "2017", "2018" ], datasets: [{ label: "Returning", data: [1065944, 826243, 1377234, 4085948, 2818678], backgroundColor: "rgb(75, 119, 209)" }, { label: "Unlapsed", data: [13285, 50292, 83705, 334140, 252852], backgroundColor: "rgb(218, 149, 75)" }, { label: "New", data: [141298, 626679, 771259, 1429613, 2056230], backgroundColor: "rgb(102, 166, 90)" }], }, options: { scales: { xAxes: [{ stacked: true, gridLines: {display: false}, scaleLabel: { display: true, labelString: "Years" } }], yAxes: [{ stacked: true, ticks: { stepSize: 1000000, callback: function(value, index, values) { return "$" + value.toLocaleString({style: "currency", currency: "USD"}); } }, scaleLabel: { display: true, labelString: "Contributions" } }] }, tooltips: { callbacks: { label: function(tooltipItem, data) { var label = data.datasets[tooltipItem.datasetIndex].label || ""; if (label) {label += ": ";} label += "$" + tooltipItem.yLabel.toLocaleString({style: "currency", currency: "USD"}); return label } } }, legend: { display: true, position: "bottom" }, layout: { padding: {} }, aspectRatio: 1.75, defaultFontFamily: "'Source Sans Pro', Helvetica, Arial, sans-serif", title: { display: true, padding: 40, fontSize: 17.5, fontColor: "#2e3f51", text: "Total Contributions per Year" }, plugins: { datalabels: {display: false} } } }); (In this chart and those that follow, “Unlapsed” indicates contributions from past supporters who did not donate in the previous year.) Highlights include: $1.02M, our largest ever single donation by an individual, from “Anonymous Ethereum Investor #2,” based in Canada, made through Rethink Charity Forward’s recently established tax-advantaged fund for Canadian MIRI supporters.8 $1.4M in grants from the Open Philanthropy Project, $1.25M in general support and $150k for our AI Safety Retraining Program. $951k during our annual fundraiser, driven in large part by MIRI supporters’ participation in multiple matching campaigns during the fundraiser, including WeTrust Spring’s Ethereum-matching campaign, Facebook’s Giving Tuesday event, and in partnership with Raising for Effective Giving (REG), professional poker players’ Double Up Drive. $529k from 2 grants recommended by the EA Funds Long-Term Future Fund. $115K from Poker Stars, also through REG. In 2018, we received contributions from 637 unique contributors, 16% less than in 2017. This drop was largely driven by a 27% reduction in the number of new donors, partly offset by the continuing trend of steady growth in the number of returning donors9: new Chart(document.getElementById("NumberOfContributors"), { type: "bar", data: { labels: [ "2014", "2015", "2016", "2017", "2018" ], datasets: [{ label: "Returning", data: [207, 208, 229, 244, 273], backgroundColor: "rgb(75, 119, 209)" }, { label: "Unlapsed", data: [106, 54, 43, 71, 53], backgroundColor: "rgb(218, 149, 75)" }, { label: "New", data: [464, 242, 265, 428, 311], backgroundColor: "rgb(102, 166, 90)" }], }, options: { scales: { xAxes: [{ stacked: true, gridLines: {display: false}, scaleLabel: { display: true, labelString: "Years" } }], yAxes: [{ stacked: true, ticks: { stepSize: 200 }, scaleLabel: { display: true, labelString: "Number of Contributors" } }] }, legend: { display: true, position: "bottom" }, layout: { padding: {} }, aspectRatio: 1.75, defaultFontFamily: "'Source Sans Pro', Helvetica, Arial, sans-serif", title: { display: true, padding: 40, fontSize: 17.5, fontColor: "#2e3f51", text: "Number of Contributors per Year" }, plugins: { datalabels: {display: false} } } }); new Chart(document.getElementById("ContributorsPerYearPerSize"), { type: "bar", data: { labels: [ "2014", "2015", "2016", "2017", "2018" ], datasets: [{ label: "$0–$500", data: [38488, 34314, 34123, 46536, 32325], backgroundColor: "rgb(75, 119, 209)" }, { label: "$500–$5k", data: [94104, 156714, 197145, 311586, 208537], backgroundColor: "rgb(209, 109, 106)" }, { label: "$5k–$50k", data: [299820, 484218, 742608, 843296, 809677], backgroundColor: "rgb(218, 149, 75)" }, { label: "$50k–", data: [788115, 827968, 1258322, 4648382, 4086729], backgroundColor: "rgb(102, 166, 90)" }], }, options: { scales: { xAxes: [{ gridLines: {display: false}, scaleLabel: { display: true, labelString: "Years" } }], yAxes: [{ ticks: { stepSize: 1000000, callback: function(value, index, values) { return "$" + value.toLocaleString({style: "currency", currency: "USD"}); } }, scaleLabel: { display: true, labelString: "Contributions" } }] }, tooltips: { callbacks: { label: function(tooltipItem, data) { var label = data.datasets[tooltipItem.datasetIndex].label || ""; if (label) {label += ": ";} label += "$" + tooltipItem.yLabel.toLocaleString({style: "currency", currency: "USD"}); return label } } }, legend: { display: true, position: "bottom" }, layout: { padding: {} }, aspectRatio: 1.75, defaultFontFamily: "'Source Sans Pro', Helvetica, Arial, sans-serif", title: { display: true, padding: 40, fontSize: 17.5, fontColor: "#2e3f51", text: "Total Contributions per Year (by Contributor Size)" }, plugins: { datalabels: {display: false} } } }); Donations of cryptocurrency were down in 2018 both in absolute terms (-$1.2M in value) and as a percentage of total contributions (23% compared to 42% in 2017). It’s plausible that if cryptocurrency values continue to rebound in 2019, we may see this trend reversed. In 2017, donations received from matching initiatives dramatically increased with almost a five-fold increase over the previous year. In 2018, our inclusion in two different REG-administered matching challenges, a significantly increased engagement among MIRI supporters with Facebook’s Giving Tuesday, and MIRI’s winning success in WeTrust’s Spring campaign, offset a small decrease in corporate match dollars to improve slightly on 2017’s matching total. The following graph represents the matching amounts received over the last 5 years: new Chart(document.getElementById("TotalMatchingContributionsPerYear"), { type: "bar", data: { labels: [ "2014", "2015", "2016", "2017", "2018" ], datasets: [{ label: "Corporate Matching", data: [78863, 82191, 98268, 173022, 151823], backgroundColor: "rgb(75, 119, 209)" }, { label: "REG Challenges", data: [0, 28250, 0, 295899, 274779], backgroundColor: "rgb(218, 149, 75)" }, { label: "Facebook Giving Tuesday", data: [0, 0, 0, 11372, 40072], backgroundColor: "rgb(102, 166, 90)" }, { label: "WeTrust Spring", data: [0, 0, 0, 0, 16217], backgroundColor: "rgb(209, 109, 106)" }], }, options: { scales: { xAxes: [{ stacked: true, gridLines: {display: false}, scaleLabel: { display: true, labelString: "Years" } }], yAxes: [{ stacked: true, ticks: { stepSize: 100000, callback: function(value, index, values) { return "$" + value.toLocaleString({style: "currency", currency: "USD"}); } }, scaleLabel: { display: true, labelString: "Contributions" } }] }, tooltips: { callbacks: { label: function(tooltipItem, data) { var label = data.datasets[tooltipItem.datasetIndex].label || ""; if (label) {label += ": ";} label += "$" + tooltipItem.yLabel.toLocaleString({style: "currency", currency: "USD"}); return label } } }, legend: { display: true, position: "bottom" }, layout: { padding: {} }, aspectRatio: 1.75, defaultFontFamily: "'Source Sans Pro', Helvetica, Arial, sans-serif", title: { display: true, padding: 40, fontSize: 17.5, fontColor: "#2e3f51", text: "Total Matching Contributions per Year" }, plugins: { datalabels: {display: false} } } }); Spending In our 2017 fundraiser post, I projected that we’d spend ~$2.8M in 2018. Towards the end of last year, I revised our estimate: Following the amazing show of support we received from donors last year (and continuing into 2018), we had significantly more funds than we anticipated, and we found more ways to usefully spend it than we expected. In particular, we’ve been able to translate the “bonus” support we received in 2017 into broadening the scope of our recruiting efforts. As a consequence, our 2018 spending, which will come in at around $3.5M, actually matches the point estimate I gave in 2017 for our 2019 budget, rather than my prediction for 2018—a large step up from what I predicted, and an even larger step from last year’s [2017] budget of $2.1M. The post goes on to give an overview of the ways in which we put this “bonus” support to good use. These included, in descending order by cost: Investing significantly more in recruiting-related activities, including our AIRCS workshop series; and scaling up the number of interns we hosted, with an increased willingness to pay higher wages to attract promising candidates to come intern/trial with us. Filtering less on price relative to fit when choosing new office space to accommodate our growth, and spending more on renovations, than we otherwise would have been able to, in order to create a more focused working environment for research staff. Raising salaries for some existing staff, who were being paid well below market rates. With concrete numbers now in hand, I’ll go into more detail below on how we put those additional funds to work. Total spending came in just over $3.75M. The chart below compares our actual spending in 2018 with our projections, and with our spending in 2017.10 new Chart(document.getElementById("BudgetComparison"), { type: "bar", data: { labels: [ "Research Personnel", "General Personnel", "Cost of Doing Business", "Program Activities" ], datasets: [{ label: "Actual 2017 Spending", stack: "1", data: [840000, 566000, 341000, 57000], backgroundColor: "rgb(218, 149, 75)" }, { label: "Estimated 2018 Budget", stack: "2", data: [1100000, 750000, 280000, 120000], backgroundColor: "rgb(75, 119, 209)" }, { label: "Actual 2018 Spending", stack: "3", data: [1300000, 740000, 750000, 550000], backgroundColor: "rgb(102, 166, 90)" }, { label: null, stack: "1", data: [280000, 0, 0, 0], backgroundColor: "rgb(230, 184, 135)", datalabels: { display: true, } }, { label: null, stack: "2", data: [560000, 0, 0, 0], backgroundColor: "rgb(226, 235, 245)", datalabels: { display: true, } }, { label: null, stack: "3", data: [470000, 0, 0, 0], backgroundColor: "rgb(202, 223, 184)", datalabels: { display: true, } }], }, options: { scales: { xAxes: [{ stacked: true, gridLines: {display: false}, scaleLabel: { display: true, labelString: "Major Budget Categories" } }], yAxes: [{ stacked: true, ticks: { stepSize: 250000, callback: function(value, index, values) { return "$" + value.toLocaleString({style: "currency", currency: "USD"}); } }, scaleLabel: { display: true, labelString: "" } }] }, tooltips: { callbacks: { label: function(tooltipItem, data) { var label = data.datasets[tooltipItem.datasetIndex].label || ""; if (label) {label += ": ";} label += "$" + tooltipItem.yLabel.toLocaleString({style: "currency", currency: "USD"}); return label } } }, legend: { display: true, position: 'bottom', labels: { filter: function(legendItem, data) { return legendItem.text != null }, }, }, layout: { padding: {} }, aspectRatio: 1.25, defaultFontFamily: "'Source Sans Pro', Helvetica, Arial, sans-serif", title: { display: true, padding: 40, fontSize: 17.5, fontColor: "#2e3f51", text: "2018 Budget Projection vs. Spending" }, plugins: { datalabels: { color: 'black', // display: false, display: true, formatter: function(value, context) { if ( (context.datasetIndex == 3 || context.datasetIndex == 4 || context.datasetIndex == 5) && value > 0) { return "New\nstaff" } else { return '' } }, textAlign: 'center', } }, } }); At a high level, as expected, personnel costs in 2018 continued to account for the majority of our spending—though represented a smaller share of total spending than in 2017, due to increased spending on recruitment-related activities along with one-time costs related to securing and renovating our new office space. Our spending on recruitment-related activities is captured in the program activities category. The major ways we put additional funds to use, which account for the increase over my projections, break down as follows: ~$170k on internships: We hosted nine research interns for an average of ~2.5 months each. We were able to offer more competitive wages for internships, allowing us to recruit interns (especially those with an engineering focus) that we otherwise would have had a much harder time attracting, given the other opportunities they had available to them. We are actively interested in hiring three of these interns, and have made formal offers to two of them. I’m hopeful that we’ll have added at least one of them to the team by the end of this year. $54k on AI Safety Retraining Program grants, described above. The bulk of the rest of the additional funds we spent in this category went towards funding our ongoing series of AI Risk for Computer Scientists workshops, described above. Expenses related to our new office space are accounted for in the cost of doing business category. The surplus spending in this category resulted from: ~$300k for securing, renovating, and filling out our new office space. Finding a suitable new space to accommodate our growth in Berkeley ended up being much more challenging and time-consuming than we expected.11 We made use of additional funds to secure our preferred space ahead of when we were prepared to move, and to renovate the space to meet our needs, whereas if we’d been operating with the budget I originally projected, we would have almost certainly ended up in a much worse space. The remainder of the spending beyond my projection in this category comes from higher-than-expected legal costs to secure visas for staff, and slightly higher-than-projected spending across many other subcategories. anchors.add(); Our summaries of our more significant results below largely come from our 2018 fundraiser post.Not to be confused with Nate Soares’ forthcoming tiling agents paper.Evan was a MIRI research intern, while Chris, Vladimir, and Joar are external collaborators.This paper was previously cited in “Embedded Agency” under the working title “The Inner Alignment Problem.”The full PDF version of the paper will be released in conjuction with the last post of the sequence.As noted in our summer updates: We had a large and extremely strong pool of applicants, with over 170 applications for 30 slots (versus 50 applications for 20 slots in 2017). The program this year was more mathematically flavored than in 2017, and concluded with a flurry of new analyses by participants. On the whole, the program seems to have been more successful at digging into AI alignment problems than in previous years, as well as more successful at seeding ongoing collaborations between participants, and between participants and MIRI staff. The program ended with a very active blogathon, with write-ups including: Dependent Type Theory and Zero-Shot Reasoning; Conceptual Problems with Utility Functions (and follow-up); Complete Class: Consequentialist Foundations; and Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet.Note that amounts in this section may vary slightly from our audited financial statements, due to small differences between how we tracked donations internally, and how we are required to report them in our financial statements.A big thanks to Colm for all the work he’s put into setting this up; have a look at our Tax-Advantaged Donations page for more information.2014 is anomalously high on this graph due to the community’s active participation in our memorable SVGives campaign.Note that these numbers will differ slightly compared to our forthcoming audited financial statements for 2018, due to subtleties of how certain types of expenses are tracked. For example, in the financial statements, renovation costs are considered to be a fixed asset that depreciates over time, and as such, won’t show up as an expense.The number of options available in the relevant time frame were very limited, and most did not meet many of our requirements. Of the available spaces, the option that offered the best combination of size, layout, and location, was looking for a tenant starting November 1st 2018, while we weren’t able to move until early January 2019. Additionally, the space was configured with a very open layout that wouldn’t have met our needs, but that many other prospective tenants found desirable, such that we’d have to cover renovation costs.The post 2018 in review appeared first on Machine Intelligence Research Institute. Read more »
  • May 2019 Newsletter
    Updates A new paper from MIRI researcher Vanessa Kosoy, presented at the ICLR SafeML workshop this week: "Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help." New research posts: Learning "Known" Information When the Information is Not Actually Known; Defeating Goodhart and the "Closest Unblocked Strategy" Problem; Reinforcement Learning with Imperceptible Rewards The Long-Term Future Fund has announced twenty-three new grant recommendations, and provided in-depth explanations of the grants. These include a $50,000 grant to MIRI, and grants to CFAR and Ought. LTFF is also recommending grants to several individuals with AI alignment research proposals whose work MIRI staff will be helping assess. We attended the Global Governance of AI Roundtable at the World Government Summit in Dubai. News and links Rohin Shah reflects on the first year of the Alignment Newsletter. Some good recent AI alignment discussion: Alex Turner asks for the best reasons for pessimism about impact measures; Henrik Åslund and Ryan Carey discuss corrigibility as constrained optimization; Wei Dai asks about low-cost AGI coordination; and Chris Leong asks, "Would solving counterfactuals solve anthropics?" From DeepMind: Towards Robust and Verified AI: Specification Testing, Robust Training, and Formal Verification. Ilya Sutskever and Greg Brockman discuss OpenAI's new status as a "hybrid of a for-profit and nonprofit". Misconceptions about China and AI: Julia Galef interviews Helen Toner. (Excerpts.) The post May 2019 Newsletter appeared first on Machine Intelligence Research Institute. Read more »
  • New paper: “Delegative reinforcement learning”
    true
    MIRI Research Associate Vanessa Kosoy has written a new paper, “Delegative reinforcement learning: Learning to avoid traps with a little help.” Kosoy will be presenting the paper at the ICLR 2019 SafeML workshop in two weeks. The abstract reads: Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions. The goal of Kosoy’s work on DRL is to put us on a path toward having a deep understanding of learning systems with human-in-the-loop and formal performance guarantees, including safety guarantees. DRL tries to move us in this direction by providing models in which such performance guarantees can be derived. While these models still make many unrealistic simplifying assumptions, Kosoy views DRL as already capturing some of the most essential features of the problem—and she has a fairly ambitious vision of how this framework might be further developed. Kosoy previously described DRL in the post Delegative Reinforcement Learning with a Merely Sane Advisor. One feature of DRL Kosoy described here but omitted from the paper (for space reasons) is DRL’s application to corruption. Given certain assumptions, DRL ensures that a formal agent will never have its reward or advice channel tampered with (corrupted). As a special case, the agent’s own advisor cannot cause the agent to enter a corrupt state. Similarly, the general protection from traps described in “Delegative reinforcement learning” also protects the agent from harmful self-modifications. Another set of DRL results that didn’t make it into the paper is Catastrophe Mitigation Using DRL. In this variant, a DRL agent can mitigate catastrophes that the advisor would not be able to mitigate on its own—something that isn’t supported by the more strict assumptions about the advisor in standard DRL.   <div class="" style="margin-top:20px;"> Sign up to get updates on new MIRI technical results Get notified every time a new technical paper is published. jQuery(document).ready(function($) { $('#mc-embedded-subscribe-form3').validate({ rules: { EMAIL: { required: true, email: true }, FNAME: { required: true }, LNAME: { required: true } }, errorClass: 'text-error', errorPlacement: function(error, element) { error.appendTo(element.closest('.control-group')); }, highlight: function(element) { $(element).closest('.control-group').removeClass('success').addClass('error'); }, success: function(element) { $(element).closest('.control-group').removeClass('error').addClass('success'); }, submitHandler: function(form) { form.submit(); $('#NewPublicationsFormTab a[href="#NewPublicationsMessage"]').tab('show'); $('#NewPublicationsMessage').addClass('alert-success').html('×Almost finished... We need to confirm your email address. To complete the subscription process, please click the link in the email we just sent you.'); _gaq.push(['_trackEvent', 'other engagement', 'submit form', 'newsletter']); //form.remove(); } }) });   The post New paper: “Delegative reinforcement learning” appeared first on Machine Intelligence Research Institute. Read more »
  • April 2019 Newsletter
    Updates New research posts: Simplified Preferences Needed, Simplified Preferences Sufficient; Smoothmin and Personal Identity; Example Population Ethics: Ordered Discounted Utility; A Theory of Human Values; A Concrete Proposal for Adversarial IDA MIRI has received a set of new grants from the Open Philanthropy Project and the Berkeley Existential Risk Initiative. News and links From the DeepMind safety team and Alex Turner: Designing Agent Incentives to Avoid Side Effects. From Wei Dai: Three Ways That "Sufficiently Optimized Agents Appear Coherent" Can Be False; What's Wrong With These Analogies for Understanding Informed Oversight and IDA?; and The Main Sources of AI Risk? Other recent write-ups: Issa Rice's Comparison of Decision Theories; Paul Christiano's More Realistic Tales of Doom; and Linda Linsefors' The Game Theory of Blackmail. OpenAI's Geoffrey Irving describes AI safety via debate on FLI's AI Alignment Podcast. A webcomic's take on AI x-risk concepts: Seed. The post April 2019 Newsletter appeared first on Machine Intelligence Research Institute. Read more »
WordPress RSS Feed Retriever by Theme Mason

Author: hits1k

Leave a Reply