Open AI

  • Why Responsible AI Development Needs Cooperation on Safety
    true
    We've written a policy research paper identifying four strategies that can be used today to improve the likelihood of long-term industry cooperation on safety norms in AI: communicating risks and benefits, technical collaboration, increased transparency, and incentivizing standards. Our analysis shows that industry cooperation on safety will be instrumental in ensuring that AI systems are safe and beneficial, but competitive pressures could lead to a collective action problem, potentially causing AI companies to under-invest in safety. We hope these strategies will encourage greater cooperation on the safe development of AI and lead to better global outcomes of AI. Read Paper It’s important to ensure that it’s in the economic interest of companies to build and release AI systems that are safe, secure, and socially beneficial. This is true even if we think AI companies and their employees have an independent desire to do this, since AI systems are more likely to be safe and beneficial if the economic interests of AI companies are not in tension with their desire to build their systems responsibly. This claim might seem redundant because developing and deploying products that do not pose a risk to society is generally in a company’s economic interest. People wouldn’t pay much for a car without functioning brakes, for example. But if multiple companies are trying to develop a similar product, they can feel pressure to rush it to market, resulting in less safety work prior to release. Such problems generally arise in contexts where external regulation is weak or non-existent. Appropriate regulation of goods and services provided in the marketplace can reduce corner-cutting on safety. This can benefit the users of goods and services as well as the sector itself—the airline sector as a whole benefits commercially from the fact that governments around the world are vigilant about safety, for example, and that when incidents occur, they are always investigated in detail. Conventional regulatory mechanisms may be less effective in dealing with AI, however, due to the rate at which the technology is developing and the large information asymmetries between developers and regulators. Our paper explores what factors might drive or dampen such a rush to deployment, and suggests strategies for improving cooperation between AI developers. Developers “cooperate” not by ceasing to compete but by taking appropriate safety precautions, and they are more likely to do this if they are confident their competitors will do the same. The Need for Collective Action on Safety If companies respond to competitive pressures by rushing a technology to market before it has been deemed safe, they will find themselves in a collective action problem. Even if each company would prefer to compete to develop and release systems that are safe, many believe they can’t afford to do so because they might be beaten to market by other companies. Problems like this can be mitigated by greater industry cooperation on safety. AI companies can work to develop industry norms and standards that ensure systems are developed and released only if they are safe, and can agree to invest resources in safety during development and meet appropriate standards prior to release. Some hypothetical scenarios: A company develops an image recognition model with very high performance and is in a rush to deploy it at scale, but the engineers at the company have not yet adequately evaluated the system's performance in the real world. The company also knows it lacks full testing standards to know the full "capability surface" of the model. Due to fears of being beaten to market by competitors in a particular niche, however, the company moves forward, gambling that their limited in-house testing will suffice to hedge against any major system failures or public blowback. A company wishes to deploy some semi-autonomous AI software onto physical robots, such as drones. This software has a failure rate that satisfies regulatory criteria, but because the company is racing to get the technology to market it knows that their product's popular "interpretability" feature gives misleading explanations that are intended more for reassurance than clarification. Due to limited expertise among regulators, this misbehavior falls through the cracks until a catastrophic incident, as does similar behavior by other companies racing to deploy similarly "interpretable" systems. Some collective action problems are more solvable than others. In general, a collective action problem is more solvable if the expected benefits of cooperating outweigh the expected benefits of not cooperating. The following interrelated factors increase the expected benefits of cooperating: High Trust Companies are more likely to cooperate on safety if they can trust that other companies will reciprocate by working towards a similar standard of safety. Among other things, trust that others will develop AI safely can be established by increasing transparency about resources being invested in safety, by publicly committing to meet a high standard of safety, and by engaging in joint work to find acceptable safety benchmarks. Shared Upside Companies have a stronger incentive to cooperate on safety if the mutual benefits from safe development are higher. The prospect of cooperation can be improved by highlighting the benefits of establishing good safety norms early, such as preventing incidents of AI failure and misuse, and establishing safety standards that are based on a shared understanding of emerging AI systems. Collaborative efforts like Risk Salon, which hosts events for people working in fraud, risk, and compliance, are a good example of this. These events facilitate open discussions between participants from different companies, and seem to be primarily motivated by the shared gain of improved risk mitigation strategies. Low Exposure Reducing the harms that companies expect to incur if another company decides not to cooperate on safety increases the likelihood that they themselves will abide by safety standards. Exposure can be reduced by discouraging violations of safety standards (e.g. reporting them) or by providing evidence of the potential risks associated with systems that don’t meet the relevant standards. When standards must be met to enter a market, for example, companies have little to lose if others don’t meet those standards. To comply with the RoHS directive, electronics manufacturers had to switch to lead-free soldering in order to sell their products in the EU. The possibility that one manufacturer would continue to use lead soldering would do little to affect cooperation with lead-reduction efforts, since their failure to comply would not be costly to other manufacturers. Low Advantage Reducing any advantages companies can expect to get by not cooperating on safety should increase overall compliance with safety standards. For example, companies producing USB connectors don’t expect to gain much from deviating from USB connector standards, because doing so will render their product incompatible with most devices. When standards have already been established and deviating from them is more costly than any benefits, advantage is low. In the context of AI, reducing the cost and difficulty of implementing safety precautions would help minimize the temptation to ignore them. Additionally, governments can foster a regulatory environment in which violating high-stakes safety standards is prohibited. Shared Downside Identifying the ways in which AI systems could fail if adequate precautions are not taken can increase the likelihood that AI companies will agree not to develop or release such systems. Shared downsides incentivize cooperation when failures are particularly harmful: especially if they are felt by the whole industry (e.g. by damaging public trust in the industry as a whole). After the Three Mile Island incident, for example, the nuclear power industry created and funded the INPO, a private regulator with the ability to evaluate plants and share the results of these evaluations within industry in order to improve operational safety. Collective action problems are susceptible to negative spirals where the loss of trust causes one party to stop cooperating, causing other parties to stop cooperating. At the same time, it is also possible to generate positive spirals where the development of trust causes some parties to cooperate, resulting in other parties cooperating. Cooperation Strategies We've found four strategies that can be used today to improve the likelihood of cooperation on safety norms and standards in AI. These are: 1. Promote accurate beliefs about the opportunities for cooperation Communicate the safety and security risks associated with AI, show that concrete steps can be taken to promote cooperation on safety, and make shared concerns about safety common knowledge. 2. Collaborate on shared research and engineering challenges Engage in joint interdisciplinary research that promotes safety and is otherwise conducive to fostering strong collaboration (e.g. work that involves combining complementary areas of expertise). 3. Open up more aspects of AI development to appropriate oversight and feedback Publicize codes of conduct, increase transparency about publication-related decision-making, and, provided that security and IP concerns are addressed, open up individual AI systems to greater scrutiny. 4. Incentivize adherence to high standards of safety Commend those that adhere to safety standards, reproach failures to ensure that systems are developed safely, and support economic, legal, or industry-wide incentives to adhere to safety standards. We think collective action problems may be a principal source of policy challenges as AI systems become increasingly powerful. This analysis focuses on the roles that industry can play in preventing such problems, but we anticipate that legal and political mechanisms will also play an important role in preventing and mitigating these issues. We also anticipate that identifying similar mechanisms to improve cooperation on AI safety between states and with other non-industry actors will be of increasing importance in the years to come. There is a great deal of uncertainty about the challenges that future AI systems may pose, but we believe that encouraging greater cooperation on the safe development of AI is likely to have a positive impact on the outcomes of AI development. While we acknowledge that such challenges exist, we advocate for a more thorough mapping of possible collaborations across organizational and national borders, with particular attention to research and engineering challenges whose solutions might be of wide utility. Areas to consider might include joint research into the formal verification of AI systems' capabilities and other aspects of AI safety and security with wide applications; various applied "AI for good" projects whose results might have wide-ranging and largely positive applications (e.g. in domains like sustainability and health); and joint development of countermeasures against global AI-related threats such as the misuse of synthetic media generation online. To achieve greater cooperation on safety, we need to make it common knowledge that such cooperation is in everyone’s interest, and that methods for achieving it can be identified, researched, and implemented today. Acknowledgments Thanks to those who provided helpful comments on earlier versions of this blog post: Helen Toner, Cullen O’Keefe, Larissa Schiavo, Danny Hernandez, Geoffrey Irving, Michael Page, Greg Brockman, Ashley Pilipiszyn, Adam Gleave, Chip Huyen, Paul Scharre, Jelena Luketina, Michael Horowitz, Rowan Zellers, Sarah Kreps,and Rebecca Crootof. Read more »
  • OpenAI Robotics Symposium 2019
    true
    We hosted the first OpenAI Robotics Symposium on April 27, 2019. Robots that learn are an exciting path forward, yet there are differing approaches and opinions on how to make progress. The event brought together a diverse set of people from both robotics and machine learning communities as well as academics and industry leaders to create a platform to exchange ideas and address open questions in building complex robot systems. Why this event? Robots that learn are a development that will allow robots to become part of our everyday lives. While we have some ideas on how to get there, we think it is important to engage with people from other organizations and disciplines to exchange and discuss ideas. Creating these robots is inherently a multidisciplinary approach—it not only requires technical expertise, but also a deeper understanding of how these robots can be deployed safely and interact with humans in the real world. The participants We hosted ~80 external attendees at our office and ~200 people joined remotely via our livestream throughout the day. We had attendees from industry labs like Google, Facebook, and NVIDIA in addition to students, postdocs and professors from universities like Stanford, UC Berkeley, CMU and MIT. We also had hobbyists, artists, roboticists, and machine learning researchers in the crowd. The talks Learning Dexterity Wojciech Zaremba, OpenAI Wojciech talks about our recent research, "Learning Dexterity," which uses sim2real with domain randomization and large-scale reinforcement learning with memory-augmented policies. This approach leads to meta-learning that allows our policy to transfer to the physical robot without ever training on the robot. Watch talk View slides Learning From Play Pierre Sermanet, Google Brain Pierre describes how play can provide self-supervision for representation learning. This approach can be used to acquire a diverse set of skills that can be used and recombined to solve novel tasks without ever providing any labels or rewards. Watch talk View slides Doing for Our Robots What Nature Did for Us Leslie Kaelbling, MIT Leslie explains how we have to think about learning both in the "robot factory" (i.e., at engineering time) as well as "in the wild" (i.e., when deployed). Leslie describes her overall architecture for building intelligent robots and how it can be used to build robots that acquire new skills. Watch talk View slides Treating People as Optimizers in Human-Robot Interaction Anca Dragan, UC Berkeley Anca explores the question of what inductive bias is right when learning for human-robot interaction. She proposes a framework for predicting human actions that broadens the assumption that humans are noisy-rational and allows for strategic human behavior, as well as systematic sub-optimality (like not knowing the exact physics of the environment, or still learning about their preferences). Watch talk View slides Social-Emotional Intelligence in Human-Robot Interactions Jin Joo Lee, MIT / Amazon Jin Joo dives into the why and how of making robots lifelike and interactive through social-emotional intelligence. These social robots can read and understand our emotional expressions and also communicate back to us in the same way. Watch talk What Should Be Learned Chris Atkeson, CMU Chris critically discusses the gap between robot learning research and robot programming practice. He asks what would make learning robots truly useful and outlined his ideas on how to get there. Watch talk View slides Robots That Adapt Like Natural Animals Jeff Clune, Uber AI / University of Wyoming Jeff describes work he and his collaborators published in Nature on how to build robots that can rapidly adapt at runtime if they become damaged. The proposed approach could ultimately lead to robots that are much more able to adapt to damage or unexpected environmental conditions. Watch talk View slides Dexterity demo Since the event was hosted at our office, we took the opportunity to perform a live demo of our humanoid robot hand manipulating a block using vision and reinforcement learning. We were excited to show the hand to people and have the OpenAI Robotics team "on hand" to answer their questions! We hope to do this again in the future as it is a very different experience to see this in person. Next steps We were extremely pleased with the outcome of the event—this was an experimental format and our expectations were definitely exceeded. The talks during the day led to interesting discussions within our team and resulted in some new ideas (e.g., self-supervision) and perspectives (e.g., traditional robotics vs deep learning robotics). After chatting with the participants and speakers, it was clear everyone felt they benefited from this event and left with a shared understanding of the diversity in the different approaches to solving the same problems. Given this feedback, we intend to repeat this format in the future, possibly as an annual symposium. We'll share details about upcoming events at a later date. If you would like to help us do research on robots that learn, please get in touch! We’re hiring. Thanks to Loren Kwan, Diane Yoon, and Maddie Hall for co-organizing the event, to all the OpenAI staff volunteers, and to Blake Tucker for filming and photography. Read more »
  • OpenAI Scholars Spring 2019: Final Projects
    true
    Our second class of OpenAI Scholars has concluded, with all eight scholars producing an exciting final project showcased at Scholars Demo Day at OpenAI. Over the past three months, we’ve seen how experienced engineers working in software, medicine, physics, child development and other fields can become machine learning practitioners with our combination of educational resources and mentorship. Rewatch Demo Day <!-- --> <!-- OpenAI Scholars Demo Day on May 14, 2019 --> Fatma Tarlaci Jonathan Michaux Nancy Otero Elynn Chen Helen (Mengxin) Ji Yuhao Wan Janet Brown Edgar Barraza Fatma Tarlaci Fine-Tuning GPT-2 Small for Question Answering Mentor: Jonathan Raiman Working from Austin, TX Website Twitter Previous Role: Eric Roberts Fellow in Computer Science at Stanford University Despite the recent successes of powerful language models, reasoning remains a challenging task in Natural Language Understanding. Question Answering (QA) requires a comprehensive mix of language processing and reasoning skills within a single task. Evaluating a system’s successes and failures on QA tasks provides valuable insights into its reasoning mechanism. This project experiments with fine-tuning of the GPT-2 small model for QA to analyze its performance on reasoning. Blog Post GitHub Repo The OpenAI Scholars program allowed me to build a solid foundation in deep learning and gain a thorough understanding of Natural Language Processing and Understanding. The program also allowed me to define my research interests in AI more clearly by providing me with the resources to experiment with various subfields of deep learning. Jonathan Michaux Using Intrinsic Motivation to Solve Robotic Tasks with Sparse Rewards Mentor: Feryal Behbahani Working from Chicago and San Francisco Website Twitter Previous Role: PhD student in Cell and Molecular Biology at the University of Chicago Many robotics problems are naturally formulated such that the extrinsic rewards to the agent are either sparse or missing altogether. These problems can be extremely difficult to solve as the environment provides limited feedback to guide the agent toward accomplishing its goal. Previous work has shown that agents that train using prediction error as an intrinsic reward are able to learn across a wide range of domains, including Atari games and continuous control tasks. In this project, I used curiosity-driven exploration to solve challenging robotics tasks with sparse rewards. I then formulated the intrinsic reward as the error in the agent’s ability to predict its next state, given its current state and executed action. My results demonstrated that this approach is capable of solving several difficult robotic manipulation tasks in simulation. Blog Post GitHub Repo Before joining the Scholars program I had already undertaken a plan to self-study robotics. The OpenAI Scholars program gave me the opportunity to greatly enhance my self-study with a curriculum focused exclusively on Deep Reinforcement Learning. After spending 8 weeks reading papers and implementing core Deep RL algorithms, I was able to apply what I learned to solving a suite of challenging robotics problems. Nancy Otero CREATURE—Human Learning Powered by Machine learning Mentor: Kai Arulkumaran Working from New York City and Mexico City Website Twitter Previous Roles: Software engineer at Palo Alto Networks; Founding Director of Learning Design and Research; Founded nonprofit based in Mexico; Stanford Education School Project-based learning is a very effective and enjoyable way to learn, but teachers often struggle to find appropriate projects for their students. Despite thousands of projects existing online, most are poorly labeled and thus difficult for teachers to find. Accurately labeling the thousands of online projects would be daunting and expensive on a case-by-case basis. CREATURE is a proof-of-concept model that labels online projects with 75–90% accuracy. Blog Post The OpenAI Scholars program demonstrated that given the right mentorship, trust, and financial support, learning ML to do a self-directed project is possible. I learned about language models, data collection and processing, model tuning, and how to integrate all that into a ready-to-use model for educational purposes. I'm excited to keep working on my project, dive deeper into the relationship between human intelligence and AI, and translate what I learned during this program into learning activities others can use. Elynn Chen Reinforcement Learning for Medical Applications Mentor: Lilian Weng Working from Princeton, NJ Website Previous Role: PhD student at Princeton University I developed a computer system that learns from historical electronic health records (EHR) and recommends optimal therapeutic treatment—dosage of IV fluids and vasopressor—based on patient's vitals and lab values. I specifically considered policy iteration and tabular Q-learning with discrete state and action spaces. Results revealed that the optimal RL policies recommend lower doses of IV fluids and higher doses of vasopressors than the physician’s actual treatments. Off-policy evaluation showed that optimal policy learned by Q-learning had higher reward than the one learned by policy iteration. The system can be easily extended to deal with continuous state/action space and incorporate other off-policy RL algorithms. Blog Post GitHub Repo I learned about NNs, CNNs, RNNs, LSTMs and deep reinforcement learning. I implemented different NN architectures and most RL algorithms including DQN, VPG, TRPO, PPO, and DDPG. Before this program, I majored in Statistics and had no experience with deep learning. The OpenAI Scholars program provided me with the guidance and resources to learn core deep learning methods in a short amount of time. Helen (Mengxin) Ji Sentiment Analysis Using Reinforcement Learning Mentor: Azalia Mirhoseini Working from the Bay Area Website Twitter Previous Role: PhD student in Economics at UC Davis We proposed novel models that combine reinforcement learning (RL) methods and supervised NLP methods to predict sentence sentiment. We formulated the sentiment-analysis task as a sequential decision process with the goal of combining RL methods for sentiment analysis. For the model involving a policy network and classification network, we found that adding a RL method can improve the performance from the transformer model and produce comparable results on the pre-trained BERT model. We concluded that for concrete classification problems in a language model, a good reward function definition is an important component for RL training. Blog Post This program gave me the opportunity to learn hands-on from current language models and gain a deeper understanding of RL methods to implement in my project. After these three months, I discovered my key interests in the field of AI and the Scholars program provided me with valuable resources to learn, practice and deploy interesting ideas in this space. Yuhao Wan Exploring Gamma: Discount of the Future, or Weight of the Past Mentor: Josh Achiam Working from the Bay Area Website Previous Role: REU-CAAR summer research group at Carleton College The role of discount factor is often neglected in deep reinforcement learning (DRL). In this project, I discovered the dual role of the discount factor in deep Q-networks: it encodes intertemporal preference and confidence in bootstrapping. In light of this hypothesis, I designed a simple myopia scheme that improves Baselines performance in various customized Gridworld environments. The experimental results demonstrated that the time-varying scheme could be robust and effective in more general settings, beyond DQN and the discrete action/state framework. Blog Post GitHub Repo The Scholars program allowed me to quickly gain a range of important skillsets. Over the first two months of self-designed study, I learned about the theory of reinforcement learning and became acquainted with how to implement deep reinforcement learning algorithms from scratch. I also appreciated the freedom and support I received as I worked on my final project. At the end of the program, I now feel more confident and ready to embark on new challenges ahead. Janet Brown Visualizing & Evaluating Image Synthesis GANs using the Techniques of Activation Atlases Mentor: Christy Dennison Working from San Francisco Website Twitter Previous Roles: Atakote; Harvard Business School; McKinsey & Company More and more realistic imagery is being achieved by generative models—yet we still struggle to effectively evaluate and understand them. I focused on different ways to understand and evaluate image synthesis GANs, using the approach of Distill’s Activation Atlas—a GAN-tlas! Using this method we were able to not only measure the difference in numerical terms, but also in highly visual terms—seeing inside the black box of what a neural network sees when it encounters both real and fake images. Blog Post Before this program, I focused on applying simple DL models in the AR/VR space. This program gave me the time dig into the foundations of DL and investigate the ‘black box’ of neural networks. Not only was the program an opportunity to do this, but to do so with access to leaders in the field that were willing to share their insights. Edgar Barraza Knowledge Distillation For Transformer Language Models Mentor: Susan Zhang Working from Ithaca, NY Website Twitter Previous Role: Physics at Cornell University With the advent of the transformer, neural networks have the power to generate language like a human, summarize text, answer questions and so much more! As they become more powerful, they also become larger in size, making them increasingly difficult to run on mobile devices. To make these tools more accessible, this project explored knowledge distillation with transformer language models by using a large, well-trained transformer as a teacher to a smaller untrained student network. Blog Post GitHub Repo The OpenAI Scholars program gave me the opportunity to learn the latest and greatest advancements in Natural Language Processing. I was also given the resources to implement and explore a new computational massive idea, enabling me to quickly learn the skills to execute my ideas. Our Scholars demonstrate core technical skills across various expert domains and self-motivation—critical competences for a self-directed program like this one. They each entered the field of machine learning as relative newcomers, and we hope their progress shows how accessible machine learning is. To begin your learning journey, check out some of our educational materials. More information about the next class of Scholars and how to apply will be announced in July. Stay tuned! Thanks to AWS for providing compute credits to the scholars. Additional thank you to our dedicated community mentors for their time advising the scholars on their projects. Read more »
  • OpenAI Fellows Fall 2018: Final Projects
    true
    Our second class of OpenAI Fellows has wrapped up, with each Fellow going from a machine learning beginner to core OpenAI contributor in the course of a 6-month apprenticeship. We are currently reviewing applications on a rolling basis for our next round of OpenAI Fellows Summer 2019. Apply for Summer 2019 During this time, we’ve seen how expertise in other scientific fields like classical music, statistics, and mathematics can yield insights to push AI research forward. All 6 Fellows have completed projects investigating a novel research idea while embedded in an OpenAI research team. We’re also excited to welcome all 6 of our Fall Fellows to OpenAI as full-time members of our technical staff! Final Projects Christine Payne Team—Language Mentor—Alec Radford Previous Role: Pianist What I Learned: "The Fellows program provided a great balance of freedom and support. I enjoyed spending the first two months reading papers and learning to implement them, and I really appreciated having a mentor who helped me pick the best papers or ideas to pursue. I was also able to work on my own and experiment with different ideas, but Alec and others on the team were always very generous with their time when I was stuck or needed advice. At the start of 2019, we were asked to think "What do I need to do to make my work this coming year the best work of my life?" For me, a big part of the answer is to work at OpenAI, as part of such a uniquely talented and motivated team." Final Project: I created MuseNet, a MIDI music model based on the same transformer architecture that powers GPT-2. MuseNet generates 2–4 minute compositions in many different musical styles. To do this, I collected hundreds of thousands of MIDI files from the web, experimented with different tokenization schemes, developed a way to condition samples based on a particular style or composer, and developed a co-composer tool to enable joint human/AI compositions. Blog PostListen to concert What's Next: Joining the Language team at OpenAI, working to improve MuseNet and collaborate with musicians. Jacob Hilton Team—Reinforcement learning Mentor—John Schulman Previous Roles: Quantitative researcher/trader at Jane Street, PhD in Mathematics at Leeds. What I Learned: "The Fellows program has been a fantastic introduction to machine learning research. It has been intense—a bit like the first year of my PhD condensed into six months. It was reinvigorating to have the first couple of months set aside to just learn, following a nicely-curated curriculum of papers and programming exercises. Just as valuable to learn from has been conducting my first machine learning research project, including all the inevitable false starts and failed experiments. Throughout I've been surrounded by experts who have been eager to bounce around ideas with me, and my mentor's indispensable guidance has helped to hone my research intuition and kept my project on track." Final Project: I studied how to make bias-variance tradeoffs in reinforcement learning. There are several hyperparameters of reinforcement learning algorithms that can be viewed as making a tradeoff between bias (systematic error) and variance (random error). For example, the discount rate controls the amount of bias towards shorter-term rewards, which tend to have less variance. I developed a general method of choosing these hyperparameters by directly measuring the bias and variance of gradients. The method also works in other contexts involving stochastic gradient descent outside of reinforcement learning. Bias and variance measurements for a CoinRun agent trained using PPO. Lower discount rates typically give lower-variance gradients, but become increasingly biased as training continues and the agent learns to model the longer-term effects of its actions. What's Next: Joining the RL team at OpenAI, exploring new research directions such as interpretability for RL. Todor Markov Team—Multiagent Mentors—Igor Mordatch Previous Role: Software engineer at Blend; BS in Symbolic Systems and MS in Statistics at Stanford University. What I Learned: "The Fellows program has been great in terms of both providing a good overview of the current state of deep learning and reinforcement learning research, and also allowing me to get hands-on experience with doing research in the field. The mentorship aspect was also a crucial component, and it has been tremendously helpful for beginning to build a sense of research taste." Final Project: I worked on evaluating skill emergence in multiagent environments by creating several evaluation tasks and testing whether transfer learning occurs when an agent trained in the multiagent environment has to learn those evaluation tasks. I also tried to evaluate how much of the observed transfer is caused by useful behaviors being learned in the multiagent environment, and how much is caused by useful mental representations being learned. What's Next: Joining the Multiagent team at OpenAI, continuing to work on transfer learning. Mark Chen Team—Algorithms Mentor—Ilya Sutskever Previous Role: Quantitative trader What I Learned: "The Fellows program provided me with a structured and efficient path to becoming a productive AI researcher. Ilya and Alec always made time for mentorship and to help me refine my ideas. With Ilya's enthusiasm, it's hard not to be excited about the future of generative models research!" Final Project: I worked on scaling image transformers to generate coherent images at high resolution. First, I explored the space of multiscale architectures, which allow for faster training and inference. Next, I focused on scaling past GPU memory limits by pipelining the models and alternatively porting them to run on TPU. Finally, I was involved in a team effort to use these large scale models to see how representations learned by generative pretraining aid us in solving downstream supervised image tasks. What's Next: Joining the Algorithms team at OpenAI, continuing work on image transformers. Lei Zhang Team—Robotics Mentor—Matthias Plappert Previous Role: Software developer; PhD in Coding & Information Theory at the University of Toronto. What I Learned: "The Fellows program is very well-suited for bringing a researcher from another technical field up-to-date on the lastest deep learning techniques. Mentorship was a significant factor in my growth as an AI researcher. I always felt that I could discuss ideas and received lots of feedback that helped calibrate my ideas. My experience with deep RL, meta-learning, and solving real-world problems in robotics definitely shaped my research interests and I look forward to exploring them in my future research." Final Project: I studied a transfer metric that can predict the performance of an RL policy trained in simulation when deployed on a physical robot. While training in simulation is highly scalable and efficient, simulators are not perfect models and policies often perform poorly in the real world. The transfer metric does not require repeated rollouts on a physical robot. It helps to resolve the sim-to-real transfer problem by predicting which policy and training procedure will lead to better real-world performance. What's Next: Joining the Robotics team at OpenAI, continuing to work on improving sim-to-real transfer. Mikhail Pavlov Team—Hardware Mentor—Scott Gray Previous Role: Software developer What I Learned: "The Fellows program allowed me to get acquainted with field of machine learning research. I think the curriculum-based learning and mentorship were two very important aspects of this program that helped me to do my research effectively. I also learned that doing research is quite challenging—not all ideas work as you expect, but if you continue formulating hypotheses and checking one thing at time, eventually you will find a promising direction and get good results." Final Project: We studied techniques to learn sparsity patterns in deep neural networks and how structure in sparsity affects parameter efficiency. We developed an additive pruning approach for learning sparsity, when during training we have few cycles of adding and pruning blocks of weights. Specially designed kernels for block sparse matrix multiplication and this additive pruning approach allowed us to explore more diverse topologies that previously hadn't been possible. We showed that sparse models are more parameter efficient and give lower loss than dense networks for the same parameters budget. What's Next: Joining the Hardware team at OpenAI, continuing to investigate sparsity in neural networks. Next Steps We’d like to congratulate our Fall 2018 Fellows on their outstanding work and thank them for their contributions to OpenAI. We are excited to see their research continue! If you want to go from a beginner to producing world class ML contributions, consider applying for our next round of OpenAI Fellows, starting July 2019. We are currently accepting applications and reviewing them on a rolling basis, so apply early! As part of our effort to educate more people like our class of Fellows, we recently open sourced part of their introductory curriculum. You can start your ML education today by completing our tutorial, “Spinning up in Deep RL.” Spinning up in Deep RL consists of examples of RL code, educational exercises, documentation, and tutorials that will help you become a skilled practitioner in RL. Read more »
  • MuseNet
    true
    We've created MuseNet, a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. MuseNet was not explicitly programmed with our understanding of music, but instead discovered patterns of harmony, rhythm, and style by learning to predict the next token in hundreds of thousands of MIDI files. MuseNet uses the same general-purpose unsupervised technology as GPT-2, a large-scale transformer model trained to predict the next token in a sequence, whether audio or text. Samples Since MuseNet knows many different styles, we can blend generations in novel ways[1]. Here the model is given the first 6 notes of a Chopin Nocturne, but is asked to generate a piece in a pop style with piano, drums, bass, and guitar. The model manages to blend the two styles convincingly, with the full band joining in at around the 30 second mark: Try MuseNet We’re excited to see how musicians and non-musicians alike will use MuseNet to create new compositions[2]! In simple mode (shown by default), you'll hear random uncurated samples that we've pre-generated. Choose a composer or style, an optional start of a famous piece, and start generating. This lets you explore the variety of musical styles the model can create. In advanced mode you can interact with the model directly. The completions will take longer, but you'll be creating an entirely new piece. Some of MuseNet's limitations include: The instruments you ask for are strong suggestions, not requirements. MuseNet generates each note by calculating the probabilities across all possible notes and instruments. The model shifts to make your instrument choices more likely, but there's always a chance it will choose something else. MuseNet has a more difficult time with odd pairings of styles and instruments (such as Chopin with bass and drums). Generations will be more natural if you pick instruments closest to the composer or band’s usual style. Composer and instrumentation tokens We created composer and instrumentation tokens to give more control over the kinds of samples MuseNet generates. During training time, these composer and instrumentation tokens were prepended to each sample, so the model would learn to use this information in making note predictions. At generation time, we can then condition the model to create samples in a chosen style by starting with a prompt such as a Rachmaninoff piano start: Or prompted with the band Journey, with piano, bass, guitar, and drums: We can visualize the embeddings from MuseNet to gain insight into what the model has learned. Here we use t-SNE to create a 2-D map of the cosine similarity of various musical composer and style embeddings. Hover over a specific composer or style to see how it relates to others. Long-term structure MuseNet uses the recompute and optimized kernels of Sparse Transformer to train a 72-layer network with 24 attention heads—with full attention over a context of 4096 tokens. This long context may be one reason why it is able to remember long-term structure in a piece, like in the following sample imitating Chopin: It can also create musical melodic structures, as in this sample imitating Mozart: Music generation is a useful domain for testing the Sparse Transformer as it sits on a middle ground between text and images. It has the fluid token structure of text (in images you can look back N tokens and find the row above, whereas in music there’s not a fixed number for looking back to the previous measure). Yet we can easily hear whether the model is capturing long term structure on the order of hundreds to thousands of tokens. It’s much more obvious if a music model messes up structure by changing the rhythm, in a way that it’s less clear if a text model goes on a brief tangent. Dataset We collected training data for MuseNet from many different sources. ClassicalArchives and BitMidi donated their large collections of MIDI files for this project, and we also found several collections online, including jazz, pop, African, Indian, and Arabic styles. Additionally, we used the MAESTRO dataset. The transformer is trained on sequential data: given a set of notes, we ask it to predict the upcoming note. We experimented with several different ways to encode the MIDI files into tokens suitable for this task. First, a chordwise approach that considered every combination of notes sounding at one time as an individual "chord", and assigned a token to each chord. Second, we tried condensing the musical patterns by only focusing on the starts of notes, and tried further compressing that using a byte pair encoding scheme. We also tried two different methods of marking the passage of time: either tokens that were scaled according to the piece’s tempo (so that the tokens represented a musical beat or fraction of a beat), or tokens that marked absolute time in seconds. We landed on an encoding that combines expressivity with conciseness: combining the pitch, volume, and instrument information into a single token. bach piano_strings start tempo90 piano:v72:G1 piano:v72:G2 piano:v72:B4 piano:v72:D4 violin:v80:G4 piano:v72:G4 piano:v72:B5 piano:v72:D5 wait:12 piano:v0:B5 wait:5 piano:v72:D5 wait:12 piano:v0:D5 wait:4 piano:v0:G1 piano:v0:G2 piano:v0:B4 piano:v0:D4 violin:v0:G4 piano:v0:G4 wait:1 piano:v72:G5 wait:12 piano:v0:G5 wait:5 piano:v72:D5 wait:12 piano:v0:D5 wait:5 piano:v72:B5 wait:12 Sample encoding which combines pitch, volume, and instrument. During training, we: Transpose the notes by raising and lowering the pitches (later in training, we reduce the amount of transposition so that generations stay within the individual instrument ranges). Augment the volumes, turning up or turning down the overall volumes of the various samples. Augment timing (when using the absolute time in seconds encoding), effectively slightly slowing or speeding up the pieces. Use mixup on the token embedding space We also create an inner critic: the model is asked during training time to predict whether a given sample is truly from the dataset or if it is one of the model's own past generations. This score is used to select samples at generation time. Embeddings We added several different kinds of embeddings to give the model more structural context. In addition to the standard positional embeddings, we added a learned embedding that tracks the passage of time in a given sample. This way, all of the notes that sound at the same time are given the same timing embedding. We then add an embedding for each note in a chord (this mimics relative attention, since it will be easier for the model to learn that note 4 needs to look back at note 3, or else at note 4 of the previous chord). Finally, we add two structural embeddings which tell the model where a given musical sample is within the larger musical piece. One embedding divides the larger piece into 128 parts, while the second encoding is a countdown from 127 to 0 as the model approaches the (end) token. We’re excited to hear what people create! If you create a piece you like, you can upload it to a free service like Instaudio and then tweet us the link (the MuseNet demo has a tweet button to help with this). If you’re interested in learning more about OpenAI’s music work, consider applying to join our team. Please feel free to email us with suggestions for the MuseNet demo. We'd also love to hear from you if you're interested in composing with MuseNet in more depth, or if you have MIDI files you'd like to add to the training set. MuseNet played an experimental concert on April 25th, 2019, livestreamed on OpenAI’s Twitch channel, in which no human (including us) had heard the pieces before. <!-- --> Acknowledgments Thanks to Rewon Child and Scott Gray for their work on the Sparse Transformer, and Jeff Wu and Alec Radford for their work on GPT-2. We also thank the following for feedback on drafts of this post: Greg Brockman, Ilya Sutskever, Durk Kingma, Arvind Neelakantan, Tim Salimans, Rob Laidlow, Judith Finell, Moni Simeonov, Ray Iwazumi, Sam McCandlish, Miles Brundage, Jack Clark, Jonas Schneider, Chris Olah. Editor Ashley Pilipiszyn Design & Development Justin Jay Wang, Nicholas Benson, Eric Sigler Cover Artwork Ben Barry Footnotes If you're interested in other projects for creating AI generated music using transformers, we recommend checking out Magenta's piano generation work. ↩︎ For use of outputs created by MuseNet, please cite this blog post as Payne, Christine. "MuseNet." OpenAI, 25 Apr. 2019, openai.com/blog/musenet Please note: We do not own the music output, but kindly ask that you not charge for it. While unlikely, we make no guarantee that the music is free from external copyright claims. ↩︎ Read more »
  • Generative Modeling with Sparse Transformers
    true
    We've developed the Sparse Transformer, a deep neural network which sets new records at predicting what comes next in a sequence—whether text, images, or sound. It uses an algorithmic improvement of the attention mechanism to extract patterns from sequences 30x longer than possible previously. Read PaperView Code One existing challenge in AI research is modeling long-range, subtle interdependencies in complex data like images, videos, or sounds. The Sparse Transformer incorporates an $O(N \sqrt{N})$ reformulation of the $O(N^2)$ Transformer self-attention mechanism, along with several other improvements, to apply it directly to these rich data types. Previously, models used on these data were specifically crafted for one domain or difficult to scale to sequences more than a few thousand elements long. In contrast, our model can model sequences with tens of thousands of elements using hundreds of layers, achieving state-of-the-art performance across multiple domains. At OpenAI, we're using it to help us build AI systems that possess a greater ability to understand the world. Deep Attention In Transformers, every output element is connected to every input element, and the weightings between them are dynamically calculated based upon the circumstances, a process called attention. While it is believed that this allows Transformers to be more flexible than models with fixed connectivity patterns, in practice it requires the creation of an $N\times N$ attention matrix for every layer and attention head, which can consume large amounts of memory when applied to data types with many elements, like images or raw audio. Data Type Stored Recomputed 1024 text tokens (several paragraphs) 1.0 GB 16 MB 32x32x3 pixels (CIFAR-10 image) 9.6 GB 151 MB 64x64x3 pixels (Imagenet 64 image) 154 GB 2.4 GB 24,000 samples (~2 seconds of 12 kHz audio) 590 GB 9.2GB Attention memory usage for a deep Transformer (64 layers and 4 heads) when matrices are stored in memory or recomputed during the backward pass. For reference, standard GPUs used for deep learning typically have memory of 12-32 GB. One way to reduce this is by recomputing the attention matrix from checkpoints during backpropagation, a well-established technique in deep learning for reducing memory usage at the cost of more computation. When done for the attention matrix in Transformers, it means the largest memory cost becomes independent of the number of layers, letting us train networks with substantially greater depth than possible previously. In practice, we found that Transformers with depth up to 128 layers outperformed shallower networks on benchmark tasks like CIFAR-10. To train these models with increased depth, we made several adjustments to the ordering of operations in the transformer and modified the initialization scheme. Full details can be seen in our paper. Sparse Attention Even computing a single attention matrix, however, can become impractical for very large inputs. We instead use sparse attention patterns, where each output position only computes weightings from a subset of input positions. When the subset is small relative to the full set of inputs (say, $\sqrt{N}$ elements instead of $N$ elements), the resulting attention computation becomes tractable even for very long sequences, with an algorithmic complexity of $O(N \sqrt{N})$ instead of $O(N^2)$. To assess the feasibility of the approach, we first visualized the learned attention patterns for deep Transformers on images, finding that many showed interpretable and structured sparsity patterns. Each of the below images shows which input pixels (highlighted in white) are attended to by a given attention head in order to predict the next value in the image. When the input portions are focused on small subsets and show a high degree of regularity, the layer is amenable to sparsification. A sampling of them are displayed here for a 128-layer model on CIFAR-10 images: Layer 19 Layer 20 Learned attention patterns (white highlight) for several layers of a 128-layer CIFAR-10 network. These layers learned to separate attention across two dimensions. Layer 19 summarizes information for each row, and layer 20 aggregates those summaries by column, leading to an efficient factorization of the full attention operation. Layer 6 Layer 36 Some layers learned to access a positional memory, often attending to similar locations regardless of the input data or timestep (layer 6). Other layers learned highly data-dependent access patterns (layer 36). While many layers displayed sparse structure, some layers clearly display dynamic attention that stretch over the entirety of the image. In order to preserve the ability of our network to learn such patterns, we implemented a two-dimensional factorization of the attention matrix, where the network can attend to all positions through two steps of sparse attention. Normal transformer Strided attention Fixed attention The first version, strided attention, is roughly equivalent to each position attending to its row and its column, and is similar to the attention pattern learned by the network above. (Note that the column attention can be equivalently formulated as attending to the row of the transposed matrix). The second version, fixed attention, attends to a fixed column and the elements after the latest column element, a pattern we found useful for when the data didn’t fit into a two-dimensional structure (like text). For more details, we refer readers to our paper. Experimental results Sparse Transformers set new state-of-the-art scores for density estimation of CIFAR-10, Enwik8, and Imagenet 64. CIFAR10 Bits per dim PixelCNN++ (Salimans et al, 2017) 2.92 Image Transformer (Parmar et. al, 2018) 2.90 PixelSNAIL (Chen et al., 2017) 2.85 Sparse Transformer 59M (256W, 128L, 2H) 2.80 Enwik8 Bits per byte Deeper Self-Attention (Al-Rfou et al, 2018) 1.06 Transformer-XL 88M (Dai et al., 2018) 1.03 Transformer-XL 277M (Dai et al., 2018) 0.99 Sparse Transformer 95M (512W, 30L, 8H) 0.99 ImageNet 64x64 Bits per dim Gated PixelCNN (van den Oord et al, 2016) 3.57 Parallel Multiscale (Reed et al, 2017) 3.7 SPN 150M (Menick & Kalchbrenner, 2018) 3.52 Sparse Transformer 152M (512W, 48L, 16H) 3.44 Density modeling performance in bits per byte (or dim) on a variety of benchmark datasets. M denotes millions of parameters used in the network, W the width of the network, L the number of layers, and H the number of heads. We also found that sparse attention achieved lower loss than full attention, in addition to being significantly faster (see our paper for comparisons). This may point to a useful inductive bias from our sparsity patterns, or an underlying optimization issue with dense attention. Generating images Transformers that use sparse attention seem to have a notion of global structure, which can be qualitatively evaluated by looking at image completions. Here we visualize a model trained on $64\times 64$ ImageNet: Prompt Completions Ground truth We also generated fully unconditional samples with an unadjusted softmax temperature of 1.0. These models are trained using the maximum likelihood objective, which is well-known to cover all modes of the data (including potentially nonexistent ones) instead of increasing fidelity of a smaller portion of the data. Sampling from these models with unadjusted temperature lets us see the full distribution of images that the model believes exists in the world. As a result, some samples can appear strange. Model samples Real data Generating raw audio waveforms Sparse Transformers can also be adapted to generate raw audio instead of images by simply changing the position embeddings. As deep learning expands to novel data types, we believe the ease of specifying inductive biases with this class of networks will be a useful tool. This model was trained on raw classical music clips and uses sparse attention to generate sequences of length 65,000. This corresponds to ~5 seconds of raw audio, and we have concatenated several samples together in each of the clips below. Code release Normally, implementing sparse attention would involve slicing query and key matrices in blocks, so to ease experimentation we implemented a set of block-sparse kernels which efficiently perform these operations on the the GPU. We open-source these kernels and provide example sparse attention functions in this repository. Future work and limitations The sparse attention patterns we introduced are only preliminary steps in the direction of efficient modeling of long sequences. We think exploring different patterns and combinations of sparsity is useful, and that learning sparse patterns is a particularly promising avenue of research for the next generation of neural network architectures. Even with the improvements we described above, autoregressive sequence generation still seems impractical for very high resolution images or video. The optimized attention operations we have introduced, however, may be useful primitives to combine with other approaches to modeling high dimensional data, like multi-scale approaches. If you are interested in advancing AI capabilities and helping further our mission of ensuring they benefit humanity, we’re hiring! Acknowledgments Thanks to Ashish Vaswani for helpful discussions, and Johannes Otterbach, Mark Chen, Prafulla Dhariwal, David Luan, and Lukasz Kaiser for comments on the manuscript. Read more »
  • How to Train Your OpenAI Five
    true
    OpenAI Five is the first AI to beat the world champions in an esports game, having won two back-to-back games versus the world champion Dota 2 team, OG, at Finals this weekend. Both OpenAI Five and DeepMind's AlphaStar had previously beaten good pros privately but lost their live pro matches, making this also the first time an AI has beaten esports pros on livestream. OpenAI Five's record versus semi-pro team Lithium and pro teams SG esports, Alliance, and OG since our losses at The International. Team OG and the OpenAI dev team. At OpenAI Five Finals, we also shared two surprises: OpenAI Five discovered a rudimentary ability to be a teammate with humans, even though our training process focuses exclusively on beating other bots. The ease with which we turned a competitive AI into a cooperative one makes us hopeful that future AI systems can be very beneficial for humans given active development effort. From April 18th–21st, we're scaling up OpenAI Five to play the Internet, whether as a competitor or teammate. This final test will let us answer an important research question—to what extent OpenAI Five is exploitable or can otherwise be reliably beaten—and be potentially the largest-ever deployment of a highly-competent deep reinforcement learning agent that people can knowingly interact with. Click to register to play OpenAI Five Watch OpenAI Five Finals Download replay files & OpenAI Five planning view Why Dota? We started OpenAI Five in order to work on a problem that felt outside of the reach of existing deep reinforcement learning [1] algorithms. We hoped that by working on a problem that was unsolvable by current methods, we'd need to make a big increase in the capability of our tools. We were expecting to need sophisticated algorithmic ideas, such as hierarchical reinforcement learning, but we were surprised by what we found: the fundamental improvement we needed for this problem was scale. Achieving and utilizing that scale wasn't easy and was the bulk of our research effort! OpenAI Five sees the world as a bunch of numbers that it must decipher. It uses the same general-purpose learning code whether those numbers represent the state of a Dota game (about 20,000 numbers) or robotic hand (about 200). To build OpenAI Five, we created a system called Rapid which let us run PPO at previously unprecedented scale. The results exceeded our wildest expectations, and we produced a world-class Dota bot without hitting any fundamental performance limits. The surprising power of today's RL algorithms comes at the cost of massive amounts of experience, which can be impractical outside of a game or simulated environment. This limitation may not be as bad as sounds—for example, we used Rapid to control a robotic hand to dexterously reorient a block, trained entirely in simulation and executed on a physical robot. But we think decreasing the amount of experience is a next challenge for RL. We are retiring OpenAI Five as a competitor today, but progress made and technology developed will continue to drive our future work. This isn't the end of our Dota work—we think that Dota is a much more intrinsically interesting and difficult (and now well-understood!) environment for RL development than the standard ones used today. Compute OpenAI Five's victories on Saturday, as compared to its losses at The International 2018, are due to a major change: 8x more training compute. In many previous phases of the project, we'd drive further progress by increasing our training scale. But after The International, we'd already dedicated the vast majority of our project's compute to training a single OpenAI Five model. So we increased the scale of compute in the only way available to us: training for longer. OpenAI Five's TrueSkill as we've applied additional training compute, with lines demarcating major system changes (moving to single courier; increasing LSTM size to 4096 units; upgrading to patch versions 7.20 and 7.21; and starting to learn buyback). The graph is roughly linear, meaning that OpenAI Five benefited continually from additional compute (note this is a log-log plot, since the x-axis is logarithm of compute and TrueSkill corresponds roughly to exponential progress). This graph evaluates all bots on the final game rules (1 courier, patch 7.21, etc)—even those trained on older ones. A steep slope after any of these indicates OpenAI Five adapting to that change; depending on the change the evaluation may be unfair to the versions before. In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months (up from about 10,000 years over 1.5 realtime months as of The International), for an average of 250 years of simulated experience per day. The Finals version of OpenAI Five has a 99.9% winrate versus the TI version [2]. Transfer learning The current version of OpenAI Five has been training continuously since June 2018, despite changes to the model size and the game rules (including some fairly large game patch updates and newly implemented features). In each case, we were able to transfer the model over and continue training—something that is an open challenge for RL in other domains. To the best of our knowledge, this is the first time an RL agent has been trained using such a long-lived training run. To make this work, we've continued to flesh out our surgery tooling so that we can start from trained parameters even across substantial architecture changes. More heroes We saw very little slowdown in training going from 5 to 18 heroes. We hypothesized the same would be true going to even more heroes, and after The International, we put a lot of effort into integrating new ones. We spent several weeks training with hero pools up to 25 heroes, bringing those heroes to approximately 5k MMR (about 95th percentile of Dota players). Although they were still improving, they weren't learning fast enough to reach pro level before Finals. We haven't yet had time to investigate why, but our hypotheses range from insufficient model capacity to needing better matchmaking for the expanded hero pool to requiring more training time for new heroes to catch up to old heroes. Imagine how hard it is for a human to learn a new hero when everyone else has mastered theirs! We believe these issues are fundamentally solvable, and solving them could be interesting in its own right. The Finals version plays with 17 heroes—we removed Lich because his abilities were changed significantly in Dota version 7.20. Cooperative mode It actually felt nice; my Viper gave his life for me at some point. He tried to help me, thinking "I'm sure she knows what she's doing" and then obviously I didn't. But, you know, he believed in me. I don't get that a lot with [human] teammates.—Sheever During Finals, we showcased OpenAI Five playing on a team alongside humans. This game featured Blitz and Sheever together with 3 agents controlled by Five on one team facing off against ODPixel and Capitalist playing with 3 agents controlled by a separate copy of Five. OpenAI Five's ability to play with humans presents a compelling vision for the future of human-AI interaction, one where AI systems collaborate and enhance the human experience. Our testers reported feeling supported by their bot teammates, that they learned from playing alongside these advanced systems, and that it was generally a fun experience overall. Note that OpenAI Five exhibits zero-shot transfer learning—it was trained to have all heroes controlled by copies of itself, but generalizes to controlling a subset of heroes, playing with or against humans. We were very surprised this worked as well as it did. In fact, we'd considered doing a cooperative match at The International but assumed it'd require dedicated training. Arena We’re launching OpenAI Five Arena, a public experiment where we'll let anyone play OpenAI Five in both competitive and cooperative modes. We'd known that our 1v1 bot would be exploitable through clever strategies; we don't know to what extent the same is true of OpenAI Five, but we're excited to invite the community to help us find out! Arena opens Thursday, April 18th at 6pm PST and will close 11:59pm PST on Sunday, April 21st. Please register so we can ensure there's enough server capacity in your region! Results of all games will be automatically reported to the Arena public leaderboard. We're incredibly grateful for all the support the Dota community has shown us over the past two years, and we hope that Arena will also serve as one small way of giving back. Have fun with it! What's next We will be releasing a more technical analysis of OpenAI Five once we've reviewed the outcomes of OpenAI Five Arena. Afterwards, we'll continue working with the Dota 2 environment within OpenAI. We've seen rapid progress in the past two years on RL capabilities, and we think that Dota 2 will continue to help us push forward what's possible—whether with achieving competent performance from less data or true human-AI cooperation. If you are interested in advancing AI capabilities and helping further our mission of ensuring they benefit humanity, we're hiring! Game replays & planning views OG Game 1: Replay, OpenAI Five planning view OG Game 2: Replay, OpenAI Five planning view Coop Game: Replay, OpenAI Five planning view Planning View Legend Footnotes Deep reinforcement learning is the idea of training a deep neural network to achieve goals using rewards and punishments ↩︎ Winrate evaluated on the current game patch. This biases the winrate towards the Finals version as the TI version was trained on an older patch, but currently we don't have another way to compare agents trained on different game versions. ↩︎ Read more »
  • OpenAI Five Finals
    true
    We'll be holding our final live event for OpenAI Five at 11:30a PT on April 13th. We’ll showcase aspects of OpenAI Five which we think illustrate how humans and AI will interact in the future. We believe that AI's impact on the world will be driven by its competence, scalability, and ability to enhance what humans can do — and this event will use OpenAI Five to concretely demonstrate each of these. We hope Finals will help people better internalize AI progress and how it will affect the world. Request to attend in personWatch on Twitch Your browser does not support video A sneak peak of a visualization of OpenAI Five's objectives. We started working with Dota 2 because we expected it to be a good testbed for developing general-purpose AI technologies. It has additionally turned out to be a great avenue for helping people experience modern AI — which we expect to become a high-stakes part of people's lives in the future, starting with systems like self-driving cars. As part of the event, we're honored to compete against the reigning Dota 2 world champions, OG, who will test OpenAI Five at the limits of human ability. We'll also be joined by Blitz, Capitalist, ODPixel, Purge, and Sheever. Games will be played with rules similar to those used for the OpenAI Five matches at The International 2018. Watch the event OpenAI Five Finals will be hosted in the Bay Area on April 13th. The event will run from 11:30a to about 4p (exact length depends on game duration). Doors will open at 11a. Last year's Benchmark — a taste of what Finals will be like. If you’d like to attend in person, please request an invite by Friday 3/29 at 9:00pm PT; invites will be sent by the end of Monday 4/1. Our venue has limited seating, so we’ll be selecting invitees based on their answers to the request form. If you can't attend in person, please tune in on Twitch! Read more »
  • Implicit Generation and Generalization Methods for Energy-Based Models
    true
    We've made progress towards stable and scalable training of energy-based models (EBMs) resulting in better sample quality and generalization ability than existing models. Generation in EBMs spends more compute to continually refine its answers and doing so can generate samples competitive with GANs at low temperatures[1], while also having mode coverage guarantees of likelihood-based models. We hope these findings stimulate further research into this promising class of models. Read PaperView Code + Pre-trained Models Generative modeling is the task of observing data, such as images or text, and learning to model the underlying data distribution. Accomplishing this task leads models to understand high level features in data and synthesize examples that look like real data. Generative models have many applications in natural language, robotics, and computer vision. Energy-based models represent probability distributions over data by assigning an unnormalized probability scalar (or “energy”) to each input data point. This provides useful modeling flexibility—any arbitrary model that outputs a real number given an input can be used as an energy model. The difficulty however, lies in sampling from these models. Conditional ImageNet32x32 model samples. To generate samples from EBMs, we use an iterative refinement process based on Langevin dynamics. Informally, this involves performing noisy gradient descent on the energy function to arrive at low-energy configurations (see paper for more details). Unlike GANs, VAEs, and Flow-based models, this approach does not require an explicit neural network to generate samples - samples are generated implicitly. The combination of EBMs and iterative refinement have the following benefits: Adaptive computation time. We can run sequential refinement for long amount of time to generate sharp, diverse samples or a short amount of time for coarse less diverse samples. In the limit of infinite time, this procedure is known to generate true samples from the energy model. Not restricted by generator network. In both VAEs and Flow based models, the generator must learn a map from a continuous space to a possibly disconnected space containing different data modes, which requires large capacity and may not be possible to learn. In EBMs, by contrast, can easily learn to assign low energies at disjoint regions. Built-in compositionality. Since each model represents an unnormalized probability distribution, models can be naturally combined through product of experts or other hierarchical models. Generation We found energy-based models are able to generate qualitatively and quantitatively high-quality images, especially when running the refinement process for a longer period at test time. By running iterative optimization on individual images, we can auto-complete images and morph images from one class (such as truck) to another (such as frog). Testimages 1 Testimages 2 Trainimages 1 Trainimages 2 Original Completions Corruption Image completions on conditional ImageNet model. Our models exhibit diversity in inpainting. Note that inputs are from test distribution and are not model samples, indicating coverage of test data. Deer Car Frog Bird Airplane Truck Frog Ship Ship Ship Truck Deer Cross-class implicit sampling on a conditional model. The model is conditioned on a particular class but is initialized with an image from a separate class. In addition to generating images, we found that energy-based models are able to generate stable robot dynamics trajectories across large number of timesteps. EBMs can generate a diverse set of possible futures, while feedforward models collapse to a mean prediction. GroundTruth FullyConnected EBMSample 1 EBMSample 2 T = 0 T = 20 T = 40 T = 60 T = 80 Top down views of robot hand manipulation trajectories generated unconditionally from the same starting state (1st frame). The FC network predicts a hand that does not move, while the EBM is able to generate distinctively different trajectories that are feasible. Generalization We tested energy-based models on classifying several different out-of-distribution datasets and found that energy-based models outperform other likelihood models such as Flow based and autoregressive models. We also tested classification using conditional energy-based models, and found that the resultant classification exhibited good generalization to adversarial perturbations. Our model—despite never being trained for classification—performed classification better than models explicitly trained against adversarial perturbations. Lessons learned We found evidence that suggest the following observations, though in no way are we certain that these observations are correct: We found it difficult to apply vanilla HMC to EBM training as optimal step sizes and leapfrog simulation numbers differ greatly during training, though applying adaptive HMC would be an interesting extension. We found training ensembles of energy functions (sampling and evaluating on ensembles) to help a bit, but was not worth the added complexity. We didn’t find much success adding a gradient penalty term, as it seemed to hurt model capacity and sampling. More tips, observations and failures from this research can be found in Section A.8 of the paper. Next steps We found preliminary indications that we can compose multiple energy-based models via a product of experts model. We trained one model on different size shapes at a set position and another model on same size shape at different positions. By combining the resultant energy-based models, we were able to generate different size shapes at different locations, despite never seeing examples of both being changed. Energy A Energy B Energy A + B A 2D example of combining energy functions through their summation and the resulting sampling trajectories. Compositionality is one of the unsolved challenges facing AI systems today, and we are excited about what energy-based models can do here. If you are excited to work on energy-based models please consider applying to OpenAI! Acknowledgments Thanks to Ilya Sutskever, Greg Brockman, Bob McGrew, Johannes Otterbach, Jacob Steinhardt, Harri Edwards, Yura Burda, Jack Clark and Ashley Pilipiszyn for feedback on this blog post and manuscript. Footnotes See Equation 2 in this paper. ↩︎ Read more »
  • OpenAI Scholars Spring 2019
    true
    Our class of eight scholars (out of 550 applicants) brings together collective expertise in literature, philosophy, cell biology, statistics, economics, quantum physics, and business innovation. Our scholars are applying these specializations to current AI research and documenting their progress as they continue to grow as machine learning practitioners. Our Spring class of scholars with members of the OpenAI team. This is our second class of OpenAI Scholars. Their program began in February and will conclude with the completion of an open-source final project. Throughout the program, scholars share their progress with the research community through their blogs. Some applications our scholars are working towards are: Applying reinforcement learning to robotic manipulation Improving inference and reasoning in natural language processing Applying reinforcement learning algorithms to sentiment analysis Meet the scholars: Fatma Tarlaci Mentor: Jonathan Raiman Working from Austin, TX Website Twitter Fatma received her PhD in Comparative Literature from the University of Texas at Austin in 2016 and earned her Masters in Computer Science from Stanford University in 2018 as an Eric Roberts Fellow. Her knowledge of languages, cultures, and literature led her to explore the human dimension of AGI. Fatma is currently a computer science instructor at St. Edwards University and is interested in the intersection between natural language processing (NLP) and computer vision (CV). She is an avid advocate of diversity in AI and believes that a better representation in AI is critical as it permeates into all aspects of human life. As an OpenAI Scholar, Fatma works on NLP methodologies and aims to complete a project that explores ways of improving inference and reasoning in NLP. Jonathan Michaux Mentor: Feryal Behbahani Working from Chicago and San Francisco Website Twitter Jonathan is a cell biologist (PhD), mathematician (BA), and robotics enthusiast who is deeply interested in the movement and control of complex systems. At the cellular level, he studied the mechanisms that control cell-shape changes in embryonic cells. As an aspiring roboticist, he is applying reinforcement learning to robot manipulation. His long-term research objective is to combine tools from machine learning and optimization with insights from control theory to design algorithms for robotic locomotion and manipulation in real-world settings. Nancy Otero Mentor: Kai Arulkumaran Working from New York City and Mexico City Website Twitter Nancy has been researching learning for the past 10 years. Thinking about human construction of knowledge is her passion. With a background in software engineering, math, psychology and education from Stanford University, Nancy wants to use multidisciplinary approaches to develop AI prototypes that could improve education. She’s also interested in understanding how AI is redefining how, why and what humans will learn in the near future. She’s on the founding team of the Portfolio School, a project-based school in NYC, and the co-founder of a non-profit in Mexico. Elynn Chen Mentor: Lilian Weng Working from Princeton, NJ Website Elynn received her PhD in Statistics in 2018. Her PhD focused on spectral method and matrix/tensor factorization on high- and multi-dimensional data. Her research interests lie at the intersection of statistical learning theory, machine learning and optimization. At OpenAI, she works on deep RL and its applications to healthcare and business management. Helen (Mengxin) Ji Mentor: Azalia Mirhoseini Working from the Bay Area Website Twitter Helen is a PhD student in Resource Economics and a Masters student in Statistics at UC Davis. Her research interests focus on machine learning methods (both classical statistical learning and deep learning) and their application to Energy Economics, and heterogeneous causal inference. She was an applied research intern at Microsoft in 2018 and a 2017 research fellow with Data Science for Social Good at the University of Chicago. In 2018, she was awarded Twitter's Grace Hopper fellowship and also the Women in Quantitative Finance fellowship. As an OpenAI Scholar, Helen works on RL methodologies and plans to complete a project that can apply RL algorithms on sentiment analysis. Yuhao Wan Mentor: Josh Achiam Working from the Bay Area Website Twitter Yuhao recently graduated from Carleton College studying Mathematics and Philosophy. Fascinated by the structure and dynamics of our world, Yuhao also explored physics, law, and economics. She discovered her interest in research and problem solving through Budapest Semesters in Mathematics and REU in Combinatorics and Algorithms for Real Problems. At OpenAI, Yuhao studies machine learning with a focus on deep reinforcement learning. Currently, she is interested in understanding how learning methods exhibit certain degrees of generalization. Janet Brown Mentor: Christy Dennison Working from San Francisco, CA Website Twitter Janet has always been fascinated by the visual dimension & using spatial approaches to help augment analysis in traditionally non-visual problem domains. As an OpenAI Scholar, she investigates the possibilities of generative models & their ability to help identify the most critical features of data/images as part of generating reconstructions. Currently, Janet leads Atakote, where she works with technologies like augmented & virtual reality to transform traditional industries such as retail, manufacturing, and transportation. Previously, Janet studied at Harvard Business School & worked at major companies, such as McKinsey & Company, in 20+ countries. Edgar Barraza Mentor: Susan Zhang Working from Ithaca, NY Website Twitter Edgar is a recent graduate of Cornell University’s Physics program. Originally trained as an experimentalist working on hybrid-quantum systems, he dove into deep learning by applying techniques in computer vision to search for sub-atomic particles represented as images. He hopes to provide people with the resources they need by utilizing AI’s power to accomplish tasks that were once only possible by humans. To work towards this goal, Edgar spends his time as an OpenAI Scholar focusing on natural language understanding. Our Scholars demonstrate core technical skills across various expert domains and self-motivation—critical competences for a self-directed program like this one. They each entered the field of machine learning as relative newcomers, and we hope their progress shows how accessible machine learning is. To begin your learning journey, check out some of our educational materials. Thanks to AWS for providing compute credits to the scholars. Additional thank you to our dedicated community mentors for their time advising the scholars on their projects. Read more »
  • OpenAI LP
    true
    We've created OpenAI LP, a new "capped-profit" company that allows us to rapidly increase our investments in compute and talent while including checks and balances to actualize our mission. OpenAI team and their families at our November 2018 offsite. Our mission is to ensure that artificial general intelligence (AGI) benefits all of humanity, primarily by attempting to build safe AGI and share the benefits with the world. We’ve experienced firsthand that the most dramatic AI systems use the most computational power in addition to algorithmic innovations, and decided to scale much faster than we’d planned when starting OpenAI. We’ll need to invest billions of dollars in upcoming years into large-scale cloud compute, attracting and retaining talented people, and building AI supercomputers. We want to increase our ability to raise capital while still serving our mission, and no pre-existing legal structure we know of strikes the right balance. Our solution is to create OpenAI LP as a hybrid of a for-profit and nonprofit—which we are calling a "capped-profit" company. The fundamental idea of OpenAI LP is that investors and employees can get a capped return if we succeed at our mission, which allows us to raise investment capital and attract employees with startup-like equity. But any returns beyond that amount—and if we are successful, we expect to generate orders of magnitude more value than we’d owe to people who invest in or work at OpenAI LP—are owned by the original OpenAI Nonprofit entity. Going forward (in this post and elsewhere), “OpenAI” refers to OpenAI LP (which now employs most of our staff), and the original entity is referred to as “OpenAI Nonprofit.” The mission comes first We’ve designed OpenAI LP to put our overall mission—ensuring the creation and adoption of safe and beneficial AGI—ahead of generating returns for investors. The mission comes first even with respect to OpenAI LP’s structure. While we are hopeful that what we describe below will work until our mission is complete, we may update our implementation as the world changes. Regardless of how the world evolves, we are committed—legally and personally—to our mission. OpenAI LP’s primary fiduciary obligation is to advance the aims of the OpenAI Charter, and the company is controlled by OpenAI Nonprofit’s board. All investors and employees sign agreements that OpenAI LP’s obligation to the Charter always comes first, even at the expense of some or all of their financial stake. Our employee and investor paperwork start with big purple boxes like this. The general partner refers to OpenAI Nonprofit (whose legal name is “OpenAI Inc”); limited partners refers to investors and employees. Only a minority of board members are allowed to hold financial stakes in the partnership at one time. Furthermore, only board members without such stakes can vote on decisions where the interests of limited partners and OpenAI Nonprofit’s mission may conflict—including any decisions about making payouts to investors and employees. Another provision from our paperwork specifies that OpenAI Nonprofit retains control. As mentioned above, economic returns for investors and employees are capped (with the cap negotiated in advance on a per-limited partner basis). Any excess returns go to OpenAI Nonprofit. Our goal is to ensure that most of the value (monetary or otherwise) we create if successful benefits everyone, so we think this is an important first step. Returns for our first round of investors are capped at 100x their investment (commensurate with the risks in front of us), and we expect this multiple to be lower for future rounds as we make further progress. What OpenAI does Our day-to-day work is not changing. Today, we believe we can build the most value by focusing exclusively on developing new AI technologies, not commercial products. Our structure gives us flexibility for how to create a return in the long term, but we hope to figure that out only once we’ve created safe AGI. OpenAI LP currently employs around 100 people organized into three main areas: capabilities (advancing what AI systems can do), safety (ensuring those systems are aligned with human values), and policy (ensuring appropriate governance for such systems). OpenAI Nonprofit governs OpenAI LP, runs educational programs such as Scholars and Fellows, and hosts policy initiatives. OpenAI LP is continuing (at increased pace and scale) the development roadmap started at OpenAI Nonprofit, which has yielded breakthroughs in reinforcement learning, robotics, and language. Your browser does not support video Your browser does not support video Your browser does not support video Safety We are excited by the potential for AGI to help solve planetary-scale problems in areas where humanity is failing and there is no obvious solution today. However, we are also concerned about AGI’s potential to cause rapid change, whether through machines pursuing goals misspecified by their operator, malicious humans subverting deployed systems, or an out-of-control economy that grows without resulting in improvements to human lives. As described in our Charter, we are willing to merge with a value-aligned organization (even if it means reduced or zero payouts to investors) to avoid a competitive race which would make it hard to prioritize safety. Who's involved OpenAI Nonprofit’s board consists of OpenAI LP employees Greg Brockman (Chairman & CTO), Ilya Sutskever (Chief Scientist), and Sam Altman (CEO), and non-employees Adam D’Angelo, Holden Karnofsky, Reid Hoffman, Sue Yoon, and Tasha McCauley. Elon Musk left the board of OpenAI Nonprofit in February 2018 and is not formally involved with OpenAI LP. We are thankful for all his past help. Our investors include Reid Hoffman's charitable foundation and Khosla Ventures, among others. We feel lucky to have mission-aligned, impact-focused, helpful investors! We are traveling a hard and uncertain path, but we have designed our structure to help us positively affect the world should we succeed in creating AGI—which we think will have as broad impact as the computer itself and improve healthcare, education, scientific research, and many aspects of people's lives. If you’d like to help us make this mission a reality, we’re hiring :)! Read more »
  • Introducing Activation Atlases
    true
    We’ve created activation atlases (in collaboration with Google researchers), a new technique for visualizing what interactions between neurons can represent. As AI systems are deployed in increasingly sensitive contexts, having a better understanding of their internal decision-making processes will let us identify weaknesses and investigate failures. Read PaperView CodeTry Demo Modern neural networks are often criticized as being a “black box.” Despite their success at a variety of problems, we have a limited understanding of how they make decisions internally. Activation atlases are a new way to see some of what goes on inside that box. #atlas-big { width: 70vw; max-width: 1200px; display: grid; grid-column-gap: 25px; position: relative; /*left: 50%;*/ /*transform: translateX(-50%);*/ /*margin-left: 0px;*/ /*margin-right: 0px;*/ /*margin-top: 40px;*/ /*margin-bottom: 50px;*/ } #atlas-big img { width: 100%; max-width: none; } #atlas-big figcaption { width: 100%; margin: 0px; padding: 0px; text-align: left; } @media only screen and (max-width: 800px) { #atlas-big { width: 95vw; } } @media only screen and (min-width: 1000px) { #atlas-big figcaption { position: absolute; bottom: 5%; right: -5%; width: 400px; margin: 0px; padding: 4px; padding-right: 0px; border-radius: 4px; background: #FFFC; box-shadow: 0px 0px 5px 5px #FFFC; text-align: left; } } An activation atlas of the InceptionV1 vision classification network reveals many fully realized features, such as electronics, buildings, food, animal ears, plants, and watery backgrounds. Explore it ➞ Activation atlases build on feature visualization, a technique for studying what the hidden layers of neural networks can represent. Early work in feature visualization primarily focused on individual neurons. By collecting hundreds of thousands of examples of neurons interacting and visualizing those, activation atlases move from individual neurons to visualizing the space those neurons jointly represent. #atlas-process { /*width: 100vw;*/ max-width: 900px; display: grid; grid-column-gap: 25px; position: relative; /*left: 50%;*/ /*transform: translateX(-50%);*/ /*margin-left: 0px;*/ /*margin-right: 0px;*/ /*margin-top: 40px;*/ /*margin-bottom: 50px;*/ } #atlas-process img { width: 100%; max-width: none; } #atlas-process figcaption { padding: 0px; margin: 0px; text-align: left; } Collect a million activation vectors from different training examples. Arrange them in 2D so that similar ones are close together. Impose a grid and use feature visualization on the average of each cell. Understanding what’s going on inside neural nets isn’t solely a question of scientific curiosity — our lack of understanding handicaps our ability to audit neural networks and, in high stakes contexts, ensure they are safe. Normally, if one was going to deploy a critical piece of software one could review all the paths through the code, or even do formal verification, but with neural networks, our ability to do this kind of review has presently been much more limited. With activation atlases humans can discover unanticipated issues in neural networks — for example, places where the network is relying on spurious correlations to classify images, or where re-using a feature between two classes leads to strange bugs. Humans can even use this understanding to “attack” the model, modifying images to fool it. For example, a special kind of activation atlas can be created to show how a network tells apart frying pans and woks. Many of the things we see are what one expects. Frying pans are more squarish, while woks are rounder and deeper. But it also seems like the model has learned that frying pans and woks can also be distinguished by food around them — in particular, wok is supported by the presence of noodles. Adding noodles to the corner of the image will fool the model 45% of the time! This is similar to work like adversarial patches, but based on human understanding. #class-compare figcaption { text-align: left; margin-left: 0px; margin-right: 0px; padding-left: 0px; padding-right: 0px; } @media only screen and (min-width: 1100px) { #class-compare { position: relative; } #class-compare figcaption { position: absolute; bottom: 10%; right: -270px; width: 220px; margin: 0px; padding-right: 0px; } } InceptionV1 partly relies on the presence of noodles to distinguish woks from frying pans. Adding noodles fools the model 45% of the time. More examples can be found in the paper. Other human-designed attacks based on the network overloading certain feature detectors are often more effective (some succeed as often as 93% of the time). But the noodle example is particularly interesting because it’s a case of the model picking up on something that is correlated, but not causal, with the correct answer. This has structural similarities to types of errors we might be particularly worried about, such as fairness and bias issues. Activation atlases worked better than we anticipated and seem to strongly suggest that neural network activations can be meaningful to humans. This gives us increased optimism that it is possible to achieve interpretability in vision models in a strong sense. We’re excited to have done this work in collaboration with researchers at Google. We believe that working together on safety-relevant research helps us all ensure the best outcome for society as AI research progresses. Want to make neural networks not be a black box? Apply to work at OpenAI. Acknowledgments Thanks to our co-authors at Google: Shan Carter, Zan Armstrong and Ian Johnson. Thanks to Greg Brockman, Dario Amodei, Jack Clark and Ashley Pilipiszyn for feedback on this blog post. We also thank Christian Howard for his help in coordination from the Google side, Phillip Isola for being Distill’s acting editor and Arvind Satyanarayan for feedback on our paper. Read more »
  • Neural MMO: A Massively Multiagent Game Environment
    true
    We're releasing a Neural MMO, a massively multiagent game environment for reinforcement learning agents. Our platform supports a large, variable number of agents within a persistent and open-ended task. The inclusion of many agents and species leads to better exploration, divergent niche formation, and greater overall competence. Read PaperView Code3D Client Your browser does not support video In recent years, multiagent settings have become an effective platform for deep reinforcement learning research. Despite this progress, there are still two main challenges for multiagent reinforcement learning. We need to create open-ended tasks with a high complexity ceiling: current environments are either complex but too narrow or open-ended but too simple. Properties such as persistence and large population scale are key, but we also need more benchmark environments to quantify learning progress in the presence of large population scales and persistence. The game genre of Massively Multiplayer Online Games (MMOs) simulates a large ecosystem of a variable number of players competing in persistent and extensive environments. To address these challenges, we built our Neural MMO to meet the following criteria: Persistence: Agents learn concurrently in the presence of other learning agents with no environment resets. Strategies must consider long time horizons and adapt to potentially rapid changes in the behaviors of other agents. Scale: The environment supports a large and variable number of entities. Our experiments consider up to 100M lifetimes of 128 concurrent agents in each of 100 concurrent servers. Efficiency: The computational barrier to entry is low. We can train effective policies on a single desktop CPU. Expansion: Similarily to existing MMOs, our Neural MMO is designed to update with new content. Current core features include procedural generation of tile-based terrain, a food and water foraging system, and a strategic combat system. There is an opportunity for open-source driven expansion in the future. The Environment Players (agents) may join any available server (environment), each containing an automatically generated tile-based game map of configurable size. Some tiles, such as food-bearing forest tiles and grass tiles, are traversable. Others, such as water and solid stone, are not. Agents spawn at a random location along the edges of the environment. They must obtain food and water, and avoid combat damage from other agents, in order to sustain their health. Stepping on a forest tile or next to a water tile refills a portion of the agent's food or water supply, respectively. However, forest tiles have a limited supply of food, which regenerates slowly over time. This means that agents must compete for food tiles while periodically refilling their water supply from infinite water tiles. Players engage in combat using three combat styles, denoted Melee, Range, and Mage for flavor. Input: Agents observe a square crop of tiles centered on their current position. This includes tile terrain types and the select properties (health, food, water, and position) of occupying agents. Output: Agents output action choices for the next game tick (timestep). Actions consist of one movement and one attack. Our platform provides a procedural environment generator and visualization tools for value functions, map tile visitation distribution, and agent-agent dependencies of learned policies. Baselines are trained with policy gradients over 100 worlds. The Model As a simple baseline, we train a small, fully connected architecture using vanilla policy gradients, with a value function baseline and reward discounting as the only enhancements. Instead of rewarding agents for achieving particular objectives, agents optimize only for their lifetime (trajectory length): they receive reward 1 for each tick of their lifetime. We convert variable length observations, such as the list of surrounding players, into a single length vector by computing the maximum across all players (OpenAI Five also utilized this trick). The source release includes our full distributed training implementation, which is based on PyTorch and Ray. Evaluation Results Maximum population size at train time varies in (16, 32, 64, 128). Policies are shared across groups of 16 agents for efficiency. At test time, we merge the populations learned in pairs of experiments and evaluate lifetimes at a fixed population size. We evaluate with foraging only, as combat policies are more difficult to compare directly. Agents trained in larger populations always perform better. Agents’ policies are sampled uniformly from a number of populations — agents in different populations share architectures, but only agents in the same population share weights. Initial experiments show that agent competence scales with increasing multiagent interaction. Increasing the maximum number of concurrent players magnifies exploration; increasing the number of populations magnifies niche formation — that is, the tendency of populations to spread out and forage within different parts of the map. Server Merge Tournaments: Multiagent Magnifies Competence There is no standard procedure among MMOs for evaluating relative player competence across multiple servers. However, MMO servers sometimes undergo merges where the player bases from multiple servers are placed within a single server. We implement “tournament” style evaluation by merging the player bases trained in different servers. This allows us to directly compare the policies learned in different experiment settings. We vary test time scale and find that agents trained in larger settings consistently outperform agents trained in smaller settings. Increased Population Size Magnifies Exploration Your browser does not support video Population size magnifies exploration: agents spread out to avoid competition. The last few frames show the learned value function overlay. Refer to the [paper](http://arxiv.org/abs/1903.00784) for additional figures. In the natural world, competition among animals can incentivize them to spread out to avoid conflict. We observe that map coverage increases as the number of concurrent agents increases. Agents learn to explore only because the presence of other agents provides a natural incentive for doing so. Increased Species Count Magnifies Niche Formation Species count (number of populations) magnifies niche formation. Visitation maps overlay the game map; different colors correspond to different species. Training a single population tends to produce a single deep exploration path. Training eight populations results in many shallower paths: populations spread out to avoid competition among species. Given a sufficiently large and resource-rich environment, we found different populations of agents separated across the map to avoid competing with others as the populations increased. As entities cannot out-compete other agents of their own population (i.e. agents with whom they share weights), they tend to seek areas of the map that contain enough resources to sustain their population. Similar effects were also independently observed in concurrent multiagent research by DeepMind. Additional Insights Each square map shows the response of an agent, located at the square's center, to the presence of agents around it. We show foraging maps upon initialization and early in training; additional dependency maps correspond to different formulations of foraging and combat. We visualize agent-agent dependencies by fixing an agent at the center of a hypothetical map crop. For each position visible to that agent, we show what the value function would be if there were a second agent at that position. We find that agents learn policies dependent on those of other agents, in both the foraging and combat environments. Agents learn “bull's eye” avoidance maps to begin foraging more effectively after only a few minutes of training. As agents learn the combat mechanics of the environment, they begin to appropriately value effective engagement ranges and angles of approach. Next Steps Our Neural MMO resolves two key limitations of previous game-based environments, but there are still many left unsolved. This Neural MMO strikes a middle ground between environment complexity and population scale. We’ve designed this environment with open-source expansion in mind and for the research community to build upon. If you are excited about conducting research on multiagent systems, consider joining OpenAI. Acknowledgments Thanks to Clare Zhu for her substantial work on the 3D client. We also thank the following for feedback on drafts of this post: Greg Brockman, Ilya Sutskever, Jack Clark, Ashley Pilipiszyn, Ryan Lowe, Julian Togelius, Joel Liebo, Cinjon Resnick. Read more »
  • Spinning Up in Deep RL: Workshop Review
    true
    On February 2nd, we held our first Spinning Up Workshop as part of our new education initiative at OpenAI. We hosted ~90 people at our office and engaged nearly 300 more through our livestream. Participants came from a wide range of backgrounds, including academia, software engineering, data science, ML engineering, medicine, and education. This workshop built off our Spinning Up in Deep RL resource package and took a deeper dive into RL algorithm design, robotics, and building safe AI systems. Livestream RecordingParticipant VideoView Workshop Materials Building Educational Tools One of the goals for education at OpenAI is to help people develop the skills needed to participate in research and development in AI—especially in deep RL, a core area of research at OpenAI. From our experience working with Scholars and Fellows, we’ve found that the key ingredients for skill development are: a flexible curriculum that includes core material and a review of research frontiers, mentorship and discussions with experts, and having the students work on projects that are at the right level to help them grow. The challenge for education at OpenAI is to figure out how to deliver these at scale. While sharing a curriculum at scale is relatively easy, it isn’t obvious how to scale up mentorship and guidance on projects. Our working theory is that workshops might help us do just that. Our first Spinning Up workshop has given us several positive signs that this is a useful direction, and we’re excited to share what we learned. The Crowd We hosted around 90 people at our office and involved nearly 300 more through our livestream. Our guests came from a wide range of backgrounds, including academic research, software engineering, data science, ML engineering, medicine, and education. The level of ML experience varied quite significantly across the group, from “almost none” to “built their own Dota bot!” More than 500 people, from all around the world, applied to participate in this workshop. Although we sadly couldn’t invite everyone to this one because of space constraints, we want to continue engaging the community with future events. The Talks The workshop kicked off with three hours of talks. To start us off, Joshua Achiam laid out the conceptual foundations of reinforcement learning and gave an overview of different kinds of RL algorithms. If you’d like to study this material, check out Spinning Up in Deep RL. Matthias Plappert presented on OpenAI’s recent work training a dexterous robot hand in simulation to manipulate objects in the real world. Domain randomization, recurrent neural networks, and large-scale distributed training were necessary ingredients in bridging the “sim2real” gap for this task. Dario Amodei, the leader of the Safety Team at OpenAI, presented an overview of problems in AI safety and recent work in this space. He described the central safety problem: the fact that correctly specifying agent behavior is hard! It is easy to inadvertently give agents incentives to perform different behavior than what you would have wanted, and when agents are very powerful, this could be dangerous. Dario also described work that OpenAI and collaborators at DeepMind have done to address this issue, in which reward functions are learned from human preferences instead of designed. The Afternoon The workshop continued into the afternoon with a semi-structured program of hacking and breakout sessions. Participants were able to seek guidance on project ideas and research tips from our slate of volunteers, which included Amanda Askell, Alex Ray, Daniel Ziegler, Dylan Hadfield-Menell, Ethan Knight, Karl Cobbe, Matthias Plappert, and Sam McCandlish. The breakout sessions turned out to be the main highlight of the afternoon. Whereas the morning talks covered the conceptual foundations of RL, the breakout sessions were designed to help participants boost their implementation and research skills. In the first session, Karl Cobbe gave an introduction to TensorFlow, a key library used in deep learning research. In the second session, “Writing DQN Together,” Daniel Ziegler led participants step-by-step through the process of implementing a deep RL algorithm. In the third session, “Advanced RL Q&A,” Joshua Achiam described recent research frontiers in RL and took audience questions about doing RL research. Our Take-Aways This was our first experiment with the workshop format, and we were generally pleased with the outcome. In particular, we found it quite gratifying to work directly with such a capable and enthusiastic group of participants. The experience, along with feedback from the group, gave us a good sense of what to keep and what to change for future workshops. What worked: We asked our participants what their highlights were, and these responses are a fairly representative sample: “Learning A TON in a very safe, friendly environment where everyone was mainly on the same level in terms of learning.” “I thought the ability to get one-on-one help and to take on some 'paired programming'-like time with folks who really know what they're doing was incredibly helpful. The enthusiasm of the volunteers was also very high, and I felt very encouraged to ask for help.” Responses like these gave us a sense that the workshop format shined on delivering “mentorship and discussions with experts." What could be improved: We asked our participants what they thought we could have done differently to enhance their experience, and received responses like: “I would've liked a presentation section of potential projects that we could pursue based on our experience level.” “Extend the workshop to two days.” Many participants felt like they either 1) weren’t sure what to work on during the hackathon, or 2) didn’t have enough time to make significant progress on their hacking project. We think this kind of feedback is a good indicator that the 1-day workshop format isn’t enough to “have the students work on projects that are at the right level to help them grow” in RL. In the future, we’ll consider running longer events so we can meet that goal. This feedback also suggests that we should do more to create “shovel-ready” RL projects that participants can jump right in to. What else? Aside from the technical content of the workshop, creating a supportive and inclusive environment was top-of-mind for us, and participants told us this was important for their experience. One piece of feedback read: “This is the first non-female exclusive social event I've been to in Silicon Valley with ~50% women in the room. It was so shocking that I thought I was in the wrong room in the beginning. It was noticeably easier to socialize as a result of the gender balance, so thank you for that.” What's Next OpenAI’s charter gives us a mandate “to create a global community working together to address AGI’s global challenges,” and we’ll continue developing education at OpenAI to help serve that goal. This includes more work on resources like Spinning Up in Deep RL and more events like this Spinning Up Workshop. We are currently planning a second workshop with CHAI at Berkeley, which we expect to formally announce soon. If you would like to help us do research on RL or teach people about AI, please get in touch! We’re hiring. Thanks to Maddie Hall and Loren Kwan for co-organizing the event, to Ian Atha for livestreaming and recording the lectures, as well as helping participants with Python and Tensorflow issues, and to Blake Tucker for filming and photography! Read more »
  • AI Safety Needs Social Scientists
    true
    We've written a paper arguing that long-term AI safety research needs social scientists to ensure AI alignment algorithms succeed when actual humans are involved. Properly aligning advanced AI systems with human values requires resolving many uncertainties related to the psychology of human rationality, emotion, and biases. The aim of this paper is to spark further collaboration between machine learning and social science researchers, and we plan to hire social scientists to work on this full time at OpenAI. Read Paper The goal of long-term artificial intelligence (AI) safety is to ensure that advanced AI systems are aligned with human values — that they reliably do things that people want them to do. At OpenAI we hope to achieve this by asking people questions about what they want, training machine learning (ML) models on this data, and optimizing AI systems to do well according to these learned models. Examples of this research include Learning from Human Preferences, AI Safety via Debate, and Learning Complex Goals with Iterated Amplification. Unfortunately, human answers to questions about their values may be unreliable. Humans have limited knowledge and reasoning ability, and exhibit a variety of cognitive biases and ethical beliefs that turn out to be inconsistent on reflection. We anticipate that different ways of asking questions will interact with human biases in different ways, producing higher or lower quality answers. For example, judgments about how wrong an action is can vary depending on whether the word “morally” appears in the question, and people can make inconsistent choices between gambles if the task they are presented with is complex. We have several methods that try to target the reasoning behind human values, including amplification and debate, but do not know how they behave with real people in realistic situations. If a problem with an alignment algorithm appears only in natural language discussion of a complex value-laden question, current ML may be too weak to uncover the issue. To avoid the limitations of ML, we propose experiments that consist entirely of people, replacing ML agents with people playing the role of those agents. For example, the debate approach to AI alignment involves a game with two AI debaters and a human judge; we can instead use two human debaters and a human judge. Humans can debate whatever questions we like, and lessons learned in the human case can be transferred to ML. For the debate approach to AI alignment, our end goal is ML debaters and a human judge, but ML is too primitive for many interesting tasks. Therefore, we propose replacing the ML debaters with human debaters, learning how to best conduct debates in this human-only setting, and later applying what we learn to the ML/human case. These human-only experiments will be motivated by machine learning algorithms but will not involve any ML systems or require an ML background. They will require careful experimental design to build constructively on existing knowledge about how humans think. Most AI safety researchers are focused on machine learning, which we do not believe is sufficient background to carry out these experiments. To fill the gap, we need social scientists with experience in human cognition, behavior, and ethics, and in the careful design of rigorous experiments. Since the questions we need to answer are interdisciplinary and somewhat unusual relative to existing research, we believe many fields of social science are applicable, including experimental psychology, cognitive science, economics, political science, and social psychology, as well as adjacent fields like neuroscience and law. We believe close collaborations between social scientists and machine learning researchers will be necessary to improve our understanding of the human side of AI alignment. As a first step, several OpenAI researchers helped organize a workshop at Stanford University's Center for Advanced Study in the Behavioral Sciences (CASBS) led by Mariano-Florentino Cuéllar, Margaret Levi, and Federica Carugati, and we continue to meet regularly to discuss issues around social science and AI alignment. We thank them for their valuable insights and participation in these conversations. Our paper is a call for social scientists in AI safety. We are in the process of starting this research at OpenAI, and are hiring full time social science researchers to push these experiments forward. If you are interested in working in this area, please apply! Read more »
WordPress RSS Feed Retriever by Theme Mason

Author: hits1k

Leave a Reply