AI machine learning

  • Introducing Deep Learning Containers: Consistent and portable environments
    It’s easy to underestimate how much time it takes to get a machine learning project up and running. All too often, these projects require you to manage the compatibility and complexities of an ever-evolving software stack, which can be frustrating, time-consuming, and keep you from what you really want to do: spending time iterating and refining your model. To help you bypass this set-up and quickly get started with your project, we’re introducing Deep Learning Containers in beta today. Deep Learning Containers are pre-packaged, performance-optimized, and compatibility-tested, so you can get started immediately. Productionizing your workflow requires not only developing the code or artifacts you want to deploy, but also maintaining a consistent execution environment to guarantee reproducibility and correctness. If your development strategy involves a combination of local prototyping and multiple cloud tools, it can often be frustrating to ensure that all the necessary dependencies are packaged correctly and available to every runtime. Deep Learning Containers address this challenge by providing a consistent environment for testing and deploying your application across GCP products and services, like Cloud AI Platform Notebooks and Google Kubernetes Engine (GKE), making it easy to scale in the cloud or shift across on-prem. In addition, we provide hardware optimized versions of TensorFlow, whether you’re training on NVIDIA GPUs or deploying on Intel CPUs.In this blog post, we’ll cover some common scenarios when working with Deep Learning Containers, including how to select a container, develop locally, and create derivative containers for use in Cloud AI Platform Notebooks.Choose a container and develop locallyAll Deep Learning Containers have a preconfigured Jupyter environment, so each can be pulled and used directly as a prototyping space. First, make sure you have the gcloud tool installed and configured. Then, determine the container that you would like to use. All containers are hosted under, and can be listed with the command:Each container provides a Python3 environment consistent with the corresponding Deep Learning VM, including the selected data science framework, conda, the NVIDIA stack for GPU images (CUDA, cuDNN, NCCL), and a host of other supporting packages and tools. Our initial release consists of containers for TensorFlow 1.13, TensorFlow 2.0, PyTorch, and R, and we are working to reach parity with all Deep Learning VM types.With the exception of the base containers, the container names will be in the format <framework>-<cpu/gpu>.<framework version>. Let’s say you’d like to prototype on CPU-only TensorFlow. The following command will start the TensorFlow Deep Learning Container in detached mode, bind the running Jupyter server to port 8080 on the local machine, and mount /path/to/local/dir to /home in the container.Then, the running JupyterLab instance can be accessed at localhost:8080. Make sure to develop in /home, as any other files will be removed when the container is stopped.If you would like to use the GPU-enabled containers, you will need a CUDA 10 compatible GPU, the associated driver, and nvidia-docker installed. Then, you can run a similar command.Create derivative containers and deploy to Cloud AI Platform Notebooks and GKEAt some point, you’ll likely need a beefier machine than what your local machine has to offer, but you may have local data and packages that need to be installed in the environment. Deep Learning Containers can be extended to include your local files, and then these custom containers can then be deployed in a Cloud AI Platform Notebooks instance and GKE.For example, imagine that you have a local python package called mypackage that you are using as part of your Pytorch workflow. Create a Dockerfile in the directory above mypackage as such.DockerfileThis simple Dockerfile will copy in the package files and install it into the default environment. You can add additional RUN pip/conda commands, but you should not modify CMD or ENTRYPOINT, as these are already configured for AI Platform Notebooks. Build and upload this container to Google Container Registry.Then, create an AI Platform Notebooks instance using the gcloud CLI (custom container UI support coming soon). Feel free to modify the instance type and accelerator fields to suit your workload needs.The image will take a few minutes to set up. If the container was loaded correctly, there will be a link to access JupyterLab written to the proxy-url metadata field, and the instance will appear as ready in the AI Platform > Notebooks UI on Cloud Console. You can also query the link directly by describing the instance metadata.Accessing this link will take you to your JupyterLab instance. Please note: only data saved to /home will be persisted across reboots. By default, the container VM mounts /home on the VM to /home on the container, so make sure you create new notebooks in /home, otherwise that work will be lost if the instance shuts down.Deploying Deep Learning Containers on GKE with NVIDIA GPUsYou can also take advantage of GKE to develop on your Deep Learning Containers. After setting up your GKE cluster with GPUs following the user guide, you just need to specify the container image in your Kubernetes pod spec. The following spec creates a pod with one GPU from tf-gpu and an attached GCE persistent disk:pod.yamlDeploy and connect to your instance with the following commands:After the pod is fully deployed, your running JupyterLab instance can be accessed at localhost:8080.Getting Started If you’re not already a Google Cloud customer, you can sign up today for $300 of credit in our free tier. Then, try out our quick start guides and documentation for more details on getting started with your project. Read more »
  • Analyze BigQuery data with Kaggle Kernels notebooks
    We’re happy to announce that Kaggle is now integrated into BigQuery, Google Cloud’s enterprise cloud data warehouse. This integration means that BigQuery users can execute super-fast SQL queries, train machine learning models in SQL, and analyze them using Kernels, Kaggle’s free hosted Jupyter notebooks environment.Using BigQuery and Kaggle Kernels together, you can use an intuitive development environment to query BigQuery data and do machine learning without having to move or download the data. Once your Google Cloud account is linked to a Kernels notebook or script, you can compose queries directly in the notebook using the BigQuery API Client library, run it against BigQuery, and do almost any kind of analysis from there with the data. For example, you can import the latest data science libraries like Matplotlib, scikit-learn, and XGBoost to visualize results or train state-of-the-art machine learning models. Even better, take advantage of Kernel’s generous free compute that includes GPUs, up to 16GB of RAM and nine hours of execution time. Check out Kaggle’s documentation to learn more about the functionality Kernels offers.With more than 3 million users, Kaggle is where the world’s largest online community of data scientists come together to explore, analyze, and share their data science work. You can quickly start coding by spinning up a Python or R Kernels notebook, or find inspiration by viewing more than 200,000 public Kernels written by others.For BigQuery users, the most distinctive benefit is that there is now a widely used Integrated Development Environment (IDE)—Kaggle Kernels—that can hold your querying and data analysis all in one place. This turns a data analyst’s fragmented workflow into a more seamless process instead of the previous way, where you would first query data in the query editor, then export the data elsewhere to complete analysis. In addition, Kaggle is a sharing platform that lets you easily make your Kernels public. Kaggle lets you disseminate your open-source work and also discuss data science with the world’s top-notch data scientist professionals.Getting started with Kaggle and BigQueryTo get started with BigQuery for the first time, enable your account under the BigQuery sandbox, which provides up to 10GB of free storage, 1 terabyte per month of query processing, and 10GB of BigQuery ML model creation queries. (Find more details on tier pricing in BigQuery’s documentation).To start analyzing your BigQuery datasets in Kernels, sign up for a Kaggle account. Once you’re signed in, click on “Kernels” in the top bar, followed by “New kernel” to immediately spin up your new IDE session. Kaggle offers Kernels in two types: scripts and notebooks. For this example, the notebooks option is selected.In the Kernels editor environment, link your BigQuery account to your Kaggle account by clicking “BigQuery” on the right-hand sidebar, then click “Link an account.” Once your account is linked, you can access your own BigQuery datasets using the BigQuery API Client library.Let’s try this out using the Ames Housing dataset that’s publicly available on Kaggle. This dataset contains 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, as well as their final sales price. Let’s compose a query to gain some insights from the data. We want to find out what different home types there are in this dataset, as well as how many do or do not have central air conditioning installed. Here’s how the query looks:We quickly get a response showing that one-story homes are the most common home style in Ames and that, regardless of home style, most homes have central air conditioning. There are many more public datasets on Kaggle that you can explore in this way.Building ML models using SQL queriesAside from data analysis, BigQuery ML lets you create and evaluate machine learning models using SQL queries. With a few queries, any data scientist can build and evaluate regression models without extensive knowledge of machine learning frameworks or programming languages. Let’s create a linear model that aims to predict the final sales price of real estate in Ames. This model will train on a couple inputs—living area size, year built, overall condition, and overall quality. Here’s the model code:In just one query, we’ve created a SQL-based ML model inside Kernels. You could continue using Kernels to create more advanced queries for analysis and optimize your model for better results. You may even choose to publish your Kernel to share publicly with the Kaggle community and broader Internet after your analysis is complete. To see the rest of the workflow on obtaining training statistics and evaluating the model, visit the complete How to use BigQuery on Kaggle tutorial. This tutorial is publicly available as a Kernels notebook. You can also check out the Getting started with BigQuery ML Kernel that goes into greater depth on training and evaluating models.Learn more details on navigating the integration by visiting Kaggle’s documentation. Also, sign up for Kaggle’s new and updated SQL micro-course that teaches you all the basics of the SQL language using BigQuery. We hope you enjoy using this integration! Read more »
  • Up, up and away: immersive machine learning at Google
    Editor’s note: Today we’re hearing from team members from AirAsia, a Malaysia-based airline. They’re exploring how to apply machine learning (ML) to better serve their customers, and Bing Yuan, a data scientist, and Koh Jia Jun, a software engineer, along with colleagues, recently spent four weeks attending the Advanced Solution Lab (ASL) immersive training at Google. Here’s how their journey went.Machine learning is a buzzword that has made its way across industries around the world. When we got the opportunity to work with Google to attend the ASL training to learn about TensorFlow and deep learning, we weren’t quite sure what to expect. Both of us had some prior exposure to ML and Python, whether from undergrad courses or self-taught from Coursera and Kaggle. Both of us are data science enthusiasts and enjoy reading blog posts and watching videos to keep tabs on the up-and-coming technologies.After our four weeks of immersive learning, we can safely say the experience has made us better data scientists. More than ever, we feel more confident in our ability to materialize machine learning in the real world.Bing Yuan (left) and Koh Jia Jun (right) at Google ASLPrior to attending, we had a general understanding of the theoretical concepts of the common ML architectures and algorithms (linear and logistic regression, decision tree models, neural networks, and image and sequence models). A lot of this knowledge came from self-study, and we expected there’d be knowledge gaps here and there. Our group of seven AirAsia team members had varying levels of machine learning experience. The group consists of members from diverse technical backgrounds of data, analytics, software engineering, as well as aircraft engineering. For some of our colleagues, this would be the first time they were exposed to ML.In this post, we’ll share our experience of the day-to-day life of the ASL, and share how we are applying our skills and experience today at AirAsia.A day in the life at the ASLThe whole team, AirAsia and Google, after a full-day session of TensorFlow and MLClass begins at 9am, but we arrived early to have a hearty meal at the Google Singapore campus. The choice of foods is abundant, not to mention the fresh juice and coffee corners in the office.After fueling up, we started the day with the history of machine learning, then worked our way up to modern deep learning. We learned to appreciate the mathematical workings of neural networks and data processing best practices (data preparation and feature engineering are important, often underrated components for effective machine learning). Our instructors, both machine learning engineers at Google, infused our daily sessions with practical ML tips and tricks that they have learned.We took a break for lunch around 1pm. Lunch is a chatter session between students and instructors. The topics were wide-ranging each day, from food preferences and entertainment to technical topics like the promises of machine learning or working through hypothetical questions by thinking things out loud. And no, we never forgot to ask the million-dollar question: How is it done at Google? The lunch session is also a cultural exchange between the east and the west, since people came from different sides of the world. After lunch, energy waned a bit, but we were still going strong, picking up from where we left off during the morning. We usually wrapped up the session around 4 to 5 pm.Finally, after a full day of content, we had evening outings to experience Singapore. We went to Gardens by the Bay and enjoyed the music-coordinated light shows. We also went to the Singapore Flyer to get a bird’s-eye view of the country. These were more than fun times; the immersive four weeks turned out to be an amazing networking experience that brought our team (located across different functions within AirAsia) together. This has translated to better group dynamics and higher productivity back at home.Our group excursion to Sentosa Island on one SaturdayWe also focused on productionalizing machine learning systems. It’s one thing to build an accurate model, but the full value of ML comes from building and deploying machine learning at scale. Traditionally, it takes hours and days for developers to build infrastructure to support model deployment at scale, let alone managing these moving gears solely on prem. This leads to ever-increasing human efforts and added complexity, as well as fragility of the logics maintained, which are prone to breaking. Thus, it was invaluable to get hands-on experience and training with tools like Cloud Dataflow for stream and batch data processing and AI Platform to build and deploy large-scale machine learning models. The combination of these tools on Google Cloud Platform (GCP) allows data scientists to focus on building robust models without brooding about DevOps, translating to reduced points of failure and increased productivity.A whiteboarding session diving into the math of recurrent neural networks for sequence modeling. This type of model was very valuable for our energy forecasting solution during the open projects week. Whiteboarding helped tackle challenging topics.The first three weeks of the course were focused on learning and hands-on labs. Each session was highly interactive. There were a lot of questions and discussions going on throughout the sessions, so it was not a dull classroom lecture. We learned to use Google Cloud, specifically TensorFlow to train machine learning models at scale. Each lab took a bit of effort and tinkering to get it working. But we were in good hands as the awesome instructors were there to help get us unstuck. The final week we were let loose to build our open projects to deploy an ML model on Google Cloud. Putting our learning into practice with open projectsOur final presentation demoing our capstone projectFast forward to the final week of ASL, the open project week. The goal of the open project week is to build ML systems to solidify our skills so we can apply them back at home. Our goal was to develop a solution, build and train machine learning models, and deploy them on Google Cloud. We focused on forecasting and recommendation systems—both important topics for our firm.This is a week of intensive development, always with coffees on the table. Though there were no more lectures or labs, the learning didn’t stop. We worked side-by-side with our Google peers to create projects for different topics that are pertinent to our industry. Our cohort was split into two groups. In our case, we built an energy forecasting model and a recommendation system, both prototypes built on public datasets that we could use as a foundation when we got back to work. Recommendation systems have always been an interesting topic for us. It’s very interesting to gauge how an audience is clustered, how to calculate the similarity between each item to recommend, how to verify a model’s performance, etc. We first built our base model using content-based filtering to predict the movie rating given by a user using movie genres. Then, we moved to a more powerful method using collaborative filtering. Our group encountered several challenges in employing collaborative filtering, but we turned challenges into learning opportunities. We squeezed dry our instructors to help us deal with those challenges, like datasets that can’t fit into memory, how to represent sparse matrices in TensorFlow, pre-processing data in TensorFlow to extract keywords, and others.We did dry runs along the way to prepare for our final presentation on the last day. In attendance was our company’s CEO, our deputy group CEO, our head of software engineering, and our head of data science at AirAsia, as well as Googlers.Finally, after the final battle—er, presentation—we took part in a closing ceremony for graduation.Our entire cohort on graduation day, including other AirAsia team members.How we’re using our ASL experience We came a long way in a month. The newfound knowledge acquired from ASL has brought lasting, positive impacts on our day-to-day work. One of us (Bing Yuan) especially learned to appreciate the beauty of algorithms like batch normalization and different variants of gradient descent, to name a few. And we’re putting the valuable skills we learned into practice. With a reinforced understanding of ML through ASL, we’re now more confident than ever in our ability to materialize robust ML models into the real world and use them to generate value for the business. To name a few of the high-impact projects: We are using this knowledge to optimize and increase ancillary sales, and promote upselling and cross-selling of the AirAsia group of products. These products include more than our airline business and include hotel bookings, travel experiences, debit services (BigPay), cargo, and more.It’s one thing to learn about neural networks in theory, but we got significantly more value from learning to productionalize machine learning at scale with AI Platform. The ASL experience taught us common data science patterns and best practices. We learned about the common pitfalls (and how to avoid them).Furthermore, it’s impossible to quantify the value of the informal chats (often tangential and “off-topic”) with our instructors, which helped to solidify our understanding while provoking thoughts that challenge conventions. Aside from feeding our curious minds, these discussions helped us customize the training to our firm’s specific needs and business requirements. The student-instructor bond is the key to creating this uniquely effective learning experience. We’re excited to use what we learned. We’ll make full use of our newly acquired skills of end-to-end problem solving to supercharge marketing by increasing product uptake, upsell and cross-sell. We’re also working to optimize and reduce the money-burning segments of the business such as aircraft fuel consumption, unscheduled aircraft maintenance and preventable delays. The skills we’ve picked up from ASL will live on and continue to grow. Learn more about ASL. Read more »
  • How Penn State World Campus is leveraging AI to help their advisers provide better student services
    Recently we spoke to Dawn Coder, director of academic advising and student disability services at Penn State World Campus, which was established in 1998 to provide accessible, quality education for online learners. It has since grown to have the second largest enrollment in the university, serving thirty thousand students all over the world. By building a virtual advising assistant to automate routine interactions, Coder and her department aim to serve more students more efficiently. Working with Google and Quantiphi, a Google Cloud Partner of the Year for Machine Learning, they plan to roll out the pilot program, their first using AI, in January 2020.How does Penn State World Campus support its students?Our goal is to help students graduate and pursue whatever their goals are. I supervise three key services here: academic advising for undergraduates, disability services and accommodations for undergraduate and graduate students, and military services for our veterans and students on active duty, as well as their spouses. Altogether our team has about sixty employees serving approximately 11,000 undergraduates who take classes online from anywhere in the world.Why turn to AI?Our strategic objectives include student retention and organizational optimization, so that’s where AI fits in. We want to make our organization as efficient as possible, make sure employees are not overworked and overwhelmed, and provide the best quality services for our students to set them up for success. Quantiphi is using Google Cloud AI tools like Dialogflow to build us a custom user interface that will take incoming emails from students and recognize keywords to sort those emails into categories, like requests for change of major, change of campus, re-enrollment, and deferment. For example, if a student emails us asking how to re-enroll to finish a degree, the virtual assistant can collect all the relevant information about that student for the adviser in seconds. It can even generate a boilerplate response that the adviser can customize. Our students are physically located all over the world; they can’t just stop by our office. This allows them to get answers quicker in a way that’s convenient for them.  Why choose Google?Security was an important factor because we’re working with student data. That was the biggest decision-maker. We also wanted to work with a company who believes education is important, especially higher education, because if you aren’t aligned with the goal of who we are, it’s really difficult to build a strong, positive relationship. I felt as though the representatives from Google and Quantiphi were focused on higher ed and really understood it. That was another decision-maker for our team.What benefits do you hope to see?Using this new interface will provide advisers with necessary student information in one place. Currently, academic advisers access many different screens in our student information system to gather all the student information needed to provide next steps. The AI-driven tool will centralize the process and all the data will be displayed in one place. With the time that is saved, an adviser will have more quality time to assist students with special circumstances, career planning, and schedule planning. We want to scale our services to serve more students as World Campus grows. During peak times of the semester, it can take our advisers longer than we would like to respond. If AI can help us reduce the time it takes to a few minutes, that will be a huge success.What’s next for this project?If the project is successful, our hope is to expand AI to other World Campus departments, like admissions or the registrar and bursar’s offices. Our biggest goal is always providing quality, accurate services to students in a timely manner—more in real time than having to wait a long time. My hope is that technology can make the process more intuitive so students can make more decisions on their own, knowing that the academic advisers are always there to advocate for them. There’s so much more to academic advising than just scheduling courses!During peak times of the semester, it can take our advisers longer than we would like to respond. If AI can help us reduce the time it takes to a few minutes, that will be a huge success. Dawn Coder director of academic advising and student disability services, Penn State World Campus Read more »
  • Predictive marketing analytics using BigQuery ML machine learning templates
    Enterprises are collecting and generating more data than ever—to better understand their business landscape, their market, and their customers. As a result, data scientists and analysts increasingly need to build robust machine learning models that can forecast business trajectories and help leaders plan for the future. However, current machine learning tools make it difficult to quickly and easily create ML models, delaying time to insights.To address these challenges, we announced BigQuery ML, a capability inside BigQuery that allows data scientists and analysts to build and operationalize machine learning models in minutes on massive structured or semi-structured datasets. BigQuery ML democratizes predictive analytics so that users unfamiliar with programming languages like Python and Java can build machine learning models with basic SQL, and is generally available.To make it even easier for anyone to get started with BigQuery ML, we have open-sourced a repository of SQL templates for common machine learning use cases. The first of these, tailored specifically for marketing, were built in collaboration with SpringML, a premier Google Cloud Platform partner that helps customers successfully deploy BigQuery and BigQuery ML. Each template is tutorial-like in nature, and includes a sample dataset for Google Analytics 360 and CRM along with SQL code for the following steps of machine learning modeling: data aggregation and transformation (for feature and label creation), machine learning model creation, and surfacing predictions from the model on a dashboard. Here’s more on the three templates:Customer segmentation—By dividing a customer base into groups of individuals that are similar in specific ways, marketers can custom-tailor their content and media to unique audiences. With this template, users can implement a BigQuery ML k-means clustering model to build customer segmentations.Customer Lifetime Value (LTV) prediction—Many organizations need to identify and prioritize customer segments that are most valuable to the company. To do this, LTV can be an important metric that measures the total revenue reasonably expected from a customer. This template implements a BigQuery ML multiclass logistic regression model to predict the LTV of a customer to be high, medium, or low.Conversion or purchase prediction—There are many marketing use cases that can benefit from predicting the likelihood of a user converting, or making a purchase, for example ads retargeting, where the advertiser can bid higher for website visitors that have a higher purchase intent, or email campaigns, where emails are sent to a subset of customers based on their likelihood to click on content or purchase. This template implements a BigQuery ML binary logistic regression model to build conversion or purchase predictions.To start using these open-source SQL templates and more, visit our repository—the code is licensed under Apache v2. We will also be adding templates for more use cases in the future. And to learn more about applying BigQuery ML for marketing analytics, watch this Google Cloud OnAir webinar. Read more »
  • How to run evolution strategies on Google Kubernetes Engine
    Reinforcement learning (RL) has become popular in the machine learning community as more and more people have seen its amazing performance in games, chess and robotics. In previous blog posts we’ve shown you how to run RL algorithms on AI Platform utilizing both Google’s powerful computing infrastructure and intelligently managed training service such as Bayesian hyperparameter optimization. In this blog, we introduce Evolution Strategies (ES) and show how to run ES algorithms on Google Kubernetes Engine (GKE).Evolution Strategies are an optimization technique based on ideas of evolution. Recently, ES has been shown (i.e. 1, 2) to be a good alternative for RL at tackling various challenging tasks. Specifically, two of the well known benefits of ES are bypassing noisy gradient estimate for policy optimization and its nature of encouraging distributed computing that brings faster convergence. While ES, first developed in the ‘60s, have the benefit of ease of scalability, only recently did open source projects (i.e. Salimans et al. 2007) in the research community demonstrate that scaling ES to a large number of machines can achieve results competitive to SOTA RL algorithms. As a result, an increasing number of  deep learning researchers have been exploring ways to incorporate evolution-based algorithms into recent research (i.e. 1, 2, 3, 4, 5).Evidence suggests that putting more effort into building better infrastructure to scale evolutionary computing algorithms will facilitate further progress in this area, however few researchers are experts in large scale systems development. Luckily, in the past few years, technologies such as Kubernetes have been developed to make it easier for non-specialist programmers to deploy distributed computing solutions. As a demonstration of how Kubernetes might be used to deploy scalable evolutionary algorithms, in this blog post, we explore the use of Kubernetes as a platform for easily scaling up ES. We provide the code and instructions here and hope all these serve as a quickstart for ML researchers to try out ES on GKE.For the record, AI Platform provides distributed training with containers which works with an ML framework that supports a distributed structure similar to that of TensorFlow’s. It is primarily for asynchronized model training, whereas distributed computing in ES is for a different purpose as you will see in the following section.Evolution Strategies 101ES is a class of black box optimization; it’s powerful for ML tasks where gradient based algorithms fail when the underlying task / function has no gradient, the complexity of computing gradient is high, the noise embedded in gradient estimation prevents learning, and other issues. As an illustration, imagine standing at a point on the terrain shown on the left in the following figure. Your task is to navigate your way to the lowest point of the terrain blindfolded. You are given some magic beads and they are the only way you interact with the environment.Figure 1. Sphere function (left) and Rastrigin function (right) (source: Wikipedia)Loosely speaking, with gradient based algorithms, at every decision-making step you drop some beads and let them roll for some time. The beads report their speeds and you walk a step along the direction most of the beads roll fast (because it’s steep there). Following this rule, you will probably reach the goal after a few iterations. Now try the same strategy on the right terrain in the figure. Chances are you will fail the mission and get stuck at the bottom of a valley surrounded by mountains.ES works very differently. Every optimization step consists of many trials; a decision is made based on the settings of those trials with great fitness. (Fitness is a metric that defines how good a trial is; it can be the altitude in our example, the lower the better. Analogous to the cumulative rewards of a trial in an RL environment.) This process, in which the trials with poor fitness are eliminated and only the fittest survives resembles evolution, hence the name.To give an example of how ES works in the previous context, instead of dropping the beads at each step, you launch the beads one by one with a pistol and let them spread the nearby region. Each bead reports its position and altitude upon landing, and you move to a point where the estimated altitude seems to be low. This strategy works on both terrains in the figure (suppose our pistol is very powerful and can shoot over high mountains) and it is easy to see that parallel executions of the trials can speed up the process (e.g., replace the pistol with a shotgun).The description in this section is meant to give you a very basic idea of what ES is and how it works. Interested readers are strongly recommended to refer to this series of blog posts that provides an excellent introduction and in-depth description.Kubernetes 101Kubernetes started at Google and was open-sourced in 2014. It is a platform for managing containerized workloads and services that facilitates both declarative configuration and automation. A thorough description of Kubernetes requires pages of documentation; in this section we will only scratch the surface and give an ES centred introduction of Kubernetes.From our previous discussion, it is easy to see the implementation of ES falls into a controller-worker architecture wherein at each iteration the controller commands the workers to do trials with given settings and perform optimization based on the workers’ feedback. With this implementation plan, let’s give some definitions and a description of how ES is conducted on Kubernetes in our earlier lowest point-finding example.You are not given a gun or beads this time; instead you have a cellphone and you can call someone to do the job of shooting the beads for you. You need to specify what you are expecting before requesting any service (in this example, bead-shooting). So you write your specification on the “Note to service provider”. You also prepared a "Note to myself" as a memo. Submitting the specification to a service provider, you started your exciting adventure.In this metaphor, the “What to do” sections on the service provider’s note is the worker’s program and the other is the controller’s program. Together with some runtime libraries, we package them as container images. The service provider is Kubernetes, and the specification it receives is called a workload, which consists of the container images and some system configurations such as resources. For example, the 10 cars in our example corresponds to 10 nodes / machines in a cluster; the 100 bead-shooters represents how many running containers (pods, in Kubernetes language) we wish to have, and Kubernetes is responsible for the availability of these pods. You probably don’t want to call each of these 100 bead-shooters to collect results. Plus, some bead-shooters may take sick leaves (E.g., failed containers due to machine reboot) and have delegated their jobs to other shooters (newly started containers) whose numbers you may not have. To cope with this, Kubernetes exposes a workload as a service that acts as a point of contact between the controller and the workers. The service is associated with the related pods; it always knows how to reach them and it provides load balance to the pods.With Kubernetes as a platform, we have high availability (Kubernetes makes sure the number of running pods match your expectations) and great scalability (Kubernetes allows adding / removing running pods at runtime). We think that’s what makes Kubernetes an ideal platform for ES. And GKE extends Kubernetes’s availability and scalability to node level which makes it an even better platform!ES on GKEIn this section, we describe our implementation of ES and instructions for running it on GKE. You can access the code and the instructions here.Our implementationAs is discussed in the previous sections, we adopt a controller-worker architecture in our implementation and we use gRPC as the interprocess communication method. Each worker is an independent server and the controller is the client. Remote procedure call (RPC) is not as efficient as other options such as message passing interface (MPI) in terms of data passing, but RPC’s user friendliness for data packaging and high fault tolerance makes it a better candidate in cloud computing. The following code snippet shows our message definitions. Each rollout corresponds to a trial and rollout_reward is the fitness reported from the rollout.The ES algorithms we provide as samples are Parameter-exploring Policy Gradients (PEPG) (based on estool) and Covariance Matrix Adaptation (CMA) (based on pycma). You can play with them in Google Brain's Minitaur Locomotion and OpenAI's BipedalWalkerHardcore-v2, a particularly difficult continuous-control RL environment to solve. You can also easily extend the code there to add your ES algorithms or change the configs to try the algorithms in your own environments. To be concrete, we defined an interface in algorithm.solver.Solver, as long as your implementation conforms to that interface, it should run with the rest of the code.Run ES on GKETo run our code on GKE, you need a cluster on Google Cloud Platform (GCP); follow the instructions here to create yours. We use the following command and configs to create our cluster; feel free to change these to suit your needs.When you have a cluster sitting there, running our sample ES code on GKE involves only three steps, each of which is a simple bash command:Build container images for the controller and the workers.Deploy the workers on the cluster.Deploy the controller on the cluster.Figure 2. Example of successful deployments in GCP console.That's all! ES should be training in your specified environment on GKE now.We provide 3 ways for you to check your training progress:Stackdriver—In the GCP console, clicking the GKE Workloads page gives you detailed status report of your pods. Go to the details of the es-master-pod and you can find "Container logs" that will direct you to Stackdriver logging where you can see training and test rewards.HTTP Server—In our code, we start a simple HTTP server in the controller to make training logs easily accessible to you. You can access this by checking the endpoint in es-master-service located in the GKE Services page.Kubectl—Finally, you can use the kubectl command to fetch logs and models. The following commands serve as examples.Run ES locallyAs a debugging process, both training and test can be run locally.Use and, and add proper options to do so.ExperimentsTo prove the benefits of running ES on GKE, we present two examples in this section: a 2D walker trained with CMA in OpenAI’s BipedalWalkerHardcore environment, and a quadruped robot in Google Brain’s MinitaurLocomotion environment. We consider the tasks solved if the agents can achieve an average of reward greater than TargetReward in 100 consecutive test trials; both tasks are challenging (try solving them with RL). The following table summarizes our experimental settings. We also ran experiments on a standalone Google Compute Engine instance with 64 cores for the purpose of comparison, the number of workers on this Compute Engine instance is tuned to make sure its CPU utilization is above 90%.Our implementation is able to solve both tasks and the results are presented below.Although the exact ratio is task dependent, ES can get significant speedup when run on GKE. In our examples, learning BipedalWalkerHardcore is 5 times faster and learning a quadruped robot is more than 10 times faster. To ML researchers, this speedup brings opportunities to try out more ideas and allows for faster iteration in ML algorithm development.ConclusionES is powerful for ML tasks where gradient based algorithms do not give satisfactory solutions. Given its nature of encouraging parallel computation, ML researchers and engineers can get significant speedup when ES is run on Kubernetes and this allows faster iteration for trying out new ideas.Due to the ease of scalability of ES, we believe the applications that can get the most benefit from  ES are those where cheap simulation environments exist for difficult problems. Recent works (i.e. 1, 2) demonstrate the effectiveness of training virtual robot controllers first in a simulation, before deploying the controller in the real world environment. Simulation environments, rather than having to be hand-programmed, can also be learned from collected observations and represented as a deep learning model (i.e. 1, 2, 3). These types of applications might leverage the scaling of ES to learn from thousands of parallel simulation environments.As evolutionary methods allow more flexibility in terms of what is being optimized, applications can span beyond traditional RL policy optimization. For instance, this recent work used ES in an RL environment to not only train a policy, but also learn a better design for the robot. We expect many more creative applications in the area of generative design using evolution. In this latest research work, the authors demonstrated the possibility of finding minimal neural network architectures that can perform several RL tasks without weight training using evolutionary algorithms. This result surprises a lot of ML researchers and points at a brand new research field wherein evolution plays the main role. Just as GPUs were the catalyst that enabled the training of large, deep neural networks leading to the deep learning revolution, we believe the ability to easily scale up evolutionary methods to large clusters of low-cost CPU workers will lead to the next computing revolution.To learn more about GKE and Kubernetes for deep learning, visit:Kubernetes EngineEnd-to-end Kubeflow on GCP Read more »
  • 3 steps to gain business value from AI
    Many customers have asked us this profound question: How do we realize business value from artificial intelligence (AI) initiatives after a proof of concept (POC)?  Enterprises are excited at the potential of AI, and some even create a POC as a first step. However, some are stymied by lack of clarity on the business value or return on investment. As a result we have heard the same question from data science teams that have created machine learning (ML) models that are under-utilized by their organizations.  At Google Cloud, we’re committed to helping organizations of all sizes to transform themselves with AI. We have worked with many of our customers to help them derive value from their AI investments.  AI is a team sport that requires strong collaboration between business analysts, data engineers, data scientists and machine learning engineers. As a result, we recommend discussing the following three steps with your team to realize the most business value from your AI projects:Step 1: Align AI projects with business priorities and find a good sponsor.Step 2: Plan for explainable ML in models, dashboards and displays.Step 3: Broaden expertise within the organization on data analytics and data engineering.Step 1: Align AI projects with business priorities and find a good sponsorThe first step to realizing value from AI is to identify the right business problem and a sponsor committed to using AI to solve that problem. Teams often get excited by the prospect of applying AI to a problem without deeply thinking about how that problem contributes to overall business value. For example, using AI to better classify objects might be less valuable to the bottom line, than, say, a great chatbot. Yet many businesses don't start with the critical step of aligning the AI project with the business challenges that matter most.  Identify the right business problem. To ensure alignment, start with your organization’s business strategy and key priorities. Identify the business priorities that can gain the most from AI. The person doing this assessment needs to have a good understanding of the most common use cases for AI and ML. It could be a data science director, or a team of business analysts and data scientists.Keep a shortlist of the business priorities that can truly benefit from AI or ML.  During implementation, work through this list starting with the most feasible. By taking this approach, you’re more likely to generate significant business value as you build a set of ML models that solve specific business priorities.   Conversely, if a data science or machine learning team builds great solutions for problems that are not aligned with business priorities, the models they build are unlikely to be used at scale.Find a business sponsor. We’ve also found that AI projects are more likely to be successful when they have a senior executive sponsor that will champion them with other leaders in your organization. Don’t start an AI project without completing this critical step. Once you identify the right business priority, find the senior executive to own it.  Work with their team to get their buy-in and sponsorship. The more senior and committed, the better. If your CEO cares about AI, you can bet most of your employees will.Step 2:  Plan for explainable ML in models, dashboards and displaysAn important requirement from many business users is to have explanations from ML models. In many cases, it is not enough for an ML model to provide an outcome; it’s also important to understand why. Explanations help to build trust in the model’s predictions and offer useful factors with which business users can take action. In regulated industries such as financial services and healthcare, for example, there are regulations that require explanations of decisions. For example, in the United States the Equal Credit Opportunity Act (ECOA) enforced by the the Federal Trade Commission (FTC), gives consumers the right to know why their loan applications were rejected.  Lenders have to tell the consumer the specific reasons why they were rejected. Regulators have been seeking more transparency around how ML predictions are made.Choose new techniques for building explainable ML models. Until recently, most leading ML models have offered little or no explanations for their predictions. However, recent advances are emerging to provide explanations even for the most complex ML algorithms such as deep learning.  These include Local Interpretable Model-Agnostic Explanations (LIME),  Anchor, Integrated Gradients, and Shapley. These techniques offer a unique opportunity to meet the needs of business users even in regulated industries with powerful ML models.  Use the right technique to meet your users’ needs for model explanation. When you build ML models, be prepared to provide explanations globally and locally. Global explanations provide the model’s key drivers, and are the strongest predictors in the overall model. For example, the global explanation from a credit default prediction model will likely show the top predictors of default may include variables such as number of previous defaults, number of missed payments, employment status, length of time with your bank, length of time at your address, etc. In contrast, local explanations provide the reasons why a specific customer is predicted to default, and the specific reason will vary from one customer to another.  As you develop your ML models, build time into your plan to provide global and local explanations. We also recommend gathering user needs to help you choose the right technique for model explanation. For example, many financial regulators do not allow the use of surrogate models for explanations, which rules out techniques like LIME. In this instance, the Integrated Gradients technique would be more suited to this use case.Also, be prepared to share the model’s explanations wherever you show the model’s results — this can be on analytics dashboards, embedded apps or other displays. This will help to build confidence in your ML models. Business users are more likely to trust your ML model if it provides intuitive explanations for its predictions. Your business users are more likely to take action on the predictions if they trust the model. Similarly, with these explanations, your models are more likely to be accepted by regulators.Step 3: Broaden expertise in data analytics and data engineering within your organizationTo realize the full potential of AI, you need good people with the right skills. This is a big challenge for many organizations given the acute shortage of ML engineers — many organizations really struggle to hire them. You can address this skills shortage by upskilling your existing employees and taking advantage of a new generation of products that simplify AI model development.Upskill your existing employees. You don’t always need PhD ML engineers to be successful with ML. PhD ML engineers are great if your applications need research and development, for example, if you were building driverless cars.  But most typical applications of AI or ML do not require PhD experts. What you need instead are people who can apply existing algorithms or even pre-trained ML models to solve real world problems. For example, there are powerful ML models for image recognition, such as ResNet50 or Inception V3, that are available for free in the open source community. You don’t need an expert in computer vision to use them. Instead of searching for unicorns, start by upgrading your existing data engineers and business analysts and be sure they understand the basics of data science and statistics to use powerful ML algorithms correctly.At Google we provide a wealth of ML training — from Qwiklabs to Coursera courses (e.g. Machine Learning with TensorFlow on Google Cloud Platform Specialization or Machine Learning for Business Professionals). We also offer immersive training such as instructor-led courses and a four-week intensive machine learning training program at the Advanced Solutions Lab. These courses offer great avenues to train your business analysts, data engineers and developers on machine learning.Take advantage of products that simplify AI model development. Until recently, you needed sophisticated data scientists and machine learning engineers to build even the simplest of ML models. This workforce required deep knowledge in core ML algorithms in order to choose the right one for each problem. However, that is quickly changing. Powerful but simple ML products such as Cloud AutoML from Google Cloud make it possible for developers with limited knowledge of machine learning to train high-quality models specific to their business needs. Similarly, BigQuery ML enables data analysts  to build and operationalize  machine learning models in minutes in BigQuery using simple SQL queries. With these two products, business analysts, data analysts and data engineers can be trained to build powerful machine learning models with very little ML expertise.Make AI a team sport. Machine learning teams should not exist in silos; they must be connected to analytics and data engineering teams. This will facilitate operationalization of models. Close collaboration between ML engineers and business analysts will help the ML team tie their models to important business priorities through the right KPIs. It also allows business analysts to run experiments to demonstrate the business value of each ML model. Close collaboration between ML and data engineering teams also helps speed up data preparation and model deployment in production. The results of ML models need to be displayed in applications or analytics and operational dashboards. Data engineers are critical in the development of data pipelines that are needed to operationalize models and integrate them into business workflows for the right end users.  It is very tempting to think that you have to hire a large team of ML engineers to be successful. In our experience, this is not always necessary or scalable. A more pragmatic approach to scale is to use the right combination of business analysts working closely with ML engineers and data engineers. A good recommendation is to have six business analysts and three data engineers for each ML engineer. More details on the recommended team structure is available in our Coursera course, Machine Learning for Business Professionals.Conclusion  As many organizations start to explore AI and machine learning, they are confronted with the question of how to realize the business potential of these powerful technologies. Based on our experience working with customers across industries, we recommend the three steps in this blog post to realize business value from AI.To learn more about AI and machine learning on Google Cloud, visit our Cloud AI page. Read more »
  • Jupyter Notebook Manifesto: Best practices that can improve the life of any developer using Jupyter notebooks
    Many data science teams, both inside and outside of Google, find that it’s easiest to build accurate models when teammates can collaborate and suggest new hyperparameters, layers, and other optimizations. And notebooks are quickly becoming the common platform for the data science community, whether in the form of AI Platform Notebooks, Kaggle Kernels, Colab, or the notebook that started it all, Jupyter.A Jupyter Notebook is an open-source web application that helps you create and share documents that contain live code, equations, visualizations, and narrative text. Because Jupyter Notebooks are a relatively recently-developed tool, they don’t (yet) follow or encourage consensus-based software development best practices.Data scientists, typically collaborating on a small project that involves experimentation, often feel they don’t need to adhere to any engineering best practices. For example, your team may have the odd Python or Shell script that has neither test coverage nor any CI/CD integration.However, if you’re using Jupyter Notebooks in a larger project that involves many engineers, you may soon find it challenging to scale your environment, or deploy to production.To set up a more robust environment, we established a manifesto that incorporates best practices that can help simplify and improve the life of any developer who uses Jupyter tools.It’s often possible to share best practices across multiple industries, since the fundamentals remain the same. Logically, data scientists, ML researchers, and developers using Jupyter Notebooks should carry over the best practices already established by the older fields of computer science and scientific research.Here is a list of best practices adopted by those communities, with a focus on those that still apply today:Our Jupyter Notebooks development manifesto0. There should be an easy way to use Jupyter Notebooks in your organization, where you can “just write code” within seconds.1. Follow established software development best practices: OOP, style guides, documentation2. You should institute version control for your Notebooks3. Reproducible Notebooks4. Continuous Integration (CI)5. Parameterized Notebooks6. Continuous Deployment (CD)7. Log all experiments automaticallyBy following the above guidelines in this manifesto, we want to help you to achieve this outcome:Note: Security is a critical part of Software Development practices. In a future blog we will cover best practices for secure software development with Jupyter Notebooks, currently this topic is not covered in this blog post, but is something critical you must consider.PrinciplesEasy access to Jupyter NotebooksCreating and using a new Jupyter Notebook instance should be very easy. On Google Cloud Platform (GCP), we just launched a new service called AI Platform Notebooks. AI Platform Notebooks is a managed service that offers an integrated JupyterLab environment, in which you can create instances running JupyterLab that come pre-installed with the latest data science and machine learning frameworks in a single click.Follow established software development best practicesThis is essential. Jupyter Notebook is just a new development environment for writing code. All the best practices of software development should still apply:Version control and code review systems (e.g. git, mercurial).Separate environments: split production and development artifacts.A comprehensive test suite (e.g. unitests, doctests) for your Jupyter Notebooks.Continuous integration (CI) for faster development: automate the compilation and testing of Jupyter notebooks every time a team member commits changes to version control.Just as an Android Developer would need to follow the above best practices to build a scalable and successful mobile app, a Jupyter Notebook focused on sustainable data science should follow them, too.Using a version control system with your Jupyter NotebooksVersion control systems record changes to your code over time, so that you can revisit specific versions later. This also lets you develop separate branches in parallel, such as allowing you to perform code reviews and providing CI/CD revision history to know who is the expert in certain code areas.In order to unblock effective use of a version control system like git, there should be a tool well integrated into the Jupyter UI that allows every data scientist on your team to effectively resolve conflicts for the notebook, view the history for each cell, and commit and push particular parts of the notebook to your notebook’s repository right from the cell.Don’t worry, though: if you perform a diff operation in git and suddenly see that multiple lines have changed, instead of one, this is the intended behavior, as of today. With Jupyter notebooks, there is a lot of metadata that can change with a simple one-line edit, including kernel spec, execution info, and visualization parameters. To apply the principles and corresponding workflows of traditional version control to Jupyter notebooks, you need the help of two additional tools:nbdime: tool for “diffing” and merging of Jupyter Notebooksjupyterlab-git: a JupyterLab extension for version control using gitIn this demo, we clone a Github repository, then after this step is completed, we modified some minor parts of the code. If you execute a diff command, you would normally expect git to show only the lines that changed, but as we explained above, this is not true for Jupyter notebooks. nbdime allows you to perform a diff from Jupyter notebook and also from CLI, without the distraction of extraneous JSON output.Reproducible notebooksYou and your team should write notebooks in such a way that anyone can rerun it on the same inputs, and produce the same outputs. Your notebook should be executable from top to bottom and should contain the information required to set up the correct, consistent environment.How to do it?If you are using AI Platform notebooks, for example on the TensorFlow M22 image, this platform information should be embedded in your notebook’s metadata for future use.Let’s say you create a notebook and install TensorFlow’s nightly version. If you execute the same notebook in a different Compute Engine instance, you need to make sure that this dependency is already installed. A notebook should have a notion of dependencies and its dependencies appropriately tracked, this can be in the environment or in the notebook metadata.In summary, a notebook is reproducible if it meets the following requirements:The Compute Engine image and underlying hardware used for creating the Notebook should be embedded in the notebook itself.All dependencies should be installed by the notebook itself.A notebook should be executable from top to bottom without any errors.In this demo we clone a GitHub repository that contains a few notebooks, and then activate the new Nova plugin, which allows you to execute notebooks directly from your Jupyter UI. Nova and its corresponding compute workload runs on a separate Compute Engine instance using Nteract papermill. AI Platform notebooks support this plugin by default—to enable it, run the script.Nova pluginContinuous integrationContinuous integration is a software development practice that requires developers to integrate code into a shared repository. Each check-in is verified by an automated build system, allowing teams to detect problems at early stages. Each change to a Jupyter notebook should be validated by a continuous integration system before being checked in; this can be done using different setups (non-master remote branch, remote execution in local branch, etc)In this demo, we modified a notebook so that it contains invalid Python code, and then we commit the results to git. This particular git repository is connected to Cloud Build. The notebook executes and the commit step fails as the engine finds an invalid cell at runtime. Cloud Build creates a new notebook to help you to troubleshoot your mistake. Once you correct the code, you’ll find that your notebook runs successfully, and Cloud Build can then integrate your code.Parameterized NotebooksReusability of code is another software development best practice.You can think of a production-grade notebook as a function or a job specification: A notebook takes a series of inputs, processes them, and generates some outputs—consistently. If you’re a data scientist you might start running grid search to find your model’s optimal hyperparameters for training, stepping through different parameters such as learning rate, num_steps or batch_size:During notebook execution, you can pass different parameters to your models, and once results are generated, pick the best options using the same notebook. For the previous execution steps, consider using Papermill and its ability to configure different parameters, these parameters will be used by the notebook during execution. This means you can override the default source of data for training or submit the same notebook with different input (for example different learning rate, epochs, etc).In this demo, we execute a notebook passing different extra parameters. Here we’re using information about bike rentals in San Francisco, with the bike rental data stored in BigQuery. This notebook queries the data and generates a top ten list and station map of the most popular bike rental stations, using start and end date as parameters. By tagging the cells with a parameters tags so Papermill can use these options, you can run reuse your notebook without making any updates to it, but still generate a different dashboard.Continuous deploymentEach version of a Jupyter Notebook that has passed all the tests should be used to automatically generate a new artifact and deploy it to staging and production environments.In this demo, we show you how to perform continuous deployment on GCP, incorporating Cloud Functions, Cloud Pub/Sub, and Cloud Scheduler.Now that you’ve established a CI system that generates a tested, reproducible, and parameterized notebook, let’s automate the generation of artifacts for a continuous deployment system.Based on the previous CI system, there is an additional step in CI to upload a payload to Cloud Functions when tests are successful. When triggered, this payload sends the same artifact build request with parameters to Cloud Build, spinning up the instance and storing the results. To add the automation, we’ll orchestrate using Cloud Pub/Sub (message passing) and Cloud Scheduler (cron). The first time the cloud function is deployed, it will create a new Pub/Sub topic and subscribe to it, later any published message will start the cloud function.  This notification is published using Cloud Scheduler, which sends messages based on time. Cloud Scheduler can use different interfaces, for example new data arriving in Cloud Storage or a manual job request.Log all experimentsEvery time you try to train a model, metadata about the training session should be automatically logged. You'll want to keep track of things like the code you ran, hyperparameters, data sources, results, and training time. This way, you remember past results and won't find yourself wondering if you already tried running that experiment.ConclusionBy following the guidelines defined above, you can make your Jupyter notebooks deployments more efficient. To learn more, read our AI Platform Notebooks overview.Acknowledgements: Gonzalo Gasca Meza, Developer Programs Engineer and Karthik Ramachandran, Product Manager contributed to this post. Read more »
  • AI Platform Notebooks now supports R in beta
    At Next ‘19, we announced the beta availability of AI Platform Notebooks, our managed service that offers an integrated environment to create JupyterLab instances that come pre-installed with the latest data science and machine learning frameworks. Today, we’re excited to introduce support for R on AI Platform Notebooks. You can now spin up a web-based development environment with JupyterLab, IRkernel, xgboost, ggplot2, caret, rpy2 and other key R libraries pre-installed.The R language is a powerful tool for data science, and has been popular with data engineers, data scientists, and statisticians everywhere since its first release in 1992. It offers a sprawling collection of open source libraries that contain implementations of a huge variety of statistical techniques. For example, the Bioconductor library contains state of the art tools for analyzing genomic data. Likewise, with the forecast package you can carry out very sophisticated time series analysis using models like ARIMA, ARMA, AR, and exponential smoothing. Or, if you prefer building deep learning models, you could use TensorFlow for R.Users of R can now leverage AI Platform Notebooks to create instances that can be accessed via the web or via SSH. This means you can install the libraries you care about; and you can easily scale your notebook instances up or down.Getting started is easyYou can get started by navigating to the AI Platform and clicking on Notebooks. Then:1. Click on “New Instance” and select R 3.5.3 (the first option).2. Give your instance a name and hit “Create”.In a few seconds your Notebook instance will show up in the list of instances available to you.You can access the instance by clicking on “Open JupyterLab”.This brings up the JupyterLab Launcher. From here you can do these three things:1. Create a new Jupyter Notebook using IRKernel by clicking on the R button under Notebook.2. Bring up an iPython style console for R by clicking on the R button under Console.3. Open up a terminal by clicking on the terminal button under Other.For fun, let’s create a new R notebook and visualize the infamous ‘Iris’ dataset, which consists of the measurements of the size of various parts of an Iris labeled by the particular species of Iris. It’s a good dataset for trying out simple clustering algorithms.1. Create a new R notebook by clicking on the R button under Notebooks.2. In the first cell, type in:data(iris)head(iris)This will let you see the first 6 rows of the Iris data set.3. Next, let’s plot Petal.Length against Sepal.Length:library(‘ggplot2’)ggplot(iris, aes(x = Petal.Length, y = Sepal.Length, colour = Species)) +  geom_point() +  ggtitle('Iris Species by Petal and Sepal Length')Install additional R packagesAs mentioned earlier, one of the reasons for R’s popularity is the sheer number of open source libraries available. One popular package hosting service is the Comprehensive R Archive Network (CRAN), with over 10,000 published libraries.You can easily install any of these libraries from the R console. For example, if you wanted to install the widely popular igraph—a package for doing network analysis—you could do so by opening up the R console and running the install.packages command:install.packages(“igraph”)Scale up and down as you needAI Platform Notebooks let you easily scale your Notebook instances up or down. To change the amount of memory and the number of CPUs available to your instance:1. Stop your instance by clicking on the check box next to the instance and clicking the Stop button. 2. Click on the Machine Type column and change the number of CPUs and amount of RAM available. 3. Review your changes and hit confirm.AI Platform Notebooks is just one of the many ways that Google Cloud supports R users. (For example, check out this blog post and learn about SparkR support on Cloud Dataproc.)To learn more, and get started with AI Platform Notebooks, check out the documentation here, or just dive in. Read more »
  • New Translate API capabilities can help localization experts and global enterprises
    Whether they’re providing global content for e-commerce, news, or video streaming, many businesses need to share information across many languages, and increasingly they’re turning to automated machine translation to do it faster and more cost-effectively.As one of our longest-standing AI products, Google Cloud Translation has substantially evolved over the years to meet the increasing needs of developers and localization providers around the world. Last year we launched AutoML Translation to help businesses with limited machine learning expertise to build high-quality, production-ready custom translation models without writing a single line of code.But that only met part of the need for customization. While AutoML Translation offers full customization flexibility, companies using Translation API wanted to have more granular control on specific words and phrases, such as a list of location names or street names. Many businesses also told us they use both custom and pre-trained models in the same translation project for different languages, so we wanted to make it easier for them to move between models to run translation predictions.As a result of these learnings, we recently launched Translation API v3 to better serve the needs of our customers. Here’s a look at the enhanced features in Translation API v3.Define specific names and terms with a glossaryIf you need to maintain and control specific terms such as brand names in translated content, creating a glossary can help. Simply define your company-specific names and vocabulary in your source and target languages, then save the glossary file to your translation project. Those words and phrases will be included in your copy when you apply the glossary in your translation request.Welocalize, a global leader in multilingual content and data transformation solutions, is already using the glossary feature to increase accuracy, efficiency and fluency for their customers. “The new Google glossary feature will have a significant impact on our day-to-day business. We process hundreds of million words per year using machine translation in widely disparate enterprise client scenarios,” said Olga Beregovaya, their Vice President of Language Services. “The ease of customization and API consumption allows us to enforce broad terminology coverage for both clients with voluminous data in Google AutoML Translation and clients with sparse data in Google Cloud Translation API. Our initial benchmarking in five languages shows a preference for translation with glossary as much as 20% over the non-glossary.”              Select between custom and pre-trained modelsNow you can choose between Translation API’s traditional pre-trained models or use custom model translations so that you can streamline your workflow within the same client library.Streamline your localization process with batch translationsYou can now translate larger volumes of content in one translation request for text and HTML files stored on Google Cloud. This means you can use a single request to upload multiple files translated into multiple languages using multiple models.For example, if you wanted to translate an English product description on your website into Spanish, Japanese, and Russian, you could use your custom AutoML model for Spanish and a pre-trained model for Japanese and Russian. You would simply upload your English HTML file to your Cloud Storage bucket and send a batch request pointing to your Spanish AutoML model and pre-trained models for Japanese and Russian. Translation v3 will then output your HTML to Cloud Storage in Spanish, Russian, and Japanese in three separate files.Integrations with TMS/CATTechnology partners are starting to integrate these new features within their TMS systems as well.“Based on feedback from our clients using Google translation technology in CAT tools, the most sought after features are the ability to customize Google Cloud Translation output with glossary and to make translation faster via batch translation,” says Konstantin Savenkov, CEO of Intento. “Now both are available via our plugin to CAT Tools (SDL APPStore, MemoQ, and Matecat). Also, we may deliver Translation API v3 to enterprise TMS systems via our XLIFF-based connectors." You can learn more in Intento’s recent blog post.Introducing the Translation API v3 free tierStarting with Translation API v3, businesses can take advantage of a free tier for the first 500,000 characters per month. Beyond that, the pricing remains on a simple and affordable per-character basis, so you only pay for what you use. You can learn more on our pricing page.How to get startedIf you’re already using Translation API v2, you can begin migrating your applications to v3 using this guide. For more information on Translation API v3, visit our website. Read more »
  • How artificial intelligence in Google Drive helps you find files fast
    For your organization to perform its best, your employees need to be able to find, access and apply knowledge quicker than the rate at which your business is changing. Outside of raw data, a lot of your organization’s “knowledge” is likely contained within the content your employees create—think strategy docs, financial spreadsheets, customer presentations, and so on. The typical enterprise user has as many as 12,000 files saved in Google Drive. With so much information spread across your organization, it’s critical for workers to find relevant information fast.To help, we’ve built longstanding features into Drive to help surface relevant files for you, including one of the latest features: Priority in Drive. Priority is located in the upper left of Drive’s homepage. It surfaces files and suggests actions you might want to take, plus lets you create dedicated workspaces to help you stay focused.The results thus far have been promising. On average, Priority helps people find files twice as fast. When research shows that people spend nearly 20% of their time looking for internal information, that kind of time savings can make a big difference for your business.How it worksPriority in Drive helps users move quickly thanks to Google’s advanced artificial intelligence (AI) and machine learning (ML).The intelligence that makes Priority so helpful is actually powered by your own day-to-day actions. Every time you open a document, edit it, share it, comment on it, rename it and so on, it uses those signals to gauge its relative importance in the scope of all the content you’re working on. So if a coworker recently replied to a comment in a document that you’ve been working on, that file shows up as something you might want to take a look at within Priority. It takes into account things like recency, frequency, collaborators, and more to determine which files you’re most likely to be interested in working on in a given moment.Priority uses these signals to not only surface the most relevant files, but also to intelligently suggest actions—things like replying to a comment, viewing an attachment, granting a contributor access, or approving a request. And in the workspaces section (located underneath suggested files), AI and ML help suggest working sets of files you might want to group together for easier access based on indicators like similar titles, content, or contributors.Like any of the other intelligent features within apps you might use like Docs or Gmail, Drive only uses these context clues to surface information when you need it. It’s not used in any other way, or shared with other individuals inside (or outside) of your organization—not even your IT administrators.Using artificial intelligence to help you focus on what mattersInstead of spending your time searching for the right information in Drive, or even organizing and categorizing your content, now you can spend more time on work that actually matters, like creating new content to inform business decisions. To learn more about how Drive can help you save time and stay focused, check out our website. Read more »
  • How IFG and Google Cloud AI bring structure to unstructured financial documents
    In the world of banking, commercial lenders often struggle to integrate the accounting systems of financial institutions with those of their clients. This integration allows financial service providers to instantly capture information regarding banking activity, balance sheets, income statements, accounts receivable, and accounts payable reports. Based on these, financial institutions can perform instant analysis, using decision engines to provide qualitative and quantitative provisions for credit limits and approval.Today’s commercial and consumer-lending solutions depend on third-party data in order to offer funding opportunities to businesses. These new integrations can facilitate tasks like originations, on-boarding, underwriting, structuring, servicing, collection, and compliance.However, borrowers are reluctant to grant third parties access to internal data, which creates a barrier for adoption. Hence, clients must often submit unstructured financial documents such as bank statements and audited or interim financial statements via drag-and-drop interfaces on a client portal. Many lenders use OCR or virtual printer technology in the background to extract data, but the results are still far from consistent. These processes still require manual intervention to achieve acceptable accuracy, which may cause additional inconsistency and provide an unsatisfactory outcome.To address these challenges, the data science team at Interface Financial Group (IFG) turned to Google Cloud. IFG partnered with Google Cloud to develop a better solution, using Document Understanding AI, which has become an increasingly invaluable tool to process unstructured invoices. It lets the data science team at IFG build classification tools that capture layout and textual properties for each field of significance, and identify specific fields on an invoice. With Google’s tools they can tune feature selection, threshold tuning, and model comparison, yielding 99% accuracy in early trials.Extracting invoices will benefit the fast growing e-invoicing industry and financiers such as trade finance, asset based lending and supply chain finance platforms, connecting buyers and suppliers in a synchronized ecosystem. This environment creates transparency, which is essential for regulators and tax authorities. Ecosystems would benefit from suppliers who submit financial documents in various formats via supplier’s portals—once the documents are converted and analyzed, the structured output can contribute to the organization’s data feed almost instantly. This blog post explains the high level approach for the document understanding project, and you can find more details in the whitepaper.What the project set out to achieveIFG’s invoice recognition project aims to build a tool that extracts all useful information from invoice scans regardless of their format. Most commercially available invoice recognition tools rely on invoices that have been directly rendered to PDF by software and that match one of a set of predefined templates. In contrast, the IFG project starts with images of invoices that could originate from scans or photographs of paper invoices or be directly generated from software. The machine learning models built into IFG’s invoice recognition system recognize, identify, and extract 26 fields of interest.How IFG built its invoice classification solutionThe first step in any invoice recognition project is to collect or acquire images. Many companies consider their supply chains—their suppliers’ resulting invoices—to be confidential. And others simply do not see a benefit to maintaining scans of their invoices, IFG found it challenging to locate a large publicly available repository of invoice images. However, they were able to identify a robust, public dataset of line-item data from invoices. With this data, they were able to synthetically generate a set of 25,011 invoices with different styles, formats, logos, and address formats. From there, they used 20% of the invoices to train its models and then validate the models on the remaining 80%.The synthetic dataset only covers a subset of the standard invoices that businesses use today, but because the core of the IFG system uses machine learning instead of templates, it was able to classify new types invoices, regardless of format. IFG restricted the numbers in its sample set to U.S. standards for grouping, and restricted the addresses in its dataset to portions of the U.S.The invoice recognition process IFG built consists of several distinct steps and relies on several third-party tools. The first step in processing an invoice is to translate the image into text using optical character recognition (OCR). IFG chose Cloud Document Understanding AI for this step. The APIs output text grouped into phrases and their bounding boxes as well as individual words and numbers and their bounding box.IFG’s collaboration with the Google machine learning APIs team helped contribute to a few essential features in Document Understanding AI, most of which involve processing tabular data. IFG’s invoice database thus became a source of data for the API, and should assist other customers in achieving reliable classification results as well. The ability to identify tables has the potential to solve a variety of issues identifying data in the details table included in most invoices.After preprocessing, the data is fed into several different neural networks that were designed and trained using TensorFlow—and IFG also used other, more traditional models in its pipeline using scikit-learn. The machine learning systems used are sequence to sequence, naive Bayes, and a decision tree algorithms. Each system has its own strengths and weaknesses, and each system is used to extract different subsets of the data IFG was interested in. Using this ensemble model allowed them to achieve higher accuracy than any individual model.Next, sequence to sequence (Seq2Seq) models use a recurrent neural network to map input sequences to output sequences of possibly different lengths. IFG implemented a character-level sequence to sequence model for invoice ID parsing, electing to parse the document at the character level because invoice numbers can be numeric, alphanumeric, or even include punctuation.IFG found that Seq2Seq performs very well at identifying invoice numbers. Because invoice numbers can consist of virtually arbitrary sequences of characters, IFG abandoned the tokenized input and focused on the text as a character string. When applied to the character stream, the Seq2Seq model matched invoice numbers with approximately 99% accuracy.Because the Seq2Seq model was unable to distinguish street abbreviations from state abbreviations, IFG added a naive Bayes model to its pipeline. This hybrid model is now able to distinguish state abbreviations from street abbreviations with approximately 97% accuracy.IFG used naive Bayes integrates n-grams to reconstruct the document and place the appropriate features in their appropriate fields at the end of the process. Even though an address is identified, it must be associated with either the payor or payee in the case of invoice recognition. What precedes the actual address is of utmost importance in this instance.Neither Seq2Seq nor naive Bayes models were able to use the bounding box information to distinguish nearly identical fields such as payor address and payee address, so IFG added a decision tree model to its pipeline in order to distinguish these two address types.Lastly, IFG used a Pandas data frame to compare the output to the test data, using cross-entropy as a loss function for both accuracy and validity. Accuracy was correlated to the number of epochs used in training. An optimum number of epochs was discovered during testing to reach 99% accuracy or higher element recognition in most invoices.ConclusionDocument Understanding AI performs exceptionally well when capturing raw data from an image. The collaboration between IFG and Google Cloud allowed the team to focus on training a high-accuracy machine learning model that processes a variety of business documents. Additionally, the team leaned on several industry-standard NLP libraries to help parse and clean the output of the APIs for use in the trained models. In the process, IFG found the sequence to sequence techniques provided it with enough flexibility to solve the document classification problem for a number of different markets. The full technical details are available in this whitepaper.Going forward, IFG plans to take advantage of the growing number of capabilities in Document Understanding AI—as well as its growing training set—to properly process tabular data. Once all necessary fields are recognized and captured to an acceptable level of accuracy, IFG will extend the invoice recognition project to other types of financial documents. IFG ultimately expects to be able to process any sort of structured or unstructured financial document from an image into a data feed with enough accuracy to eliminate the need for consistent human intervention in the process. You can find more details about Document Understanding AI here.AcknowledgementsRoss Biro, Chief Technology Officer; Michael Cave, Senior Data Scientist, The Interface Financial Group drove implementation for IFG. Shengyang Dai, Engineering Manager, Vision API, Google Cloud, provided guidance throughout the project. Read more »
  • Forseti intelligent agents: an open-source anomaly detection module
    Among security professionals, one way to identify a breach or spurious entity is to detect anomalies and abnormalities in customer’ usage trend. At Google, we use Forseti,a community-driven collection of open-source tools to improve the security of Google Cloud Platform (GCP) environments. Recently, we launched the “Forseti Intelligent Agents” initiative to identify anomalies,  enable systems to take advantage of common user usage patterns, and identify other outlier data points. In this way, we hope to help security specialists for whom it’s otherwise cumbersome and time-consuming to manually flag these data points.Anomaly detection is a classic and common solution implemented across multiple business domains. We tested several machine-learning (ML) techniques for use in anomaly detection, analyzing existing data that had been used to create firewall rules and identify outliers. The approach, the results of which you can find in this whitepaper, was experimental and based on static analysis.At a high level, our goal is to use Forseti inventory data to achieve the following:Detect unusual instances between snapshots.Alert users of unusual firewall rules, provide comparisons with what expected behaviors.Provide potential remediation steps.Below is our solution. Note that it uses static data for now, but we can transform it to use dynamic data, if needed.The Forseti intelligent agents workflowTo build this solution, we took a multi-phase approach that imported firewall data into a BigQuery table, prepared and manipulated the data, then generated and evaluated a model. At the same time, we engaged in “feature-level decision stumps” (i.e., decision trees built after considering one feature as the label and all the rest as regular features) and performed bucketing and sample detection. Figure 1 is a high level depiction of our initial workflow. For pre-processing we experimented with approaches such as penalizing the subnet with a wider range. We also looked at Supernets, an example of which is depicted below.Some of these flattened firewall rules that we used to train the model can be depicted as follows:Then, for unsupervised learning, we experimented with techniques including  k-means clustering, decision stumps, and visualization in low-dimensional space.Feature weights for both principal components:Based on these results, we looked at a normal organization with thousands of firewall rules, and examining the points and clusters to the right, found some of the following anomalies (marked in RED below):*Model output has been anonymized for privacy and security.We conducted these experiments with firewall rules to prototype different approaches. You can read these approaches in detail in the whitepaper.A next step to follow up on this framework would be to use semi-supervised learning. Using some of the data points that our models can confidently flag as anomalous would also help in generating annotated data for such detailed analysis. Since we only used firewall rules in this initial study, as a next step, we plan to use other features such as hierarchical location of the firewall rules and network-related metadata.If you’re interested in contributing to the Forseti intelligent agents initiative, you can play around with any sample inventory data (or even your own), helping us generate broader anomaly detection mechanisms. By enlisting the community’s help with intelligent agents, we hope to continue to expand the Forseti toolset to help ensure the security of your cloud environment.For more details about this initiative, check out the solution here.Joe Cheuk, Cloud Application Engineer; Praneet Dutta, Cloud Machine Learning Engineer; and Nitin Aggarwal, Technical Program Manager, Cloud Machine Learning contributed to this report. Read more »
  • Topping the tower: the Obstacle Tower Challenge AI Contest with Unity and Google Cloud
    Ever since Marvin Minsky and several collaborators coined the term “artificial intelligence” in 1956, games have served as both a training ground and a benchmark for AI research. At the same time, in many cultures around the world, the ability to play certain games such as chess or Go has long been considered one of the hallmarks of human intelligence. And when computer science researchers started thinking about building systems that mimic human behavior, games emerged as a natural “playground” environment.Over the last decade, deep learning has driven a resurgence in AI research, and games have returned to the spotlight. Perhaps most significantly, in 2015 AlphaGo, an autonomous Go bot built by DeepMind (an Alphabet subsidiary) emerged as the best player in the world at the traditional board game Go. Since then, the DeepMind team has built bots that challenge top competitors at a variety of other games, including Starcraft.The competitionAs games have become a prominent arena for AI, Google Cloud and Unity Technologies decided to collaborate on a game-focused AI competition: the Obstacle Tower Challenge. Competitors create advanced AI agents in a game environment. The agents they create are AI programs that take as inputs the image data of the simulation, including obstacles, walls, and the main character’s avatar. They then provide the next action that the character takes in order to solve a puzzle or advance to the next level. The Unity engine runs the logic and graphics for the environment, which operates very much like a video game.Unity launched the first iteration of the Obstacle Tower Challenge in February, and the reception from the AI research community has been very positive. The competition has received more than 2,000 entries from several hundred teams around the world, including both established research institutions and collegiate student teams. The top batch of competitors, the highest scoring 50 teams, will receive an award sponsored by Google Cloud and advance to the second round.Completing the first round was a significant milestone, since teams had to overcome a fairly difficult hurdle, advancing past several levels of increased difficulty in the challenge. None of these levels were available to the researchers or their agents during training, so the agents had to learn complex behavior and generalize their behavior to handle previously unseen situations.The contest’s second round features a set of additional levels. These new three-dimensional environments incorporate brand new puzzles and graphical elements that force contestant research teams to develop more sophisticated machine learning models. New obstacles may stymie many of the agents that passed the levels from first phase.How Google Cloud can helpDeveloping complex game agents is a computationally demanding task, which is why we hope that the availability of Cloud credits will help participating teams. Google Cloud offers the same infrastructure that trained AlphaGo’s world-class machine learning models, to any developer around the world. In particular we recently announced the availability of Cloud TPU pods, for more information you can read this blog post.All of us at Google Cloud AI would like to congratulate the first batch of successful contestants of the Unity AI challenge, and we wish them the best of luck as they enter the second phase. We are excited to learn from the winning strategies. Read more »
  • Sunny spells: How SunPower puts solar on your roof with AI Platform
    Editor’s Note: Today’s post comes from Nour Daouk, Product Manager at SunPower. She describes how SunPower uses AI Platform to provide users with useful models and proposals of solar panel layouts for their home, with only a street address for user input.Have you ever wondered what solar panels would look like on your roof? At SunPower, we’re helping homeowners create solar designs from the comfort of their home. Specifically, we use deep learning and high-resolution imagery as inputs to models that design and visualize solar power systems on residential roofs. Read on to learn how and why we built this technology for our customers, called SunPower Instant Design.Homeowners typically spend a significant amount of time online researching solar panels and running calculations to understand their potential savings and  the number of panels they need for their home. There are no quick answers because every roof is different and every house requires a customized design. With SunPower Instant Design, homeowners can create their own designs in seconds, which improves their buying experience, reduces barriers to going solar, and ultimately increases solar adoption.Instant Design’s 3D model of a roof with obstructions in red (left), satellite image with panel layout (middle), and input satellite image (right)How we helpDesigning a solar power system for a home is a process that relies on factors unique to each home. First, we model the roof in three dimensions to account for obstructions such as chimneys and vents. Second, we lay legally-mandated access walkways and place solar panels on the roof segments. Finally, we model the angle and exposure of sunlight hitting the roof to calculate the system’s potential energy production. With Instant Design, we replicate this same process by leveraging tools including machine learning and optimization. Below, we’ll explain how we used deep neural networks to obtain accurate three-dimensional models of residential roofs.The data: guiding the design with both color and depth imageryIt is probably possible to design a three-dimensional model of a roof with satellite imagery alone, but design accuracy improves greatly with the use of a height map. For Instant Design, we partnered with Google Project Sunroof for access to both satellite and digital surface model (DSM) data. We used our database of manually generated designs as a base for our labeled data, and projected those onto the RGB and depth channels for the training, validation, and test sets. We also generated augmentations—including rotation and translation—to reduce overfitting.  Roof segmentationTo reconstruct a roof, we model each roof segment with its corresponding pitch and azimuth in three dimensions. We began to identify roof segments by applying image processing and edge detection on both the satellite and depth data, but we quickly realized that semantic segmentation would yield much better results, as similar edges were detected successfully with that method in research literature.Image processing result (left), neural network-based result (middle) and input satellite image (right)After some experimentation, we chose to perform semantic segmentation, and then selected a version of a U-net that works well with our type of imagery at high speeds. The U-net architecture was a solid starting point, with a few tweaks for better results. For instance, we added batch normalization to each convolutional layer for regularization and selected the Wide Residual Network as our encoder for improved accuracy. We also created a domain-specific loss function to get the model to converge to meaningful outcomes.U-net diagram (click for source)What gets in the way: chimneys, vents, pipes, and skylightsIn an effort to avoid mistakenly placing panels on obstructions such as chimneys, vents, pipes, skylights, and previously-installed panels, our next step is to detect those obstructions as separate items on the roof. Our main challenge here was that we had to handle both the quantity and size of the obstructions, and address any imbalance in class representation. Indeed, there are more roof pixels than obstruction pixels in our images. Due to the difference in shape and scale of chosen classes we decided to use a separate model from the segmentation model to detect obstructions, although both models are similar in structure.Roof with detected obstructions outlined in redSpeed and scale via Cloud AI PlatformOnce we had built a satisfactory proof of concept, we quickly realized that we would need to iterate on our model in order to deliver an experience that was ready for homeowners. We needed to build a development pipeline that could quickly bring modeling ideas from conception to deployment, so we chose AI Platform to help us scale. Our initial training setup was on our own servers, and the training process was slow: training a new model took a week. In contrast, on AI Platform, we were able to train and test a new model in a single day. Moreover, we took full advantage of the ability to train multiple models simultaneously to conduct a vast hyperparameter search. For our prediction, we used NVIDIA V100 GPU-enabled virtual machines on GCP with nvidia-docker, which helped us achieve prediction times of around one second.ConclusionSunPower empowers homeowners to understand the amount of energy they can generate with solar, now with just a few clicks. Our team was able to start work on this exciting project due to advances in aerial imagery and machine learning. And AI Platform helped us focus on the core design problem, achieve our goals faster, and create designs quickly.We are changing how we offer solar power to homeowners by giving them immediate answers to their questions. While we have more work to do, we are optimistic that SunPower Instant Design will transform the solar industry when our first product featuring this technology launches this summer.To learn more about how SunPower is using the cloud, read this blog post from Google Cloud CEO Thomas Kurian. Read more »
  • No deep learning experience needed: build a text classification model with Google Cloud AutoML Natural Language
    Modern organizations process greater volumes of text than ever before. Although certain tasks like legal annotation must be performed by experienced professionals with years of domain expertise, other processes require simpler types of sorting, processing, and analysis, with which machine learning can often lend a helping hand.Categorizing text content is a common machine learning task—typically called “content classification”—and it has all kinds of applications, from analyzing sentiment in a review of a consumer product on a retail site, to routing customer service inquiries to the right support agent. AutoML Natural Language helps developers and data scientists build custom content classification models without coding. Google Cloud’s Natural Language API helps you classify input text into a set of predefined categories. If those categories work for you, the API is a great place to start, but if you need custom categories, then building a model with AutoML Natural Language is very likely your best option.In this blog post, we'll guide you through the entire process of using AutoML Natural Language. We'll use the 20 Newsgroups dataset, which consists of about 20,000 posts, roughly evenly divided across 20 different newsgroups, and is frequently used for content classification and clustering tasks.As you'll see, this can be a fun and tricky exercise, since the posts typically use casual language and don't always stay on topic. Also, some of the newsgroups that we’ll use from the dataset overlap quite a bit; for example, two disparate groups cover PC and Mac hardware.Preparing your dataLet's first start by downloading the data. I've included a link to a Jupyter notebook that will download the raw dataset, and then transform it into the CSV format expected by AutoML Natural Language. AutoML Natural Language looks for the text itself or a URL in the first column, and the label in the second column. In our example, we're assigning one label to each sample, but AutoML Natural Language also supports multiple labels.To download the data, you can simply run the notebook in the hosted Google Colab environment, or you can find the source code on GitHub.Importing your dataWe are now ready to access the AutoML Natural Language UI. Let's start by creating a new dataset by clicking the New Dataset button. Create a name like twenty_newsgroups and upload the CSV you downloaded in the earlier step.Training your modelIt will take several minutes for the endpoint to import your training text. Once complete, you'll see a list of the text items and each accompanying label. You can drill down into the text items for specific labels on the left side.After you’ve loaded your data successfully, you can move on to the next stage by training your model. It will take several hours to return the optimal model, and you’ll receive notification emails about the status of the training.Evaluating your modelWhen the model training is complete, you'll see a dashboard that displays a number of metrics. AutoML Natural Language generates these metrics comparing predictions against the actual labels in the test set. If these metrics are new to you, I'd recommend reading more about them in the Google Machine Learning Crash Course. In short, recall represents how well the model found instances of the correct label (minimizing false negatives). Precision represents how well it did at avoiding labeling instances incorrectly (minimizing false positives).The precision and recall metrics from this example are based on a score threshold of 0.5. You can try adjusting this threshold to see how it impacts your metrics. You can see that there is a tradeoff between precision and recall. If the confidence required to apply a label rises from 0.5 to 0.9, for example, precision will go up because your model will be less likely to mislabel a sample. On the other hand, recall will go down because any samples between 0.5 and 0.9 which were previously identified will not be labeled.Just below this paragraph, you’ll find a confusion matrix. This tool can help you more precisely evaluate the model’s accuracy at the label level. You'll not only see how often the model identified each label correctly, but you'll see which labels it mistakenly identified. You can drill down to see specific examples of false positives and negatives. This can prove to be very useful information, because you’ll know whether you need to add more training data to help your model better differentiate between labels that it frequently failed to predict.PredictionLet's have some fun and try this on some example text. By moving to the Predict tab, you can paste or type some text and see how your newly trained model labels it. Let's start with an easy example. I'll take the first paragraph of a Google article about automotive trends, and paste it in. Woohoo! 100% accuracy.You can try some more examples yourself, entering text that might be a little tougher for the model to distinguish. You'll also see how to invoke a prediction using the API at the bottom. For more details, the documentation provides examples in Python, Java, and Node.js.ConclusionOnce you’ve created a custom model that organizes content into categories, you can then use AutoML Natural Language’s robust evaluation tools to assess your model's accuracy. These will help you refine your threshold and potentially add more data to shore up any weaknesses. Try it out for yourself! Read more »
  • Google’s scalable supercomputers for machine learning, Cloud TPU Pods, are now publicly available in beta
    To accelerate the largest-scale machine learning (ML) applications deployed today and enable rapid development of the ML applications of tomorrow, Google created custom silicon chips called Tensor Processing Units (TPUs). When assembled into multi-rack ML supercomputers called Cloud TPU Pods, these TPUs can complete ML workloads in minutes or hours that previously took days or weeks on other systems. Today, for the first time, Google Cloud TPU v2 Pods and Cloud TPU v3 Pods are publicly available in beta to help ML researchers, engineers, and data scientists iterate faster and train more capable machine learning models.A full Cloud TPU v3 PodDelivering business valueGoogle Cloud is committed to providing a full spectrum of ML accelerators, including both Cloud GPUs and Cloud TPUs. Cloud TPUs offer highly competitive performance and cost, often training cutting-edge deep learning models faster while delivering significant savings. If your ML team is building complex models and training on large data sets, we recommend that you evaluate Cloud TPUs whenever you require:Shorter time to insights—iterate faster while training large ML modelsHigher accuracy—train more accurate models using larger datasets (millions of labeled examples; terabytes or petabytes of data)Frequent model updates—retrain a model daily or weekly as new data comes inRapid prototyping—start quickly with our optimized, open-source reference models in image segmentation, object detection, language processing, and other major application domainsWhile some custom silicon chips can only perform a single function, TPUs are fully programmable, which means that Cloud TPU Pods can accelerate a wide range of state-of-the-art ML workloads, including many of the most popular deep learning models. For example, a Cloud TPU v3 Pod can train ResNet-50 (image classification) from scratch on the ImageNet dataset in just two minutes or BERT (NLP) in just 76 minutes.Cloud TPU customers see significant speed-ups in workloads spanning visual product search, financial modeling, energy production, and other areas. In a recent case study, Recursion Pharmaceuticals iteratively tests the viability of synthesized molecules to treat rare illnesses. What took over 24 hours to train on their on-prem cluster completed  in only 15 minutes on a Cloud TPU Pod.What’s in a Cloud TPU PodA single Cloud TPU Pod can include more than 1,000 individual TPU chips which are connected by an ultra-fast, two-dimensional toroidal mesh network, as illustrated below. The TPU software stack uses this mesh network to enable many racks of machines to be programmed as a single, giant ML supercomputer via a variety of flexible, high-level APIs.2D toroidal mesh networkThe latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance, and each one delivers more than 100 petaFLOPs of computing power. In terms of raw mathematical operations per second, a Cloud TPU v3 Pod is comparable with a top 5 supercomputer worldwide (though it operates at lower numerical precision).It’s also possible to use smaller sections of Cloud TPU Pods called “slices.” We often see ML teams develop their initial models on individual Cloud TPU devices (which are generally available) and then expand to progressively larger Cloud TPU Pod slices via both data parallelism and model parallelism to achieve greater training speed and model scale.You can learn more about the underlying architecture of TPUs in this blog post or this interactive website, and you can learn more about individual Cloud TPU devices and Cloud TPU Pod slices here.Getting startedIt’s easy and fun to try out a Cloud TPU in your browser right now via this interactive Colab that enables you to apply a pre-trained Mask R-CNN image segmentation model to an image of your choice. You can learn more about image segmentation on Cloud TPUs in this recent blog post.Next, we recommend working through our Cloud TPU Quickstart and then experimenting with one of the optimized and open-source Cloud TPU reference models listed below. We carefully optimized these models to save you time and effort, and they demonstrate a variety of Cloud TPU best practices. Benchmarking one of our official reference models on a public dataset on larger and larger pod slices is a great way to get a sense of Cloud TPU performance at scale.Image classificationResNet (tutorial, code, blog post)AmoebaNet-D (tutorial, code)Inception (tutorial, code)Mobile image classificationMnasNet (tutorial, code, blog post)MobileNet (code)Object detectionRetinaNet (tutorial, code, blog post)TensorFlow Object Detection API (blog post, tutorial)Image segmentationMask R-CNN (tutorial, code, blog post, interactive Colab)DeepLab (tutorial, code, blog post, interactive Colab)Natural language processingBERT (code, interactive Colab)Transformer (tutorial, Tensor2Tensor docs)Mesh TensorFlow (paper, code)QANet (code)Transformer-XL (code)Speech recognitionASR Transformer (tutorial)Lingvo (code)Generative Adversarial NetworksCompare GAN library, including a reimplementation of BigGAN (blog post, paper, code)DCGAN (code)After you work with one of the above reference models on Cloud TPU, our performance guide, profiling tools guide, and troubleshooting guide can give you in-depth technical information to help you create and optimize machine learning models on your own using high-level TensorFlow APIs. Once you’re ready to request a Cloud TPU Pod or Cloud TPU Pod slices to accelerate your own ML workloads, please contact a Google Cloud sales representative. Read more »
  • Improving data quality for machine learning and analytics with Cloud Dataprep
    Editor’s note: Today’s post comes to us from Bertrand Cariou at Trifacta, and presents some steps you might take in Cloud Dataprep to clean your data for later use for your analytics or in training a machine learning model.Data quality is a critical component of any analytics and machine learning initiative, and unless you’re working with pristine, highly-controlled data, you’ll likely face data quality issues. To illustrate the process of turning unknown, inconsistent data into trustworthy assets, we will leverage the example of a forecast analyst in the retail (consumer packaged goods) industry. Forecast analysts must be extremely accurate in planning the right quantities to order. Supplying too much product results in wasted resources, whereas supplying too little means that they risk losing profit. On top of that, an empty shelf also risks consumers choosing a competitor’s product, which can have a harmful, long-term impact on the brand.To strike the right balance between appropriate product stocking levels and razor-thin margins, forecast analysts must continually refine their analysis and predictions, leveraging their own internal data as well as third-party data, over which they have no control.Every business partner, including suppliers, distributors, warehouses and other retail stores, may provide data (e.g. inventory, forecast, promotions, or past transactions) in various shapes and level of quality. One company may use palettes instead of boxes as a unit of storage, pounds versus kilograms, may have different categories nomenclature and namings, may use a different date format, or will most likely have product SKUs that are a combination of internal and other supplier IDs. Furthermore, some data may be missing or may have been incorrectly entered.Each of these data issues represents an important risk to reliable forecasting. Forecast analysts must absolutely clean, standardize, and gain trust in the data before they can report and model on it accurately. This post reviews key techniques for cleaning data with Cloud Dataprep and covers new features that may help improve your data quality with minimal effort.Basic conceptsCleaning data with Cloud Dataprep corresponds to a three-step iterative process:Assessing your data qualityResolving or remediating any issues uncoveredValidating cleaned data, at scaleCloud Dataprep constantly profiles the data you’re working on, from the moment you open the grid interface and start preparing data. With Dataprep’s real-time Active Profiling, you can see the impact of each data cleaning step on your data.The profile result is summarized at the column header with basic data points to point out key characteristics in your data, in the form of an interactive visual profile. By clicking one of these profile column header bars, Cloud Dataprep suggests some transformations to remediate mismatched or missing values. You can always try a transformation, preview its impact, select it or tweak it. At any point, you can always revert to a specific previous step if you don’t like the result.With these basic concepts in mind, let’s cover Cloud Dataprep data quality capabilities.1. Assessing your data qualityAs soon as you open a dataset in the grid interface, you can access to data quality signals that help you assess data issues and guide your work in cleaning the data.Rapid profilingYou’ll likely scan over your column headers and identify the potential quality issues to understand which columns may need your attention. Mismatched values (red bar) based on the inferred data types, missing values (black) and uneven value distribution (bars) can help you quickly identify which columns need your attention.In this particular case, our forecast analyst knows she’ll have to drill down on the `material` field that includes some mismatched and missing values. How should these data defaults impact her forecast and replenishment models?Intermediary data profilingIf you click on a column header, you’ll see some extra statistics in the right panel of Dataprep. This is particularly useful if you expect a specific format standard for a field and want to identify the values that don’t comply to the standard. In the example below, you can see that Cloud Dataprep discovered three different format patterns for the order_date. You might have follow-up questions: can empty order dates be leveraged in the forecast? Can mismatched dates can be corrected and how can you correct them?Advanced profilingIf you click “Show more”, or click the column header menu and “column details” in the main grid, you’ll land on a comprehensive data profiling page with some details about mismatched values, value distribution, or outliers. You can also navigate to the pattern tab to explore the data structure within a specific column.These three data profiling capabilities are dynamic by nature in the sense that Cloud Dataprep reprofiles the data in real time at each step of a transformation, to always present you with the latest information. This helps you clean your data faster and more effectively.The value for the forecast analyst is that she can immediately validate as she goes through the process of cleaning and transforming the data so that it fits the format she expects for her downstream modeling and reporting.2. Resolving data quality issuesDynamic profiling helps you assess the data quality at hand, and it is also the point of entry to start cleaning the data. Graph profiles are interactive and offer transformation suggestions as soon as you interact with them. For example, clicking the missing value space in the column header displays transformation suggestions such as deleting the values or setting the values to a default one.Resolving incorrect patternsYou can efficiently resolve incorrect patterns in a column (such as the recurrent date formatting issue in the order_data column) by accessing the pattern tab in the column details screen. Cloud Dataprep shows you the most frequent patterns. Once you select a target conversion format, Cloud Dataprep displays some transformation suggestions on the right panel, to convert all the data to fit the selected pattern. Watch the animation below, and try it for yourself:Highlight over data contentAnother interactive way to clean your data is to highlight over some portion of a value in a cell. Cloud Dataprep will suggest a set of transformations based on your selection, and you can refine the selection by highlighting over some additional content from another cell. Here is an example that extracts the month from the order date in order to calculate the volume per month:Format, replace, conditional functions, and moreYou can find most of the functions you’ll use to clean up data in the Column menu from the format or replace sections, or in the conditional formulas in the icon bar as shown below. These can be useful to convert all product or category names into uppercase or trim the names that have often quotes after import from a CSV or Excel file.Format functionsExtract functionsThe extract functions can be particularly useful to extract a subset of a value within a column. For example, you may want to extract from the product_id “Item: ACME_66979905111536979300 - PASTA RONI FETTUCINE ALFR” each individual component by splitting it on the “ - ” value.Conditional functionsConditional functions are useful for tagging values that are out of scope. For example you can write a formula that will tag records when a quantity is over 10,000, which wouldn’t be valid for the order sizes you typically encounter.If none of the visual suggestions give you what you require for cleaning your data, you can always edit a suggestion or manually adding a new step in a Dataprep recipe. Type in a search box what you want to do and Cloud Dataprep will suggest some transformations you can then edit and apply to the dataset.StandardizationStandardizing values is a way to group similar values into a single, consistent format. This problem is especially prevalent with free-form entries like product, product categories, company names. You can access the standardization feature from the Column menu. Additionally, Cloud Dataprep can group similar values together by string similarities or by pronunciation.Tip: You can mix-and-match standardization algorithms. Some values may be standardized using spelling, while others are more sensibly standardized based on international pronunciation standards.3. Validation at scaleThe last, critical step of a typical data quality workflow in Cloud Dataprep is to validate that no single data quality issue remains in the dataset, at scale.Leveraging sampling to clean dataSometimes, the full volume of a dataset won’t fit into Cloud Dataprep via your browser tab (especially when leveraging BigQuery tables with hundreds of millions of records or more). In that case, Cloud Dataprep automatically samples the data from BigQuery to fit it in your local computer’s memory. That might lead you to question: how can you ensure you’ve standardized all the data from one column (e.g. product name, category, region, etc.) or you have cleaned all the date formats from another?You can adjust your sampling settings by clicking the sampling icon at the top right and choosing the sampling technique that fits your requirements.Select anomaly-based to keep all the data mismatched or missing for one of multiple columnsSelect stratified to retrieve every distinct value for a particular column (particularly useful for standardization)Select filter-based to retrieve all the data based on particular formula (i.e format does not match dd/mm/yyyy)Profiling the data at scaleAt this point, hopefully you’re happy and confident that your recipe will produce a clean dataset, but until you run it at scale across the whole data set, you can’t ensure all your data is valid. To do so, click the ‘Run Job’ button and check that Profile Results is enabled.If in the job results you still see some red, this most likely means you need to adjust your data quality rules and try again.SchedulingTo ensure that the data quality rules you create are applied on a recurring basis schedule your recipes to run automatically. In the case of forecasting, data may change on a weekly basis, so users must run the job every week to validate that all the profile results stay green over time. If not, you can simply reopen and adapt the recipe to address the new data inconsistencies you discovered.In the flow view, select Schedule Flow to define the parameters to run the job on a recurring basis.ConclusionOur example here is retail-specific, but regardless of your area of expertise or industry, you may encounter similar data issues. Following this process and leveraging Cloud Dataprep, you can become more effective and faster at cleaning up your data for analytics or feature engineering.We hope you that by using Cloud Dataprep, the toil of cleaning up your data and improving your data quality is, well, not so messy. If you’re ready to start, log in to Dataprep via Google Cloud Console to start using this three-step data quality workflow on your data. Read more »
  • Empower your AI Platform-trained serverless endpoints with machine learning on Google Cloud Functions
    Editor’s note:Today’s post comes from Hannes Hapke at Caravel. Hannes describes how Cloud Functions can accelerate the process of hosting machine learning models in production for conversational AI, based on serverless infrastructure.At Caravel, we build conversational AI for digital retail clients — work that relies heavily on Google Cloud Functions. Our clients experience website demand fluctuations that vary by the day of the week or even by time-of-day. Because of the constant change in customer requests, Google Cloud Platform’s serverless endpoints help us handle fluctuating demand for our service. Unfortunately, serverless functions are limited in available memory and CPU cycles, which makes them an odd place to deploy machine learning models. However, Cloud Functions offer a tremendous ease in deploying API endpoints, so we decided to integrate machine learning models without deploying them to the endpoints directly.If your organization is interested in using serverless functions to help address its business problems, but you are unsure how you can use your machine learning models with your serverless endpoints, read on. We’ll explain how our team used Google Cloud Platform to deploy machine learning models on serverless endpoints. We’ll focus on our preferred Python solution and outline some ways you can optimize your integration. If you would prefer to build out a Node.js implementation, check out “Simplifying ML Prediction with Google Cloud Functions.”Architecture OverviewFigure 1: System architecture diagram.First, let’s start with the architecture. As shown in Figure 1, this example consists of three major components: a static page accessible to the user, a serverless endpoint that handles all user requests, and a model instance running on AI Platform. While other articles suggest loading the machine learning model directly onto the serverless endpoint for online predictions, we found that approach to have a few downsides:Loading the model will increase your serverless function memory footprint, which can accrue unnecessary expenses.The machine learning model has to be deployed with the serverless function code, meaning the model can’t be updated independently from a code deployment.For the sake of simplicity, we’re hosting the model for this example on an AI Platform serving instance, but we could also run our own Tensorflow Serving instance.Model setupBefore we describe how you might run your inference workload from a serverless endpoint, let’s quickly set up the model instance on Cloud AI Platform.1. Upload the latest exported model to a Cloud Storage bucket. We exported our model from TensorFlow’s Keras API.Create a bucket for your models and upload the latest trained model into its own folder.2. Head over to AI Platform from the Console and register a new model.Set up a new model on AI Platform.3. After registering the model, set up a new model version, probably your V1. To start the setup steps, click on ‘Create version.’Note: Under Model URI link to the Cloud Storage Bucket where you saved the exported model.You can choose between different ML frameworks. In our case, our model is based on TensorFlow 1.13.1.For our demo, we disable model autoscaling.Once the creation of the instance is completed and the model is ready to serve, you’ll see a green icon next to the model’s version name.Inferring a prediction from a serverless endpointInferring a prediction with Python is fairly straightforward. You need to generate a payload that you would like to submit to the model endpoint, and then you submit it to that endpoint. We’ll cover the generation of the payload in the following sections, but for now, let’s focus on inferring an arbitrary payload.Google provides a Python library google-api-python-client that allows you to access its products through a generic API interface. You can install it with:Once installed, you need to “discover” your desired service. In our case, the service name is ml. However, you aren’t limited to just the prediction functionality; depending on your permissions (more later on that), you can access various API services of AI Platform. You’ll now want to execute any API request you created thus far. If you don’t encounter any errors, the response should contain the model’s response: its prediction.PermissionsCloud Functions on Google Cloud Platform execute all requests as the user with the id:By default, your account has Editor permissions for the entire project, and you should be able to execute online predictions. At the time of this blog post’s publication, you can’t control permissions per serverless function, but if you want to try out the functionality yourself, sign up for the Alpha Tester Program.Generating a request payloadBefore submitting our inference request, you need to generate your payload with the input data for the model. At Caravel, we trained a deep learning model to classify the sentiment of sentences. We developed our model on Keras and TensorFlow 1.13.1, and because we wanted to limit the amount of preprocessing required on the client side, we decided to implement our preprocessing steps with TensorFlow (TF) Transform. Using TF Transform has multiple advantages:Preprocessing can occur server-side.Because the preprocessing runs on the server side, you can update the preprocessing functionality without affecting the clients. If this weren’t the case, you could imagine a situation like the following: if you perform the preprocessing in a mobile client, you would have to update all clients in case you implement changes or provide new endpoints for every change (not scalable).The preprocessing steps are consistent between the training, validation, and serving stages. Changes to the preprocessing steps will force you to re-train the model, which avoids misalignment between these steps and already trained models.You can transform the dataset nicely and train and validate your datasets efficiently, but at time of writing, you still need to convert your Keras model to a TensorFlow Estimator, in order to properly integrate TF Transform with Keras. With TensorFlow Transform, you can submit raw data strings as inputs to the model. The preprocessing graph, which is running in conjunction with the model graph, will convert your string characters first into character indices and then into embedding vectors.Connecting the preprocessing graph in TF Transform with our TensorFlow modelOur AI Platform instance and any TensorFlow Serving instance both expect a payload dictionary that includes the key instances, which contains a list of input dictionaries for each inference. You can submit multiple input dictionaries in a single request; the model server can infer the predictions all in a single request through the amazing batching feature of TensorFlow Serving. Thus, the payload for your sentence classification demo should look like this:We moved the generation step into its own helper function to allow for potential manipulation of the payload—when we want to lower-case or tokenize the sentences, for example. Here, however, we have not yet included such manipulations._connect_service provides us access to the AI platform service with the service name “ML”. At the time of writing this post, the current version was “v1”. We have encapsulated the service discovery into its own function to be able to add more parameters like account credentials, if needed.Once you generate a payload in the correct data structure and have access to the GCP service, you can infer predictions from the AI Platform instance. Here is an example:Obtaining model meta-information from the AI Platform training instanceSomething amazing happens when the Cloud Function setup interacts with the AI Platform instance: the client can infer predictions without any knowledge of the model. You don’t need to specify the model version during the inference, because the AI Platform Serving instance handles that for you. However, it’s generally very useful to know which version was used for the prediction. At Caravel, we track our models’ performance extensively, and our team prioritizes knowing when each model was used and deployed and consider this to be essential information.Obtaining the model meta information from the AI Platform instance is simple, because the Serving API has its own endpoint for requesting the model information. This helps a lot when you perform a large number of requests and only need to obtain the meta information once.The little helper function below obtains model information for any given model in a project. You’ll need to call two different endpoints, depending on whether we want to obtain the information for a specific model version or just for the default model. You can specify this in the AI Platform Command Console.Here is a brief example of metadata returned from the AI Platform API endpoint:ConclusionServerless functions have proven very useful to our team, thanks to their scalability and ease of deployment. The Caravel team wanted to demonstrate that both concepts can work together easily and share our best practices, as machine learning becomes an essential component of a growing number of today’s leading applications.In this blog post, we introduced the setup of a machine learning model on AI Platform and how to infer model predictions from a Python 3.7 Cloud Function. We also reviewed how you might structure your prediction payloads, as well as an overview of how you can request model metadata from the model server. By splitting your application between the Cloud Functions and AI Platform, you can deploy your legacy applications in an efficient and cost-effective manner.If you’re interested in ways to reduce network traffic between your serverless endpoints, we recommend our follow-up post on how to generate model request payloads with the ProtoBuf serialization format. To see this example live, check out our demo endpoint here, and if you want to start with some source code to build your own, you can find it in the ML on GCP GitHub repository.Acknowledgements: Gonzalo Gasca Meza, Developer Programs Engineer contributed to this post. Read more »
  • Efficiently scale ML and other compute workloads on NVIDIA’s T4 GPU, now generally available
    NVIDIA’s T4 GPU, now available in regions around the world, accelerates a variety of cloud workloads, including high performance computing (HPC), machine learning training and inference, data analytics, and graphics. In January of this year, we announced the availability of the NVIDIA T4 GPU in beta, to help customers run inference workloads faster and at lower cost. Earlier this month at Google Next ‘19, we announced the general availability of the NVIDIA T4 in eight regions, making Google Cloud the first major provider to offer it globally.A focus on speed and cost-efficiencyEach T4 GPU has 16 GB of GPU memory onboard, offers a range of precision (or data type) support (FP32, FP16, INT8 and INT4), includes NVIDIA Tensor Cores for faster training and RTX hardware acceleration for faster ray tracing. Customers can create custom VM configurations that best meet their needs with up to four T4 GPUs, 96 vCPUs, 624 GB of host memory and optionally up to 3 TB of in-server local SSD.At time of publication, prices for T4 instances are as low as $0.29 per hour per GPU on preemptible VM instances. On-demand instances start at $0.95 per hour per GPU, with up to a 30% discount with sustained use discounts.Tensor Cores for both training and inferenceNVIDIA’s Turing architecture brings the second generation of Tensor Cores to the T4 GPU. Debuting in the NVIDIA V100 (also available on Google Cloud Platform), Tensor Cores support mixed-precision to accelerate matrix multiplication operations that are so prevalent in ML workloads. If your training workload doesn’t fully utilize the more powerful V100, the T4 offers the acceleration benefits of Tensor Cores, but at a lower price. This is great for large training workloads, especially as you scale up more resources to train faster, or to train larger models.Tensor Cores also accelerate inference, or predictions generated by ML models, for low latency or high throughput. When Tensor Cores are enabled with mixed precision, T4 GPUs on GCP can accelerate inference on ResNet-50 over 10X faster with TensorRT when compared to running only in FP32. Considering its global availability and Google’s high-speed network, the NVIDIA T4 on GCP can effectively serve global services that require fast execution at an efficient price point. For example, Snap Inc. uses the NVIDIA T4 to create more effective algorithms for its global user base, while keeping costs low.“Snap’s monetization algorithms have the single biggest impact to our advertisers and shareholders. NVIDIA T4-powered GPUs for inference on GCP will enable us to increase advertising efficacy while at the same time lower costs when compared to a CPU-only implementation.” —Nima Khajehnouri, Sr. Director, Monetization, Snap Inc.The GCP ML Infrastructure combines the best of Google and NVIDIA across the globeYou can get up and running quickly, training ML models and serving inference workloads on NVIDIA T4 GPUs by using our Deep Learning VM images. These include all the software you’ll need: drivers, CUDA-X AI libraries, and popular AI frameworks like TensorFlow and PyTorch. We handle software updates, compatibility, and performance optimizations, so you don’t have to. Just create a new Compute Engine instance, select your image, click Start, and a few minutes later, you can access your T4-enabled instance. You can also start with our AI Platform, an end-to-end development environment that helps ML developers and data scientists to build, share, and run machine learning applications anywhere. Once you’re ready, you can use Automatic Mixed Precision to speed up your workload via Tensor Cores with only a few lines of code.Performance at scaleNVIDIA T4 GPUs offer value for batch compute HPC and rendering workloads, delivering dramatic performance and efficiency that maximizes the utility of at-scale deployments. A Princeton University neuroscience researcher had this to say about the T4’s unique price and performance:“We are excited to partner with Google Cloud on a landmark achievement for neuroscience: reconstructing the connectome of a cubic millimeter of neocortex. It’s thrilling to wield thousands of T4 GPUs powered by Kubernetes Engine. These computational resources are allowing us to trace 5 km of neuronal wiring, and identify a billion synapses inside the tiny volume.” —Sebastian Seung, Princeton UniversityQuadro Virtual Workstations on GCPT4 GPUs are also a great option for running virtual workstations for engineers and creative professionals. With NVIDIA Quadro Virtual Workstations from the GCP Marketplace, users can run applications built on the NVIDIA RTX platform to experience the next generation of computer graphics, including real-time ray tracing and AI-enhanced graphics, video and image processing, from anywhere.“Access to NVIDIA Quadro Virtual Workstation on the Google Cloud Platform will empower many of our customers to deploy and start using Autodesk software quickly, from anywhere. For certain workflows, customers leveraging NVIDIA T4 and RTX technology will see a big difference when it comes to rendering scenes and creating realistic 3D models and simulations. We’re excited to continue to collaborate with NVIDIA and Google to bring increased efficiency and speed to artist workflows." —Eric Bourque, Senior Software Development Manager, AutodeskGet started todayCheck out our GPU page to learn more about how the wide selection of GPUs available on GCP can meet your needs. You can learn about customer use cases and the latest updates to GPUs on GCP in our Google Cloud Next 19 talk, GPU Infrastructure on GCP for ML and HPC Workloads. Once you’re ready to dive in, try running a few TensorFlow inference workloads by reading our blog or our documentation and tutorials. Read more »
WordPress RSS Feed Retriever by Theme Mason

Author: hits1k

Leave a Reply