AI machine learning

  • TensorFlow 2.0 and Cloud AI make it easy to train, deploy, and maintain scalable machine learning models
    Since it was open-sourced in 2015, TensorFlow has matured into an entire end-to-end ML ecosystem that includes a variety of tools, libraries, and deployment options to help users go from research to production easily. This month at the 2019 TensorFlow Dev Summit we announced TensorFlow 2.0 to make machine learning models easier to use and deploy.TensorFlow started out as a machine learning framework and has grown into a comprehensive platform that gives researchers and developers access to both intuitive higher-level APIs and low-level operations. In TensorFlow 2.0, eager execution is enabled by default, with tight Keras integration. You can easily ingest datasets via pipelines, and you can monitor your training in TensorBoard directly from Colab and Jupyter Notebooks. The TensorFlow team will continue to work on improving TensorFlow 2.0 alpha with a general release candidate coming later in Q2 2019. Making ML easier to useThe TensorFlow team’s decision to focus on developer productivity and ease of use doesn’t stop at iPython notebooks and Colab, but extends to make API components integrate far more intuitively with tf.keras (now the standard high level API), and to TensorFlow Datasets, which let users import common preprocessed datasets with only one line of code. Data ingestion pipelines can be orchestrated with, pushed into production with TensorFlow Extended (TFX), and scaled to multiple nodes and hardware architectures with minimal code change using distribution strategies.The TensorFlow engineering team has created an upgrade tool and several migration guides to support users who wish to migrate their models from TensorFlow 1.x to 2.0. TensorFlow is also hosting a weekly community testing stand-up for users to ask questions about TensorFlow 2.0 and migration support. If you’re interested, you can find more information on the TensorFlow website.Upgrading a model with the tf_upgrade_v2 tool.Experiment and iterateBoth researchers and enterprise data science teams must continuously iterate on model architectures, with a focus on rapid prototyping and speed to a first solution. With eager execution a focus in TensorFlow 2.0, researchers have the ability to use intuitive Python control flows, optimize their eager code with tf.function, and save time with improved error messaging. Creating and experimenting with models using TensorFlow has never been so easy.Faster training is essential for model deployments, retraining, and experimentation. In the past year, the TensorFlow team has worked diligently to improve training performance times on a variety of platforms including the second-generation Cloud TPU (by a factor of 1.6x) and the NVIDIA V100 GPU (by a factor of more than 2x). For inference, we saw speedups of over 3x with Intel’s MKL library, which supports CPU-based Compute Engine instances.Through add-on extensions, TensorFlow expands to help you build advance models. For example, TensorFlow Federated lets you train models both in the cloud and on remote (IoT or embedded) devices in a collaborative fashion. Often times, your remote devices have data to train on that your centralized training system may not. We also recently announced the TensorFlow Privacy extension, which helps you strip personally identifiable information (PII) from your training data. Finally, TensorFlow Probability extends TensorFlow’s abilities to more traditional statistical use cases, which you can use in conjunction with other functionality like estimators.Deploy your ML model in a variety ofenvironments and languagesA core strength of TensorFlow has always been the ability to deploy models into production. In TensorFlow 2.0, the TensorFlow team is making it even easier. TFX Pipelines give you the ability to coordinate how you serve your trained models for inference at runtime, whether on a single instance, or across an entire cluster. Meanwhile, for more resource-constrained systems, like mobile or IoT devices and embedded hardware, you can easily quantize your models to run with TensorFlow Lite. Airbnb, Shazam, and the BBC are all using TensorFlow Lite to enhance their mobile experiences, and to validate as well as classify user-uploaded content.Exploring and analyzing data with TensorFlow Data Validation.JavaScript is one of the world’s most popular programming languages, and TensorFlow.js helps make ML available to millions of JavaScript developers. The TensorFlow team announced TensorFlow.js version 1.0. This integration means you can not only train and run models in the browser, but also run TensorFlow as a part of server-side hosted JavaScript apps, including on App Engine. TensorFlow.js now has better performance than ever, and its community has grown substantially: in the year since its initial launch, community members have downloaded TensorFlow.js over 300,000 times, and its repository now incorporates code from over 100 contributors.How to get startedIf you’re eager to get started with TensorFlow 2.0 alpha on Google Cloud, start up a Deep Learning VM and try out some of the tutorials. TensorFlow 2.0 is available through Colab via pip install, if you’re just looking to run a notebook anywhere, but perhaps more importantly, you can also run a Jupyter instance on Google Cloud using a Cloud Dataproc Cluster, or launch notebooks directly from Cloud ML Engine, all from within your GCP project.Using TensorFlow 2.0 with a Deep Learning VM and GCP Notebook Instances.Along with announcing the alpha release of TensorFlow 2.0, we also announced new community and education partnerships. In collaboration with O’Reilly Media, we’re hosting TensorFlow World, a week-long conference dedicated to fostering and bringing together the open source community and all things TensorFlow. Call for proposals is open for attendees to submit papers and projects to be highlighted at the event. Finally, we announced two new courses to help beginners and learners new to ML and TensorFlow. The first course is’s Course 1 - Introduction to TensorFlow for AI, ML and DL, part of the TensorFlow: from Basic to Mastery series. The second course is Udacity’s Intro to TensorFlow for Deep Learning.If you’re using TensorFlow 2.0 on Google Cloud, we want to hear about it! Make sure to join our Testing special interest group, submit your project abstracts to TensorFlow World, and share your projects in our #PoweredByTF Challenge on DevPost. To quickly get up to speed on TensorFlow, be sure to check out our free courses on Udacity and Read more »
  • NVIDIA’s RAPIDS joins our set of Deep Learning VM images for faster data science
    If you’re a data scientist, researcher, engineer, or developer, you may be familiar with Google Cloud’s set of Deep Learning Virtual Machine (VM) images, which enable the one-click setup machine learning-focused development environments. But some data scientists still use a combination of pandas, Dask, scikit-learn, and Spark on traditional CPU-based instances. If you’d like to speed up your end-to-end pipeline through scale, Google Cloud’s Deep Learning VMs now include an experimental image with RAPIDS, NVIDIA’s open source and Python-based GPU-accelerated data processing and machine learning libraries that are a key part of NVIDIA’s larger collection of CUDA-X AI accelerated software. CUDA-X AI is the collection of NVIDIA's GPU acceleration libraries to accelerate deep learning, machine learning, and data analysis.The Deep Learning VM Images comprise a set of Debian 9-based Compute Engine virtual machine disk images optimized for data science and machine learning tasks. All images include common machine learning (typically deep learning, specifically) frameworks and tools installed from first boot, and can be used out of the box on instances with GPUs to accelerate your data processing tasks. In this blog post you’ll learn to use a Deep Learning VM which includes GPU-accelerated RAPIDS libraries.RAPIDS is an open-source suite of data processing and machine learning libraries developed by NVIDIA that enables GPU-acceleration for data science workflows. RAPIDS relies on NVIDIA’s CUDA language allowing users to leverage GPU processing and high-bandwidth GPU memory through user-friendly Python interfaces. It includes the DataFrame API based on Apache Arrow data structures called cuDF, which will be familiar to users of pandas. It also includes cuML, a growing library of GPU-accelerated ML algorithms that will be familiar to users of scikit-learn. Together, these libraries provide an accelerated solution for ML practitioners to use requiring only  minimal code changes and no new tools to learn.RAPIDS is available as a conda or pip package, in a Docker image, and as source code.Using the RAPIDS Google Cloud Deep Learning VM image automatically initializes a Compute Engine instance with all the pre-installed packages required to run RAPIDS. No extra steps required!Creating a new RAPIDS virtual machine instanceCompute Engine offers predefined machine types that you can use when you create an instance. Each predefined machine type includes a preset number of vCPUs and amount of memory, and bills you at a fixed rate, as described on the pricing page.If predefined machine types do not meet your needs, you can create an instance with a custom virtualized hardware configuration. Specifically, you can create an instance with a custom number of vCPUs and amount of memory, effectively using a custom machine type.In this case, we’ll create a custom Deep Learning VM image with 48 vCPUs, extended memory of 384 GB, 4 NVIDIA Tesla T4 GPUs and RAPIDS support.Notes:You can create this instance in any available zone that supports T4 GPUs.The option install-nvidia-driver=True Installs NVIDIA GPU driver automatically.The option proxy-mode=project_editors makes the VM visible in the Notebook Instances section.To define extended memory, use 1024*X where X is the number of GB required for RAM.Using RAPIDSTo put the RAPIDS through its paces on Google Cloud Platform (GCP), we focused on a common HPC workload: a parallel sum reduction test. This test can operate on very large problems (the default size is 2TB) using distributed memory and parallel task processing.There are several applications that require the computation of parallel sum reductions in high performance computing (HPC). Some examples include:Solving linear recurrencesEvaluation of polynomialsRandom number generationSequence alignmentN-body simulationIt turns out that parallel sum reduction is useful for the data science community at large. To manage the deluge of big data, a parallel programming model called “MapReduce” is used for processing data using distributed clusters. The “Map” portion of this model supports sorting: for example, sorting products into queues. Once the model maps the data, it then summarizes the output with the “Reduce” algorithm—for example, count the number of products in each queue. A summation operation is the most compute-heavy step, and given the scale of data that the model is processing, these sum operations must be carried out using parallel distributed clusters in order to complete in a reasonable amount of time.But certain reduction sum operations contain dependencies that inhibit parallelization. To illustrate such a dependency, suppose we want to add a series of numbers as shown in Figure 1.From the figure 1 on the left, we must first add 7 + 6 to obtain 13, before we can add 13 + 14 to obtain 27 and so on in a sequential fashion. These dependencies inhibit parallelization. However, since addition is associative, the summation can be expressed as a tree (figure 2 on the right). The benefit of this tree representation is that the dependency chain is shallow, and since the root node summarizes its leaves, this calculation can be split into independent tasks.Speaking of tasks, this brings us to the Python package Dask, a popular distributed computing framework. With Dask, data scientists and researchers can use Python to express their problems as tasks. Dask then distributes these tasks across processing elements within a single system, or across a cluster of systems. The RAPIDS team recently integrated GPU support into a package called dask-cuda. When you import both dask-cuda and another package called CuPY, which allows data to be allocated on GPUs using familiar numpy constructs, you can really explore the full breadths of models you can build with your data set. To illustrate, shown in Figures 3 and 4 show side-by-side comparisons of the same test run. On the left, 48 cores of a single system are used to process 2 terabytes (TB) of randomly initialized data using 48 Dask workers. On the right, 4 Dask workers process the same 2 TB of data, but dask-cuda is used to automatically associate those workers with 4 Tesla T4 GPUs installed in the same system.Running RAPIDSTo test parallel sum-reduction, perform the following steps:1. SSH into the instance. See Connecting to Instances for more details.2. Download the code required from this repository and upload it to your Deep Learning Virtual Machine Compute Engine instance. These two files are of particular importance as you profile helper bash shell summation Python scriptYou can find the sample code to run these tests, based on this blog, GPU Dask Arrays, below.3. Run the tests:Run test on the instance’s CPU complex, in this case specifying 48 vCPUs (indicated by the -c flag):Now, run the test using 4 (indicated by the -g flag) NVIDIA Tesla T4 GPUs:Figure 3.c: CPU-based solution. Figure 4.d: GPU-based solutionHere are some initial conclusions we derived from these tests:Processing 2 TB of data on GPUs is much faster (an ~12x speed-up for this test)Using Dask’s dashboard, you can visualize the performance of the reduction sum as it is executingCPU cores are fully occupied during processing on CPUs, but the GPUs are not fully utilizedYou can also run this test in a distributed environmentIn this example, we allocate Python arrays using the double data type by default. Since this code allocates an array size of (500K x 500K) elements, this represents 2 TB  (500K × 500K × 8 bytes / word). Dask initializes these array elements randomly via normal Gaussian distribution using the dask.array package.Running RAPIDS on a distributed clusterYou can also run RAPIDS in a distributed environment using multiple Compute Engine instances. You can use the same code to run RAPIDS in a distributed way with minimal modification and still decrease the processing time. If you want to explore RAPIDS in a distributed environment please follow the complete guide here.ConclusionAs you can see from the above example, the RAPIDS VM Image can dramatically speed up your ML workflows. Running RAPIDS with Dask lets you seamlessly integrate your data science environment with Python and its myriad libraries and wheels, HPC schedulers such as SLURM, PBS, SGE, and LSF, and open-source infrastructure orchestration projects such as Kubernetes and YARN. Dask also helps you develop your model once, and adaptably run it on either a single system, or scale it out across a cluster. You can then dynamically adjust your resource usage based on computational demands. Lastly, Dask helps you ensure that you’re maximizing uptime, through fault tolerance capabilities intrinsic in failover-capable cluster computing.It’s also easy to deploy on Google’s Compute Engine distributed environment. If you’re eager to learn more, check out the RAPIDS project and open-source community website, or review the RAPIDS VM Image documentation.Acknowledgements: Ty McKercher, NVIDIA, Principal Solution Architect; Vartika Singh, NVIDIA, Solution Architect; Gonzalo Gasca Meza, Google, Developer Programs Engineer; Viacheslav Kovalevskyi, Google, Software Engineer Read more »
  • Cloud AI helps you train and serve TensorFlow TFX pipelines seamlessly and at scale
    Last week, at the TensorFlow Dev Summit, the TensorFlow team released new and updated components that integrate into the open source TFX Platform (TensorFlow eXtended). TFX components are a subset of the tools used inside Google to power hundreds of teams’ wide-ranging machine learning applications. They address critical challenges to successful deployment of machine learning (ML) applications in production, such as:The prevention of training-versus-serving skewInput data validation and quality checksVisualization of model performance on multiple slices of dataA TFX pipeline is a sequence of components that implements an ML pipeline that is specifically designed for scalable, high-performance machine learning tasks. TFX pipelines support modeling, training, serving/inference, and managing deployments to online, native mobile, and even JavaScript targets.In this post, we‘ll explain how Google Cloud customers can use the TFX platform for their own ML applications, and deploy them at scale.Cloud Dataflow as a serverless autoscaling execution engine for (Apache Beam-based) TFX componentsThe TensorFlow team authored TFX components using Apache Beam for distributed processing. You can run Beam natively on Google Cloud with Cloud Dataflow, a seamless autoscaling runtime that gives you access to large amounts of compute capability on-demand. Beam can also run in many other execution environments, including Apache Flink, both on-premises and in multi-cloud mode. When you run Beam pipelines on Cloud Dataflow—the execution environment they were designed for—you can access advanced optimization features such as Dataflow Shuffle that groups and joins datasets larger than 200 terabytes. The same team that designed and built MapReduce and Google Flume also created third-generation data runtime innovations like dynamic work rebalancing, batch and streaming unification, and runner-agnostic abstractions that exist today in Apache Beam.Kubeflow Pipelines makes it easy to author, deploy, and manage TFX workflowsKubeflow Pipelines, part of the popular Kubeflow open source project, helps you author, deploy and manage TFX workflows on Google Cloud. You can easily deploy Kubeflow on Google Kubernetes Engine (GKE), via the 1-click deploy process. It automatically configures and runs essential backend services, such as the orchestration service for workflows, and optionally the metadata backend that tracks information relevant to workflow runs and the corresponding artifacts that are consumed and produced. GKE provides essential enterprise capabilities for access control and security, as well as tooling for monitoring and metering.Thus, Google Cloud makes it easy for you to execute TFX workflows at considerable scale using:Distributed model training and scalable model serving on Cloud ML EngineTFX component execution at scale on Cloud DataflowWorkflow and metadata orchestration and management with Kubeflow Pipelines on GKEFigure 1: TFX workflow running in Kubeflow PipelinesThe Kubeflow Pipelines UI shown in the above diagram makes it easy to visualize and track all executions. For deeper analysis of the metadata about component runs and artifacts, you can host a Jupyter notebook in the Kubeflow cluster, and query the metadata backend directly. You can refer to this sample notebook for more details.At Google Cloud, we work to empower our customers with the same set of tools and technologies that we use internally across many Google businesses to build sophisticated ML workflows. To learn more about using TFX, please check out the TFX user guide, or learn how to integrate TFX pipelines into your existing Apache Beam workflows in this video.Acknowledgments:Sam McVeety, Clemens Mewald, and Ajay Gopinathan also contributed to this post. Read more »
  • New study: The state of AI in the enterprise
    Editor’s note: Today we hear from one of our Premier partners, Deloitte.  Deloitte’s recent report, The State of AI in the Enterprise, 2nd Edition, examines how businesses are thinking about—and deploying—AI services.From consumer products to financial services, AI is transforming the global business landscape. In 2017, we began our relationship with Google Cloud to help our joint customers deploy and scale AI applications for their businesses. These customers frequently tell us they’re seeing steady returns on their investments in AI, and as a result, they’re interested in more ways to increase those investments.We regularly conduct research on the broader market trends for AI, and in November of 2018, we released our second annual “State of AI in the Enterprise” study. It showed that industry trends at large reflect what we hear from our customers: the business community remains bullish on AI’s impact.In this blog post, we’ll examine some of the key takeaways from our survey of 1,100 IT and line-of-business executives and discuss how these findings are relevant to our customers.Enterprises are doubling down on AI—and seeing financial benefitsMore than 95 percent of respondents believe that AI will transform both their businesses and their industries. A majority of survey respondents have already made large-scale investments in AI, with 37 percent saying they have committed $5 million or more to AI-specific initiatives. Nearly two-thirds of respondents (63 percent) feel AI has completely upended the marketplace and they need to make large-scale investments to catch up with rivals—or even to open a narrow lead.A surprising 82 percent of our respondents told us they’ve already gained a financial return from their AI investments. But that return is not equal across industries. Technology, media, and telecom companies, along with professional services firms, have made the biggest investments and realized the highest returns. In contrast, the public sector and financial services, with lower investments, lag behind. With 88 percent of surveyed companies planning to increase AI spending in the coming year, there’s a significant opportunity to increase both revenue and cost savings across all industries. However, like past transformative technologies, selecting the right AI use cases will be key to recognizing near and long-term benefits.Enterprises are using a broad range of AI technologies, increasingly in the cloudOur findings show that enterprises are employing a wide variety of AI technologies. More than half of respondents say their businesses are using statistical machine learning (63 percent), robotic process automation (59 percent), or natural language processing and generation (53 percent). Just under half (49 percent) are still using expert or rule-based systems, and 34 percent are using deep learning.When asked how they were accessing these AI capabilities, 59 percent said they relied on enterprise software with AI capabilities (much of which is available in the cloud) and 49 percent said, “AI as a service” (again, presumably in the cloud). Forty-six percent, a surprisingly high number, said they were relying upon automated machine learning—a set of capabilities that are only available in the cloud. It’s clear, then, that the cloud is already having a major effect on AI use in these large enterprises.These trends suggest that public cloud providers can become the primary way businesses access AI services. As a result, we believe this could lower the cost of cloud services and enhance its capabilities at the same time. In fact, our research shows that AI technology companies are investing more R&D dollars into enhancing cloud native versions of AI systems. If this trend continues, it seems likely that enterprises seeking best-of-breed AI solutions will increasingly need to access them from cloud providers.There are still challenges to overcomeGiven the enthusiasm surrounding AI technologies, it is not surprising that organizations also need to supplement their investments in talent. Although 31 percent of respondents listed “lack of AI skills” as a top-three concern—below such issues as implementation, integration, and data—HR teams need to look beyond technology skills to understand their organization’s pain points and end goals. Companies should try to secure teams that bring a mix of business and technology experience to help fully realize their AI project potential.Our respondents also had concerns about AI-related risks. A little more than half are worried about cybersecurity issues around AI (51 percent), and are concerned about “making the wrong strategic decisions based on AI recommendations” (43 percent). Companies have also begun to recognize ethical risks from AI, the most common being “using AI to manipulate information and create falsehoods” (43 percent).In conclusionDespite some challenges, our study suggests that enterprises are enthusiastic about AI, have already seen value from their investments, and are committed to expanding those investments. Looking forward, we expect to see substantial growth in AI and its cloud-based implementations, and that businesses will increasingly turn to public cloud providers as their primary method of accessing them.Deloitte was proud to be named Google Cloud’sGlobal Services Partner of the Year for 2017–in part due to our joint investments in AI. To learn more about how we can help you accelerate your organization’s AI journey, contact used in this document, “Deloitte” means Deloitte Consulting LLP, a subsidiary of Deloitte LLP. Please see for a detailed description of our legal structure. Certain services may not be available to attest clients under the rules and regulations of public accounting. Read more »
  • Everyday AI: beyond spell check, how Google Docs is smart enough to correct grammar
    Written communication is at the heart of what drives businesses. Proposals, presentations, emails to colleagues—this all keeps work moving forward. This is why we’ve built features into G Suite to help you communicate effectively, like Smart Compose and Smart Reply, which use machine learning smarts to help you draft and respond to messages quickly. More recently, we’ve introduced machine translation techniques into Google Docs to flag grammatical errors within your documents as you draft them.If you’ve ever questioned whether to use “a” versus “an” in a sentence, or if you’re using the correct verb tense or preposition, you’re not alone. Grammar is nuanced and tricky, which makes it a great problem to solve with the help of artificial intelligence. Here’s a look at how we built grammar suggestions in Docs.The gray areas of grammarAlthough we generally think of grammar as a set of rules, these rules are often complex and subjective. In spelling, you can reference a resource that tells you whether a word exists or how it’s spelled: dictionaries (Remember those?).Grammar is different. It’s a harder problem to tackle because its rules aren’t fixed. It varies based on language and context, and may change over time, too. To make things more complicated, there are many different style books—whether it be MLA, AP or some other style—which makes consistency a challenge.Given these nuances, even the experts don’t always agree on what’s correct. For our grammar suggestions, we worked with professional linguists to proofread sample sentences to get a sense of the true subjectivity of grammar. During that process, we found that linguists disagreed on grammar about 25 percent of the time. This raised the obvious question: how do we automate something that doesn’t run on definitive rules?Where machine translation makes a markMuch like having someone red-line your document with suggestions on how to replace “incorrect” grammar with “correct” grammar, we can use machine translation technology to help automate that process. At a basic level, machine translation performs substitution and reorders words from a source language to a target language, for example, substituting a “source” word in English (“hello!”) for a “target” word in Spanish (¡hola!). Machine translation techniques have been developed and refined over the last two decades throughout the industry, in academia and at Google, and have even helped power Google Translate.Along similar lines, we use machine translation techniques to flag “incorrect” grammar within Docs using blue underlines, but instead of translating from one language to another like with Google Translate, we treat text with incorrect grammar as the “source” language and correct grammar as the “target.”Working with the expertsBefore we could train models, we needed to define “correct” and "incorrect” grammar. What better way to do so than to consult the experts? Our engineers worked with a collection of computational and analytical linguists, with specialties ranging from sociology to machine learning. This group supports a host of linguistic projects at Google and helps bridge the gap between how humans and machines process language (and not just in English—they support over 40 languages and counting).For several months, these linguists reviewed thousands of grammar samples to help us refine machine translation models, from classic cases like “there” versus “their” versus “they’re,” to more complex rules involving prepositions and verb tenses. Each sample received close attention—three linguists reviewed each case to identify common patterns and make corrections. The third linguist served as the “tie breaker” in case of disagreement (which happened a quarter of the time).Once we identified the samples, we then fed them into statistical learning algorithms—along with “correct” text gathered from high-quality web sources (billions of words!)—to help us predict outcomes using stats like the frequency at which we’ve seen a specific correction occur. This process helped us build a basic spelling and grammar correction model.We iterated over these models by rolling them out to a small portion of people who use Docs, and then refined them based on user feedback and interactions. For example, in earlier models of grammar suggestions, we received feedback that suggestions for verb tenses and the correct singular or plural form of a noun or verb were inaccurate. We’ve since adjusted the model to solve for these specific issues, resulting in more precise suggestions. Better grammar. No ifs, ands or buts.So if you’ve ever asked yourself “how does it know what to suggest when I write in Google Docs,” these grammar suggestion models are the answer.  They’re working in the background to analyze your sentence structure, and the semantics of your sentence, to help you find mistakes or inconsistencies. With the help of machine translation, here are some mistakes that Docs can help you catch:Evolving grammar suggestions, just like languageWhen it comes to grammar, we’re constantly improving the quality of each suggestion to make corrections as useful and relevant as possible. With our AI-first approach, G Suite is in the best position to help you communicate smarter and faster, without sweating the small stuff. Learn more. Read more »
  • Let Deep Learning VMs and Jupyter notebooks burn the midnight oil for you: robust and automated training with Papermill
    In the past several years, Jupyter notebooks have become a convenient way of experimenting with machine learning datasets and models, as well as sharing training processes with colleagues and collaborators. Often times your notebook will take a long time to complete its execution. An extended training session may cause you to incur charges even though you are no longer using Compute Engine resources.This post will explain how to execute a Jupyter Notebook in a simple and cost-efficient way.We’ll explain how to deploy a Deep Learning VM image using TensorFlow to launch a Jupyter notebook which will be executed using the Nteract Papermill open source project. Once the notebook has finished executing, the Compute Engine instance that hosts your Deep Learning VM image will automatically terminate.The components of our system:First, Jupyter NotebooksThe Jupyter Notebook is an open-source web-based, interactive environment for  creating and sharing IPython notebook (.ipynb) documents that contain live code, equations, visualizations and narrative text. This platform supports data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.Next, Deep Learning Virtual Machine (VM) imagesThe Deep Learning Virtual Machine images are a set of Debian 9-based Compute Engine virtual machine disk images that are optimized for data science and machine learning tasks. All images include common ML frameworks and tools installed from first boot, and can be used out of the box on instances with GPUs to accelerate your data processing tasks. You can launch Compute Engine instances pre-installed with popular ML frameworks like TensorFlow, PyTorch, or scikit-learn, and even add Cloud TPU and GPU support with a single click.And now, PapermillPapermill is a library for parametrizing, executing, and analyzing Jupyter Notebooks. It lets you spawn multiple notebooks with different parameter sets and execute them concurrently. Papermill can also help collect and summarize metrics from a collection of notebooks.Papermill also permits you to read or write data from many different locations. Thus, you can store your output notebook on a different storage system that provides higher durability and easy access in order to establish a reliable pipeline. Papermill recently added support for Google Cloud Storage buckets, and in this post we will show you how to put this new functionality to use.InstallationSubmit a Jupyter notebook for executionThe following command starts execution of a Jupyter notebook stored in a Cloud Storage bucket:The above commands do the following:Create a Compute Engine instance using TensorFlow Deep Learning VM and 2 NVIDIA Tesla T4 GPUsInstall the latest NVIDIA GPU driversExecute the notebook using PapermillUpload notebook result (with all the cells pre-computed) to Cloud Storage bucket in this case: “gs://my-bucket/”Terminate the Compute Engine instanceAnd there you have it! You’ll no longer pay for resources you don’t use since after execution completes, your notebook, with populated cells, is uploaded to the specified Cloud Storage bucket. You can read more about it in the Cloud Storage documentation.Note: In case you are not using a Deep Learning VM, and you want to install Papermill library with Cloud Storage support, you only need to run:Note: Papermill version 0.18.2 supports Cloud Storage.And here is an even simpler set of bash commands:Execute a notebook using GPU resourcesExecute a notebook using CPU resourcesThe Deep Learning VM instance requires several permissions: read and write ability to Cloud Storage, and the ability to delete instances on Compute Engine. That is why our original command has the scope “” defined.Your submission process will look like this:Note: Verify that you have enough CPU or GPU resources available by checking your quota in the zone where your instance will be deployed.Executing a Jupyter notebookLet’s look into the following code:This command is the standard way to create a Deep Learning VM. But keep in mind, you’ll need to pick the VM that includes the core dependencies you need to execute your notebook. Do not try to use a TensorFlow image if your notebook needs PyTorch or vice versa.Note: if you do not see a dependency that is required for your notebook and you think should be in the image, please let us know on the forum (or with a comment to this article).The secret sauce here contains two following things:Papermill libraryStartup shell scriptPapermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.Papermill lets you:Parameterize notebooks via command line arguments or a parameter file in YAML formatExecute and collect metrics across the notebooksSummarize collections of notebooksIn our case, we are just using its ability to execute notebooks and pass parameters if needed.Behind the scenesLet’s start with the startup shell script parameters:INPUT_NOTEBOOK_PATH: The input notebook located Cloud Storage bucket.Example: gs://my-bucket/input.ipynbOUTPUT_NOTEBOOK_PATH: The output notebook located Cloud Storage bucket.Example: gs://my-bucket/input.ipynb.PARAMETERS_FILE: Users can provide a YAML file where notebook parameter values should be read.Example: gs://my-bucket/params.yamlPARAMETERS: Pass parameters via -p key value for notebook execution.Example: -p batch_size 128 -p epochs 40.The two ways to execute the notebook with parameters are: (1) through the Python API and (2) through the command line interface. This sample script supports two different ways to pass parameters to Jupyter notebook, although Papermill supports other formats, so please consult Papermill’s documentation.The above script performs the following steps:Creates a Compute Engine instance using the TensorFlow Deep Learning VM and 2 NVIDIA Tesla T4 GPUsInstalls NVIDIA GPU driversExecutes the notebook using Papermill toolUploads notebook result (with all the cells pre-computed) to Cloud Storage bucket in this case: gs://my-bucket/Papermill emits a save after each cell executes, this could generate “429 Too Many Requests” errors, which are handled by the library itself.Terminates the Compute Engine instanceConclusionBy using the Deep Learning VM images, you can automate your notebook training, such that you no longer need to pay extra or manually manage your Cloud infrastructure. Take advantage of all the pre-installed ML software and Nteract’s Papermill project to help you solve your ML problems more quickly! Papermill will help you automate the execution of yourJupyter notebooks and in combination of Cloud Storage and Deep Learning VM images you can now set up this process in a very simple and cost efficient way. Read more »
  • Why so tense? Let grammar suggestions in Google Docs help you write even better
    If you're working against deadlines to create documents daily (how’s that for alliteration?), having correct grammar probably isn’t the first thing on your mind. And when it is, it seems there’s almost always a contested debate about what is correct (or “which?”). Even professional linguists have a hard time agreeing on grammatical suggestions—our own research found that one in four times linguists disagree on whether a suggestion is correct.We first introduced spell check in Google Docs to help folks catch errors seven years ago, and have since improved these features so that you can present your best work. Today we’re taking that a step further by using machine translation techniques to help you catch tricky grammatical errors, too, with grammar suggestions in Docs (which we first introduced at Google Cloud Next last year).G Suite Basic, Business, and Enterprise customers will start to see inline, contextual grammar suggestions in their documents as they type, just like spellcheck. If you’ve made a grammar mistake, a squiggly blue line will appear under the phrase as you write it. You can choose to accept the suggestion by right-clicking it.“Affect” versus “effect,” “there” versus “their,” or even more complicated rules like how to use prepositions correctly or the right verb tense, are examples of errors that grammar suggestions can help you catch. Because this technology is built right into in Docs, you don’t have to rely on third-party apps to do the work.How it worksWhen it comes to spelling, you can typically look up whether a word exists in the dictionary. Grammar is different. It’s a more complex set of rules that can vary based on the language, region, style and more. Because it’s subjective, it can be a harder problem to tackle using a fixed set of rules. To solve the problem, we use machine translation to build a model that can incorporate the complexity and nuances of grammar correction.Using machine translation, we are able to recognize errors and suggest corrections as work is getting done. We worked closely with linguists to decipher the rules for the machine translation model and used this as the foundation of automatic suggestions in your Docs, all powered by AI. Learn more about that process in detail in this blog post.In doing so, machine translation techniques can catch a range of different corrections, from simple grammatical rules such as how to use “a” versus “an” in a sentence, to more complex grammatical concepts such as how to use subordinate clauses correctly.Using artificial intelligence to make work easierGoogle’s machine intelligence helps individuals collaborate more efficiently everyday in G Suite. If you’ve ever assigned action items or used the Explore feature to search for relevant content to add to your Docs, you’ve tried the power of AI first hand.Happy writing! Read more »
  • Train fast on TPU, serve flexibly on GPU: switch your ML infrastructure to suit your needs
    When developing machine learning models, fast iteration and short training times are of utmost importance. In order for you or your data science team to reach higher levels of accuracy, you may need to run tens or hundreds of training iterations in order to explore different options.A growing number of organizations use Tensor Processing Units (Cloud TPUs) to train complex models due to their ability to reduce the training time from days to hours (roughly a 10X reduction) and the training costs from thousands of dollars to tens of dollars (roughly a 100X reduction). You can then deploy your trained models to CPUs, GPUs, or TPUs to make predictions at serving time. In some applications for which response latency is critical—e.g., robotics or self-driving cars—you might need to make additional optimizations. For example, many data scientists frequently use NVIDIA’s TensorRT to improve inference speed on GPUs. In this post, we walk through training and serving an object detection model and demonstrate how TensorFlow’s comprehensive and flexible feature set can be used to perform each step, regardless of which hardware platform you choose.A TensorFlow model consists of many operations (ops) that are responsible for training and making predictions, for example, telling us whether a person is crossing the street. Most of TensorFlow ops are platform-agnostic and can run on CPU, GPU, or TPU. In fact, if you implement your model using TPUEstimator, you can run it on a Cloud TPU by just setting the use_tpu flag to True, and run it on a CPU or GPU by setting the flag to False.NVIDIA has developed TensorRT (an inference optimization library) for high-performance inference on GPUs. TensorFlow (TF) now includes TensorRT integration (TF-TRT) module that can convert TensorFlow ops in your model to TensorRT ops. With this integration, you can train your model on TPUs and then use TF-TRT to convert the trained model to a GPU-optimized one for serving. In the following example we will train a state-of-the-art object detection model, RetinaNet, on a Cloud TPU, convert it to a TensorRT-optimized version, and run predictions on a GPU.Train and save a modelYou can use the following instructions for any TPU model, but in this guide, we choose as our example the TensorFlow TPU RetinaNet model. Accordingly, you can start by following this tutorial to train a RetinaNet model on Cloud TPU. Feel free to skip the section titled "Evaluate the model while you train (optional)"1.For the RetinaNet model that you just trained, if you look inside the model directory (${MODEL_DIR} in the tutorial) in Cloud Storage you’ll see multiple model checkpoints. Note that checkpoints may be dependent on the architecture used to train a model and are not suitable for porting the model to a different architecture.TensorFlow offers another model format, SavedModel, that you can use to save and restore your model independent of the code that generated it. A SavedModel is language-neutral and contains everything you need (graph, variables, and metadata) to port your model from TPU to GPU or CPU.Inside the model directory, you should find a timestamped subdirectory (in Unix epoch time format, for example, 1546300800 for 2019-01-01-12:00:00 GMT) that contains the exported SavedModel. Specifically, your subdirectory contains the following files:saved_model.pbvariables/ training script stores your model graph as saved_model.pb in a protocol buffer (protobuf) format, and stores in the variables in the aptly named variables subdirectory. Generating a SavedModel involves two steps—first, to define a serving_input_receiver_fn and then, to export a SavedModel.At serving time, the serving input receiver function ingests inference requests and prepares them for the model, just as at training time the input function input_fn ingests the training data and prepares them for the model. In the case of RetinaNet, the following code defines the serving input receiver function:The  serving_input_receiver_fn returns a tf.estimator.export.ServingInputReceiver object that takes the inference requests as arguments in the form of receiver_tensors and the features used by model as features. When the script returns a ServingInputReceiver, it’s telling TensorFlow everything it needs to know in order to construct a server. The features arguments describe the features that will be fed to our model. In this case, features is simply the set of images to run our detector on. receiver_tensors specifies the inputs to our server. Since we want our server to take JPEG encoded images, there will be a tf.placeholder for an array of strings. We decode each string into an image, crop it to the correct size and return the resulting image tensor.To export a SavedModel, call the export_saved_model method on your estimator shown in the following code snippet:Running export_saved_model generates a `SavedModel` directory in your FLAGS.model_dir directory. The SavedModel exported from TPUEstimator contains information on how to serve your model on CPU, GPU and TPU architectures.InferenceYou can take the SavedModel that you trained on a TPU and load it on CPU(s), GPU(s) or TPU(s), to run predictions. The following lines of code restore the model and run inference.model_dir is your model directory where the SavedModel is stored. loader.load returns a MetaGraphDef protocol buffer loaded in the provided session. model_outputs is the list of model outputs you’d like to predict, model_input is the name of the placeholder that receives the input data, and input_image_batch is the input data directory2.With TensorFlow, you can very easily train and save a model on one platform (like TPU) and load and serve it on another platform (like GPU or CPU). You can choose from different Google Cloud Platform services such as Cloud Machine Learning Engine, Kubernetes Engine, or Compute Engine to serve your models. In the remainder of this post you’ll learn how to optimize the SavedModel using TF-TRT, which is a common process if you plan to serve your model on one or more GPUs.TensorRT optimizationWhile you can use the SavedModel exported earlier to serve predictions on GPUs directly, NVIDIA’s TensorRT allows you to get improved performance from your model by using some advanced GPU features. To use TensorRT, you’ll need a virtual machine (VM) with a GPU and NVIDIA drivers. Google Cloud’s Deep Learning VMs are ideal for this case, because they have everything you need pre-installed.Follow these instructions to create a Deep Learning VM instance with one or more GPUs on Compute Engine. Select the checkbox "Install NVIDIA GPU driver automatically on first startup?" and choose a "Framework" (for example, "Intel optimized TensorFlow 1.12" at the time of writing this post) that comes with the most recent version of CUDA and TensorRT that satisfy the dependencies for the TensorFlow with GPU support and TF-TRT modules. After your VM is initialized and booted, you can remotely log into it by clicking the SSH button next to its name on the Compute Engine page on Cloud Console or using the gcloud compute ssh command. Install the dependencies (recent versions of TensorFlow include TF-TRT by default) and clone the TensorFlow TPU GitHub repository3.Now run tpu/models/official/retinanet/ and provide the location of the SavedModel as an argument:In the preceding code snippet, SAVED_MODEL_DIR is the path where SavedModel is stored (on Cloud Storage or local disk). This step converts the original SavedModel to a new GPU optimized SavedModel and prints out the prediction latency for the two models.If you look inside the model directory you can see that has converted the original SavedModel to a TensorRT-optimized SavedModel and stored it in a new folder ending in _trt. This step was done using the command.In the new SavedModel, the TensorFlow ops have been replaced by their GPU-optimized TensorRT implementations. During conversion, the script converts all variables to constants, and writes out to saved_model.pb, and therefore the variables folder is empty. TF-TRT module has an implementation for the majority of TensorFlow ops. For some ops, such as control flow ops such as Enter, Exit, Merge, and Switch, there are no TRT implementation, therefore they stay unchanged in the new SavedModel, but their effect on prediction latency is negligible.Another method to convert the SavedModel to its TensorRT inference graph is to use the saved_model_cli tool using the following command:In the preceding command MY_DIR is the shared filesystem directory and SAVED_MODEL_DIR is the directory inside the shared filesystem directory where the SavedModel is also loads and runs two models before and after conversion and prints the prediction latency. As we expect, the converted model has lower latency. Note that for inference, the first prediction often takes longer than subsequent predictions. This is due to startup overhead and for TPUs, the time taken to compile the TPU program via XLA. In our example, we skip the time taken by the first inference step, and average the remaining steps from the second iteration onwards.You can apply these steps to other models to easily port them to a different architecture, and optimize their performance. The TensorFlow and TPU GitHub repositories contain a diverse collection of different models that you can try out for your application including another state of the art object detection model, Mask R-CNN. If you’re interested in trying out TPUs, to see what they can offer you in terms of training and serving times, try this Colab and quickstart.1. You can skip the training step altogether by using the pre-trained checkpoints which are stored in Cloud Storage under gs://cloud-tpu-checkpoints/retinanet-model.2. Use loader.load(sess, [tag_constants.SERVING], saved_model_dir).signature_def to load the model and return the signature_def which contains the model input(s) and output(s). sess is the Session object here.3. Alternatively you can use a Docker image with a recent version of TensorFlow and the dependencies. Read more »
  • Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform
    Editor’s note: Many organizations depend on real-time data streams from a fleet of remote devices, and would benefit tremendously from machine learning-derived, automated insights based on that real-time data. Founded by the team that built Apache Kafka, Confluent offers a a streaming platform to help companies easily access data as real-time streams. Today, Confluent’s Kai Waehner describes an example describing a fleet of connected vehicles, represented by Internet of Things (IoT) devices, to explain how you can leverage the open source ecosystems of Apache Kafka and TensorFlow on Google Cloud Platform and in concert with different Google machine learning (ML) services.Imagine a global automotive company with a strategic initiative for digital transformation to improve customer experience, increase revenue, and reduce risk. Here is the initial project plan:The main goal of this transformation plan is to improve existing business processes, rather than to create new services. Therefore, cutting-edge ML use cases like sentiment analysis using Recurrent Neural Networks (RNN) or object detection (e.g. for self-driving cars) using Convolutional Neural Networks (CNN) are out of scope and covered by other teams with longer-term mandates.Instead, the goal of this initiative is to analyze and act on critical business events by improving existing business processes in the short term, meaning months, not years, to achieve some quick wins with machine learning:All these business processes are already in place, and the company depends on them. Our goal is to leverage ML to improve these processes in the near term. For example, payment fraud is a consistent problem in online platforms, and our automotive company can use a variety of data sources to successfully analyze and help identify fraud in this context. In this post, we’ll explain how the company can leverage an analytic model for continuous stream processing in real-time, and use IoT infrastructure to detect payment fraud and alert them in the case of risk.Building a scalable, mission-critical, and flexible ML infrastructureBut before we can do that, let’s talk about the infrastructure needed for this project.If you’ve spent some time with TensorFlow tutorials or its most popular wrapper framework, Keras, which is typically even easier to use, you might not think that building and deploying models is all that challenging. Today, a data scientist can build an analytic model with only a few lines of Python code that run predictions on new data with very good accuracy.However, data preparation and feature engineering can consume most of a data scientist’s time. This idea may seem to contradict what you experience when you follow tutorials, because these efforts are already completed by the tutorial’s designer. Unfortunately, there is a hidden technical debt inherent in typical machine learning systems:You can read an in-depth analysis of the hidden technical debt in ML systems here.Thus, we need to ask the fundamental question that addresses how you’ll add real business value to your big data initiatives: how can you build a scalable infrastructure for your analytics models? How will you preprocess and monitor incoming data feeds? How will you deploy the models in production, on real-time data streams, at scale, and with zero downtime?Many larger technology companies faced these challenges some years before the rest of the industry. Accordingly, they have already implemented their own solutions to many of these challenges. For example, consider:Netflix’ Meson: a scalable recommendation engineUber’s Michelangelo: a platform and technology independent ML frameworkPayPal’s real time ML pipeline for fraud detectionAll of these projects use Apache Kafka as their streaming platform. This blog post explains how to solve the above described challenges for your own use cases leveraging the open source ecosystem of Apache Kafka and a number of services on Google Cloud Platform (GCP).Apache Kafka: the rise of a streaming platformYou may already be familiar with Apache Kafka, a hugely successful open source project created at LinkedIn for big data log analytics. But today, this is just one of its many use cases. Kafka evolved from a data ingestion layer to a feature-rich event streaming platform for all the use cases discussed above. These days, many enterprise data-focused projects build mission-critical applications around Kafka. As such, it has to be available and responsive, round the clock. If Kafka is down, their business processes stop working.The practicality of keeping messaging, storage, and processing in one distributed, scalable, fault-tolerant, high volume, technology-independent streaming platform is the primary reason for the global success of Apache Kafka in many large enterprises, regardless of industry. For example, LinkedIn processes over 4.5 trillion messages per day1 and Netflix handles over 6 petabytes of data on peak days2.Apache Kafka also enjoys a robust open source ecosystem. Let’s look at its components:Kafka Connect is an integration framework for connecting external sources / destinations into Kafka.Kafka Streams is a simple library that enables streaming application development within the Kafka framework. There are also additional Clients available for non-Java programming languages, including C, C++, Python, .NET, Go, and several others.The REST Proxy provides universal access to Kafka from any network connected device via HTTP.The Schema Registry is a central registry for the format of Kafka data—it guarantees that all data is in the proper format and can survive a schema evolution. As such, the Registry guarantees that the data is always consumable.KSQL is a streaming SQL engine that enables stream processing against Apache Kafka without writing source code.All these open source components build on Apache Kafka’s core messaging and storage layers, leveraging its high scalability, high volume and throughput, and failover capabilities. Then, if you need coverage for your Kafka deployment, we here at Confluent offer round-the-clock support and enterprise tooling for end-to-end monitoring, management of Kafka clusters, multi-data center replication, and more, with Confluent Cloud on GCP. This  Kafka ecosystem as a fully managed service includes a 99.95% service level agreement (SLA), guaranteed throughput and latency, and commercial support, while out-of-the-box integration with GCP services like Cloud Storage enable you to build out your scalable, mission-critical ML infrastructure.Apache Kafka’s open-source ecosystem as infrastructure for Machine LearningThe following picture shows an architecture for your ML infrastructure leveraging Confluent Cloud for data ingestion, model training, deployment, and monitoring:Now, with that background, we’re ready to build scalable, mission-critical ML infrastructure. Where do we start?Replicating IoT data from on-premises data centers to Google CloudThe first step is to ingest the data from the remote end devices. In the case of our automotive company, the data is already stored and processed in local data centers in different regions. This happens by streaming all sensor data from the cars via MQTT to local Kafka Clusters that leverage Confluent’s MQTT Proxy. This integration from devices to a local Kafka cluster typically is its own standalone project, because you need to handle IoT-specific challenges like constrained devices and unreliable networks. The integration can be implemented with different technologies, including low-level clients in C for microcontrollers, a REST Proxy for HTTP(S) communication, or an integration framework like Kafka Connect or MQTT Proxy. All of these components integrate natively with the local Kafka cluster so that you can leverage Kafka's features like high scalability, fault-tolerance and high throughput.The data from the different local clusters then needs to be replicated to a central Kafka Cluster in GCP for further processing and to train analytics models:Confluent Replicator is a tool based on Kafka Connect that replicates the data in a scalable and reliable way from any source Kafka cluster—regardless of whether it lives on premise or in the cloud—to the Confluent Cloud on GCP.GCP also offers scalable IoT infrastructure. If you want to ingest MQTT data directly into Cloud Pub/Sub from devices, you can also use GCP’s MQTT Bridge. Google provides open-source Kafka Connect connectors to get data from Cloud Pub/Sub into Kafka and Confluent Cloud so that you can make the most of KSQL with both first- and third-party logging integration.Data preprocessing with KSQLThe next step is to preprocess your data at scale. You likely want to do this in a reusable way, so that you can ingest the data into other pipelines, and to preprocess the real-time feeds for predictions in the same way once you’ve deployed the trained model.Our automotive company leverages KSQL, the open source streaming SQL engine for Apache Kafka, to do filtering, transformation, removal of personally identifiable information (PII), and feature extraction:This results in several tangible benefits:High throughput and scalability, failover, reliability, and infrastructure-independence, thanks to the core Kafka infrastructurePreprocessing data at scale with no codeUse SQL statements for interactive analysis and at-scale deployment to productionLeveraging Python using KSQL’s REST interfaceReusing preprocessed data for later deployment, even at the edge (outside of the cloud, possibly on embedded systems)Here’s what a continuous query looks like:You can then deploy this stream to one or more KSQL server instances to process all incoming sensor data in a continuous manner.Data ingestion with Kafka ConnectAfter preprocessing the data, you need to ingest it into a data store to train your models. Ideally, you should format and store in a flexible way, so that you can use it with multiple ML solutions and processes. But for today, the automotive company focuses on using TensorFlow to build neural networks that perform anomaly detection with autoencoders as a first use case. They use Cloud Storage as scalable, long-term data store for the historical data needed to train the models.In the future, the automotive company also plans to build other kinds of models using open source technologies like or for algorithms beyond neural networks. Deep Learning with TensorFlow is helpful, but it doesn’t fit every use case. In other scenarios, a random forest tree, clustering, or naïve Bayesian learning is much more appropriate due to simplicity, interpretability, or computing time.In other cases, you might be able to reduce efforts and costs a lot by using prebuilt and managed analytic models in Google’s API services like Cloud Vision for image recognition, Cloud Translate for translation between languages, or Cloud Text-to-Speech for speech synthesis. Or if you need to build custom models, Cloud AutoML might be the ideal solution to easily build out your deployment without the need for a data scientist.You can then use Kafka Connect as your ingestion layer because it provides several benefits:Kafka’s core infrastructure advantages: high throughput and scalability, fail-over, reliability, and infrastructure-independenceOut-of-the-box connectivity to various sources and sinks for different analytics and non-analytics use cases (for example, Cloud Storage, BigQuery, Elasticsearch, HDFS, MQTT)A set of out-of-the-box integration features, called Simple Message Transformation (SMT), for data (message) enrichment, format conversion, filtering, routing, and error-handlingModel training with Cloud ML Engine and TensorFlowAfter you’ve ingested your historical data into Cloud Storage, you’re now able to train your models at extreme scale using TensorFlow and TPUs on Google ML Engine. One major benefit of running your workload on a public cloud is that you can use powerful hardware in a flexible way. Spin it up for training and stop it when finished. The pay-as-you-go principle allows you to use cutting-edge hardware while still controlling your costs.In the case of our automotive company, it needs to train and deploy custom neural networks that include domain-specific knowledge and experience. Thus, they cannot use managed, pre-fabricated ML APIs or Cloud AutoML here. Cloud ML Engine provides powerful API and an easy-to-use web UI to train and evaluate different models:Although Cloud ML Engine supports other frameworks, TensorFlow is a great choice because it is open source and highly scalable, features out-of-the-box integration with GCP, offers a variety of tools (like TensorBoard for Keras), and has grown a sizable community.Replayability with Apache Kafka: a log never forgetsWith Apache Kafka as the streaming platform in your machine learning infrastructure, you can easily:Train different models on the same dataTry out different ML frameworksLeverage Cloud AutoML if and where appropriateDo A/B testing to evaluate different modelsThe architecture lets you leverage other frameworks besides TensorFlow later—if appropriate. Apache Kafka allows you to replay the data again and again over time to train different analytic models with the same dataset:In the above example, using TensorFlow, you can train multiple alternative models on historical data stored in Cloud Storage. In the future, you might want or need to use other machine learning techniques. For example, if you want to offer AutoML services to less experienced data scientists, you might train Google AutoML on Cloud Storage, or experiment with alternative, third party AutoML solutions like DataRobot or H2O Driverless, which leverage HDFS as storage on Cloud Dataproc, a managed service for Apache Hadoop and Spark.Alternative methods for model deployment and serving (inference)The automotive company is now ready to deploy its first models to do real-time predictions at scale. Two alternatives exist for model deployment:Option 1: RPC communication for model inference on your model serverCloud ML Engine allows you to deploy your trained models directly to a model server (based on TensorFlow Serving).Pros of using a model server:Simple integration with existing technologies and organizational processesEasier to understand if you come from the non-streaming (batch) worldAbility to migrate to true streaming down the roadModel management built-in for different models, versioning and A/B testingOption 2: Integrate model inference natively into your streaming applicationHere are some challenges you might encounter as you deploy your model natively in your streaming application:Worse latency: classification requires a remote call instead of local inferenceNo offline inference: on a remote or edge device, you might have limited or no connectivityCoupling the availability, scalability, and latency/throughput of your Kafka Streams application with the SLAs of the RPC interfaceOutliers or externalities (e.g., in case of failure) not covered by Kafka processingFor each use case, you have to assess the trade-offs and decide whether you want to deploy your model in a model server or natively in the application.Deployment and scalability of your client applicationsConfluent Cloud running in conjunction with GCP services ensures high availability and scalability for the machine learning infrastructure described above. You won’t need to worry about operations, just use the components to build your analytic models. However, what about the deployment and dynamic scalability of the Kafka clients, which use the analytic models to do predictions on new incoming events in real-time?You can write these clients using any programming language like (Java, Scala, .NET, Go, Python, JavaScript), Confluent REST Proxy, Kafka Streams or KSQL applications. Unlike on a Kafka server, clients need to scale dynamically to accommodate the load. Whichever option you choose for writing your Kafka clients, Kubernetes is a more and more widely adopted solution that handles deployment, dynamic scaling, and failover. Although it would be out of scope to introduce Kubernetes in this post, Google Kubernetes Engine Quickstart Guide can help you set up your own Kubernetes cluster on GCP in minutes. If you need to learn more details about the container orchestration engine itself, Kubernetes’ official website is a good starting point.The need for local data processing and model inferenceIf you’ve deployed analytics models on Google Cloud, you’ll have noticed that the service (and by extension, GCP) takes over most of the burden of deployment and operations. Unfortunately, migrating to the cloud is not always possible due to legal, compliance, security, or more technical reasons.Our automotive company is ready to use the models it built for predictions, but all the personally identifiable information (PII) data needs to be processed in its local data center. However, this demand creates a challenge, because the architecture (and some future planned integrations) would be simpler if everything were to run within one public cloud.Self-managed on-premise deployment for model serving and monitoring with KubernetesOn premises, you do not get all the advantages of GCP and Confluent Cloud—you need to operate the Apache Kafka cluster and its clients yourself.What about scaling brokers, external clients, persistent volumes, failover, and rolling upgrades? Confluent Operator takes over the challenge of operating Kafka and its ecosystem on Kubernetes, with automated provisioning, scaling, fail-over, partition rebalancing, rolling updates, and monitoring.For your clients, you face the same challenges as if you deploy in the cloud. What about dynamic load-balancing, scaling, and failover? In addition, if you use a model server on premise, you also need to manage its operations and scaling yourself.Kubernetes is an appropriate solution to solve these problems in an on-premises deployment. Using it both on-premises and on Google Cloud allows you to re-use past lessons learned and ongoing best practices.Confluent schema registry for message validation and data governanceHow can we ensure that every team in every data center gets the data they’re looking for, and that it’s consistent across the entire system?Kafka’s core infrastructure advantages: high throughput and scalability, fail-over, reliability, and infrastructure-independenceSchema definition and updatesForward- and backward-compatibilityMulti-region deploymentA mission-critical application: a payment fraud detection systemLet’s begin by reviewing the implementation of our first use case in more detail, including some code examples. We now plan to analyze historical data about payments for digital car services (perhaps for a car’s mobile entertainment system or paying for fuel at a gas station) to spot anomalies indicating possible fraudulent behavior. The model training happens in GCP, including preprocessing that anonymizes private user data. After building a good analytic model in the cloud, you can deploy it at the edge in a real-time streaming application, to analyze new transactions locally and in real time.Model training with TensorFlow on TPUsOur automotive company trained a model in Cloud ML Engine. They used Python and Keras to build an autoencoder (anomaly detection) for real-time sensor analytics, and then trained this model in TensorFlow on Cloud ML Engine leveraging Cloud TPUs (Tensor Processing Units):Google Cloud’s documentation has lots of information on how to train a model with Cloud ML Engine, including for different frameworks and use cases. If you are new to this topic, the Cloud ML Engine Getting Started guide is a good start to build your first model using TensorFlow. As a next step, you can walk through more sophisticated examples using Google ML Engine, TensorFlow and Keras for image recognition, object detection, text analysis, or a recommendation engine.The resulting, trained model is stored in Cloud Storage and thus can either deployed on a “serving” instance for live inference, or downloaded for edge deployment, as in our example above.Model deployment in KSQL streaming microservicesThere are different ways to build a real-time streaming application in the Kafka ecosystem. You can build your own application or microservice using any Kafka Client API (Java, Scala, .NET, Go, Python, node.js, or REST). Or you might want to leverage Kafka Streams (writing Java code) or KSQL (writing SQL statements)—two lightweight but powerful stream-processing frameworks that natively integrate with Apache Kafka to process high volumes of messages at scale, handle failover without data loss, and dynamically adjust scale without downtime.Here is an example of model inference in real-time. It is a continuous query that leverages the KSQL user-defined function (UDF) ‘applyFraudModel’, which embeds an autoencoder:You can deploy this KSQL statement as a microservice to one or more KSQL servers. The called model performs well and scales as needed, because it uses the same integration and processing pipeline for both training and deploying the model, leveraging Kafka Connect for real-time streaming ingestion of the sensor data, and KSQL (with Kafka Streams engine under the hood) for preprocessing and model deployment.You can build your own stateless or stateful UDFs for KSQL easily. You can find the above KSQL ML UDF and a step-by-step guide on using MQTT sensor data.If you’d prefer to leverage Kafka Streams to write your own Java application instead of writing KSQL, you might look at some code examples for deployment of a TensorFlow model in Kafka Streams.Monitoring your ML infrastructureOn GCP, you can leverage tools like Stackdriver, which allows monitoring and management for services, containers, applications, and infrastructure. Conventionally, organizations use Prometheus and JMX notifications for notifications and updates on their Kubernetes and Kafka integrations. Still, there is no silver bullet for monitoring your entire ML infrastructure, and adding an on-premises deployment creates additional challenges, since you have to set up your own monitoring tools instead of using GCP.Feel free to use the monitoring tools and frameworks you and your team already know and like. Ideally, you need to monitor your data ingestion, processing, training, deployment, accuracy, A/B testing workloads all in a scalable, reliable way. Thus, the Kafka ecosystem and GCP can provide the right foundation for your monitoring needs. Additional frameworks, services, or UIs can help your team to effectively monitor its infrastructure.Scalable, mission-critical machine learning infrastructure with GCP and Confluent CloudIn sum, we’ve shown you how you might build scalable, mission-critical, and even vendor-agnostic machine learning infrastructure. We’ve also shown you how you might leverage the open source Apache Kafka ecosystem to do data ingestion, processing, model inference, and monitoring. GCP offers all the compute power and extreme scale to run the Kafka infrastructure as a service, plus out-of-the-box integration with platform services such as Cloud Storage, Cloud ML Engine, GKE, and others. No matter where you want to deploy and run your models (in the cloud vs. on-premises; natively in-app vs. on a model server), GCP and Confluent Cloud are a great combination to set up your machine learning infrastructure.If you need coverage for your Kafka deployment, Confluent offers round-the-clock support and enterprise tooling for end-to-end monitoring, management of Kafka clusters, multi-data center replication, and more. If you’d like to learn more about Confluent Cloud, check out:The Confluent Cloud site, which provides more information about Confluent’s Enterprise and Professional offerings.The Confluent Cloud Professional getting started video.Our instructions to spin up your Kafka cloud instance on GCP here.Our community Slack group, where you can post questions in the #confluent-cloud channel.1. Read more »
  • Making AI-powered speech more accessible—now with more options, lower prices, and new languages and voices
    The ability to recognize and synthesize speech is critical for making human-machine interaction natural, easy, and commonplace, but it’s still too rare. Today we're making our Cloud Speech-to-Text and Text-to-Speech products more accessible to companies around the world, with more features, more voices (roughly doubled), more languages in more countries (up 50+%), and at lower prices (by up to 50% in some cases).Making Cloud Speech-to-Text more accessible for enterprisesWhen creating intelligent voice applications, speech recognition accuracy is critical. Even at 90% accuracy, it's hard to have a useful conversation. Unfortunately, many companies build speech applications that need to run on phone lines and that produce noisy results, and that data has historically been hard for AI-based speech technologies to interpret.For these situations with less than pristine data, we announced premium models for video and enhanced phone in beta last year, developed with customers who opted in to share usage data with us via data logging to help us refine model accuracy. We are excited to share today that the resulting enhanced phone model now has 62% fewer transcription errors (improved from 54% last year), while the video model, which is based on technology similar to what YouTube uses for automatic captioning, has 64% fewer errors. In addition, the video model also works great in settings with multiple speakers such as meetings or podcasts.The enhanced phone model was initially available only to customers participating in the opt-in data logging program announced last year. However, many large enterprises have been asking us for the option to use the enhanced model without opting into data logging. Starting today, anyone can access the enhanced phone model, and customers who choose the data logging option pay a lower rate, bringing the benefits of improved accuracy to more users.In addition to the general availability of both premium models, we’re also announcing the general availability of multi-channel recognition, which helps the Cloud Speech-to-Text API distinguish between multiple audio channels (e.g., different people in a conversation), which is very useful for doing call or meeting analytics and other use cases involving multiple participants. With general availability, all these features now qualify for an SLA and other enterprise-level guarantees.Cloud Speech-to-Text at LogMeInLogMeIn is an example of a customer that requires both accuracy and enterprise scale: Every day, millions of employees use its GoToMeeting product to attend an online meeting. Cloud Speech-to-Text lets LogMeIn automatically create transcripts for its enterprise GoToMeeting customers, enabling users to collaborate more effectively. “LogMeIn continues to be excited about our work with Google Cloud and its market-leading video and real-time speech to text technology. After an extensive market study for the best Speech-to-Text video partner, we found Google to be the highest quality and offered a useful array of related technologies. We continue to hear from our customers that the feature has been a way to add significant value by capturing in-meeting content and making it available and shareable post-meeting. Our work with Google Cloud affirms our commitment to making intelligent collaboration a fundamental part of our product offering to ultimately add more value for our global UCC customers.” - Mark Strassman, SVP and General Manager, Unified Communications and Collaboration (UCC) at LogMeIn.Making Cloud Speech-to-Text more accessible through lower pricing (up to 50% cheaper)Lowering prices is another way we are making Cloud Speech-to-Text more accessible. Starting now:For standard models and the premium video model, customers that opt-in to our data logging program will now pay 33% less for all usage that goes through the program.We’ve cut pricing for the premium video model by 25%, for a total savings of 50% for current video model customers who opt-in to data logging.Making Cloud Text-to-Speech accessible across more countriesWe’re also excited to help enterprises benefit from our research and experience in speech synthesis. Thanks to unique access to WaveNet technology powered by Google Cloud TPUs, we can build new voices and languages faster and easier than is typical in the industry: Since our update last August, we’ve made dramatic progress on Cloud Text-to-Speech, roughly doubling the number of overall voices, WaveNet voices, and WaveNet languages, and increasing the number of supported languages overall by ~50%, including:Support for seven new languages or variants, including Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian, and Norwegian Bokmål (all in beta). This update expands the list of supported languages to 21 and enables applications for millions of new end-users.31 new WaveNet voices (and 24 new standard voices) across those new languages. This gives more enterprises around the world access to our speech synthesis technology, which based on mean opinion score has already closed the quality gap with human speech by 70%. You can find the complete list of languages and voices here.20 languages and variants with WaveNet voices, up from nine last August--and up from just one a year ago when Cloud Text-to-Speech was introduced, marking a broad international expansion for WaveNet.In addition, the Cloud Text-to-Speech Device Profiles feature, which optimizes audio playback on different types of hardware, is now generally available. For example, some customers with call center applications optimize for interactive voice response (IVR), whereas others that focus on content and media (e.g., podcasts) optimize for headphones. In every case, the audio effects are customized for the hardware.Get started todayIt’s easy to give Cloud Speech products a try—check out the simple demos on the Cloud Speech-to-Text and Cloud Text-to-Speech landing pages. If you like what you see, you can use the $300 GCP credit to start testing. And as always, the first 60 minutes of audio you process every month with Cloud Speech-to-Text is free. Read more »
  • AI in depth: monitoring home appliances from power readings with ML
    As the popularity of home automation and the cost of electricity grow around the world, energy conservation has become a higher priority for many consumers. With a number of smart meter devices available for your home, you can now measure and record overall household power draw, and then with the output of a machine learning model, accurately predict  individual appliance behavior simply by analyzing meter data. For example, your electric utility provider might send you a message if it can reasonably assess that you left your refrigerator door open, or if the irrigation system suddenly came on at an odd time of day.In this post, you’ll learn how to accurately identify home appliances’ (e.g. electric kettles and washing machines, in this dataset) operating status using smart power readings, together with modern machine learning techniques such as long short-term memory (LSTM) models. Once the algorithm identifies an appliance’s operating status, we can then build out a few more applications. For example:Anomaly detection: Usually the TV is turned off when there is no one at home. An application can send a message to the user if the TV turns on at an unexpected or unusual time.Habit-improving recommendations: We can present users the usage patterns of home appliances in the neighborhood at an aggregated level so that they can compare or refer to the usage patterns and optimize the usage of their home appliances.We developed our end-to-end demo system entirely on Google Cloud Platform, including data collection through Cloud IoT Core, a machine learning model built using TensorFlow and trained on Cloud Machine Learning Engine, and real-time serving and prediction made possible by Cloud Pub/Sub, App Engine and Cloud ML Engine. As you progress through this post, you can access the full set of source files in the GitHub repository here.IntroductionThe growing popularity of IoT devices and the evolution of machine learning technologies have brought new opportunities for businesses. In this post, you’ll learn how home appliances’ (for example, an electric kettle and a washing machine) operating status (on/off) can be inferred from gross power readings collected by a smart meter, together with state-of-the-art machine learning techniques. An end-to-end demo system, developed entirely on Google Cloud Platform (as shown in Fig. 1), includes:Data collection and ingest through Cloud IoT Core and Cloud Pub/SubA machine learning model, trained using Cloud ML EngineThat same machine learning model, served using Cloud ML Engine together with App Engine as a front endData visualization and exploration using BigQuery and ColabFigure 1. System architectureThe animation below shows real-time monitoring, as real-world energy usage data is ingested through Cloud IoT Core into Colab.Figure 2. Illustration of real-time monitoringIoT extends the reach of machine learningData ingestionIn order to train any machine learning model, you need data that is both suitable and sufficient in quantity. In the field of IoT, we need to address a number of challenges in order to reliably and safely send the data collected by smart IoT devices to remote centralized servers. You’ll need to consider data security, transmission reliability, and use case-dependent timeliness, among other factors.Cloud IoT Core is a fully managed service that allows you to easily and securely connect, manage, and ingest data from millions of globally dispersed devices. The two main features of Cloud IoT Core are its device manager and its protocol bridge. The former allows you to configure and manage individual devices in a coarse-grained way by establishing and maintaining devices’ identities along with authentication after each connection. The device manager also stores each device’s logical configuration and is able to remotely control the devices—for example, changing a fleet of smart power meters’ data sampling rates. The protocol bridge provides connection endpoints with automatic load balancing for all device connections, and natively supports secure connection over industry standard protocols such as MQTT and HTTP. The protocol bridge publishes all device telemetry to Cloud Pub/Sub, which can then be consumed by downstream analytic systems. We adopted the MQTT bridge in our demo system and the following code snippet includes MQTT-specific logic.Data consumptionAfter the system publishes data to Cloud Pub/Sub, it delivers a message request to the “push endpoint,” typically the gateway service that consumes the data. In our demo system, Cloud Pub/Sub pushes data to a gateway service hosted in App Engine which then forwards the data to the machine learning model hosted in the Cloud ML Engine for inference, and at the same time stores the raw data together with received prediction results in BigQuery for later (batch) analysis.While there are numerous business-dependent use cases you can deploy based on our sample code, we illustrate raw data and prediction results visualization in our demo system. In the code repository, we have provided two notebooks:EnergyDisaggregationDemo_Client.ipynb: this notebook simulates multiple smart meters by reading in power consumption data from a real world dataset and sends the readings to the server. All Cloud IoT Core-related code resides in this notebook.EnergyDisaggregationDemo_View.ipynb: this notebook allows you to view raw power consumption data from a specified smart meter and our model's prediction results in almost real time.If you follow the deployment instructions in the README file and in the accompanying notebooks, you should be able to reproduce the results shown in Figure 2. Meanwhile, if you’d prefer to build out your disaggregation pipeline in a different manner, you can also use Cloud Dataflow and Pub/Sub I/O to build an app with similar functionality.Data processing and machine learningDataset introduction and explorationWe trained our model to predict each appliance’s on/off status from gross power readings, using the UK Domestic Appliance-Level Electricity (UK-DALE, publicly available here1) dataset  in order for this end-to-end demo system to be reproducible. UK-DALE records both whole-house power consumption and usage from each individual appliance every 6 seconds from 5 households. We demonstrate our solution using the data from house #2, for which the dataset includes a total of 18 appliances’ power consumption. Given the granularity of the dataset (a sample rate of ⅙ Hz), it is difficult to estimate appliances with relatively tiny power usage. As a result, appliances such as laptops and computer monitors are removed from this demo. Based on a data exploration study shown below, we selected eight appliances out of the original 18 items as our target appliances: a treadmill, washing machine, dishwasher, microwave, toaster, electric kettle, rice cooker and “cooker,” a.k.a., electric stovetop.The figure below shows the power consumption histograms of selected appliances. Since all the appliances are off most of the time, most of the readings are near zero. Fig. 4 shows the comparisons between aggregate power consumption of selected appliances (`app_sum`) and the whole-house power consumption (`gross`). It is worth noting that the input to our demo system is the gross consumption (the blue curve) because this is the most readily available power usage data, and is even measurable outside the home.Figure 3. Target appliances and demand histogramsFigure 4. Data sample from House #2 (on 2013-07-04 UTC)The data for House #2 spans from late February to early October 2013. We used data from June to the end of September in our demo system due to missing data at both ends of the period. The descriptive summary of selected appliances is illustrated in Table 1. As expected, the data is extremely imbalanced in terms of both “on” vs. “off” for each appliance and power consumption scale of each appliance, which introduces the main difficulty of our prediction task.Table 1. Descriptive summary of power consumptionPreprocessing the dataSince UK-DALE did not record individual appliance on/off status, one key preprocessing step is to label the on/off status of each appliance at each timestamp. We assume an appliance to be “on” if its power consumption is larger than one standard deviation from the sample mean of its power readings, given the fact that appliances are off most of the time and hence most of the readings are near zero. The code for data preprocessing can be found in the notebook provided, and you can also download the processed data from here.With the preprocessed data in CSV format, TensorFlow’s Dataset class serves as a convenient tool for data loading and transformation—for example, the input pipeline for machine learning model training. For example, in the following code snippet lines 7 - 9 load data from the specified CSV file and lines 11 - 13 transform data into our desired time-series sequence.In order to address the data imbalance issue, you can either down-sample the majority class or up-sample the minority class. In our case, we propose a probabilistic negative down-sampling method: we’ll preserve the subsequences in which at least one appliance remains on, but we’ll filter the subsequences with all appliances off, based on a certain probability and threshold. The filtering logic integrates easily with the API, as in the following code snippet:Finally, you’ll want to follow best practices from Input Pipeline Performance Guide to ensure that your GPU or TPU (if they are used to speed up training process) resources are not wasted while waiting for the data to load from the input pipeline. To maximize usage, we employ parallel mapping to parallelize data transformation and prefetch data to overlap the preprocessing and model execution of a training step, as shown in the following code snippet:The machine learning modelWe adopt a long short-term memory (LSTM) based network as our classification model. Please see Understanding LSTM Networks for an introduction to recurrent neural networks and LSTMs. Fig. 5 depicts our model design, in which an input sequence of length n is fed into a multilayered LSTM network, and prediction is made for all m appliances. A dropout layer is added for the input of LSTM cell, and the output of the whole sequence is fed into a fully connected layer. We implemented this model as a TensorFlow estimator.Figure 5. LSTM based model architectureThere are two ways of implementing the above architecture: TensorFlow native API (tf.layers and tf.nn) and Keras API (tf.keras). Compared to TensorFlow’s native API, Keras serves as a higher level API that lets you train and serve deep learning models with three key advantages: ease of use, modularity, and extensibility. tf.keras is TensorFlow's implementation of the Keras API specification. In the following code sample, we implemented the same LSTM-based classification model using both methods, so that you can compare the two:Model authoring using TensorFlow’s native API:Model authoring using the Keras API:Training and hyperparameter tuningCloud Machine Learning Engine supports both training and hyperparameter tuning. Figure 6 shows the average (over all appliances) precision, recall and f_score for multiple trials with different combinations of hyperparameters. We observed that hyperparameter tuning significantly improves model performance.Figure 6. Learning curves from hyperparameter tuning.We selected two experiments with optimal scores from hyperparameter tunings and report their performances in Table 2.Table 2. Hyper-parameter tuning of selected experimentsTable 3 lists the precision and recall of each individual appliance. As mentioned in the previous “Dataset introduction and exploration” section, the cooker and the treadmill (“running machine”) are difficult to predict, because their peak power consumptions were significantly lower than other appliances.Table 3. Precision and recall of predictions for individual appliancesConclusionWe have provided an end-to-end demonstration of how you can use machine learning to determine the operating status of home appliances accurately, based on only smart power readings. Several products including Cloud IoT Core,  Cloud Pub/Sub,  Cloud ML Engine, App Engine and BigQuery are orchestrated to support the whole system, in which each product solves a specific problem required to implement this demo, such as data collection/ingestion, machine learning model training, real time serving/prediction, etc. Both our code and data are available for those of you who would like to try out the system for yourself.We are optimistic that both we and our customers will develop ever more interesting applications at the intersection of more capable IoT devices and fast-evolving machine learning algorithms. Google Cloud provides both the IoT infrastructure and machine learning training and serving capabilities that make newly capable smart IoT deployments both a possibility and a reality.1. Jack Kelly and William Knottenbelt. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Scientific Data 2, Article number:150007, 2015, DOI:10.1038/sdata.2015.7. Read more »
  • How Box Skills can optimize your workflow with the help of Cloud AI
    Have you ever had to manually upload and tag a lot of files? It’s no fun. Increasingly though, machine learning algorithms can help you or your team classify and tag large volumes of content automatically. And if your company uses Box, a popular Cloud Content Management platform, you can now apply Google ML services to your enterprise content with just a few lines of code using the Box Skills framework. Box Skills makes it easy to connect an ML service.With technologies like image recognition, speech-to-text transcription, and natural language understanding, Google Cloud makes it easy to enrich your Box files with useful metadata. For example, if you have lots of images in your repository, you can use the Cloud Vision API to understand more about the image, such as objects or landmarks in an image, or in documents, —or even parse their contents and identify elements that determine the document’s category. If your needs extend beyond functionality provided by Cloud Vision, you can point your Skill at a custom endpoint that serves your own custom-trained model. The metadata applied to files in Box via a Box Skill can power other Box functionality such as search, workflow, or retention.An example integration in actionNow, let’s look at an example. Many businesses use Box to store images of their products. With the Box Skills Kit and the product search functionality in the Cloud Vision API, you can automatically catalog these products. When a user uploads a new product image into Box, the product search feature within the Vision API helps identify similar products in the catalog, as well as the maximum price for such a product.Configuring and deploying a product search Box SkillLet’s look at how you can use the Box Skills Kit to implement the use case outlined above.1.Create an endpoint for your Skill   a. Follow this QuickStart guide.   b. You can use this API endpoint to call a pre-trained machine learning model to classify new data.   c. Create a Cloud Function to point your Box Skill at the API endpoint created above.   d. Clone the following repository.   e. Next, follow the instructions to deploy the function to your project.   f. Make a note of the endpoint’s URI.2.Configure a Box Custom Skills App in Box, then configure it to point to the Cloud Function created above.   a. Follow the instructions.   b. Then these instructions.And there you have it. You now have a new custom Box Skill enabled by Cloud AI that’s ready to use. Try uploading a new image to your Box drive and notice that maximum retail price and information on similar products are both displayed under the “skills” console.Using your new SkillNow that you’re all set up, you can begin by uploading an image file of household goods, apparel, or toys into your Box drive. The upload triggers a Box Skill event workflow, which calls a Cloud Function you deployed in Google Cloud and whose endpoint you will specify in the Box Admin Console. The Cloud Function you create then uses the Box Skills kit's FileReader API to read the base64-encoded image string, automatically sent by Box when the upload trigger occurs. The Function then calls the product search function of Cloud Vision, and creates a Topics Card with data returned from the product search function. Next, it creates a Faces card in which to populate a thumbnail that it scaled from the original image. Finally, the function persists the skills card within Box using the skillswriter API. Now, you can open the image in Box drive and click on the "skills" menu (which expands, when you click the “magic wand” icon on the right), and you’ll see product catalog information, with similar products and maximum price populated.What’s next?Over the past several years, Google Cloud and Box have built a variety of tools to make end users more productive. Today, the Box Skills integration opens the door to a whole new world of advanced AI tools and services: in addition to accessing pre-trained models via the Vision API, Video Intelligence API or Speech-to-Text API, data scientists can train and host custom models written in TensorFlow, sci-kit learn, Keras, or PyTorch on Cloud ML Engine. Lastly, Cloud AutoML lets you train a model on your dataset without having to write any code. Whatever your levels of comfort with code or data science, we’re committed to making it easy for you to run machine learning-enhanced annotations on your data.You can find all the code discussed in this post and its associated documentation in its GitHub repository. Goodbye, tedious repetition! Hello, productivity. Read more »
  • Making the machine: the machine learning lifecycle
    As a Googler, one of my roles is to educate the software development community on machine learning (ML). The first introduction for many individuals is what is referred to as the ‘model’. While building models, tuning them, and evaluating their predictive abilities has generated a great deal of interest and excitement, many organizations still find themselves asking more basic questions, like how does machine learning fit into their software development lifecycle?In this post, I explain how machine learning (ML) maps to and fits in with the traditional software development lifecycle. I refer to this mapping as the machine learning lifecycle. This will help you as you think about how to incorporate machine learning, including models, into your software development processes. The machine learning lifecycle consists of three major phases: Planning (red), Data Engineering (blue) and Modeling (yellow).PlanningIn contrast to a static algorithm coded by a software developer, an ML model is an algorithm that is learned and dynamically updated. You can think of a software application as an amalgamation of algorithms, defined by design patterns and coded by software engineers, that perform planned tasks. Once an application is released to production, it may not perform as planned, prompting developers to rethink, redesign, and rewrite it (continuous integration/continuous delivery).We are entering an era of replacing some of these static algorithms with ML models, which are essentially dynamic algorithms. This dynamism presents a host of new challenges for planners, who work in conjunction with product owners and quality assurance (QA) teams.For example, how should the QA team test and report metrics? ML models are often expressed as confidence scores. Let’s suppose that a model shows that it is 97% accurate on an evaluation data set. Does it pass the quality test? If we built a calculator using static algorithms and it got the answer right 97% of the time, we would want to know about the 3% of the time it does not.Similarly, how does a daily standup work with machine learning models? It’s not like the training process is going to give a quick update each morning on what it learned yesterday and what it anticipates learning today. It’s more likely your team will be giving updates on data gathering/cleaning and hyperparameter tuning.When the application is released and supported, one usually develops policies to address user issues. But with continuous learning and reinforcement learning, the model is learning the policy. What policy do we want it to learn? For example, you may want it to observe and detect user friction in navigating the user interface and learn to adapt the interface (Auto A/B) to reduce the friction.Within an effective ML lifecycle, planning needs to be embedded in all stages to start answering these questions specific to your organization.Data engineeringData engineering is where the majority of the development budget is spent—as much as 70% to 80% of engineering funds in some organizations. Learning is dependent on data—lots of data, and the right data. It’s like the old software engineering adage: garbage in, garbage out. The same is true for modeling: if bad data goes in, what the model learns is noise.In addition to software engineers and data scientists, you really need a data engineering organization. These skilled engineers will handle data collection (e.g., billions of records), data extraction (e.g., SQL, Hadoop), data transformation, data storage and data serving. It’s the data that consumes the vast majority of your physical resources (persistent storage and compute). Typically due to the magnitude in scale, these are now handled using cloud services versus traditional on-prem methods.Effective deployment and management of data cloud operations are handled by those skilled in data operations (DataOps). The data collection and serving are handled by those skilled in data warehousing (DBAs). The data extraction and transformation are handled by those skilled in data engineering (Data Engineers), and data analysis are handled by those skilled in statistical analysis and visualization (Data Analysts).ModelingModeling is integrated throughout the software development lifecycle. You don’t just train a model once and you’re done. The concept of one-shot training, while appealing in budget terms and simplification, is only effective in academic and single-task use cases.Until fairly recently, modeling was the domain of data scientists. The initial ML frameworks (like Theano and Caffe) were designed for data scientists. ML frameworks are evolving and today are more in the realm of software engineers (like Keras and PyTorch). Data scientists play an important role in researching the classes of machine learning algorithms and their amalgamation, advising on business policy and direction, and moving into roles of leading data driven teams.But as ML frameworks and AI as a Service (AIaaS) evolve, the majority of modeling will be performed by software engineers. The same goes for feature engineering, a task performed by today’s data engineers: with its similarities to conventional tasks related to data ontologies, namespaces, self-defining schemas, and contracts between interfaces, it too will move into the realm of software engineering. In addition, many organizations will move model building and training to cloud-based services used by software engineers and managed by data operations. Then, as AIaaS evolves further, modeling will transition to a combination of turnkey solutions accessible via cloud APIs, such as for Cloud Vision and Cloud Speech-to-Text, and customizing pre-trained algorithms using transfer learning tools such as AutoML.Frameworks like Keras and PyTorch have already transitioned away symbol programming into imperative programming (the dominant form in software development), and incorporate object-oriented programming (OOP) principles such as inheritance, encapsulation, and polymorphism. One should anticipate that other ML frameworks will evolve to include object relational models (ORM), which we already use for databases, to data sources and inference (prediction). Common best practices will evolve and industry-wide design patterns will become defined and published, much like how Design Patterns by the Gang of Four influenced the evolution of OOP.Like continuous integration and delivery, continuous learning will also move into build processes, and be managed by build and reliability engineers. Then, once your application is released, its usage and adaptation in the wild will provide new insights in the form of data, which will be fed back to the modeling process so the model can continue learning.As you can see, adopting machine learning isn’t simply a question of learning to train a model, and you’re done. You need to think deeply about how those ML models will fit into your existing systems and processes, and grow your staff accordingly. I, and all the staff here at Google, wish you the best in your machine learning journey, as you upgrade your software development lifecycle to accommodate machine learning. To learn more about machine learning on Google Cloud here, visit our Cloud AI products page. Read more »
  • Spam does not bring us joy—ridding Gmail of 100 million more spam messages with TensorFlow
    1.5 billion people use Gmail every month, and 5 million paying businesses use Gmail in the workplace as a part of G Suite. For consumers and businesses alike, a big part of Gmail’s draw is its built-in security protections.Good security means constantly staying ahead of threats, and our existing ML models are highly effective at doing this—in conjunction with our other protections, they help block more than 99.9 percent of spam, phishing, and malware from reaching Gmail inboxes. Just as we evolve our security protections, we also look to advance our machine learning capabilities to protect you even better.That’s why we recently implemented new protections powered by TensorFlow, an open-source machine learning (ML) framework developed at Google. These new protections complement existing ML and rules-based protections, and they’ve successfully improved our detection capabilities. With TensorFlow, we are now blocking around 100 million additional spam messages every day.Where did we find these 100 million extra spam messages? We’re now blocking spam categories that used to be very hard to detect. Using TensorFlow has helped us block image-based messages, emails with hidden embedded content, and messages from newly created domains that try to hide a low volume of spammy messages within legitimate traffic.Given we’re already blocking the majority of spammy emails in Gmail, blocking millions more with precision is a feat. TensorFlow helps us catch the spammers who slip through that less than 0.1 percent, without accidentally blocking messages that are important to users. One person’s spam is another person’s treasureML makes catching spam possible by helping us identify patterns in large data sets that humans who create the rules might not catch; it makes it easy for us to adapt quickly to ever-changing spam attempts.ML-based protections help us make granular decisions based on many different factors. Consider that every email has thousands of potential signals. Just because some of an email’s characteristics match up to those commonly considered “spammy,” doesn’t necessarily mean it’s spam. ML allows us to look at all of these signals together to make a determination.Finally, it also helps us personalize our spam protections to each user—what one person considers spam another person might consider an important message (think newsletter subscriptions or regular email notifications from an application).Using TensorFlow to power MLBy complementing our existing ML models with TensorFlow, we’re able to refine these models even further, while allowing the team to focus less on the underlying ML framework, and more on solving the problem: ridding your inbox of spam!Applying ML at scale can be complex and time consuming. TensorFlow includes many tools that make the ML process easier and more efficient, accelerating the speed at which we can iterate. As an example, TensorBoard allows us to both comprehensively monitor our model training pipelines and quickly evaluate new models to determine how useful we expect them to be.TensorFlow also gives us the flexibility to easily train and experiment with different models in parallel to develop the most effective approach, instead of running one experiment at a time.As an open standard, TensorFlow is used by teams and researchers all over the world (There have been 71,000 forks of the public code and other open-source contributions!). This strong community support means new research and ideas can be applied quickly. And, it means we can collaborate with other teams within Google more quickly and easily to best protect our users.All in all, these benefits allow us to scale our ML efforts, requiring fewer engineers to run more experiments and protect users more effectively.This is just one example of how we’re using machine learning to keep users and businesses safe, and just one application of TensorFlow. Even within Gmail, we’re currently experimenting with TensorFlow in other security-related areas, such as phishing and malware detection, as part of our continuous efforts to keep users safe.And you can use it, too. Google open-sourced TensorFlow in 2015 to make ML accessible for everyone—so that many different organizations can take advantage of the technology that powers critical capabilities like spam prevention in Gmail and more.To learn more about TensorFlow and the companies that are using it, visit To learn more about the security benefits of G Suite, download this eBook. Read more »
  • Exoplanets, astrobiological research, and Google Cloud: What we learned from NASA FDL’s Reddit AMA
    Are we alone in the universe? Does intelligent life exist on other planets? If you’ve ever wondered about these things, you’re not the only one. Last summer, we partnered with NASA's Frontier Development Lab (FDL) to help find answers to these questions—you can read about some of this work in this blog post. And as part of this work we partnered with FDL researchers to host an AMA (“ask me anything”) to answer all those burning questions from Redditlings far and wide. Here are some of the highlights:Question: What can AI do to detect intelligent life on other planets?Massimo Mascaro, Google Cloud Director of Applied AI: AI can help extract the maximum information from the very faint and noisy signals we can get from our best instruments. AI is really good at detecting anomalies and in digging through large amounts of data and that's pretty much what we do when we search for life in space.Question: About how much data is expected to be generated during this mission? Are we looking at the terabyte, 10s of terabytes, or 100s of terabytes of data?Megan Ansdell, Planetary Scientist with a specialty in exoplanets: The TESS mission will download ~6 TB of data every month as it observes a new sector of sky containing 16,000 target stars at 2-minute cadence. The mission lifetime is at least 2 years, which means TESS will produce on the order of 150 TB of data. You can learn more about the open source deep learning models that have been developed to sort through the data here.Question: What does it mean to simulate atmospheres?Giada Arney, Astronomy and astrobiology (mentor): Simulating atmospheres for me involves running computer models where I provide inputs to the computer on gases in the atmosphere, “boundary conditions”, temperature and more. These atmospheres can then be used to simulate telescopic observations of similar exoplanets so that we can predict what atmospheric features might be observable with future observatories for different types of atmospheres.Question: How useful is a simulated exoplanet database?Massimo Mascaro: It's important to have a way to simulate the variability of the data you could observe, before observing it, to understand your ability to distinguish patterns, to plan on how to build and operate instruments and even to plan how to analyze the data eventually.Giada Arney: Having a database of different types of simulated worlds will allow us to predict what types of properties we’ll be able to observe on a diverse suite of planets. Knowing these properties will then help us to think about the technological requirements of future exoplanet observing telescopes, allowing us to anticipate the unexpected!Question: Which off-the-shelf Google Cloud AI/ML APIs are you using?Massimo Mascaro, Google Cloud Director of Applied AI: We've leveraged a lot of Google Cloud’s infrastructure, in particular Compute Engine and GKE, to both experiment with data and to run computation on large scale (using up to 2500 machines simultaneously), as well as TensorFlow and PyTorch running on Google Cloud to train deep learning models for the exoplanets and astrobiology experiments.Question: What advancements in science can become useful in the future other than AI?Massimo Mascaro: AI is just one of the techniques science can benefit in our times. I would put in that league definitely the wide access to computation. This is not only helping science in data analysis and AI, but in simulation, instrument design, communication, etc.Question: What do you think are the key things that will inspire the next generation of astrophysicists, astrobiologists, and data scientists?Sara Jennings, Deputy Director, NASA FDL: For future data scientists, I think it will be the cool problems like the ones we tackle at NASA FDL, which they will be able to solve using new and ever increasing data and techniques. With new instruments and data analysis techniques getting so much better, we're now at a moment where asking question such as whether there's life outside our planet is not anymore preposterous, but real scientific work.Daniel Angerhausen, Astrophysicist with expertise spanning astrobiology to exoplanets (mentor): I think one really important point is that we see more and more women in science. This will be such a great inspiration for girls to pursue careers in STEM. For most of the history of science we were just using 50 percent of our potential and this will hopefully be changed by our generation.You can read the full AMA transcript here. Read more »
  • AI in Depth: Cloud Dataproc meets TensorFlow on YARN: Let TonY help you train right in your cluster
    Apache Hadoop has become an established and long-running framework for distributed storage and data processing. Google’s Cloud Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simple, cost-efficient way. With Cloud Dataproc, you can set up a distributed storage platform without worrying about the underlying infrastructure. But what if you want to train TensorFlow workloads directly on your distributed data store?This post will explain how to install a Hadoop cluster for LinkedIn open-source project TonY (TensorFlow on YARN). You will deploy a Hadoop cluster using Cloud Dataproc and TonY to launch a distributed machine learning job. We’ll explore how you can use two of the most popular machine learning frameworks: TensorFlow and PyTorch.TensorFlow supports distributed training, allowing portions of the model’s graph to be computed on different nodes. This distributed property can be used to split up computation to run on multiple servers in parallel. Orchestrating distributed TensorFlow is not a trivial task and not something that all data scientists and machine learning engineers have the expertise, or desire, to do—particularly since it must be done manually. TonY provides a flexible and sustainable way to bridge the gap between the analytics powers of distributed TensorFlow and the scaling powers of Hadoop. With TonY, you no longer need to configure your cluster specification manually, a task that can be tedious, especially for large clusters.The components of our system:First, Apache HadoopApache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.  Hadoop services provides for data storage, data processing, data access, data governance, security, and operations.Next, Cloud DataprocCloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Cloud Dataproc’s automation capability helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data.And now TonYTonY is a framework that enables you to natively run deep learning jobs on Apache Hadoop. It currently supports TensorFlow and PyTorch. TonY enables running either single node or distributed training as a Hadoop application. This native connector, together with other TonY features, runs machine learning jobs reliably and flexibly.InstallationSetup a Google Cloud Platform projectGet started on Google Cloud Platform (GCP) by creating a new project, using the instructions found here.Create a Cloud Storage bucketThen create a Cloud Storage bucket. Reference here.Create a Hadoop cluster via Cloud Dataproc using initialization actionsYou can create your Hadoop cluster directly from Cloud Console or via an appropriate `gcloud` command. The following command initializes a cluster that consists of 1 master and 2 workers:When creating a Cloud Dataproc cluster, you can specify in your TonY initialization actions script that Cloud Dataproc should run on all nodes in your Cloud Dataproc cluster immediately after the cluster is set up.Note: Use Cloud Dataproc version 1.3-deb9, which is supported for this deployment. Cloud Dataproc version 1.3-deb9 provides Hadoop version 2.9.0. Check this version list for details.Once your cluster is created. You can verify that under Cloud Console > Big Data > Cloud Dataproc > Clusters, that cluster installation is completed and your cluster’s status is Running.Go to  Cloud Console > Big Data > Cloud Dataproc > Clusters and select your new cluster:You will see the Master and Worker nodes.Connect to your Cloud Dataproc master server via SSHClick on SSH and connect remotely to Master server.Verify that your YARN nodes are activeExampleInstalling TonYTonY’s Cloud Dataproc initialization action will do the following:Install and build TonY from GitHub repository.Create a sample folder containing TonY examples, for the following frameworks:TensorFlowPyTorchThe following folders are created:TonY install folder (TONY_INSTALL_FOLDER) is located by default in:TonY samples folder (TONY_SAMPLES_FOLDER) is located by default in:The Tony samples folder will provide 2 examples to run distributed machine learning jobs using:TensorFlow MNIST examplePyTorch MNIST exampleRunning a TensorFlow distributed jobLaunch a TensorFlow training jobYou will be launching the Dataproc job using a `gcloud` command.The following folder structure was created during installation in `TONY_SAMPLES_FOLDER`, where you will find a sample Python script to run the distributed TensorFlow job.This is a basic MNIST model, but it serves as a good example of using TonY with distributed TensorFlow. This MNIST example uses “data parallelism,” by which you use the same model in every device, using different training samples to train the model in each device. There are many ways to specify this structure in TensorFlow, but in this case, we use “between-graph replication” using tf.train.replica_device_setter.DependenciesTensorFlow version 1.9Note: If you require a more recent TensorFlow and TensorBoard version, take a look at the progress of this issue to be able to upgrade to latest TensorFlow version.Connect to Cloud ShellOpen Cloud Shell via the console UI:Use the following gcloud command to create a new job. Once launched, you can monitor the job. (See the section below on where to find the job monitoring dashboard in Cloud Console.)Running a PyTorch distributed jobLaunch your PyTorch training jobFor PyTorch as well, you can launch your Cloud Dataproc job using gcloud command.The following folder structure was created during installation in the TONY_SAMPLES_FOLDER, where you will find an available sample script to run the TensorFlow distributed job:DependenciesPyTorch version 0.4Torch Vision 0.2.1Launch a PyTorch training jobVerify your job is running successfullyYou can track Job status from the Dataproc Jobs tab: navigate to Cloud Console > Big Data > Dataproc > Jobs.Access your Hadoop UILogging via web to Cloud Dataproc’s master node via web: http://<Node_IP>:8088 and track Job status. Please take a look at this section to see how to access the Cloud Dataproc UI.Cleanup resourcesDelete your Cloud Dataproc clusterConclusionDeploying TensorFlow on YARN enables you to train models straight from your data infrastructure that lives in HDFS and Cloud Storage. If you’d like to learn more about some of the related topics mentioned in this post, feel free to check out the following documentation links:Machine Learning with TensorFlow on GCPHyperparameter tuning on GCPHow to train ML models using GCPAcknowledgements: Anthony Hsu, LinkedIn Software Engineer; and Zhe Zhang, LinkedIn Core Big Data Infra team manager. Read more »
  • How we built a derivatives exchange with BigQuery ML for Google Next ‘18
    Financial institutions have a natural desire to predict the volume, volatility, value or other parameters of financial instruments or their derivatives, to manage positions and mitigate risk more effectively. They also have a rich set of business problems (and correspondingly large datasets) to which it’s practical to apply machine learning techniques.Typically, though, in order to start using ML, financial institutions must first hire data scientist talent with ML expertise—a skill set for which recruiting competition is high. In many cases, an organization has to undertake the challenge and expense of bootstrapping an entire data science practice. This summer, we announced BigQuery ML, a set of machine learning extensions on top of our scalable data warehouse and analytics platform. BigQuery ML effectively democratizes ML by exposing it via the familiar interface of SQL—thereby letting financial institutions accelerate their productivity and maximize existing talent pools.As we got ready for Google Cloud Next London last summer, we decided to build a demo to showcases BigQuery ML’s potential for the financial services community. In this blog post, we’ll walk through how we designed the system, selected our time-series data, built an architecture to analyze six months of historical data, and quickly trained a model to outperform a 'random guess' benchmark—all while making predictions in close to real time.Meet the Derivatives ExchangeA team of Google Cloud solution architects and customer engineers built the Derivatives Exchange in the form of an interactive game, in which you can opt to either rely on luck, or use predictions from a model running in BigQuery ML, in order to decide which options contracts will expire in-the-money. Instead of using the value of financial instruments as the “underlying” for the options contracts, we used the volume of Twitter posts (tweets) for a particular hashtag within a specific timeframe. Our goal was to show the ease with which you can deploy machine learning models on Google Cloud to predict an instrument’s volume, volatility, or value.The Exchange demo, as seen at Google Next ‘18 LondonOur primary goal was to translate an existing and complex trading prediction process into a simple illustration to which users from a variety of industries can relate. Thus, we decided to:Use the very same Google Cloud products that our customers use daily.Present a time-series that is familiar to everyone—in this case, the number of hashtag Tweets observed in a 10-minute window as the “underlying” for our derivative contracts.Build a fun, educational, and inclusive experience.When designing the contract terms, we used this Twitter time-series data in a manner similar to the strike levels specified in weather derivatives.Architectural decisionsSolution architecture diagram: the social media options marketWe imagined the exchange as a retail trading pit where, using mobile handsets, participants purchase European binary range call option contracts across various social media single names (what most people would think of as hashtags). Contracts are issued every ten minutes and expire after ten minutes. At expiry, the count of accumulated #hashtag mentions for the preceding window is used to determine which participants were holding in-the-money contracts, and their account balances are updated accordingly. Premiums are collected upon opening interest in a contract, and are refunded if the contract strikes in-the-money. All contracts pay out 1:1.We chose the following Google Cloud products to implement the demo:Compute Engine served as our job server:The implementation executes periodic tasks for issuing, expiring, and settling contracts. The design also requires a singleton process to run as a daemon to continually ingest tweets into BigQuery. We decided to consolidate these compute tasks into an ephemeral virtual machine on Compute Engine. The job server tasks were authored with node.js and shell scripts, using cron jobs for scheduling, and configured by an instance template with embedded VM startup scripts, for flexibility of deployment. The job server does not interact with any traders on the system, but populates the “market operational database” with both participant and contract status.Cloud Firestore served as our market operational database:Cloud Firestore is a document-oriented database that we use to store information on market sessions. It serves as a natural destination for the tweet count and open interest data displayed by the UI, and enables seamless integration with the front end.Firebase and App Engine provided our mobile and web applications:Using the Firebase SDK for both our mobile and web applications’ interfaces enabled us to maintain a streamlined codebase for the front end. Some UI components (such as the leaderboard and market status) need continual updates to reflect changes in the source data (like when a participant’s interest in a contract expires in-the-money). The Firebase SDK provides concise abstractions for developers and enables front-end components to be bound to Cloud Firestore documents, and therefore to update automatically whenever the source data changes.Choosing App Engine to host the front-end application allowed us to focus on UI development without the distractions of server management or configuration deployment. This helped the team rapidly produce an engaging front end.Cloud Functions ran our backend API services:The UI needs to save trades to Cloud Firestore, and Cloud Functions facilitate this serverlessly. This serverless backend means we can focus on development logic, rather than server configuration or schema definitions, thereby significantly reducing the length of our development iterations.BigQuery and BigQuery ML stored and analyzed tweetsBigQuery solves so many diverse problems that it can be easy to forget how many aspects of this project it enables. First, it reliably ingests and stores volumes of streaming Twitter data at scale and economically, with minimal integration effort. The daemon process code for ingesting tweets consists of 83 lines of Javascript, with only 19 of those lines pertaining to BigQuery.Next, it lets us extract features and labels from the ingested data, using standard SQL syntax. Most importantly, it brings ML capabilities to the data itself with BigQuery ML, allowing us to train a model on features extracted from the data, ultimately exposing predictions at runtime by querying the model with standard SQL.BigQuery ML can help solve two significant problems that the financial services community faces daily. First, it brings predictive modeling capabilities to the data, sparing the cost, time and regulatory risk associated with migrating sensitive data to external predictive models. Second, it allows these models to be developed using common SQL syntax, empowering data analysts to make predictions and develop statistical insights. At Next ‘18 London, one attendee in the pit observed that the tool fills an important gap between data analysts, who might have deep familiarity with their particular domain’s data but less familiarity with statistics; and data scientists, who possess expertise around machine learning but may be unfamiliar with the particular problem domain. We believe BigQuery ML helps address a significant talent shortage in financial services by blending these two distinct roles into one.Structuring and modeling the dataOur model training approach is as follows:First, persist raw data in the simplest form possible: filter the Twitter Enterprise API feed for tweets containing specific hashtags (pulled from a pre-defined subset), and persist a two-column time-series consisting of the specific hashtag as well as the timestamp of that tweet as it was observed in the Twitter feed.Second, define a view in SQL that sits atop the main time-series table and extracts features from the raw Twitter data. We selected features that allow the model to predict the number of tweet occurrences for a given hashtag within the next 10-minute period. Specifically:#Hashtag#fintech may have behaviors distinct from #blockchain and distinct from #brexit, so the model should be aware of this as a feature.Day of weekSunday’s tweet behaviors will be different from Thursday’s tweet behaviors.Specific intra-day windowWe sliced a 24-hour day into 144 10-minute segments, so the model can inform us on trend differences between various parts of the 24-hour cycle.Average tweet count from the past hourThese values are calculated by the view based upon the primary time-series data.Average tweet velocity from the past hourTo predict future tweet counts accurately, the model should know how active the hashtag has been in the prior hour, and whether that activity was smooth (say, 100 tweets consistently for each of the last six 10-minute windows) or bursty (say, five 10-minute windows with 0 tweets followed by one window with 600 tweets).Tweet count rangeThis is our label, the final output value that the model will predict. The contract issuance process running on the job server contains logic for issuing options contracts with strike ranges for each hashtag and 10-minute window (Range 1: 0-100, Range 2: 101-250, etc.) We took the large historical Twitter dataset and, using the same logic, stamped each example with a label indicating the range that would have been in-the-money. Just as equity option chains issued on a stock are informed by the specific stock’s price history, our exchange’s option chains are informed by the underlying hashtag's volume history.Train the model on this SQL view. BigQuery ML makes model training an incredibly accessible exercise. While remaining inside the data warehouse, we use a SQL statement to declare that we want to create a model trained on a particular view containing the source data, and using a particular column as a label.Finally, deploy the trained model in production. Again using SQL, simply query the model based on certain input parameters, just as you would query any table.Trading options contractsTo make the experience engaging, we wanted to recreate a bit of the open-outcry pit experience by having multiple large “market data” screens for attendees (the trading crowd) to track contract and participant performance. Demo participants used Pixel 2 handsets in the pit to place orders using a simple UI, from which they could allocate their credits to any or all of the three hashtags. When placing their order, they chose between relying on their own forecast, or using the predictions of a BigQuery ML model for their specific options portfolio, among the list of contracts currently trading in the market. Once the trades were made for their particular contracts, they monitored how their trades performed compared to other “traders” in real-time, then saw how accurate the respective predictions were when the trading window closed at expiration time (every 10 minutes).ML training processIn order to easily generate useful predictions about tweet volumes, we use a three-part process, First, we store tweet time series data to a BigQuery table. Second, we layer views are layered on top of this table to extract the features and labels required for model training. Finally, we use BigQuery ML to train and get predictions from the model.The canonical list of hashtags to be counted is stored within a BigQuery table named “hashtags”. This is joined with the “tweets” table to determine aggregates for each time window.Example 1: Schema definition for the “hashtags” table1. Store tweet time series data The tweet listener writes tags, timestamps, and other metadata to a BigQuery table named “tweets” that possesses the schema listed in example 2:Example 2: Schema definition for the “tweets” table2. Extract features via layered viewsThe lowest-level view calculates the count of each hashtag’s occurrence, per intraday window. The mid-level view extracts the features mentioned in the above section (“Structuring and modeling the data”). The top-level view then extracts the label (i.e., the “would-have-been in-the-money” strike range) from that time-series data. Lowest-level view The lowest-level view is defined by the SQL in example 3. The view definition contains logic to aggregate tweet history into 10-minute buckets (with 144 of these buckets per 24-hour day) by hashtag.Example 3: low-level view definitionb. Intermediate viewThe selection of some features (for example: hashtag, day-of-week or specific intraday window) is straightforward, while others (such as average tweet count and velocity for the past hour) are more complex. The SQL in example 4 illustrates these more complex feature selections.Example 4: intermediate view definition for adding featuresc. Highest-level viewHaving selected all necessary features in the prior view, it’s time to select the label. The label should be the strike range that would have been in-the-money for a given historical hashtag and ten-minute-window. The application’s “Contract Issuance” batch job generates strike ranges for every 10-minute window, and its “Expiration and Settlement” job determines which contract (range) struck in-the-money. When labeling historical examples for model training, it’s critical to apply this exact same application logic.Example 5: highest level view3. Train and get predictions from modelHaving created a view containing our features and label, we refer to the view in our BigQuery ML model creation statement:Example 6: model creationThen, at the time of contract issuance, we execute a query against the model to retrieve a prediction as to which contract will be in-the-money.Example 7: SELECTing predictions FROM the modeImprovementsThe exchange was built with a relatively short lead time, hence there were several architectural and tactical simplifications made in order to realistically ship on schedule. Future iterations of the exchange will look to implement several enhancements, such as:Introduce Cloud Pub/Sub into the architectureCloud Pub/Sub is an enabler for refined data pipeline architectures, and it stands to improve several areas within the exchange’s solution architecture. For example, it would reduce the latency of reported tweet counts by allowing the requisite components to be event-driven rather than batch-oriented.Replace VM `cron` jobs with Cloud SchedulerThe current architecture relies on Linux `cron`, running on a Compute Engine instance, for issuing and expiring options contracts, which contributes to the net administrative footprint of the solution. Launched in November of last year (after the version 1 architecture had been deployed), Cloud Scheduler will enable the team to provide comparable functionality with less infrastructural overhead.Reduce the size of the code base by leveraging Dataflow templatesOften, solutions contain non-trivial amounts of code responsible for simply moving data from one place to another, like persisting Pub/Sub messages to BigQuery. Cloud Dataflow templates allow development teams to shed these non-differentiating lines of code from their applications and simply configure and manage specific pipelines for many common use cases. Expand the stored attributes of ingested tweetsStoring the geographical tweet origins and the actual texts of ingested tweets could provide a richer basis from which future contracts may be defined. For example, sentiment analysis could be performed on the Tweet contents for particular hashtags, thus allowing binary contracts to be issued pertaining to the overall sentiment on a topic.Consider BigQuery user-defined functions (UDFs) to eliminate duplicate code among batch jobs and model executionCertain functionality, such as the ability to nimbly deal with time in 10-minute slices, is required by multiple pillars of the architecture, and resulted in the team deploying duplicate algorithms in both SQL and Javascript. With BigQuery UDFs, the team can author the algorithm once, in Javascript, and leverage the same code assets in both the Javascript batch processes as well as in the BigQuery ML models.A screenshot of the exchange dashboard during a trading sessionIf you’re interested in learning more about BigQuery ML, check out our documentation, or more broadly, have a look at our solutions for the financial services industry, or check out this interactive BigQuery ML walkthrough video. Or, if you’re able to attend Google Next ‘19 in San Francisco, you can even try out the exchange for yourself. Read more »
  • Build an AI-powered, customer service virtual agent with Chatbase
    These days, most people don’t tolerate more than one or two bad customer service experiences. For contact centers drowning in customer calls and live chats, an AI-powered customer service virtual agent can reduce that risk by complementing humans to provide personalized service 24/7, without queuing or waiting. But the status-quo approach to designing those solutions (i.e., intuition and brainstorming) is slow, based on guesswork, and just scratches the surface on functionality -- usually, causing more harm than good because the customer experience is poor.Built within Google’s internal incubator called Area 120, Chatbase is a conversational AI platform that replaces the risky status quo approach with a data-driven one based on Google’s world-class machine learning and search capabilities. The results include faster development (by up to 10x) of a more helpful and versatile virtual agent, and happier customers!Lessons learned along the journeyInitially, Chatbase provided a free-to-use analytics service for measuring and optimizing any AI-powered chatbot. (That product is now called Chatbase Virtual Agent Analytics.) After analyzing hundreds of thousands of bots and billions of messages in our first 18 months of existence, we had two revelations about how to help bot builders in a more impactful way: one, that customer service virtual agents would become the primary use case for the technology; and two, that using ML to glean insights from live-chat transcripts at scale would drastically shorten development time for those agents while creating a better consumer experience. With those lessons learned, Chatbase Virtual Agent Modeling (currently available via an EAP) was born.Virtual Agent Modeling explainedVirtual Agent Modeling (a component in the Cloud Contact Center AI solution) uses Google’s core strengths in ML and search to analyze thousands of transcripts, categorizing customer issues into “drivers” and then digging deeper to find specific intents (aka customer requests) per driver. For complex intents, Chatbase models simple yet rich flows developers can use to build a voice or chat virtual agent that handles up to 99% of interactions, responds helpfully to follow-up questions, and knows exactly when to do a hand-off to a live agent.In addition, the semantic search tool finds potentially thousands of training phrases per intent. When this analysis is complete, developers can export results to their virtual agent (via Dialogflow) -- cutting weeks, months, or even years from development time.Don’t settle for the slow status quoAccording to one Fortune 100 company upgrading its customer service virtual agent with Virtual Agent Modeling, “This new approach to virtual agent development moves at 200 mph, compared to 10-20 mph with current solutions.” Furthermore, it expects to nearly double the number of interactions its virtual agent can handle, from 53% of interactions to 92%.If you have at least 100,000 English-language live-chat transcripts available and plan to deploy or enhance either a voice or chat customer service virtual agent in 2019, Virtual Agent Modeling can help you get into that fast lane. Request your personal demo today! Read more »
  • Introducing Feast: an open source feature store for machine learning
    To operate machine learning systems at scale, teams need to have access to a wealth of feature data to both train their models, as well as to serve them in production. GO-JEK and Google Cloud are pleased to announce the release of Feast, an open source feature store that allows teams to manage, store, and discover features for use in machine learning projects.Developed jointly by GO-JEK and Google Cloud, Feast aims to solve a set of common challenges facing machine learning engineering teams by becoming an open, extensible, unified platform for feature storage. It gives teams the ability to define and publish features to this unified store, which in turn facilitates discovery and feature reuse across machine learning projects.“Feast is an essential component in building end-to-end machine learning systems at GO-JEK,” says Peter Richens, Senior Data Scientist at GO-JEK, “we are very excited to release it to the open source community. We worked closely with Google Cloud in the design and development of the product,  and this has yielded a robust system for the management of machine learning features, all the way from idea to production.”For production deployments, machine learning teams need a diverse set of systems working together. Kubeflow is a project dedicated to making these systems simple, portable and scalable and aims to deploy best-of-breed open-source systems for ML to diverse infrastructures. We are currently in the process of integrating Feast with Kubeflow to address the feature storage needs inherent in the machine learning lifecycle.The motivationFeature data are signals about a domain entity, e.g: for GO-JEK, we can have a driver entity and a feature of the daily count of trips completed. Other interesting features might be the distance between the driver and a destination, or the time of day. A combination of multiple features are used as inputs for a machine learning model.In large teams and environments, how features are maintained and served can diverge significantly across projects and this introduces infrastructure complexity, and can result in duplicated work.Typical challenges:Features not being reused: Features representing the same business concepts are being redeveloped many times, when existing work from other teams could have been reused.Feature definitions vary: Teams define features differently and there is no easy access to the documentation of a feature.Hard to serve up to date features: Combining streaming and batch derived features, and making them available for serving, requires expertise that not all teams have. Ingesting and serving features derived from streaming data often requires specialised infrastrastructure. As such, teams are deterred from making use of real time data.Inconsistency between training and serving: Training requires access to historical data, whereas models that serve predictions need the latest values. Inconsistencies arise when data is siloed into many independent systems requiring separate tooling.Our solutionFeast solves these challenges by providing a centralized platform on which to standardize the definition, storage and access of features for training and serving. It acts as a bridge between data engineering and machine learning.Feast handles the ingestion of feature data from both batch and streaming sources. It also manages both warehouse and serving databases for historical and the latest data. Using a Python SDK, users are able to generate training datasets from the feature warehouse. Once their model is deployed, they can use a client library to access feature data from the Feast Serving API.Feast provides the following:Discoverability and reuse of features: A centralized feature store allows organizations to build up a foundation of features that can be reused across projects. Teams are then able to utilize features developed by other teams, and as more features are added to the store it becomes easier and cheaper to build models.Access to features for training: Feast allows users to easily access historical feature data. This allows users to produce datasets of features for use in training models. ML practitioners can then focus more on modelling and less on feature engineering.Access to features in serving: Feature data is also available to models in production through a feature serving API. The serving API has been designed to provide low latency access to the latest feature values.Consistency between training and serving: Feast provides consistency by managing and unifying the ingestion of data from batch and streaming sources, using Apache Beam, into both the feature warehouse and feature serving stores. Users can query features in the warehouse and the serving API using the same set of feature identifiers.Standardization of features: Teams are able to capture documentation, metadata and metrics about features. This allows teams to communicate clearly about features, test features data, and determine if a feature is useful for a particular model.KubeflowThere is a growing ecosystem of tools that attempt to productionize machine learning. A key open source ML platform in this space is Kubeflow, which has focused on improving packaging, training, serving, orchestration, and evaluation of models. Companies that have built successful internal ML platforms have identified that standardizing feature definitions, storage, and access, was critical to that success.For this reason, Feast aims to be both deployable on Kubeflow and to integrate seamlessly with other Kubeflow components.  This includes a Python SDK for use with Kubeflow's Jupyter notebooks, as well as Kubeflow Pipelines.There is a Kubeflow GitHub issue here that allows for discussion of future Feast integration.How you can contributeFeast provides a consistent way to access features that can be passed into serving models, and to access features in batch for training. We hope that Feast can act as a bridge between your data engineering and machine learning teams, and we would love to hear your feedback via our GitHub project. For additional ways to contribute:Find the Feast project on GitHub repository hereJoin the Kubeflow community and find us on SlackLet the Feast begin! Read more »
  • A simple blueprint for building AI-powered customer service on GCP
    As a Google Cloud customer engineer based in Amsterdam, I work with a lot of banks and insurance companies in the Netherlands. All of them have this common requirement: to help customer service agents (many of whom are poorly trained interns due to the expense of hiring) handle large numbers of customer calls, especially at the end of the year when many consumers want to change or update their insurance plan.Most of these requests are predictable and easily resolved with the exchange of a small amount of information, which is a perfect use case for an AI-powered customer service agent. Virtual agents can provide non-queued service around the clock, and can easily be programmed to handle simple requests as well as do a hand-off to well-trained live agents for more complicated issues. Furthermore, a well-designed solution can help ensure that consumer requests, regardless of the channel in which they are received (phone, chat, IoT), are routed to the correct resource. As a result, in addition to the obvious customer satisfaction benefits, research says that virtual agents could help businesses in banking and healthcare alone trim costs collectively by $11 billion a year.In this post, I’ll provide an overview of a simple solution blueprint I designed that may inspire you to meet these objectives using GCP. Similar solutions that integrate with existing call center systems can be obtained through Cloud Contact Center AI partners, as well.Requirements and solutionAll businesses have the goal of making customer service effortless. With an AI-powered approach, a system can be designed that can accommodate consumers however they choose to reach out, whether by telephone, web chat, social media, mobile apps, or smart speaker.The particular approach described here covers three channels: web chat, the Google Assistant (on a Google Home), and telephone (through a telephony gateway). It also meets a few other requirements:Ability to optimize over time. If you know what questions consumers ask and how their sentiment changes during a conversation, the virtual agent (and thus customer satisfaction) can be improved over time.Protection of consumer privacy. Per GDPR, sensitive personal information can’t be revealed or stored.An easy deployment and management experience. It goes without saying that any company adopting cloud wants to avoid maintaining VMs, networks, and operating systems, as well as monolithic architecture. Thus the solution should take advantage of the ability to easily/automatically build, deploy, and publish updates.With Google Cloud, meeting these requirements is as easy as stitching a few components together. Let’s have a closer look.Technology stackThe diagram below provides a high-level overview; I’ll explain each piece in turn.DialogflowDialogflow Enterprise Edition, an emerging standard for building AI-powered conversational experiences across multiple channels, is the “brains” of this solution. My customers love it because it doesn’t require special natural language understanding skills; a team of content experts and UX designers are all you need to build a robust virtual agent for a simple use case. It also integrates well with other Google Cloud components, offers error reporting and debug information out of the box, and is available along with Google Cloud Support and SLA.As you can see in the architectural diagram, Dialogflow is integrated with the website channel through the Dialogflow SDK. Integration with the Google Assistant or the Phone Gateway simply requires flipping a switch during configuration.ChannelsWebsite: The website front-end and back-end are split into two separate Kubernetes containers. The website front-end is build with Angular, and the back-end container is based on Node.js with integration. Dialogflow has a Node.js client library, so text messages from the Angular app are passed to the Node.js server app via WebSocket in order to send it to the Dialogflow SDK.The Google Assistant: Actions on Google is a framework for creating software applications (a.k.a., “actions”) for the Google Assistant. Actions on Google is nicely integrated in Dialogflow: Just log in with your Google account and you can easily deploy your agent to the Google Assistant, enabling interactions on Android apps, via the Google Assistant app on iOS, or on Google Home.Phone: As mentioned in the introduction, if your plan is to integrate your virtual agent with an existing contact center call system, Google Cloud partners like Genesys, Twilio, and Avaya can help integrate Cloud Contact Center AI with their platforms. (For an overview, see this video from Genesys.) For startups and SMBs, the Dialogflow Phone Gateway feature (currently in beta) integrates a virtual agent with a Google Voice telephone number with just a few clicks, creating an “instant” customer service voice bot.AnalyticsWhether you’re building a full customer service AI system, a simple action for the Google Assistant, or anything in between, it’s important to know which questions/journeys are common, which responses are most satisfying, and if and when the virtual agent isn’t programmed to respond beyond a default “fallback” message. The diagram below shows the solution analytics architecture for addressing this need.Cloud Pub/Sub: Cloud Pub/Sub, a fully-managed, real-time publish/subscribe messaging service that sends and receives messages between independent applications, is the “glue” that holds the analytic components together. All transcripts (from voice calls or chats) are sent to Cloud Pub/Sub as a first step before analysis.Cloud Functions: Google Cloud Functions is a lightweight compute platform for creating single-purpose, standalone functions that respond to events without the need to manage a server or runtime environment. In this case, the event will be triggered by Cloud Pub/Sub: Every time a message arrives there through the subscriber endpoint, a cloud function will run the message through two Google Cloud services (see below) before storing it in Google BigQuery.Cloud Natural Language: This service reveals the structure of a text message; you can use it to extract information about people, places, or in this case, to detect the sentiment of a customer conversation. The API returns a sentiment level between 1 and -1.Cloud Data Loss Prevention: This service discovers and redacts any sensitive information such as addresses and telephone numbers remaining in transcripts before storage.BigQuery: BigQuery is Google Cloud’s serverless enterprise data warehouse, supporting super-fast SQL queries enabled by the massive processing power of Google's infrastructure. Using BigQuery you could combine your website data together with your chat logs. Imagine you can see that your customer browsed through one of your product webpages, and then interacted with a chatbot. Now you can answer them proactively with targeted deals.. Naturally, this analysis can be done through a third-party business intelligence tool like Tableau, with Google Data Studio, or through a homegrown web dashboard like the one shown below.Another use case would be to write a query that returns all chat messages that have a negative sentiment score:SELECT * from `chatanalytics.chatmessages` where SCORE < 0 ORDER BY SCORE ASCThis query also returns the session ID,  so you could then write a query to get the full chat transcript and explore why this customer became unhappy:SELECT * from `chatanalytics.chatmessages` where SESSION = '6OVkcIQg7QFvdc5EAAAs' ORDER BY POSTEDDeployment: Finally, you can use Cloud Build to easily build and deploy these containers to Google Kubernetes Engine with a single command in minutes. A simple YAML file in the project will specify how this all works. As a result, each component/container can be independently modified as needed.Chatbase (optional): It’s not included in this blueprint, but for a more robust approach, Chatbase Virtual Agent Analytics (which powers Dialogflow Analytics and is free to use) is also an option. In addition to tracking health KPIs, it provides deep insights into user messages and journeys through various reports combined with transcripts. Chatbase also lets you report across different channels/endpoints.ConclusionRecently, it took me just a couple of evenings to build a full demo of this solution.And going forward, I don’t need to worry about installing operating systems, patches, or software, nor about scaling for demand: whether I have 10 or hundreds of thousands of users talking to the bot, it will just work. If you’re exploring improving customer satisfaction with an AI-powered customer service virtual agent, hopefully this blueprint is a thought-provoking place to start! Read more »
WordPress RSS Feed Retriever by Theme Mason

Leave a Reply