Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

https://www.joinsuperhuman.ai/c/cheat-sheet-beginners?utm_source=www.joinsuperhuman.ai&utm_medium=newsletter

...

OpenAI Pricing Guide

https://openai.com/pricing#language-models

ChatGPT Prompting Guide

https://www.joinsuperhuman.ai/c/prompts-cheat-sheet?utm_source=www.joinsuperhuman.ai&utm_medium=newsletter

...

Once registered, here are the links to the 3 keynotes I recommend you watch:
 
Microsoft Build opening:
https://lnkd.in/eJkWk8ep
 
The era of the AI Copilot:
https://lnkd.in/e8Q3qTKm
 
Next generation AI for developers with the Microsoft Cloud:
https://lnkd.in/eUNmD9-w
 
Finally, here is a link to the complete Book of News of Microsoft Build 2023:
https://lnkd.in/eWfubc6h

CANVA - visual report generator

Graphic design output generator

https://chat.openai.com/g/g-alKfVrz9K-canva

Effortlessly design anything: presentations, logos, social media posts and more.

SORA - Openai Video Generator

First look at SORA video generator from text review

https://www.vox.com/future-perfect/24080195/sora-openai-sam-altman-ai-generated-videos-disinformation-midjourney-dalle

Sora, a generative AI model that produces videos based on a simple prompt. It’s not available to the public yet

Users replied with short prompts: “a monkey playing chess in a park,” or “a bicycle race on ocean with different animals as athletes.” It’s uncanny, mesmerizing, weird, beautiful — and prompting the usual cycle of commentary.

Deep fakes?

(AI generations are relatively cheap, but if you’re going for something specific and convincing, that’s much pricier. A tsunami of deepfakes implies a scale that spammers mostly can’t afford at the moment.)

DALL-E image generator from text

DALL-E 2, a model that could produce still images from a text prompt. The high-resolution fantastical images it produced were quickly all over social media, as were the takes on what to think of it: Real art? Fake art? A threat to artists? A tool for artists? A disinformation machine? Two years later, it’s worth a bit of a retrospective if we want our takes on Sora to age better.

SORA will progress like DALL-E did

Many of the things DALL-E 2 couldn’t do, DALL-E 3 could. And if DALL-E 3 couldn’t, a competitor often could. That’s a perspective that’s crucial to keep in mind when you read prognosticating on Sora — you’re likely looking at early steps into a major new capability, one that could be used for good or malicious purposes,and while it’s possible to oversell it, it’s also very easy to sell it short

SORA beta enrollment

https://www.youtube.com/watch?v=sBo-D3TzBcc

https://openai.com/sora

https://openai.com/research/video-generation-models-as-world-simulators

SORA - creates 1 minute video from text ---

We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.

Similar to DALL·E 3, we also leverage GPT to turn short user prompts into longer detailed captions that are sent to the video model. This enables Sora to generate high quality videos that accurately follow user prompts.

SORA input = text, video, images

All of the results above and in our landing page show text-to-video samples. But Sora can also be prompted with other inputs, such as pre-existing images or video. This capability enables Sora to perform a wide range of image and video editing tasks—creating perfectly looping video, animating static images, extending videos forwards or backwards in time, etc.

SORA can extend videos backward or forward in time

Sora is also capable of extending videos, either forward or backward in time. Below are four videos that were all extended backward in time starting from a segment of a generated video. As a result, each of the four videos starts different from the others, yet all four videos lead to the same ending.

Video-to-video editing

Diffusion models have enabled a plethora of methods for editing images and videos from text prompts. Below we apply one of these methods, SDEdit,32 to Sora. This technique enables Sora to transform  the styles and environments of input videos zero-shot.

3D simulations

We find that video models exhibit a number of interesting emergent capabilities when trained at scale. These capabilities enable Sora to simulate some aspects of people, animals and environments from the physical world. These properties emerge without any explicit inductive biases for 3D, objects, etc.—they are purely phenomena of scale.

3D consistency. Sora can generate videos with dynamic camera motion. As the camera shifts and rotates, people and scene elements move consistently through three-dimensional space.

Interacting with the world. Sora can sometimes simulate actions that affect the state of the world in simple ways. For example, a painter can leave new strokes along a canvas that persist over time, or a man can eat a burger and leave bite marks.

Simulating digital worlds. Sora is also able to simulate artificial processes–one example is video games. Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning “Minecraft.”

These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them.

Current SORA limitations

The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

SORA Safety

We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model.

We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora. We plan to include C2PA metadata in the future if we deploy the model in an OpenAI product.

SORA policy enforcement

reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others. We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user.

SORA reward and risk

Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.

How SORA works

We represent videos and images as collections of smaller units of data called patches, each of which is akin to a token in GPT. By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions and aspect ratios.

Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully.

Potential Value Opportunities


ChatGPT workflow - potential governance points

...