Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Once registered, here are the links to the 3 keynotes I recommend you watch:
 
Microsoft Build opening:
https://lnkd.in/eJkWk8ep
 
The era of the AI Copilot:
https://lnkd.in/e8Q3qTKm
 
Next generation AI for developers with the Microsoft Cloud:
https://lnkd.in/eUNmD9-w
 
Finally, here is a link to the complete Book of News of Microsoft Build 2023:
https://lnkd.in/eWfubc6h

SORA - Openai Video Generator

First look at SORA video generator from text review

https://www.vox.com/future-perfect/24080195/sora-openai-sam-altman-ai-generated-videos-disinformation-midjourney-dalle

Sora, a generative AI model that produces videos based on a simple prompt. It’s not available to the public yet

Users replied with short prompts: “a monkey playing chess in a park,” or “a bicycle race on ocean with different animals as athletes.” It’s uncanny, mesmerizing, weird, beautiful — and prompting the usual cycle of commentary.

Deep fakes?

(AI generations are relatively cheap, but if you’re going for something specific and convincing, that’s much pricier. A tsunami of deepfakes implies a scale that spammers mostly can’t afford at the moment.)

DALL-E image generator from text

DALL-E 2, a model that could produce still images from a text prompt. The high-resolution fantastical images it produced were quickly all over social media, as were the takes on what to think of it: Real art? Fake art? A threat to artists? A tool for artists? A disinformation machine? Two years later, it’s worth a bit of a retrospective if we want our takes on Sora to age better.

SORA will progress like DALL-E did

Many of the things DALL-E 2 couldn’t do, DALL-E 3 could. And if DALL-E 3 couldn’t, a competitor often could. That’s a perspective that’s crucial to keep in mind when you read prognosticating on Sora — you’re likely looking at early steps into a major new capability, one that could be used for good or malicious purposes,and while it’s possible to oversell it, it’s also very easy to sell it short

SORA beta enrollment

https://www.youtube.com/watch?v=sBo-D3TzBcc

https://openai.com/sora

https://openai.com/research/video-generation-models-as-world-simulators

SORA - creates 1 minute video from text ---

We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.

Similar to DALL·E 3, we also leverage GPT to turn short user prompts into longer detailed captions that are sent to the video model. This enables Sora to generate high quality videos that accurately follow user prompts.

SORA input = text, video, images

All of the results above and in our landing page show text-to-video samples. But Sora can also be prompted with other inputs, such as pre-existing images or video. This capability enables Sora to perform a wide range of image and video editing tasks—creating perfectly looping video, animating static images, extending videos forwards or backwards in time, etc.

SORA can extend videos backward or forward in time

Sora is also capable of extending videos, either forward or backward in time. Below are four videos that were all extended backward in time starting from a segment of a generated video. As a result, each of the four videos starts different from the others, yet all four videos lead to the same ending.

Video-to-video editing

Diffusion models have enabled a plethora of methods for editing images and videos from text prompts. Below we apply one of these methods, SDEdit,32 to Sora. This technique enables Sora to transform  the styles and environments of input videos zero-shot.

3D simulations

We find that video models exhibit a number of interesting emergent capabilities when trained at scale. These capabilities enable Sora to simulate some aspects of people, animals and environments from the physical world. These properties emerge without any explicit inductive biases for 3D, objects, etc.—they are purely phenomena of scale.

3D consistency. Sora can generate videos with dynamic camera motion. As the camera shifts and rotates, people and scene elements move consistently through three-dimensional space.

Interacting with the world. Sora can sometimes simulate actions that affect the state of the world in simple ways. For example, a painter can leave new strokes along a canvas that persist over time, or a man can eat a burger and leave bite marks.

Simulating digital worlds. Sora is also able to simulate artificial processes–one example is video games. Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning “Minecraft.”

These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them.


Potential Value Opportunities

...