How will Tesla give its cars depth perception?

How will Tesla give its cars depth perception?

Since there is very little overlap between cameras, how is Tesla going to give its cars depth perception? The radar only points forward and won't detect people, soccer balls, a mattress falling of the roof of the car in front of you, etc. Ultrasonic sensors don't see far enough away to be useful for avoiding obstacles while you're at speed.

Millions of years of evolution have created a pretty consistent trend. Animals (mainly predators) that need good depth perception have more than one eye pointing in the direction of the object they need to locate and the distance between the eyes is proportional to the distance to the objects that are most important for them to locate. Teslas have more of a prey-animal-configuration for their cameras -- they're pointing in all directions, giving the cars a huge "visual field," but there are only very small areas where cameras overlap. If you watch the cartoons on your car's display, the car does a pretty good job of determining the bearing of other cars, but not the distance. They'll bounce toward and away from the cartoon of your car while all cars are stopped (excluding the car in front of you which is located with radar). Seems like it might be a hard problem to solve. Thoughts?

RedPillSucks | 14 november 2018

There's radar. Tesla and others have years of research on this. The cameras arent the only thing used.

ReD eXiLe ms us | 14 november 2018

Parallel Axis animation, AKA 'parallax' probably serves some function in examination of otherwise 2D imagery in motion. Determine the horizon and the vanishing point relative to the camera's position within a still phrame, then begin you distance calculation to each object. Then update the result when comparing frames, before, after, expected. At least, that's how I would approach the problem in software. Smarter folks might also use analysis of shadows and light sources in realtime to provide more data. But I've only wotked with computer graphics for the past 35 years or so and I'm mostly self taught as a hobbyist and videogame enthusiast, so there may be advances and techniques I'm not readily aware of... Basically, it is possible, don't worry about it.

ColoDriver | 14 november 2018

There's also three forward facing cameras with different focal lengths. The size and distance of objects can be calculated using two different lenses e.g. wide angle and telephoto.

neylus | 14 november 2018

Yep, 3 forward facing cameras directly behind the rear-view mirror.

Magic 8 Ball | 14 november 2018

This is all alien born tech and we will never know how it works ; ). To give you an idea of the complexity of the issue, and how perception can be tricked easily, visit the works of M.C. Escher.

SteveWin1 | 14 november 2018

@RedPill, yeah... read my second sentence.

@ReD, great answer, thanks!

@ColoDriver & neylus, How does that work? If the cameras were separated by a significant distance (one on each A-pillar), I would image it would be easy to use the different cameras to determine the distance of things. Since they're so close together, the benefit of having different cameras is limited. Does zoom/wide-angle actually allow you to determine distance? A 2x zoom camera would see the moon as twice as big and would see the car in front of you as twice as big. Wide angle would also distort near and far objects by the same amount. right? **not a camera expert at all**

nachilau | 14 november 2018

I think it is a very interesting question. After upgrade to Ver 9 with all camera on, I notice that the car icon show on the screen beside me always poping toward and outward, which mean at a moment, it is very close to my car and at another moment it is not. I know that it didn't affect Auto Pilot performance as Tesla properly using the distance (sonic and radar) sensor to determine the actual distance between car. But it might show just using the camera is not that accurate at all.

Magic 8 Ball | 14 november 2018

@nachilau You do realize those are not actual images of objects, don't you. Position determined by "sonic and radar" can have a graphical representation just like you are seeing on your screen ; ).

SteveWin1 | 14 november 2018

@nachilau, exactly. That's what had me wondering how they're going to fix that. Whatever they're doing currently isn't working that well except for the car in front of you (due to radar). It doesn't even seem like they're combining the camera data with the sonar data. The sonar should invalidate any camera-based depth perception that puts a car so close that its sitting inside your car (happens pretty frequently)...but it doesn't.

Atom12 | 14 november 2018

@SteveWin1: Are you suggesting that a person with only one functioning eye can't get a driver's license?

SteveWin1 | 14 november 2018

@Atom12, no, I'm not. I am 100% positive that they have a worse visual field and worse depth perception than binocular drivers, however.

ColoDriver | 14 november 2018

The outer two of the forward facing cameras are about the same distance apart as human eyes. That should give plenty of parallax for depth perception.

As for the focal length question (please chime in any experts in trig!): imagine two photographs made with lenses that have significantly different focal lengths like a 24mm wide angle and a 200mm telephoto. Identify an object that appears the same size on both photos, call the real height X and the height in the image Y (Y24 and Y200). There is only one distance at which any object of height X will render the same size Y (Y24=Y200), so the distance Z can be calculated. If for a given X, Y24 is larger than Y200 then the object is closer than Z, if Y24 is smaller then the object is further than Z. Hopefully that makes sense.

I'm not saying that's how Tesla does it, it should work.

Haggy | 14 november 2018

"Animals (mainly predators) that need good depth perception have more than one eye pointing in the direction of the object they need to locate and the distance between the eyes is proportional to the distance to the objects that are most important for them to locate."

That's a big part of it. If I were to design a bird with an eye on each side of its head, and I wanted to give it depth perception, I'd make it so that each time it took a step, it would bob its head back and forth, so essentially for each step it would have two views for each eye. It would walk like a pigeon. Since pigeons happen to be birds, and must move that way for a reason, my assumption is that that's part of it.

I've noticed that this sort of thing works with humans too. If I look across the room at objects at slightly different depths, I can perceive the depth with both eyes. If I close one, the depth perception goes away. But if I keep one eye closed, and move my head side to side the right way, I can perceive depth.

Humans have a good idea of depth when they are moving, but it depends on the movement. Changes in size can help as long as the person is familiar with the object. However, when looking at a road, it would be possible to know the distance in a lane by observing the lane alone, since lane lines would converge in a specific way relative to their width close to the car. With a camera at a fixed height, it would be possible to do a lot. The visible part of the lane between the car and the one in front of it could tell a car a lot.

Plus, the outer two cameras address it for the front. For the rear, adjacent lanes the combination of data from side cameras and the rear camera can be used. For the rear alone, I think that the data can be more useful from one camera than some people think.

darshi | 19 augusti 2019

A deep learning model can be trained to predict depth, even from images from a monocular camera.

Tronguy | 19 augusti 2019

And, if you want to learn more, go and watch the YouTube video from Autonomy Day, earlier this year. The computer in the car generates a real-time, 3D model of the space that it's traveling through; it's why it has all those neural network computer algorithms running.. Just like your brain.
No question that the radars in the car are helping; from the Autonomy Day info, they explicitly state that the various cameras/videos/sonic/what-all are cross-checked against each other to come up with a consensus environment. Lots of fun.

Daryl | 19 augusti 2019

I'm blind in one eye, yet I have an excellent driving record. I can easily see where cars and people are relative to my car based on many cues, including their position on the road, relative size, etc. The only (minor) disadvantage I have is a reduction in peripheral vision on the blind side, so I have to turn my head more. But the Model 3 has cameras all around and avoids this problem.

There's no reason machine learning could not duplicate and improve on my skills.

SteveWin1 | 19 augusti 2019

@Tronguy, the graphic they showed where we flew through the city is not something that is active in our cars. They were just showing what you can potentially do with cameras only. It's one thing to drive past a car (building, sign, etc) with different cameras and reconstruct it as a 3D object. It's another to figure out how far it is before you drive past it, or hit it. As others have pointed out, when you're moving side to side or passing something, you get more depth info than when you're diving straight toward it.

@Daryl, Good points, although binocular depth perception is much better than monocular.

majassow | 19 augusti 2019

@stevewin1: question: how far are your eyes separated? How's your depth perception? I'm guessing ok.

Binocular depth perception is "better"? or "easier"? With motion based parallax stereo imaging, you can have much longer image separations than just pillar to pillar. Yes: much harder to implement, but that's why we pay Tesla the big bucks.

With Tesla's existing hardware they can do stereo imaging, motion parallax imaging, ML inference of distances and radar. Plenty of data to make FSD happen. Easy? hell no.

ReD eXiLe ms us | 19 augusti 2019

Eight cameras, twelve hypersonic sensors, one forward facing radar, GPS. Working in concert with onboard supercomputers and a remotely hosted neural network. That'll do, pig. That'll do.

carlk | 19 augusti 2019

Tesla has many cameras but even with just a single camera (eye) you can also get good idea of the distance. Try use only one eye you'd be surprised of how good you can do it. Your brain can take monocular cues like size and proportion to do a pretty decent distance estimation.

In the 4/22 autonomy investor day presentation Karpathy showed a video to demonstrate Tesla's depth recognition capability using the neural net. He also mentioned they are using radar data to confirm the result and then use that to further train and improve the neural net. These are all done in the shadow mode automatically.

carlk | 19 augusti 2019

BTW that's the reason why Elon says Lidar is not needed. Camera has much better spatial resolution than Lidar. It can also get the distance accuracy with neural net and HUGE amount of data.

SteveWin1 | 20 augusti 2019

@majassow, binocular is always better. Two eyes can move sideways just like one can. The thing with motion-based-parallax is that it doesn't really work when you're moving straight toward something. The things coming straight at us are the most important. I don't care as much about the pedestrian on the sidewalk that I'm already passing as I do about the child standing in the road on a box in front of me. The child is small and on a box. The size and the location may trick a monocular system that doesn't know what size a child should be into thinking it is farther away than it really is.

@carlk, the example from that presentation was using the forward radar (the only one that exists) to train on images from the forward cameras. Since we already have forward radar, having our cars become great at guessing vehicle distance in front of us is only important for redundancy. The front of a car and the back of a car look different. The rear camera is wide angle, so all cars will look slightly different to that camera. I'm not sure how well the radar-backed training will translate to the other cameras or to objects radar can't see.

Since I first started this thread it does seem like Tesla is doing much better at determining the distance from cars around me. They still spin around like tops, but they no longer look like there crashing into me. Keep it up Tesla!

leo33 | 20 augusti 2019

There are three cameras in front of the rear view mirror, with two of them about as far apart as human eyes, so the HW is there for the NN to be able to determine distance from both binocular as well as motion based parallax. (Dash display quirks are not a metric for Tesla's progress on autopilot and self driving capabilities. That suggestion is kind of fuddy.)

finman100 | 20 augusti 2019

Babe! Nice work Red on the "herding" pig movie.

rsingh05 | 20 augusti 2019

People can we please actually read the responses. As Leo said above, the car has THREE front facing cameras.

Can we close the thread already?

Atom12 | 20 augusti 2019

@SteveWin1: “Good points, although binocular depth perception is much better than monocular.”

Are you suggesting that those drivers with only one eye have higher insurance payouts (accidents) per mile driven than those with two eyes?

Daryl | 20 augusti 2019

"@Daryl, Good points, although binocular depth perception is much better than monocular."

Certainly there are big advantages to having two eyes vs one, and depth perception is an major example. But note two caveats:

1) Human binocular vision only gives depth perception cues out to about 6 meters (about 20 feet). After that the convergence of the eyes and difference in image seen is too small. Most of the interactions we have on the streets are greater than 20 feet, so farther out than that we both see the same world.

2) When I am out hiking with my brother he will sometimes close one eye to simulate what I see. He immediately starts stumbling and mis-stepping. But I lost my eye as a child and have learned to compensate and can pick up on lots of depth cues that he misses, and so I walk smoothly over the same areas with no trouble. My point is that a two-eyed person cannot simply close one eye and assume that this is how a one-eyed person (or one-camera car) would see the world.

Of course two eyes have undeniable advantages. It's why I can never be a tennis player -- the ball is too small and coming too fast for me to accurately swat at it. But driving down the road is a very different and easy to adapt to challenge.

Daryl | 20 augusti 2019

And I have a pilot's license, though I haven't flown in years. Again, flying a plane there is very little within 20 feet that you need to be concerned with.

RoadDevil | 20 augusti 2019

I believe multiple eyes were created more so for the purpose of redundancy and area of the coverage than for estimating distance . Brains are used to estimate distance by comparing relative positions of objects within the eye sight , that is why one eye works as well as two eyes in distance estimation.

stephenclabaugh | 20 augusti 2019

You need to watch Elon's 2019 atonomy day presentation. This is discussed in depth.

carlk | 20 augusti 2019

@SteveWin1 Accurate distance info is only important for the driving direction which is forward. You certainly need to see cars coming from sides or back but you don't need to know their speed or distance that accurately. Primates have two eyes looking forward, as opposed to on sides of the head as birds and other mammals do, because they need fast and accurate distance estimate jumping from tree to tree. The same for for driving a car fast. Radar has little of no spatial resolution. It's not very useful other than to tell you the distance. It looks Tesla is mostly using it to train the vision NN which makes sense. Elon said one day they could even get rid of radar.