Dev Diary of Graeme Devine

February 1 2021

Well GPUs are back!

 

I tried various virtual machines today since pricing changes drastically. It’s not such a big deal when you are live with a project but for testing it’s kind of a big deal to have a cloud server you might accidentally leave running and not actually be making $$$ off. The thing that irks me the most is that Amazon charges almost $150/month just for the Windows license on the virtual machine. So I looked at Microsoft, they charge $209/month for the same license (although they do charge less per virtual CPU making it about the same). Amazon has the better GPUs (graphics processors) since all of Microsoft’s are basically great for machine learning but the kicker is that Microsoft virtual machines don’t have sound cards. The big lesson here? You want to work out how to use Linux on a server (which I will, just not right now).

 

If we’re all thinking wow, all of this is a bunch of $$$ for a server, then think about 8th Wall, who makes just an SDK that does about the same thing (well it does pose) that charges per page view. Mt. Resilient, an AR app for an Australian TV show (https://www.mtresilience.com/) costs about $2,000 a month just in fees to 8th Wall, not even including hosting costs for the content. We’re also paying for on demand virtual compute (i.e. I can turn it off when I remember to) and once we actually have a 24/7 service we can reduce costs by around 75% by paying for a whole year of server time (well, actually, just committing to that year).

 

I also learnt that Amazon will charge you for an hour of virtual compute time even if you just test the machine for a minute. They round up rather generously in their favor. So every time I created, tested, deleted, and remade a stack, that counted as an hour. Yeah, don’t do that.

 

So. I’m a cloud expert all of a sudden.

 

But, I got more bugs fixed. A few remain, streaming right now is 640x480 which is terrible since even the camera feed going upstream is 1280x720, there’s something in the AWS WebRTC doohickie that’s doing it and making it downsize from HD right away. And then it runs at 60hz until it tries to decode video when it falls over and slows down a lot due to a threading issue (where one thread is waiting for another thread and no one is doing anything for a while). This is a bug, not a wall, because software decoding still ran over 100fps. Hopefully have those dealt with tomorrow where my biggest issue is to work out how to make the servers deal with many connections at once (as in, I give you a URL to test and you all try it at the same time).

 

Andy, I wanted the movie Freaks, which was pretty good, had some nice plot twists.

 

It’s rained a whole bunch here over the last week, so there’s a few frogs outside (listen to the audio on the video).

 

Cricket video…. 

January 31 2021

Firstly, I’m really behind on email and probably slack. Apologies. Little bit of pressure to get this to a demo.

 

Well I made it to a remote web server demo in January.

 

So AWS. It’s actually a chore to deploy to AWS. You need to make a build, zip up that build, upload that zip file to Amazon S3 (the storage facility for Amazon AWS) (Quick aside, S3 means Simple Storage Service and AWS means Amazon Web Service). In my case the build is 9GB so it takes a bit to even zip that sucker up and an upload is a walk of the dog. If I was on my Texas ISP an upload would be an overnight sleepover.

 

In all, it’s about an hour per deployment.

 

Once there you need three things to tell AWS about. A template JSON, a bootstrap PS1 file (a command file) and that zip file. Each of these has unique URLs you get from Amazon S3.

 

Plugging those in, you hit create Stack (a virtual computer on AWS is called a Stack) and wait. It takes a bit.

 

And then, it fails. Because default Amazon AWS accounts have zero vCPUs (virtual CPUs). I asked Amazon for an account upgrade to go to 32 virtual CPUs in a Stack and they said, hold on a sec, you aren’t Google, how about 16? I was happy with that since most people get 4 virtual CPUs and I didn’t realize how cheeky I was being asking for 32. Of course, this was all with Steve in Amazon AWS customer service who was really nice and totally not an AI chatbot because the approval process isn’t exactly instant (it can take up to 36 hours).

 

Then, you wonder, how do I restart? Turns out you don’t. You delete the failed one and remake the Stack from scratch. That’s right. That’s the ONLY option on a failed deploy. 2021 folks. Yes I checked.

 

Then you test! And it works! Yay! 16 Xeon vCPUs, a Titan NVIDIA GPU! Yay! I should take screenshots, but this is a sure thing now right? I should work out how much this is costing? Oh, they have an estimator? Uptime on a Stack like that 24/7 (meaning someone is using it 24/7) is around $1047 a month. Not being sure that I have the “switch off if no one is connected” code right I delete the Stack when I’m not using it so I’m not spending $36 a day testing.

 

Let me tell you the number of times I’ve gone to bed over the last year leaving all the servers running when I mean to close them down because electricity is expensive. It’s over 300. Even now, I’d bet my local I instance is still running even though I’m testing true cloud instances.

 

So today, bright and early, I fixed some bugs and redeployed. BUT, now my virtual server has no GPU. Amazon CloudFormation deployment is all out of GPUs! (Ummm, crypto folks are a real pain, they even rent stuff on AWS).


 

I’ve a feeling it’s just a “Sunday” temporary thing because WTF. But it does make me want to fix the “make sure it goes off when I’m not using it” code for sure when I do get one back so I don’t just delete the next Stack that actually has a GPU assigned to it.

 

I’m working on two demo scenes. One from Metropolis that shows the tracking we can do on your face / body and one for the hut scene in from Ship of the Dead.

 

Sunset here tonight was on a sky on fire. I think I know who got that GPU, it’s those Government sky generators at it again.

 

Hope you are all fantastic. I’m going to have a glass of whisky now and watch some terrible movie.

 

Graeme.

January 10 2021

Here it is on my phone. While Samsung Gallery decides to super slowly upload to my google drive here’s a video on my android device directly using it’s camera. We won’t show the live video in the actual app, it’s there for “truth” right now (all of this is just “test”). I’ll try with my Microsoft Duo later over LTE to see what that does.

Safari is still giving me an SSL issue, I love web development.

 

Graeme.

January 4 2021

At first it was awesome. I had two way communication all trusted and I was jubilant. I even tweeted so.

 

Then I worked out that my uplink was trying to send about 400mb/sec and that MBs a second and not KBs a second. You see, HTTP, you’ve got to encapsulate the message, you can’t just send a binary message, it’s TEXT. At first I was sending a JSON array (as in { 1: “244”, 2: “120”, 3: “0”, 4: “255” ….} except for zillions of pixels because when you are debugging stuff you absolutely want to use JSON because it’s easy to debug and readable right? And I need to sync the accelerometer data to the frame as well and send it as data in a dictionary format like this. Well, that’s good theory.. 

 

But two things, I was being an idiot ( that’s a given), it was huge, I was sending even the alpha channel, and it was sending the indexes, spaces even, and even though I naively thought to myself when I wrote the code “it’ll be fine, I’ll optimize that later because it’s so readable right now” it’s surprising just how many home networks get clogged when you try to stuff 400mb/sec over them.

 

I fixed that. At first I took out the alpha channel and just sent the pixels as one big hex string. At most that’s 2x as bad as just pure binary frames. But I kept on thinking, I’ve got a video feed here, I can surely send the video frame right? That’s when I delved into how to pipe media streams into base64 and boy, yes, you can do that, and it works, and now I’m not clogging the network.

 

The end result? I have a streaming Unreal video coming into my phone with the Main Street from Metropolis being rendered in Unreal while you can see my lovely camera upload on a cube floating in the street the same way a huge test cube in Unreal floats. It’s the smoke test. It’s like Apollo 10.

 

One funny thing. It did take me a week to notice that the red and blue channels were crossed over in the video streams. It wasn’t until I had a Starbucks holiday cup (because my coffee machine broke January 1st!!!!!) that I noticed it was blue on the stream and not red that I had that swapped…

 

So let me just say. It’s been productive. I’m really close to showing you all something awesome. This is the most complicated thing I’ve ever coded. I’m used to a video game having a lot of ground to cover before seeing a single pixel jump around but I’m not used to having to go build a moon before we get to send astronauts to it.

 

2020 was shit. But we launched QXR in October and that was the beginning of the turnaround. It took me a long time, longer than I thought, to work through my Mom dying suddenly last year, I kept on thinking what if this is it, I can’t write or code anymore? Living in the USA in this pandemic has been a descent into the trench of despairs I thought I had shaken many years ago, but you’ve been family to me throughout it all and I’m optimistic for 2021 (don’t jinx it Graeme, don’t jinx it.…).

 

Anyway. You are all awesome. Especially you dear reader who reached the bottom of this email and actually read the whole thing.

 

Graeme.

So.

 

It’s been a productive week. It was really the week between Christmas and New Years when the meetings dropped off, but more than that, I had I think been learning enough about how to actually do what I said I could do that it became practical.

 

We use a method called webRTC to communicate between device (client, phone, you) and the server (cloud, AWS, my house). We use that because basically there’s no easy way to send data back and forth between the various mobile providers (AT&T, Mint, are there any more?) and servers because HTTP and HTTPS is basically the only trusted protocol that’s not totally blocked. Stadia, for example, uses webRTC and Google has their own implementation of it.

 

These servers can use intermediary servers to act as Switzerland as you establish a peer to peer connection (peer to peer here means direct connection). You can either host these on your own or reflect off several I’m sure highly over used public ones. This is basically how FaceTime works when you call from one device to another inside a Starbucks. I should know that, I was at Apple when we invented this stuff (but it was a lot buggier then, and boy, Starbucks wifi, let me tell you some stories about that Cupertino Starbucks and testing..).

 

For a long time I was stuck, it was easy to get the connection downstream trusted but the uplink wouldn’t connect. Then my dog Bella had an idea to use a socket connector on the Unreal server and that fixed almost everything. ALMOST.