Startup Idea: Augmented Reality Internet

Tino Sambora
9 min readJun 23, 2018

--

Photo by NeONBRAND on Unsplash

I came up with this idea when my friend who’s also my regular hackathon teammate got invited to compete at F8 hackathon in San Jose on May 2018. I was credited in his LinkedIn post below:

We have competed in about a dozen of hackathons together but when the registration for this hackathon was opened I had just recently moved to Singapore to start a new job so I chose not to compete and only help by coming up with an idea.

So after hours of brainstorming and pages with list of ideas (from so-so to okay) and UI sketches, this one idea found it’s way into my head and stood out from the rest. So we went with this idea and spoiler alert, I put him and the other team members into a development hell with it.

The Idea

As described in the title, the idea is Augmented Reality Internet. Let’s explore this idea in a practical context by doing it within a context of an application. In this hypothetical app, a user can open a camera view and place a digital assets into the surrounding captured in the view. The assets can be a 3D model, a text, a gif, a picture, and potentially lot more. That something the user placed will then be posted to a centralised augmented reality for other users to interact with.

With AR as the medium, a user can immerse other users into a story deeper than other medium like a photo or a video can. For example, a user can “secretly decorate” a room with sweet words and paintings for his/her significant other. That user can then tell his/her SO to go into the room, open the app, and find out that the room has been digitally decorated.

That’s just one of many application that can be built on top of this tech. We can also exploit it to reinvent advertising, to make gamification out of it, or to make a new platform to express criticism. The way I see it, this can be a new frontier of social interactions.

Functional Requirements

For this writing-only, Medium version, there are at the minimum two functions that make the AR internet:

  1. The user has to be able to open a camera view and place an “AR object”.
  2. The user has to be able to see AR objects posted by the other users.

To better explain the FR, let’s translate the requirement into a more visual form: a user journey. In a UX sense, a user will walk through two activities in the varying applications of this tech: viewing and posting. Let’s translate those two into two screens: augmented reality screen and create post screen.

The augmented reality screen is where we can see users’ “AR posts”. We can experience this screen in many way using many devices depending on the application. We can do it the old-school way by holding our phone in our hand, using cardboard goggle, or in the future maybe by using a smart glass.

The other one is the create post screen, to give you a picture, this screen functions like the AR screen, but enhanced with capabilities to snap a digital asset into the augmented reality. If the object is a wall and a user want to tag some text onto the wall, then while being placed, the text should snap nicely to the wall as if the text is attached to it. Now of course we can make those two screens into one screen to make a more seamless experience but I separate them so I can explain the functions more clearly.

Let’s now translate the functional requirements into something closer to implementation.

Engineering

I will try to explain the technical detail of the idea the best I can. That being said, my knowledge in the technologies we need to put together to make this idea come to life is limited. There might be some terms I use that aren’t commonly used in some context because I’m not proficient enough — there might be some features in one of the tech I put together that would do the job better that I haven’t heard of because I can be oblivious at times — among other things. So I want to apologise beforehand if that happens and I hope you can understand how I want to put things together despite my limited knowledge.

Now let’s jump into the exciting technical details!

I will separate the technical design of this idea into two interconnecting sides, the client and the server. If you’re not from software engineering background, client and server side is a separation commonly used in a computer network. Client side or front end is a part of an internet application the users view and interact with, it can be a website that runs in a browser or a mobile app. Server side or back end is a part which responsible for serving the website that will be run in the browser, and for storing and sending data needed to be displayed or processed in the client side. Since the idea we’re talking about is an internet application, modelling the technical design like such makes sense.

We will now explore, for each side, what needs to be done and how to do it to satisfy the requirements.

Client Side

The client must have the capability to detect areas within the shot of the camera we can snap AR objects to. It can be a surface, vertical or horizontal or a slope. It might also be a shape, like a face. There’s also a more complex type like a contour of an object. The type that needs to be detected depends on the application but in any case the client needs to implement some type of area detection algorithm.

Some type of area requires different detection algorithm with another type. Surface detection algorithm differs quite significantly with face detection, 3D object detection is much more complex than the previous two.

Implementing area detection algorithm is a time consuming task, especially if you just want to explore the possibilities of application you can build on top of this idea. It takes reading papers and implementing complex algorithms in those papers into a working code. But here’s the good news: there are already some matured AR tools that provides API for area detection and other utilities for building an AR client app. Here are some of them:

  1. ARKit and ARCore. They are platforms for building AR applications in mobile device. They provide developers with listeners of detectable surface so that they can write a handler for when a desirable surface appears.
  2. Vuforia. This is a platform for making a database of detectable surface in the client side. First we give Vuforia an image of surface we want to register in our app. Vuforia will then process the image into a unique, programmable file format that we can integrate into our application.
  3. WebXR. This provides the same functionalities with ARKit and ARCore, but for browsers.

Server Side

This is where the problem becomes more interesting. What data should we store on the server so when a user opens the app she can see objects other user put on a specific point?

One way that I can think of is by storing coordinate of the post, image of the covered surface, and orientation of the object posted.

Coordinate is the information of where the AR object is placed. It can be a GPS location or a simple address depending on the level of precision needed. The coordinate is important for narrowing the search for markers that will be explained shortly.

The covered surface is an area covered by the AR object a user posted. If a user places a ball on top of a carpet, the covered surface is the circle area with the diameter of the ball and with the pattern of the carpet. This information has to be saved into some format and image might be a reasonable one.

Orientation is self explanatory. A user might place an object with certain orientation on top of a surface, like placing a character facing towards a specific direction. That is why we have to store the orientation data, so we can remember which way objects are oriented in the augmented reality.

Let’s put those three things together to try answering the question above:

We use this image of the covered surface as the anchor point for the object, or what’s commonly known as a marker in AR engineering. So when a user places an object with a certain orientation onto a surface area, that covered area will be copied from the camera view and uploaded to the server along with the orientation data relative to the marker.

When another user opens the client app, it will send the content of the camera view to the server so the server can do a search within some radius of the coordinates of the post. If there’s a marker found in the camera view, the server will send the coordinate of the marker in the camera view to the client and tell it to render the AR object associated with the marker in that coordinate.

Known Problems

The solution I proposed comes with some tough technical challenges. Here are some of the problems that arise with my current solution:

  1. Performance. The obvious caveat of this solution is that the cost to enable the back and forth between the client and the server will be severely expensive. The client will need to constantly send a stream of images of the camera view to the server. The server will have to search, within some radius of a certain location, markers that can be found in the view, and when it’s found, give the client the data of the marker and the AR object associated with the marker — which is not cheap as we have to implement some kind of object detection and image learning algorithm. One way to improve the performance is to eager load the server data. Meaning that the client would download some markers and objects data upfront and it would use that data to render the objects without connecting to the server in some locations and within a period of time.
  2. Multiple features detected in one point. Let’s take a case where the surface the user place an object on is a carpet that has a pattern. Then, another user place another object in the same carpet. With our current system the server might draw both objects on both points of the carpet because the features detected by the server are similar. We can mitigate this issue by sharpening the coordinate data of the posted object— though finding a sharper location tracker technology can be a challenge, too.

Social Impact

Setting aside the engineering challenges of building it, I think we can agree that this idea has a lot of potential. I explained that we can use this idea to reinvent social media, advertising, and gaming but there are more we can achieve with this tech. We’ve seen sci-fi films where the characters has a visor that gives contextual information about the surrounding. This tech can be the core engine for scanning the surrounding and giving the interactive contextual information. We might be able to reinvent digital navigation by using AR object as a guide. Those are just some ideas on top of my head and I’m sure you can think of other interesting ideas achievable with this tech.

Other than the positive potential I want to pose the risks of this tech should it get implemented in the large scale. The risks depend on what application we are building with this tech but there are some generic problem we should be aware of. One of them is that there might be some area with too many objects hence obscuring the users vision — in crowded areas this can be harmful so we will have to block posts in those areas. Another problem is content filtering. The users should be able to choose or limit what will appear in the augmented reality because obtrusive amount of information could harm them.

Conclusion

For entrepreneur

If you are interested in what I (hopefully) thoroughly described and you want to create an AR internet application, now might be the right time to start. You can start looking for some ideas that exploit this emerging tech so you’ll have some creativity edge when the technology has matured in the coming years.

Here’s my subjective list of some areas in which this idea might flourish:

  1. Social Media
  2. Navigation
  3. Gaming
  4. AR Server as a Service. Because implementing the server side of an AR internet application is challenging, providing a service focused on solving server side problems could catalyse the ecosystem of this idea.

For engineer

If you are a software engineer and you are looking for challenge or something new to explore, this tech is worth tinkering with. The idea is possible to be built but there are still a lot of problems to solve. You will benefit by having a technical edge for when the tech has matured and the demand for applications of this idea has emerged.

P.S: Special thanks to my friends Aji, Rubiano, and Jerome for giving me a bunch of valuable inputs for writing this piece.

--

--

Tino Sambora

Product manager. Running an online career counselling service www.potensia.co. Writes war stories on philosophy, psychology, and tech.