The following was written as a reflection piece on my experience, so far, with Augmented Reality (AR) apps. This is being submitted to satisfy some of the requirements for a course in AR, at George Mason University.
As part of the “Palimpsest” Augmented Reality (AR) Project (=> see my in “in a nutshell” post on AR), my responsibilities in this project were largely focused on the technology involved in the development of AR experiences. After viewing several AR apps in Google Play, I began considering which might work best. Reviewing the ratings for each of these, I realized the frustration users experienced when attempting to use the apps. This discouraged me from trying some of the apps available, and narrowed the list of most viable apps to Augment 3D, Layar, Zappar, and Aurasma.
Narrowing the app options further
Though I already reduced the list to four possible apps, I continued to weed-out apps that demand higher technical skills, the point being to learn about the basics of AR creation without complicating that with other external tasks. As such, Augment 3D was ruled out as an option rather quickly, as it requires more advanced graphics-creations effort (Photoshop use, for example). Additionally, I soon realized that both Layar and Zappar require either a QR code or the creation of a “layar code” or “zapcode”, respectively. While QR codes were an option for use in the project, there appeared to be other “green” (no requirement for placing a QR-type sticker on a physical feature) AR-prompting apps available (=> see Aurasma, below). Zappar was also of limited use because of its 30 day trial before requiring payment.
Aurasma – its basic components, and the steps of the Aurasma AR experience
Of all of the apps sampled, Aurasma offers what seems to be the most user-friendly “plug and play” opportunity. Specifically, there are only two elements necessary to create a basic “aura” (“auras” being the AR experience received by a user when using the app) – a “trigger image” and “overlay features”. A “trigger image” is simply a photograph of an object (the physical object, itself, being the thing upon which you wish to project the AR experience), and the “overlay feature” (the actual AR experience which is projected onto the physical trigger, by the use of the “trigger image”/prompting element uploaded into the app).
The way in which the AR is experienced is simple.
- Install the app on a compatible smart phone or tablet
- Subscribe to the “aura” page of the designer who created the auras you wish to see
- Be at the site where the “physical trigger” (remember, the photo/”trigger image” of this physical trigger was uploaded into the app by the developer)
- Enable the app on your device
- Point the device’s optical recognition feature (camera) at the physical trigger, and give it time to recognize the physical image via the designer-installed trigger image
- Wait for the app to recognize (usually under a minute) the physical trigger as a match to the trigger image/photograph used by the designer
- Begin the AR experience (overlays) created by the designer.
While the design of a basic overlay is simply the matter of a trigger image prompting the appearance of another image over top of the trigger image, adding more dynamic AR experiences (such as audio and video overlays) proved more challenging. In fact, I wanted to create AR experiences featuring both, but as separate experiences. Though I created an audio sound byte to test in an upload, regretfully, I quickly learned that, while videos can be uploaded, sound bytes cannot be uploaded to Aurasma as a layer. With no video immediately available for my needs, I was determined to create my own. Yet, this being a test effort, it didn’t need to be a complex video, perhaps something less than a minute in length.
With the thought that I would use a picture of the documents in the hands of George Mason (the statue at GMU) as my trigger image, I wanted to add a brief video clip focused on a brief definition of the Virginia Declaration of Rights (at this time, I also began to think more about the user experience, and the need for brevity in the interaction between the AR and the user => see Interaction and Theory, below). While I found a video, in YouTube, featuring the original document, I found it necessary to find a text to speech application on the Web to create my personalized audio effort. Though I successfully created a sound clip, the program sampler I used only permitted a limited number of characters for conversion to audio.
Playing the YouTube video with the volume turned down, and, at the same time, activating the audio sample I created on another page, I began filming the brief video via my smart phone. This was subsequently uploaded to the “aura” I created for the Virginia Bill of Rights trigger image I created.
Developing multi-tiered overlays in Aurasma
After becoming familiar with the basics of aura creation, I looked at further developing auras with multiple layers/tiers of overlays. I was successful in creating a two-tier experience whereby a trigger image would prompt a single image overlay, and either single-tapping or double-tapping the screen of the smart phone would take the user, next, to a video (either loaded directly to an overlay in the aura, or via a URL taking the user to a YouTube video). However, I was bothered by what seemed to be the lack of intuitiveness behind the tapping of the screen. How, for example, does a user know that he/she is to tap the screen at all (let alone twice) to experience the next layer of the AR?
After developing the two-tier overlays, I looked at development of multi-tiered (three or more) overlays, but was unsuccessful. Any efforts I made to create the multi-tiered overlays usually resulted in a total failure of the AR experience (nothing worked correctly when the trigger image was read by the smart phone) or a single layer would work, and others would not.
Interaction, Theory, and Beyond
It was during the video upload that I began to think more about limitations of the the user experience. More specifically, I began to consider what would be effective and what would not. I have yet to tie the theory together with it, but feel certain that the AR presentation must be much like that which is presented on the Web, brevity (specifically, “chunking” came to mind) being a key element in capturing the user for the short amount of time he/she is available in a mobile setting (=>see AR and “on the fly immersion”). In a project such as the one we’ve created on our team, after all, the user is actually moving to more than one location.
Furthermore, the creation of an interactive experience via AR made me consider how AR apps might be developed for some sort of mobile social media experience. In such a situation, I see the ability to “follow” another person’s auras, and have the ability to see other sites through their perspectives. As the apps now exist, however, I don’t see this is a seamless experience quite yet. As one of my colleagues (also a team member) suggested, it seems the companies that have released these AR apps are providing us with “crumbs” of what an experience could be like, reserving the best for themselves at a later time (paid subscription services).
Additionally, while near the end of this project, I had a chance to consider Blippar and its value as an AR app. While, in its present form, it provides an open interaction experience, I can see more possibilities for it as a multi-tiered (don’t think of multi-tier in the way I used it earlier in the Aurasma app) interaction opportunity. I see this as another topic worthy of a post/paper unto itself.
Lastly, using the George Mason statue on campus as George Mason as the focus of my project auras, I’ve also thought about how AR interpretation at historic sites could open an unusual opportunity to counter or supplement interpretation (especially static interpretation). Specifically, anybody and everybody who has the ability to use the AR app (at least one which proves capable, reliable, and offers that seamless experience that I mentioned above) effectively, has an opportunity to project their voice/interpretation/perspective on those sites. Through this, I see the administrative entities which, traditionally, controlled static interpretive platforms stand to be challenged. As with the mention of Blippar, above, this subject also warrants a post/paper of its own.
I look forward to the next two projects in which I can experiment further with the apps and plug in relevant theory.
*To see how my auras work, subscribe to my page (cenantua) at Aurasma.