Well, good afternoon. Welcome to the Sermon Audio Technology Lecture Series. I am so glad that you are here. And it is a wonderful opportunity of mine to introduce Dr. Steven Schaub, who has been my colleague almost as long as I've been here. You came back in 96. I was showing some students this afternoon a paper that was published in October of 92. And I told him that that's my first semester here. So there was a few years I had to survive computer science without him. But anyways, he has been helpful. And sometimes when I have crazy ideas, he kind of straightens them all out. And so he's been very influential in our computer science department. And obviously, he now works for Sermon Audio. And it's my privilege to introduce him and have him speak.
But before we do that, let's pray. Our Heavenly Father, we thank you for your mercies, which are new each day. And even as we talk about technology and how we can use it to reach people around this globe, we thank you for the fact that you sent Christ here to reach us and to redeem us. And even as Steven presents, we would ask for your blessing on him and ideas on how we can serve you through language and technology. So we ask for your blessing now in this time and ask it in Christ's name. Amen.
Thank you, Jim. Well, welcome everybody. Thank you for coming out on a Tuesday at a busy time of the semester. I've been looking forward to giving this talk for a while, partly because I've spent a lot of time working with apps that are aimed at the international market. I remember when I first came on board, Sermon Audio, we were talking about the need to be able to reach more audiences. We primarily are a US company and have users that consume, that listen to the sermons that are published by US publishers and also the United Kingdom. But we also have missionaries around the world that are interested in the content that is published on our platform. And they have users that don't speak English. And we want to be able to make that available to them.
So I'm going to talk about some of the issues that are involved in creating apps and websites that can be used by people that are in more than one people group. So let's start by just, and my target, let me just mention this, my target audience is, this is more of a technical type of talk. So, this is geared towards people who might one day be working on websites or producing applications that have to have more than one language in them. I'm going to be showing bits of code. It's not going to be real complicated code. I'm going to be talking about those kinds of things. I'll try to keep it as understandable as possible, but it is going to get a little bit technical.
So, put on your maybe dreaming cap and you're thinking about the fact that maybe one of these days you'll have an application or website that you have created for a US audience. Now, you want users in other markets to be able to use it. I'd like you to think maybe about some of the issues that people are going to experience and try to use your application. Maybe they're not US users. Let's start with other English speaking countries like the United Kingdom, like Australia. Can you think of any surprises that they might have in using your application, even though it's in English, they speak English, but something that they see in the application catches them off guard or maybe it's a surprise or difficult for them to use. Anybody, any thoughts there? Yes. Obviously, the way you worded that implies any references to America specifically. If you say here in the US or using American flag or anything like that, perhaps it should automatically adjust to whatever time zones or another thing. Have it automatically adjust all of those sorts of things to wherever the person is using it from.
Okay. So, you know, time zones, of course, an issue here in the United States. That's a good point. And if you're in Australia, you're even further away. Any other thoughts? Yes. The way dates are formatted. Dates are a good one. OK. So we're going to look at dates in just a minute. But in the United Kingdom, they don't write dates the same way that we write them in the United States. All right. Any other? Any other thoughts? Yes? Not just time zones, but like the way time is written out, because I know that in the United Kingdom and in the U.S. military, say, 10 a.m. is actually written as 1,000. Okay, so the way time is written, very good. Currencies, a few other things. So we're gonna look at some of those things, we're gonna talk about some of the details, and maybe you'll learn some interesting things here.
So we're going to talk about four things. We're going to talk about postal addresses briefly, because that's often a gotcha. Date and number formatting. The fact that, of course, if you're going to want to reach people in other languages, you're going to have to display strings in your application in other languages. And then I want to talk just a little bit about languages that are not read left to right, but that are read right to left. So we're going to talk about those things. And then maybe towards the end of the talk, I want to talk a little bit about a system that we created at Sermon Audio to help us with some of these challenges.
So what I have up here on the screen is the Sermon Audio app, our current Sermon Audio app that people around the world are using. On the left-hand side here is the English version of the application. And you can see that everything about this is in English. The middle illustration shows you the, I believe this is the French version of the application. And you can see that some of the text here is still in English. For example, this is the title of the sermon up at the top, Busyness and Spiritual Discipline, and the conference in which it was presented. But the text on the button, is not in English. The text on these widgets is not in English. Over here on the right, you have a language that's even further removed. This is the Khmer language that is spoken in Cambodia. Khmer is a very interesting language. It's probably even more difficult language than Chinese for English speakers to learn. One of the things that is difficult about Khmer is that they don't use any spaces in between words. So you have to figure out where one word ends and another one begins. If you're looking here, play audio, clearly two words, French, three words. This is multiple words, but you have no idea where one ends and the next begins unless you know Khmer. So that's an interesting challenge.
All right, let's talk just a little bit about postal addresses. So here is a familiar address in the United States. You have a person's name, you have the company where the mail is going, you have a street address with a city and a state and a zip code, and then the country there on the last line. That should look very familiar to U.S. people. This is what addresses look like in the Netherlands. Now this is still a U.S. address, but I've written it In the Netherlands order, so the company comes first, and then the person to which we've addressed it at the company. The street number comes at the end, not at the beginning. You see the zip code comes first, and then the state, and then the city, and then Netherlands. So completely different ordering. All the same components are there, but different ordering.
Now, this example, this is one of my favorites. So this is an Irish address, and it's written in Gaelic, which is the language of Ireland, but I've translated it for you over here. So this is the person's name. There is no street. There's no street number. Instead, we've got two cross streets. So the Hill of the Thorn is one of the streets. The Cross Street is the flagstone of the storm. So this is a house that's located at the intersection of two streets. The City of the Bees is the name of the city. The County of the Plain of the Ewes is the county. And then in 2015, which is only nine years ago, Ireland introduced a postal code system. Before that, they didn't have zip codes. So all you had to identify addresses was cross streets and cities, and in fact, I read that the postal workers basically had to memorize which families lived in which locations because there was no kind of numerical system used in a third of the addresses.
So the challenge for, okay, so let's say that you are creating, say, the, an application to come to Bob Jones as a student, and you ask them to enter their mailing address. And you've got to deal with mailing addresses like this one, right? And so you can't have a standardized set of fields for street and city and state and zip code when you've got all these different address formats that are used by these different countries. or how are you going to deal with that? Well, at Sermon Audio, when we ask for an address, we don't have different fields. We have one big box, and they just type it in, and we can take whatever they put in. But then, of course, if you're interested in, well, what was their zip code or those other pieces, then you might have to parse that and try to figure it out. It's an interesting challenge.
Let's go back to dates for just a little bit. So two countries that are closely related in a lot of cultural ways use completely different date formats. So this is April the 8th, 2024. If you are a U.S. date reader, but if you are a U.K. date reader, this is the 4th of August, 2024 because in the United Kingdom, The day comes first, and then the month, and then the year. Which is also how the Germans do it, but they put a dot in between the numbers. And in Japan, they put the year first. And I think actually it makes sense to put the year first because if you put the year first and then you sort it, it sorts very nicely, okay? The one that makes the least sense, I think, is the United States. There's a lot of things about the US that probably make the least sense. I'm not going to go down that rabbit trail. I was going to say a word about the metric system, but I'm going to refrain.
And then this is Arabic. So you want an app or a site that displays a date in a way that makes sense to the folks in the United Kingdom. You don't want to display a sermon date in the U.S. format if you've got a U.K. reader who is looking at the information there. How do you go about doing that easily? Well, here's where we're going to get into just a little bit of code. So if you're working in JavaScript, which is the language that you're going to be working in in the web, there is a really nice DateTimeFormat API that you can use. And I'm going to go ahead and switch over to show you what is involved in working with DateTimeFormat. And that is really in a bad spot there on that line. So I'm going to see if I can scroll up at all here and get that further up here.
So here's how you create a date object in JavaScript. And I've got that on another line right there. And right here is what's involved in converting that into the format that we use here in the United States with the month first, and then the day, and then the year. And so if you run this little bit of code, you get 12-19-2020. If you want to display it for a UK user, you would just change the locale indicator here to indicate that this is for English language UK users. And when you run that form, then you get the day first, and then the month, and then the year.
So with a nice API like this for formatting dates, it makes your job as a developer much easier When the user starts their app, they can indicate whether they're a UK user or a German user or whatever, and then you take that information and you plug it into the API call and it spits out just the right date format. It's an easy way to fix those kinds of issues.
Numbers are also interesting. In the US, we use commas to separate groups of digits and then we use a decimal point to separate the fractional part from the whole part. France, they use spaces to separate groups of digits and they use a comma to separate the whole number from the fractional part. The Germans do it exactly the opposite from us. They use dots to separate groups of digits and they use a comma like the French do. to separate the whole part from the fractional part.
I keep throwing in Arabic just because I think it looks cool. Arabic uses commas for both the decimal separator as well as the groupings. And then I threw in Hindi because I wanted you to see the position of the commas here. So it looks a lot like US English in that they've got the decimal separator there and they've got the first comma after the first group of three digits. But after that first group of three digits, they switch from groups of three digits to groups of two digits to format the numbers. So this is 1,234,567 in Hindi. Isn't that wild? Absolutely fascinating.
So you wouldn't want to format a number like this if it's being displayed to a person in India. How do you deal with that? Well, again, there's a nice API in JavaScript for dealing with that. It's the number format. And I wanted to show that to you as well. So here's a number that we're placing into a number variable. So this is 123,456.789. And then I've got it formatted three different ways. I've got it formatted as euro currency for Germans, and the output of that looks like this. I've got it formatted as Japanese currency for Japanese users, and you can see the, is it the yuan symbol comes to the left there instead of afterwards. And because we're doing currency, the Japanese yen does not have, it rounds things to the nearest whole number. It doesn't have fractional parts. So the formatting system took care of that and eliminated the decimal part. And here's our Hindi example with the commas there, the way that they do it in India. And I know all this looks slightly complicated. So if I took out the style currency business right here and here, and because we weren't dealing with currency, we're just dealing, let's say, with just regular numbers, it looks a whole lot simpler. And when you run it, then you don't get the currency symbols in there. But the main point is, with just a few lines of code with the right, company language and locale indicator, you can get JavaScript to give you the right output without a lot of effort.
It's not difficult to handle dates and times. You just have to know about these APIs and use them in your programs. Just a word about right to left languages. So, most of the languages on the planet are left to right, but we have a few that are right to left. What comes to mind, right to left languages? Hebrew is one that many of us are familiar with. And there are a couple of others, I think. Anybody know any of the others? Pardon? Russian, right to left? I'm not sure about Russian. Arabic. I think is right-to-left. The Semitic languages, I think, tend to be right-to-left.
So on the left here, you have a left-to-right kind of layout. I want you to look at the right-hand version. This is an Arabic example. And I want you to notice that not only is the name right-aligned and is written right-to-left here, but the labels are all right-aligned. That's not the only thing that has changed, though. In the left-to-right, notice the position of the little hamburger menu is on the left-hand side. For right-to-left languages, it's switched to the other side. If you look at the layout here, we have the phone number on the left and the label next to it. It's sort of the exact mirror image. You see how these two have completely flipped.
So it's not just a matter of putting the writing right justified instead of left justified and making it flow in the right direction. Entire layout elements of the application have to be mirrored because people are used to starting at the right-hand side and scanning to the left-hand side. And writing an application that can be switched on the flip of an option, to make it work. I mean, most developers find it challenging just to get the layout looking correct in one direction. To make it looking correct in both directions is a significant challenge.
Now, where I want to spend most of our time today is talking about supporting multiple languages. So far, I've been talking about matters of formatting. But now we're getting into the meat of the presentation, which is how do you support English and Russian and Portuguese and all that kind of thing. So let me just start by saying that when you're writing your code, you can't display a dialogue box with a line of code that looks like this. Here's a typical API call to display a dialogue box. This is the title of the dialogue. Here's the message that appears inside the dialogue. And then maybe you have two buttons labeled yes and no. And of course, that's going to look just fine for English speakers, but it's not going to look good at all for French speakers or Spanish speakers.
So if you're going to make this work well for multiple languages, then you can't put the actual text into your source code. You have to separate the text out to separate what are called language resource files. And that looks something like this. So here we have a file named app resources dot R. E. S. X. This is a Microsoft approach. We use Microsoft technology at the moment for our mobile applications. And so you can see that the word alert and yes and no. And this action will reset your church selection. All of that has been pulled out into a separate file, which is separate from the main source code of the program itself. And then over in the source code of the program itself, we're referring to different fragments of text using this app resources mechanism.
So at runtime, what happens is these references here to various constants in the app resources class get replaced with the text that is read from the resource file that replacement happens at runtime. So if it's an English user, then they're going to see a dialogue that says this action will reset your church selection. But if you are a French user, then you're going to see this text right here because we have a different file that has all the translations for French. And if you're a commer user, then you're going to see all of this with no spaces in between any of the words. And it will look perfectly natural to you in that dialogue box.
OK, so the idea is you have a separate file with the translation for each one of those languages. And your source code refers to the constants in the file. So that's the basic idea. We call it externalizing the text strings. And every application framework has its own way of doing this. So what I'm showing you here is Microsoft's approach that they use with their Xamarin technology. But there's a different approach that is used, say, with Vue, which is the technology that we use on the web. The resource files look entirely different. For Vue, they're written in JSON notation instead of XML notation. But the concept is the same. There's a little key that's associated with each fragment that you need to have tied to your source code file. And at runtime, the keys match up and the bits of text are displayed there. So it's not too complicated.
Well, what if you need to insert a little value into one of those strings, like you need to insert a number into one of those strings or maybe a piece of text? So maybe you want to display a message like this. You want to display the words, there was a problem following. And then you want to substitute in whatever it was that they were trying to follow, like the name of a sermon right there. So these systems allow you to define little markers like this curly brace zero marker. And then in your code, you can substitute a specific value. In this case, it's the name of a sermon in place of that marker by using a mechanism in the API call to do that substitution.
Now, let's talk about plurals. This is one of my favorite little things when talking about languages because If you're an English speaker and you think about plurals, you think, well, we have one banana or two bananas or 10 bananas or zero bananas. You've got two forms of the word banana, one that's singular and one that's plural, bananas, okay? And so you would tend to write code like this. If you're wanting to say I have one sermon or I have three sermons, then you might have an if statement that checks a variable like numSermons. That's the number of sermons that you want to talk about. And if it's the value one, then you are going to have the value one followed by the word sermon and for whatever language that you're working with. And if it's something other than one, like five or 10 or zero, then you're going to use the plural form of the word sermons right there. And of course that works just fine for English, but it doesn't work well for a number of other languages. For example, if you're working in Japanese, Vietnamese, or Korean, those languages don't distinguish at all between singular or plural. The word sermon is the same word for both sermon and sermons in those languages. If you're working in English, Greek, or Hebrew, you have two plural forms, the singular and the plural. And the singular is used for when we have just one of something. Okay, so that's the same for those three languages.
In French and Brazilian Portuguese, there are also two forms for the plural, but the singular form is used for both zero and one, and then the plural is used for two or more. So in terms of our banana example, you would talk about in French, I have zero banana. It would be the singular form with the word zero. So if you were proud of yourself because you externalized the word sermon and sermons and you translated sermon to singular in French and sermons to plural in French, but if you wrote your code this way, if we had zero sermons, it would read zero sermons in French, and they would lower their eyeglasses at your poor grammar.
It's even better in Arabic and Russian. Arabic has six plural forms. Russian has four plural forms. And Polish. Polish has one form for amounts from two to four. And then from numbers 22 to 24, and 32 to 34, and 42 to 44, any number that ends with a two, three, or four has one form in the plural. And everything else has another form. Try to imagine writing code to get that right. Amazing what the Tower of Babel did to us. when God separated out and confused the languages.
So how do we deal with this in any kind of reasonable way? Well, if you're using a good system for localizing your applications, you're going to be able to write little specifications like this, and I don't have the time to go into the detail of what this means, but you can just tell that for English, I've got the word sermon here and the word sermons there with one and other. So everything that is not one uses the word sermons for the plural form. In Spanish, we have one sermon and everything else is sermons because Spanish has the same kind of plural system as English.
But in French, excuse me, you can see that we have the zero case and the one case and the other case. I'm not even attempting to show you what it's going to look like for Polish or Arabic, but they can be handled with similar kinds of mechanisms. So the main takeaway from this is that you don't want to write your code this way if you want your application or your website to make sense to people in other languages. You want something that's a little more sophisticated and can handle the range of you know, options that are out there.
Now we have to get to the elephant in the room, which is how do you actually get those translations done, right? Somebody has to create those different translation resource files with the different translations in them. And of course, you could have a person sit down and do that. Okay, you could have somebody who knows France sit down and write down all the various translations for you to put into your French resource file. These days, you can throw it at Google Translate. You could give Google Translate a file of text and ask it to translate all those words to, say, Portuguese or Chinese or whatever, OK? And maybe it'll do a good job, and maybe it'll come up with absolute gibberish. And the problem is you won't know, right? You won't know unless you are a speaker of that language, or unless you know somebody that knows the language and can review it. And that's this third option, right? You can have Google make a first pass at it, and then you can have a human review it. So that is probably, The best option because you can leverage computer technology to make a first pass and then you can use humans to double check.
Now we have used both of those approaches at Sermon Audio, okay? So when I first came on board, what I would do if I wanted something translated for our app, is that I would extract from, say, the English resource file all of the various pieces of text that needed to be translated, and I'd put them in a spreadsheet, and I would email that spreadsheet off to the translator, who would then sit there and work through all that and fill in the translations, and then he would email it back to me, and then I would extract all of the translations that he had done into the resource file, and that was fairly labor-intensive.
So one of the first things that we did when I joined Sermon Audio was that Mr. Lee said, look, we've got to have a better way. We want to support more languages, and it is just not feasible for us to have this whole spreadsheet emailing kind of approach for getting translations. So we sat down and brainstormed, and we decided that we wanted to automate the translation with the help of human translation volunteers.
So, we thought, well, here's how we would like things to work. Okay, so I'm gonna talk about how we thought, this is sort of interesting part of the talk because I'm talking about how you go from concept, right, to an actual product and then, Did it work out the way that you hoped it would? So the idea was that you have a developer who is working on the application and he's updating his English resource file with new strings that he's using in his app, right, that haven't been translated yet. So then we said, well, we want an automated system that's going to know when the developer has added a new string or set of strings to that resource file. and can use machine translation to translate those into Portuguese and Spanish and Khmer and all of the languages that we support our app in. We want machine translation for that first pass. But then we want a human translator to review that first pass and make a change to it if needed. And then once the translator is done with that, then we want that corrected translation to be automatically brought into our code, into our non-English resource files, so that we're not having to email any spreadsheets around. This work of getting strings to the translators and then from the translators back is all as seamless as possible.
So this is the architecture of the system that we designed to make this happen, and it has three major components to it. I want to talk about this briefly, and then we'll be finished. So over here on the left, you have application developers who are putting their source code into a get source code repository kinds. Many of you familiar with the idea of a get version controlled source code repository. And this would include, then, resource files that have both English strings and non-English strings. When the developer makes a change to one of those English resource files, then we want the system to identify, hey, this word is new. It wasn't in that resource file before. This is another phrase that wasn't in that resource file before. It needs to be translated. And so we want a human translator to be able to see a list of strings that need to be translated using a web application. We didn't want to have to email spreadsheets back and forth. We wanted the translator to just be able to open a web browser and go to a system, log in, and see, oh, here's all the new strings that have been translated to Spanish that I need to review, machine translated to Spanish.
So you have to have a database that's sitting in the middle that holds the work that is in progress, the work that the translators are looking at, that they're making changes to, until they approve it and say, yeah, that is the right translation for the word sermons. And then when they approve it, we want that word to make its way back to the source code repository and update the resource file so that you know, none of our sort of office people are having to be in the middle of this kind of exchange.
So here's how things actually worked out when we got through all of this. The process works like this. You've got a developer. He's got the project open. You can imagine yourself as a developer sitting in front of a screen with a resource file, and you're making changes to it. So you commit that, and that goes to the Git repository. See if you can follow this mentally. Okay, we're getting towards the end here. Y'all do it real well.
Then we have what's called a continuous integration pipeline. That's a fancy term for a system that runs whenever the developer makes changes to the code in the source code repository. And when it runs, it looks for new English strings and puts them into the database so that they can be viewed by a human translator. It creates draft translations of those strings using machine translation and put those translations into the database. And then it also scans the database for work that has been approved by the human translators and gets those non-English translations that have been approved and brings them back into the source code. And we want the developer to be able to review that before it actually gets merged into the source code.
So that's why you have this idea of a Git branch. Those of you who've worked with Git are familiar with the idea of a Git branch. So we put the translator work into the branch where the program the programmer can sort of do the sanity check on it. You know, does that look like that might be reasonable? Does it have the right markers in it that are supposed to be in it and everything before it actually gets its way into the project itself?
So this is what the web interface looks like that the translators use. Here we're looking at the mobile application project strings. You can see the English strings on the left here. There are 401 strings that of which 20 of the only 23 need to be reviewed by a human translator. Right here the translator is editing this particular translation. Maybe it's not quite the right word so he can make a change to it, and save it, and so forth.
So we've got a nice web interface. Works on mobile, works on desktop. You can use it in an iPad, although it has to be a fairly fast iPad in order to... When you're dealing with 400 strings in one interface, it is taxing in a web browser. We've got projects that have 900 strings. So how do we bring all this to fruition? Well, that's where the partnership between Sermon Audio and BJU comes into the picture.
So Sermon Audio has been here at Bob Jones for a couple of years now. In the spring of 2022, we sat down and had a meeting at Sermon Audio and envisioned this system. We said, this is what we want. We designed in sort of big picture detail how the flow needed to work. And we designed the database. We decided what tables needed to be there and what they would contain and so forth. And then it laid dormant for several months until January of 2023, when we had a team of BJU interns that joined us for a semester. And they basically completed in about four months of work, about 10 hours a week, the web application and the piece of the tool that does the extract and merge between the resource files and the database.
And then they left, which is what happens to interns, right? They go off and they leave and then we're left and we get to finish whatever didn't get finished. We were very pleased with what they got finished because all we had to do when they were done was to integrate it into our automated pipeline so that things didn't have to be run at the command line, but it was all happening automatically.
So for the past year, we've been actually using this in production. And we've been using it on our mobile application. We were able to add support for French and Farsi and Khmer using that system. And then on the new website that we are hoping to release later this year, in which you can try out if you want to, beta.sermonaudio.com, I think we have support for French right now. closing in on Spanish on that and we'll have other languages that we add over time.
So we had two BJU interns that helped us with this. Peyton did the front end design work and the extract and merge tool and Emily also assisted with the back end API implementation and again was an example of a very successful intern, this was not a traditional type of intern projects. I don't know if you know anything about internships, but often interns are stuck, I shouldn't use that word, have the privilege of doing bug fixes and other types of things like this. This was a very useful project and very rewarding, I think they would say, to work on. because their work is contributing directly to the ability of additional people groups to be able to access material that they otherwise wouldn't easily have access to.
Because you see what happens is say missionaries in Khmer will publish sermons that they have preached in Khmer, but then the people in Cambodia need to be able to find it and access it. And so if they're having to use an app in English to search for Khmer sermons, it's very difficult to do. But now that they have the app interface all in Khmer, it's much easier for them to navigate and find the Khmer information that is available.
All right. So our remaining challenges. One of the big things that a translator needs to know when they're translating the words, say, back, is where does that word back show up? And are we talking about navigating backwards? Or are we talking about somebody's back? Right? And right now in the interface, there is nothing to tell that translator, you know, the word bookmarked and bookmarks is probably pretty easy. But when you get to some of those more generic terms, they're back to emailing us and saying, now, Where does this word back appear, and what does it mean? So we need to be able to give additional information to the translators to help them make the translations.
Well, I promised a 45-minute talk, and we're at 45 minutes.