HCT World Championship – Between Two Hearths – Life of a Developer

Hello, ladies and gentlemen. Rdu here, professional Hearthstone player
for Team G2 Esports. With me, I have Seyil
from the Hearthstone development team. Tell us more about yourself. Hey, morning, guys. Hi, my name is Seyil Yoon. I’m the lead server engineer
on the Hearthstone team. And my team is basically responsible
for making sure that the servers are up and that everyone can play
at any time of day. So we make sure that any time
you want to log in to the Hearthstone client
on your phone and play, the game is there,
available for you to play. So basically, Hearthstone
has multiple development teams. People that do different things
to make the game come together. -Right?
-Yeah, so… On the team, most of the people
can be divided up into four disciplines. And so, the four disciplines are
Production, Art, Design and Engineering. And so, I’m part of
the engineering part of the team. And the engineering team, overall, is responsible
for implementing a lot of the ideas that the designers have, obviously. And then, Production is responsible for all the scheduling and making sure
that the releases are in good shape and launch on time. And then, obviously,
Art is pretty self-explanatory. So those are
the four main disciplines on the team. So, what is the most important thing that the server engineers
have to do on a daily basis? So, from the perspective of day-to-day, obviously we need to make sure that our service is up
and running all the time. And if there are any problems,
we’ll respond to them and try to figure out what the problem is
as quickly as possible. And then, of course, we have the cadence
of major releases that we have. So, we basically release, like,
nine big patches every year. We have our three main content patches,
so our three expansions. And then we have three point releases
that come out a little bit after that. And usually there’s a little bit
of content associated with those as well. And then we have the three major patches
that launch before the next expansion, which is when we release
a little bit more content and there’s pre-sales and so on. So, we kind of have a daily and a weekly
and a monthly type of cadence in terms of the things that
we need to be responsible for. That sounds like
a lot of pressure on the team. What happens when the servers go down?
What are the priorities? So, generally speaking, the most important thing for us,
as a game, is to make sure that people can play. And so, sometimes we will prioritize
that above all other things. So, to give you an example, for the Un’Goro and
Knights of the Frozen Throne launch, I don’t know if people recall, but immediately after the launch
of each of these expansions, there was a short amount of time during which card acknowledgments
were turned off. And what I mean by that is when you open a pack,
you get some new cards, and then when you go to Collection Manager and you see those cards
in your Collection Manager, there’s a little pop-up that says “New.” Basically says,
“These are new cards in your collection, “and you’ve never seen them before.” And we want to highlight those
because we want people to know what new cards have been added
as a result of opening these packs. But, if some people will recall, for a few days after the launch
of these two expansions, we made it so that even if
you moused over the card, and the little “New” window went away, the next time you log back in,
it would show up as new again. Yeah, I encountered this at some point. Yeah. So the reason we did that was because checking off that “New” flag
for cards in your collection, it requires a little bit of work
on the server side. And at the time, as it always happens
with a new expansion launch, we had a lot of people playing,
a lot of people opening packs. And so, we wanted to make sure
that people were able to continue to open packs and play games, and we were willing to sacrifice
some other parts of the service like card acknowledgments. So you value people playing the game
all the time, and then, while they’re playing the game, you’re trying to slowly
but surely fix the problems. Yes, exactly. Right. And so, there’s lots of different
strategies that we can use for this. In the Hearthstone service,
we kind of have prioritization of what we consider high priority tasks, and we have things
we consider low priority tasks. So, high priority tasks
are always going to be things like the ability to play a game. So, one of the things that came up during Kobolds and Catacombs launch,
for example, is because of the Dungeon Runs, every time you beat a boss
in a Dungeon Run, we save that information so that the next time you play
another Dungeon Run game, it remembers and takes you
to the next boss. -And that’s high priority, right?
-Yeah. That’s super high priority. We want people
to be able to always play the game. But, as a result of that, other lower priority tasks
tended to fall behind a little bit. So one of the unfortunate things
with the Kobolds and Catacombs launch was that there was a problem
with Daily Quests. So, a lot of players
experienced this problem where they would not get Daily Quests
actually for a very long amount of time. And the reason for that was because we considered the granting
of Daily Quests to be a lower priority than the work that’s needed to be done in order for you
to be able to play the games. But because so many people
were playing Dungeon Runs, these tasks for granting Daily Quests
were falling behind more and more as time went on. And it turned into a really pretty poor
experience unfortunately. And so, a few days after launch, we started tweaking
some things on the servers to make sure that Daily Quests were
being granted in a more timely fashion. It was still a little bit delayed, but we got the granting
of Daily Quests from 30 minutes down to five seconds or so,
which is not perfect but it’s kind of good enough for us. Like I said, it was still
more important to us that everyone be able to play the game. As long as people could pay the game,
that was our ultimate goal. Yeah. Talking about the priorities, there was some talk in the community
some time ago that if Kripparrian
would press his disenchant button, the servers would go crazy. Can you tell us what happens when somebody wants
to disenchant their cards, server-wise? Yeah, so, mass disenchant
is another one of these tasks that we consider lower priority
than gameplay. But, when I say “low priority,” I don’t mean that
we’ll disenchant your cards, and then the dust will disappear
into the air and you won’t get them. That’s not what I mean at all. What it means is that the work
that actually occurs on the servers to remove the cards from your collection
and add the dust, that ends up getting done later
than other higher priority tasks. So at the time when Kripp was trying
to reach his subscriber goal, I forget exactly what the number was, and he was saying
that once he hit this goal, he was going to press the button… I mean, we were keeping an eye on things. We also knew that… So, first of all, I recall Brode made
a post on Reddit about… If we had done this
a couple of years ago or something, there would have been major problems. I don’t know that there
would have been major problems, but one of the unfortunate things was… So first of all,
when Kripp pressed the button, the experience was a little confusing because he pressed the button, and then
the client just sat there doing nothing. And it sat there and it sat there,
and then it actually disconnected Kripp. But then, he logged back in
and he had his dust. So why is that? Yeah, so basically
what happened there was that at the time when the client
pressed the button, it sent a message to the server saying,
“I want to do a mass disenchant. “Please do the work for me.” And so the server was there chugging away, doing the work required
to do the mass disenchant, but because that operation took so long, it was not able to send a message
back to the client saying “Hey, I’m done.” The client just was like, “Hey, the server
doesn’t seem to be talking to me. “Maybe there’s a problem.” And in some cases like this, the client will just disconnect. But the server was still doing the work. And so, by the time
Kripp logged back in to his client, the work had been completed, and all the updated information
was sent back to his client. Which is why he actually saw the correct
number of dust on his collection. So that’s basically
what happened in that case. So it still worked out pretty smoothly. I mean, we can always do better. Ultimately, obviously, in a case like
mass disenchant, the most important thing is we don’t do the wrong thing, right? We don’t want to take away the cards
and give you the wrong amount of dust. Or we don’t want
to give you too much dust. Or we don’t want to… There’s lots of things that can go wrong
in that sort of instance. And so ultimately, the goal here is
we have to do the right thing. Unfortunately, the client experience
of that is not as nice as we would like. And so, we’ve actually
made some improvements to that as well, so that there’s
a little bit more feedback now when you press the mass disenchant button, but it’s a big mass disenchant, there’ll be animations going around, and you’ll have an indication
that something is happening, but it can still take
a little bit of time. Tell me more about releases, because that’s what I find
an interesting subject as a player. I know back in the day,
when I was playing, the release was kind of laggy
at the beginning, but in Kobolds, it was just super smooth. And it was also simultaneous
for Europe and America. Tell us something about that. What happened, what changed server-wise? Yeah, so, I don’t know
if a lot of people remember this, but there was a time
a couple of years ago, I think it was during
The Grand Tournament, or somewhere around that time frame, where we used to launch Europe
much earlier than we did later on. And so we would launch U.S., and then
shortly after, we would launch Europe, and then we would launch China and Korea. So there’s a lot of factors
that go into deciding when we want to launch
a particular region. And one of the things we do on purpose,
which I think some people find frustrating but this is a good tenet
of service architecture, is we want to launch
when we are not at peak, right? And there’s a couple of reasons for that. Number one,
whenever we launch a new expansion, lots of new people come to the game,
and they log in and they try to play, and so our concurrency levels tend to be
much higher during expansion launch. So, if something goes wrong,
it’s going to impact more people, right? And so, we don’t want that.
I mean, that’s a bad experience. And the second part of it is we also want to make sure that we launch
in a way that is sustainable for our team. So to give you an example
of something that’s not sustainable, during the launch of
Knights of the Frozen Throne, we… Again, if people will recall,
we had login queues for a couple of days, I think, during the
Knights of the Frozen Throne launch. Again, because a lot of people were
logging in, a lot of people were playing. And it was
a little bit laggy at that time. And so, we used this thing
called login queues to control the rate at which
people logged into the game. Because the more people that are playing
the game, the more laggy it is. But as a result,
we had people from my team basically online for 48 hours straight after the
Knights of the Frozen Throne launch, hand-tweaking stuff, and making
sure the service remained stable. So, obviously we don’t want to do that.
That’s just not sustainable, right? People burn out,
and it’s just not good work-life balance. Yeah. So, we spent a lot of time trying to improve the way our service
would respond to big spikes in concurrency. And, so, we made a lot of
improvements to EU. And also, a very legitimate complaint
that people had in EU was that EU would launch so much later
than all the other regions, right? And, as a result, a lot of players from EU would login to a U.S. region, just to see what’s going on and
look at the new cards and stuff like that. But that’s not an ideal experience either. You want to play on your own account,
right? You don’t want to just… Create this new account on EU just for the
purposes of looking at stuff. And, so, for us, we really prioritized trying to get EU out in a more
reasonable time frame. Now, we actually did end up launching
EU pretty close to peak. And that was a decision that we were
hemming and hawing pretty late into the development cycle
of Kobolds and Catacombs. But we have made a number of improvements
to our service. And at that point we felt like,
“This is worth the risk.” And we think people
will be happy with it, you know? Yeah. As a player, it felt super smooth. And I wanted to ask, how was it
from your perspective? Were you guys a bit afraid
that it might go wrong? We were very afraid. Generally speaking, we’re afraid
every expansion because… I mean, obviously, we have
a great QA staff, and we have a great automation team
that spends a lot of time doing the best they can to make sure
that the game is going to be stable when it launches. But, you just can’t replicate what it is like when you launch
in production and all these people login
to your game at the same time. It is impossible for us to replicate
that sort of environment. And so, every time we launch something,
we’re a little worried. But we have a lot of contingency plans,
we have a lot of people on our team who are very invested in making sure
that things go smoothly, and we spend a lot of hours just
making sure things are okay. And so, I’m… I feel like the EU launch
did go pretty well. And I think a lot of people
were pretty happy with it. So, I guess that was a success. Yeah. I assume that you guys also
have somebody always there in case the server goes down. ‘Cause the server can go down at 4:00 a.m.
and it’ll be up in like 10 minutes. So, I cannot get my head around how
you guys are so good at problem solving, and always have somebody on site,
it’s incredible. So, we have a lot of
awesome people on the team. And while some of the technology that allows the service
to come back up quickly is automated, a lot of it is still manual. And what I mean by that is, oftentimes the best time to collect
information about what’s going wrong is at the time when something bad happens. And the best way to
collect that information is for someone to be available
and make decisions based on what they see,
as far as what information to pull out. And so, we’re on call. I’ve gotten some late night phone calls
at times. Not nearly as much as we used to
a few years ago, but it still happens from time to time. Well, I think we covered quite some things and I want to thank you
and your team for always observing over the servers and making
sure that we can play the game at any hour during the day or night. And thank you for joining me to talk. -It was very nice.
-Thank you very much. -And thank you, everyone.
-Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *