User:RheingoldRiver/Blog/Tournament Cargo

From Leaguepedia | League of Legends Esports Wiki
Jump to: navigation, search

Leaguepedia Tournament Data Storage & Display Revamp[edit]

This isn't gonna be a super technical writeup (I don't think), for links to code & specific accounts of what I edited, see 2018 Dev Blog and 2019 Dev Blog. You can skip to #Initial Planning if you want probably.

Background[edit]

So when I joined Leaguepedia (/Esportspedia/EsportsWikis) in January/February 2014 the wiki was already established for three-ish years, and there were relatively well-defined procedures already in place for a lot of things. I didn't touch the wiki period until about August 2014, and then when I did start editing the wiki I built a lot of new stuff (match history pages, player stats, roster swap portals, VODs tabs, Match Details tabs, etc etc there were a lot of things) but I didn't really touch tournament coverage, just added some new facets to it. In fact it wasn't until about January 2018 that I really started to think about our tournament coverage and I realized it was kinda....really really awful/inefficient, and I'd eventually want to do something about it.

Basically under the old system, after a game was over you had to update:

  • Standings
  • Matches section
  • Crossbox
  • Schedule section
  • Frontpage featured leagues

And later on additional sections/pages:

  • Match Details/VODs
  • Timeline

This really sucks because it's...the same data. It should only need to be entered one time. So this post is about how I spent 6 weeks rewriting everything from scratch so it only needs to be entered one time. (Technically the front page is still separate, but I'll get to that soontm)

SMW, Cargo, & Lua[edit]

So also of relevance is that there are two databasing extensions we use, SMW (Semantic MediaWiki) and Cargo. I spent about the first half of 2018 transitioning from SMW to Cargo, and everything except our scoreboard data is now in Cargo (scoreboard data is a scary big database and I actually wrote most of the code to transition it but I didn't want to think about it at all until after our 1.31 upgrade / transition to a newer version of Cargo, and then I had to do this project in the offseason so yeah we'll see when that part happens). Cargo has a lot of advantages over SMW, and it's a lot easier to do the type of thing needed for this project in Cargo than in SMW.

Also this year, I learned Lua, which is supported by the Scribunto extension and lets you write significantly more sane code than what you can do in "regular" MediaWiki (MediaWiki (MW) is the software that the wiki runs on, the same software that Wikipedia uses). Quotes because you're already using ParserFunctions and other extensions when writing MW code, but Lua is completely separate from MW markup. So writing everything in Lua with Cargo was 100% essential to this project.

Initial Planning[edit]

It was around June 2018 that I started to think seriously about unifying our tournament data entry. I knew it was going to be an unreal enormous project, and I also knew I had 0 desire to do it during the season, it had to happen during 2019 preseason. That said, it was going to require a ton of pre planning to decide what to do, so I started thinking about it. There were a couple main things I knew I would have to figure out:

  • What database table(s) to build and what to put in them
  • What tabs to keep/remove/combine (like, why in god's name were MH links separated from VODs???????? there is no good answer to this except that I had decided to start tracking these things at different points in time)
  • What visual elements to put on the main page of a tournament - in particular, what do we do about Matches & Schedule sections? the Schedule section was an absolute disaster, with like 70% of the information in any instance of it completely irrelevant, but it was also a super ubiquitous part of our coverage. How could I possibly delete something like this???????

Match Details & VODs Tab[edit]

Also around this time I was like, ok if nothing else at least I'm getting rid of this stupid division between MD & VODs, so in like 3 days I wrote some pretty terrible Lua code and put it live, and that was the MDV tab. I should probably mention that while I took a Python class in college & had written a bunch of cute programs in TI-BASIC in high school for math team, I really didn't have coding experience prior to this. So I had to do a lot of learning things from scratch, and I still don't really consider myself "a developer" (whatever that means). The code for MDV was something that I wanted to just get done and have live before start of summer split and I accomplished that so I guess it was fine.

MDV tab also stored to a Cargo table that I used on literally one single occasion, which was to make Chloe a list of all Rift Rivals Match Histories on a single page. I also have a page that queries it called "quick spoiler free list" but I think that was just a sandbox so yeah that was kinda a waste of development time I guess, but on the other hand literally everything I wrote all year was a learning experience so there's that.

Matches Section OMG[edit]

Okay so like I said above what to do about Matches & Schedule was one of my biggest concerns. I tend to think about UI design whenever I'm stuck afk, so mostly like showering or walking places. So I spent probably a full month where this was like the only thing on my mind. Also in June I had laryngitis the entire month. That's not an exaggeration. I kept looking at how many days it had been, and it was over 4 weeks that I couldn't speak period and it was the worst. So I had like no option other than to just sit silently and think about this shit. There were like three questions:

  1. How do I get VODs & extended info to display in MD section?
  2. How do I get dates/start times to display in MD section?
  3. How do I make the display in Schedule suck less & let you see all games at a time, while still retaining the ability to show a lot of information simultaneously?

It was sort of an either-or, either I do 1 and 2, or I do 3.

VODs[edit]

There were a couple of things I 100% knew I DIDN'T want to do:

  • Absolutely do not show links only one series at a time by clicking a game, because then you can't easily open a bunch of MH links or VODs at a time, and also that's just annoying.
  • Absolutely do not want to keep both of these sections on the page.

Other than that I was really unsure. I had a bunch of ideas that I ended up not going with:

  • Each week of Matches has two toggles: one to expand/contract, and one that shows a bunch of extra columns to the right that contain the extra information
  • Each match of Matches is divided into 2 lines, one with result & one with links
  • Kill schedule completely & rely on just the "spoiler free schedule" link (which could also be made not spoiler free if needed).

Later on in December I had another idea which I actually coded completely and showed to people but ended up deleting because it seemed too confusing - this idea was to have a toggle that hides the team TEXT in favor of displaying only icon + MH/VOD links, MH on one side and VOD on the other. It was cute af and I loved it but yeah, kinda confusing. Like, really confusing lol.

So then I was going to just not display them at all maybe and make you click to MDV page, but one person on reddit was like "no pls" in a really convincing way and I was like k, so I put MDV page on overview page instead. So there's actually still kiiiiinda a schedule-ish section on overview, but it just contains links and it's actually great, I love it.

Dates & Times[edit]

Having the dates be above each match was a thing that was already in existing code, and was initially something Liquipedia did, but I had a huge problem with it: We want to show date/time in user local time but that's handled in JavaScript which means that I don't actually know which matches are happening on the same date until after the page is loaded, and then JS figures that out. So I had to write some JS to make this happen, which when I did it was np but when I was still deciding things in June I didn't know enough JS to have done this (to be clear, this was a really easy thing to do in the end, I just didn't know literally any JS then). So that was a struggle at the time.

Also how to do times?? Like I said before, I was spending a lot of time thinking about this, and getting increasingly frustrated. So at one point I remember thinking, "okay I'm just going to go through literally every UI element and see what I can squish down or move to make room for start times." Team 1, no, Team 2, no, score......OH WAIT NO ONE CARES START TIME AFTER THE SCORE IS KNOWN and I was SO HAPPY when I thought of it and like that one realization was what made me go from "lol this would be nice" to "hey I might actually do this."

So I was really unsure how I was going to code the local time zone part of this but whatever at least I knew what I was gonna (attempt to) do.

Database[edit]

So by the end of June I knew (mostly) what I was gonna do with Matches section & Schedule section (well at least I knew what I was gonna do with times, and I was (kinda) confident (not at all) that I'd be able to figure out SOMETHING to do with VODs later, which in the end I didn't decide until like the 2nd or 3rd week of December, so this was still on my radar as a pain point through all of this for months). But the display is like if anything the smallest part of this. I needed to design a database structure ahhhhhhhhhh

Joining Tables[edit]

I'm gonna keep this super not-at-all technical, so all you need to know is that I have a bunch of separate tables of data (think a table is an Excel sheet) and I need some way to know what game in one table is the same as a game in another table. Each table corresponds to one "type" of data - start time & match details link & vods links & etc from Schedule needs to be able to be associated to the pick/ban order from that game, as well as the scoreboard. (Like I said at the start scoreboars are still in SMW not Cargo, so not scoreboards yet, but eventually.)

This is not an easy thing to make happen.

The natural way to associate stuff together would be to use a unique ID per game. Great! Wait not great. Our scoreboards are auto-generated, but most of our stuff is entered by hand. So even if we had a Riot MH link for every game (fuck you LPL) then requiring that to be entered for every game in every place where data is stored is suddenly a ton of extra data entry. I don't want to make people do that. And ofc Riot MH links are NOT there for every game, so.....rip. We'd have to come up with some kind of internal indexing and uhhhhhh. With dozens of regular editors, probably hundreds of one-time/occasional editors, and a lot of people not native English speakers, anything like this will have to be 100% automated by my code. I absolutely can't make people enter something non-obvious like this. |winner= is clear, |uniquegameid= is....not.

So ideally what I want is something like <pagename>_<number of game within the page>. Of course this doesn't work because (a) not all games are always ordered identically within displays (think 2-stream events, or organized by group vs by day of play) (but actually we can, and will, start mandating consistent ordering) but more importantly (b) How do I know the order of games when games are stored across multiple pages??????

One option for (b) would be to preform a query on previous pages to figure out how many games were played before then, but that would be a DISASTER because queries only refresh when you edit a page (or "blank edit" / "null edit" (same thing) i.e. press edit and then save without making any changes). This is bad for a lot of reasons, but the biggest reason is what happens when you add more matches to Group A's page, and the editor doesn't know to then re-blank-edit Group B, Group C, and Group D oh god. And tiebreakers uhhhh nty.

So I need two things with different requirements:

  1. A way to associate games together that doesn't involve the name of the page or the number of the game within the page.
  2. A way to order games that DOES involve both name of the page (and order within the "group" of pages) and also the number of the game within the page.

I also struggled with this for a while. I don't remember any particular aha! moment but here's what I ultimately decided on, probably sometime in September-October ish:

  • Break up events into "Rounds" or "Phases" or some smaller unit, each given a distinct name. For example, "Week 1" or "Quarterfinals."
  • Do not permit these units to be split across pages.
  • Denote the event name by the title of the overview page, for example "NA LCS/2018 Season/Spring Season" but not "NA LCS/2018 Season/Spring Season/Picks and Bans"
  • Number matches/games by their number within the phase/round/whatever
  • Join based on the overview page, the NAME of the phase/round/whatever, and the number of the match witihn that phase/round/whatever
  • Order based on page number and number of phase/round/whatever within page and number of match within phase/round/whatever

If this sounds kinda sketch to you that's because it really is but like I'm pretty confident this is the best way to do it because I literally cannot rely on any kind of user input for this. It will of course require that phase/round/whatever is named consistently across pages within events, but that's relatively easy to police.

Phase/Round/Whatever[edit]

Yeah so like....what do I actually call this shit? I had a lot of trouble with this. And it's important because you might want to know a team is eliminated in "Quarterfinals" but if they happen to not play in Week 10, that doesn't mean they were elminated in "Week 9" - even though these objects need to be categorized the same way. Ahhhhhhhhhhhhhhhhhhhhhhhhhhh

I went back and forth a lot between saying "phase" and saying "round." In the end I said fuck it and called it "Tab" instead. So we can still define a Phase as when you were elimianted - RR, QF, SF, etc; and we can also still define a Round to display in Top Schedule and other places, and we also have Tab. Kinda confusing sure but this is the kind of thing that most people don't even need to know about as a thing, and this avoids overloading a term.

Matches & Games[edit]

So by the time I was properly thinking about this, it was probably November already. In November I did two major projects, the first of which was rewriting all of our infoboxes in Lua, and the second of which was redesigning our brackets from scratch. That said brackets was one of the most collaborative things I've done on the wiki, by which I mean Ema (kittymmeow) did everything for me (i.e. designed the HTML & CSS using grid) and I just wrote a couple hundred lines of really easy Lua. But the point is I still wasn't working on this HUGE GIANT PROJECT THAT NEEDS TO BE DONE BY THE START OF THE 2019 SEASON OH GOD. But this other stuff had to happen too, and it turns out it was REALLY GOOD that we redid brackets because I ended up making them able to query Cargo too. And that is absolutely definitely not something I could've done with the old code in any remotely sane way.

Anyway so it's November, I still hadn't decided on the phase/round/whatever thing but I knew how I was going to do the join, but I've arrived at another decision: What database tables do I want, and what goes in them? In the old schedules we only had one line which represented a match, but I need a lot of game-specific data too. Honestly I'm not even sure what to say about this, just that it was a problem I had to figure out, and I did. There's probably two super relevant things I decided:

  • Every BO1 is both a match and a game, even though this will feel very inconvenient and I'm sure editors will complain to me (tbf it does seem kinda zzz that for formats without side selection, you have to say |team1=A |team2=B and |blue=A |red=B in the same line but too bad sorry).
  • For fields like MVP, interview, etc, I will provide columns in both charts, and if I don't know the format I'll try to grab preferentially from game, if not provided there then try from match, otherwise give up.

Actually one thing that I decided later which would have alleviated a lot of my pain over this issue was that I'd just require the user to specify what columns they want displayed when printing match details stuff (MVP, interview, Match History, etc etc). But my initial thoughts were that, similar to old MDV tab, I'd be printing everything and just with some empty columns, so it seemed extremely important to make it easy to know where individual pieces of information were supposed to come from (game vs series). And in the end it became super unimportant gg.

The one thing I did do was prohibit a match link (as opposed to game link) for MH link, fuck you LPL go die. So for LPL it needs to have the same link input per game in a match, and on the page it will display the same way, the same link 2-5 times, but also I don't care I want that always in the same order.

Huh I guess I did have a bunch to say about this.

Coding!?!?![edit]

So on December 1 remember I was still working on all this other stuff and I'm like ok I need to start this shit oh my god. So I just started writing code. I decided by December 15 I was going to have some kind of database set up even if it wasn't finalized, and I'd have Crossbox, Timeline, Standings, and Matches done. So...I kinda just did that. Probably I spent 10+ hours a day writing Lua code and every couple days running into another disaster that I had to solve. It was around this time that I decided to go with Tab instead of Round or Phase, and figured out what to do about VODs (actually one thing that helped a lot with "what to do about VODs" was the realization that VODs are no more important than Match Histories, so a solution for VODs wasn't really any kind of solution if it didn't incorporate MH too), and some other things. I'll just quickly go over a few.

Local Times[edit]

Actually this ended up being really easy because I no longer know 0 JS. But basically for static timezones, I printed them only if either (a) the line was the first in its tab or (b) it was on a different day from the line before. For dynamic timezone/countdown, I printed it after every single line, but when you load the page there's JS that hides it above all lines where those criteria aren't met. ez.

Toggling Timezones[edit]

Not that interesting but I had to write some new JS for this because our existing toggle JS wouldn't work overlapped with the show/hide for individual weeks. I still want to replace the existing old toggle JS with something more modern / easier to use / does less on page load, but it works for now. So, eventually.

VODs & Match Links[edit]

Like I said before literally one reddit comment convinced me to do this. Like I guess I kinda always wanted to, but I wasn't going to, but then someone on reddit described how they use the wiki and I was like ok this sounds really valid. I'm not gonna make you click to another page. So this is all on overview page now and I'm really happy with it.

Brackets[edit]

A happy little coincidence that I realized was that I could very easily associated bracket titles with tab names, or let you specify a UniqueMatch per match in the bracket, and query Cargo data to print brackets too! Yay these are automated too now.

Timelines[edit]

One thing I decided to do for timelines was actually remove some data - in BO2s it only prints points now, instead of also record, and for BO3 with tiebreaker points, it's just one cell that has points in parentheses after the record. This made my life waaaaay easier, and I don't think it's that important to see tons of detail in timelines anyway. Also it makes the timeline narrower which is nice because they're really wide.

Footnotes[edit]

Footnotes are really hard to code because they require an extra layer of processing in between normal data processing & printing so that you can group identical footnotes and also order them properly. And this layer of processing needs to know where all of the footnotes are in your entire HTML object. I ended up making a utility module to handle footnotes and set up a global table and the result is that footnotes are now really easy. But figuring that out took me a couple days.

Implementation[edit]

This is where we are as of writing this post so I can't say "yay we did this and it worked" but there's a lot of moving parts here so I'll write a bit anyway. The tl;dr of it is I wrote some Python code.

Data Namespace[edit]

So pages are now at like Data:LCK/2018 Season/Spring Season. If you're logged into the wiki, there's a JS function that runs on tournament pages and displays a [view data] link at the top of the page (similar to [edit]) and takes you there. Creating these is kind of a three-step process:

  1. Apply a regex to GameSchedule data to partially turn it into MatchSchedule & change some commented tab names into template arguments
  2. Run a Python script on the page to change some more arguments
    • Potentially run 1-3 additional Python scripts depending if VODs/Match Details/MDV tabs exist for that event
  3. Change over the templates used on the tournament page to the new stuff

Actually the 1-3 additional Python scripts are a bit more complicated than that - my Python code uses the library mwparserfromhell to parse wiki pages, and that's really good at finding arguments of templates but not so good at some other things. Also MD & VODs tabs have varying formats. So basically a human needs to check each page by hand, run a regex on it to turn it into using templates, and then I can run Python code to pull the data to the Data tab. Another complication with this is that our old VODs pages were spoiler-free which is not fun to deal with now lol. Especially because it means I can't trust blue-red sides for teams there (though I can use teams for correct-match validation). But match details are nice because I can use those to pull the actual game data from Riot and get per-game blue/red and winner.

Next Steps[edit]

So we have to finish all the data migration which will probably take most of the rest of this month to do. I also have some more JS I need to write for toggles, and I want to improve a couple small things here and there. And I have some unrelated projects I need to do, such as creating data tables for individual achievements for players as well as competitive rulings (they'll be similar in structure so I'll do them back to back probably). And eventually I want to also update all of our brackets to use the new syntax, which is gonna be another huge project that will probably be a mix of human & automated work. But regarding the tournament data, the next cool thing that I want to do is make team pick-ban-order history pages based on joining pick-ban history with schedule so I can pull a timestamp from schedule yay.