PVP Leaderboards

queue38 · September 2013

I have had a few thoughts about the leaderboards. Note feel free to ignore them if you don?t like them.

I don?t know if you can but I think you should count hull damage and shield damage different with a high percentage of the score coming from hull damage. The reason is because hull damage will get the kills and shield damage is easier to get higher numbers. Also I think hull healing should be a little more because there are more passive shield healing which makes it easier to get.

The base score would be off of damage and healing while the kills and deaths would modify the scores. I made an example to better explain what I am trying to say.

https://docs.google.com/spreadsheet/ccc?key=0AgHlChA__jqNdHllSFJ0YktFZWN6UWlHWWpHTFBRVGc&usp=sharing

A few other things, I think your top ten are averages so why don?t you just use everyone?s lifetime averages with no minimum plays. Then have a true top ten with people?s highest single game run also with no minimums. Or maybe just the top record so people can see what the highest someone has ever got. Like a Guinness book of galactic records.

antoniosalieri · September 2013

naz4 wrote: »

Hilbert, for TD, working better and better bud.

This can't be said enough. I hear your Match up tool is working better and better Hilbert. The effort is super appreciated. Thanks for all your work on this stuff.

mancom · September 2013

Balancing should be even easier now. I have added an autocomplete option (for users that have not enabled privacy settings) that will eliminate the need to get everyone's handles a second time.

g0h4n4 · September 2013

vegie0 wrote: »

Look I am not sure if this has been said before, but I do not qualify for the Leaderboards, as I am not on Hilbert's Friends list. In fact, he has me on ignore, and routinely tries to taunt me with that fact. Do not know why I should care, but the point I am going for, if it is not obvious enough, is that a list derived by one persons rules, and maintained by one person who is not associated with the development process. The list is in my opinion worthless. As there are some players who are surprisingly good that never really care for the ques. So I personally find the Leaderboard laughable at best. Not due to the fact that I do not appear on it, but due to the fact that his friends are more likely to achieve higher ranks due to his relationship with them.

Just an observation.

Incorrect, I was number 1 after the No BS tourney and I never knew Hilbert then. So that goes out of the window.

Since Cryptic aren't bothered with helping PVP develop, the community has had to do it and Hilbert has stepped in, using his time and expertise to form some kind of leaderboard, albeit limited as the combat log does not show everything.

Perhaps you care to join in and make your own leaderboard? Before being critical of someone's work, if you cannot make a better one or improve it, I sugest you keep your bitterness to yourself.

masterkeychnk5 · September 2013

Just put on the Valdore heal console and your stats will go up automagically! No healer prerequisites!

g0h4n4 · September 2013

masterkeychnk5 wrote: »

Just put on the Valdore heal console and your stats will go up automagically! No healer prerequisites!

lolz trollz

masterkeychnk5 · September 2013

g0h4n4 wrote: »

lolz trollz

Haha that was funny wasnt it.

But in all honesty, the hilbert leaderboard is awesome, but it has big flaws in it at the same time.

Alot of stuff that gets penalized is because its pay2win, and pay2win doesnt automagically means its OP or anything or should be penalized. I know hilberts stance on p2w and how he likes the game to go back to pre free2play, and sure everyone has their opinion on stuff.

But IMO the leaderboard is way too subjective at the moment. Its a great attempt to make something for the pvp community, i just think the balance is far from right atm.

Alot of stuff that IS actually existing in the combatlogs is not even counted in any way shape of form, like the valdore console i mentioned above. I do not personally think it is overpowred, alot of other people do. It can achieve massive passive healing, sometimes as good as the old BFI doffs if it procs right in your favor.

Stuff like TBR is only counted for damage, whereas its a great 'utility' to push people out of a group to get it killed and stop it from getting heals or push it out of the extenders range.

I'm sure its not finsihed yet, but thats exactly what we want to do, give feedback and definately approach this without subjective opinions, especially not from the creator himself.

Just ask yourself this, a full out cheese spam build with EWP almost always on global, theta in between (using the doff to reduce all Exotic **** when using grav well), black hole, AMS, graviton pulse, backstep, nadeon detonator, i cant even think of em all, and topped out in the top 5 of the leaderboard. *scratches head*
I know for one the black hole can be detected because it does damage in the logs afaik, perhaps add some prediction formula like

When using
{ 'black hole console' }
the probability of using other pay2win TRIBBLE increases, etc. Still hard to do but its probably the best bet.

The current utility detection is flawed too, that simply means spam out Tractor beam etc as much as possible, even if they are totally crappy timed on Polarize hull or omega for 99% of the time.

And the pets, cmon I know you dont like pets, but really? A few fleet credits and some dilithium, my god... Tractor beam pets are way worse then the other stuff around.

Are TB mines penalized too? Since they are like 3 times as strong as the highest specced Bridge officer version.

mancom · September 2013

The algorithm has flaws, no doubt. I think the rating of players within a team is somewhat sensible, I have a different algorithm in internal testing that I feel gives better "in-team" results, but provides worse overall averages. A major issue with the current ranking is that it favours matches with lots of kills in a short timeframe which leads to high scores in pugstomps that are not sufficiently balanced out in the averaging process via the match balance factor. But there is also the thought that I don't want to punish players for playing queue matches and winning.
Another problem is that it's usually better for the score to do a lot of damage and a lot of kills then just a lot of kills in a super efficient low-damage way.

The impact of the style rating is currently toned down, due to the fact that many things like TIF, GravPulse, AMS cannot be detected and it would be easy to game the system and have super annoying stuff and still not get any penalty at all. (Yes, TB mines get penalised.)

The utility rating is problematic in its current form since you can get really high utility scores by simply using as many different sci abilities as possible. Also, as you noted correctly, it currently favours spamming the ability over using it only in decisive moments. I think that this could be improved by taking into account the situation and e.g. giving a bonus to tractor beams on a target that died a couple of seconds later or something like that. The main obstacle here is that coding this detection is not a lot of fun, and so I'm probably not going to do it in the foreseeable future.

I'm not really sure what you are referring to regarding pets. I only put penalties on runabouts, interceptors, drone ships and nimbus pirates (SNB from a pet? really?), which seems like it is in line with your "tractor beam pets" comment...

If someone really wants to attempt to devise a new rating algorithm that works with the currently collected data, I probably could provide a database dump so that one could experiment and maybe find a better ranking system.

Regarding the cheese team at the top of the list: I have already mentioned why style has a low impact at the moment, and other than this highly subjective component, it appears that it was a very successful team in terms of winning matches and that would obviously put it near the top of the list, especially considering that it was used in the queues and that the system currently has a bias towards onesided matches.

(The question here is: Should winners in onesided matches be punished for the fact that their opponents were not good enough, i.e. receive low match scores despite winning decisively, or should such matches only receive lower weights in the averaging process to reduce their impact on the average score?)

hurleybird · September 2013

Here's what I think you need to do.

Add up all the player ratings prior to a match, and if a player does not yet have a rating assign him an average one.
Assign a bonus for the team with the lower rating, and a penalty for the team with the higher rating. Therefore it does not lower your rating as much to lose against better opponents, and it's harder to farm weaker players for rating.
After a certain threshold where one team has an overwhelming rating advance, do not even let the results of that match influence player ratings. For example, if my team is playing against pugs that we are rated 2-3 times higher than, it's possible that even if we win 15-0 we would lose rating because of the penalty. Weighting these kinds of matches would do no benefit to either the losing or winning teams.

mancom · September 2013

My rather simplistic auto-balancing algorithm seems to be fairly successful. I think I could adopt this as my new match-balance factor, and then possibly exclude results below a certain threshold.

hurleybird · September 2013

Go for it.

xtremenoob1 · September 2013

mancom wrote: »

If someone really wants to attempt to devise a new rating algorithm that works with the currently collected data, I probably could provide a database dump so that one could experiment and maybe find a better ranking system.

If you are offering up a DB dump I wouldn't mind checking it out.

I have been working on a parser for personal use. Would be interesting to see how you
are storing and summarizing data.

You could upload it on Pandas TS in Pheo's room if that is easiest and the offer still stands.

mancom · September 2013

The match balance rating has been changed. Now a balance rating of 1 indicates a perfectly balanced match, values below 1 indicate progressively worse balance, 0 means totally unbalanced.

For the ranking lists only matches with a rating above 0.25 are counted and in the averaging process match ratings are weighted with the square of the match balance, thus further reducing the impact of unbalanced games.

masterkeychnk5 · September 2013

Just out of curiousity, I had pretty much equal score with all the factors kills, dmg, utility, style, etc, but i had alot less rating then others with the same individual score (comparing on other matches), how does this work? Others from team also snoop of of your own rating?

hurleybird · September 2013

mancom wrote: »

For the ranking lists only matches with a rating above 0.25 are counted and in the averaging process match ratings are weighted with the square of the match balance, thus further reducing the impact of unbalanced games.

Can you give the exact formula, because the way I'm looking at it would mean it would be impossible to get a higher rating with, say, a 0.26 team balanced factor no matter how well you do (=0.26 * 0.26 * score?). Also, do you then get an inverse bonus factor for being on the weaker team?

mancom · September 2013

masterkeychnk5 wrote: »

Just out of curiousity, I had pretty much equal score with all the factors kills, dmg, utility, style, etc, but i had alot less rating then others with the same individual score (comparing on other matches), how does this work? Others from team also snoop of of your own rating?

When you are essentially equal in terms of kills/dmg/heal/util/style and on the same team (there is a rather small bonus for winning), your score can still differ: there is a "grace under pressure" modifier - the more damage you take relative to everyone in the match, the higher it gets. It is calculated as

1 + (damageInRaw / totalDamageRaw) / 2

which means that in the worst case (you take no damage at all) the multiplier is 1, in the best case (you are great at holding aggro and tanking and take all the damage in the entire match) you get a multiplier of 1.5. Since the total damage is divided between both teams and a big part of the raw damage is "wasted" against pets, it usually should be much closer to 1 than to the theoretical 1.5 maximum.

I had introduced this factor to reduce the effects of being picked on by a superior team, but now that you mention it, it's probably also something to encourage players to stay in the fight and not cloak and hide all the time.

mancom · September 2013

hurleybird wrote: »

Can you give the exact formula, because the way I'm looking at it would mean it would be impossible to get a higher rating with, say, a 0.26 team balanced factor no matter how well you do (=0.26 * 0.26 * score?). Also, do you then get an inverse bonus factor for being on the weaker team?

With the new match balance factor, I removed the bonus for being on a weaker team. I was never really happy with the way it previously worked because it occasionally showed a massive advantage for the losing side that did not really make sense. So currently the score is independent of the team's performance and is only rated relative to all players in the match in addition to the absolute parts.

After excluding all matches below 0.25 balance score, I calculate the average score as

SUM (matchScore * matchWeight^2) / SUM(matchWeight^2)

where the sum ranges over all a players' matches (>0.25 balance) in the selected timeframe.

This means that scores in unbalanced matches still influence the average (and improve it in case of good scores), but have a rather small impact on the final result compared to well-balanced ones.

dontdrunkimshoot · September 2013

so far every match ive been in balance by this has been great, all within 5 or 7 points of each other at most, and lasted a good long time. no short 1 sided shutouts so far. if only math like this was part of qued match makeing

antoniosalieri · September 2013

dontdrunkimshoot wrote: »

so far every match ive been in balance by this has been great, all within 5 or 7 points of each other at most, and lasted a good long time. no short 1 sided shutouts so far. if only math like this was part of qued match makeing

Then you would have the issue of balancing teams and teams of 2-3 people ect.

Wouldn't it be fantastic though if Cryptic created a Tyler Style Que with there own internal data. One that didn't allow teaming... and auto balanced teams... making sure no team had 5 sci no healers ect... and then using there own internal scoring system to balance teams. If every player had a behind the scenes score for dmg / healing / Kill Death ratios / Win loss ratios ect... they could learn a lot from what Mancom is doing.

They need to sit down with Mancom and pick his brain for awhile... and do just that really. They don't even have to do a leader board... just create a new que that does a automated balancing based on metrics and team make ups..

naz4 · September 2013

antoniosalieri wrote: »

Then you would have the issue of balancing teams and teams of 2-3 people ect.

Wouldn't it be fantastic though if Cryptic created a Tyler Style Que with there own internal data. One that didn't allow teaming... and auto balanced teams... making sure no team had 5 sci no healers ect... and then using there own internal scoring system to balance teams. If every player had a behind the scenes score for dmg / healing / Kill Death ratios / Win loss ratios ect... they could learn a lot from what Mancom is doing.

They need to sit down with Mancom and pick his brain for awhile... and do just that really. They don't even have to do a leader board... just create a new que that does a automated balancing based on metrics and team make ups..

Would definitely be good for the pug pew pewers as well.

antoniosalieri · September 2013

naz4 wrote: »

Would definitely be good for the pug pew pewers as well.

It would be the pug players wet dream. Imagine quing for 10 games in an eveing where 6-7 of them where epic 15-10+ type games... with the others being ok at least.

I would say they could even set it up so people could que as a team... but just not end up teamed together if you know what I mean.

So 5 guys could que up teamed... and at least know they would end up in the same match. (for the guys that are always saying "I just want to play with my friends") I know I enjoy shooting at my friends now and then... and getting killed by them now and then to. I may not like getting blowed up... but if a friend does something great and makes me go boom hard... it can still be pretty darn fun.

masterkeychnk5 · September 2013

Yeah that would be great, Cryptic would that way finally support PvP somehow.
They would have to make those combatlogs a whole lot more accurate though and also show abilities that do not do damage.

Last TD match i was in ended up in 15-1 or so with the new Hilbert algorithm unfortunately.
Well it might have been a needle in a haystack, time will tell.

mancom wrote: »

"Another problem is that it's usually better for the score to do a lot of damage and a lot of kills then just a lot of kills in a super efficient low-damage way."

Isn't this inherently wrong then? Someone with massive AoE FaW, TBR, and all the other goodies like transphasic still get more rating (Even if they dont kill as efficiently at all, or at all) then quick and swift kills.
Now with a nerfed beam overload stack build it feels alot, and alot harder to do that anyway (Except for pay2win toys like Wing canon overload, Ionized particle beam from mogai etc)

I'd say increase the kill efficiency rating and nerf the usage of those instashot wonders, balance the dps and efficient kills out properly would do alot of good.

Only extreme damage dealers which are considered op, or valdore healing console (Which alot find OP) you should penalize (hell, my 0 heal abilities Scimitar outhealed Trinity, and everyone else yesterday on the board how crazy is that.)

Base the rating for the majority of your system on Spike vs DPS balance vs Heal balance, and only penalize the worst consoles but do NOT overdo it on those. Utility rating is good to have, but if one half of the sci boff powers arent logged and the other part is, stuff will in fact get out of balance. Isnt it super unfair if 1 sci has an awesome and effective build (Shield stripper or such) and th other sci has gravwell/ewp, aceton, and the last example would get tons higher utility score..

Again same goes for Black hole (logged by dmg) versus TIF (Not logged) or EMP Dkora (Not logged) versus Isocharge (Logged)

It simply brings too much fluctuations.

99% of the logs consist of damage/heals so thats where you will eventually have to seek the ultimate sweetspot and remove any excessive heals/damage by p2w consoles or other damage consoles.

Just my last example, I already mentioned it before, my fullout cheese spam build with all the possible p2w toys got on the top 5 at some point for a few days. The only reason being I used logged CC like EWP, Graviton gens, 2x Tractors, Chronitons, i had VERY little healing on that wells, and not much dmg either other then the stuff thats effected by particle gens. Reason for high rating? Couldnt be damage, wasnt my healing, it was my utility. So half my genuine legit toys got rated high, half of my other CC wasnt loggable and wasnt rated at all (Most of the pay2win) So if the system wouldnt have counted in my 'utility' abilities my rating on that fullout pay2win ship would be much lower. Simple example of the current algorithm and too much rating based on one side, and not the other, totally ruins and skews the results.

Sorry if my post is a bit unclear, but I'm sure you'll understand, just trying to help make this thing work better.

mancom · September 2013

masterkeychnk5 wrote: »

I'd say increase the kill efficiency rating and nerf the usage of those instashot wonders, balance the dps and efficient kills out properly would do alot of good.

Putting even more emphasis on kills than damage means that damage dealers that lose matches by a wide margin would get super low scores since they don't get any kills at all in such matches. And not every good match ends 15-14. Some end 15-0 and still were by no means easy wins.

And kill efficiency has an inherent problem too: With increasing match length (which correlates with match balance) kill efficiency goes down if one takes damage per kill or kills per time as a metric. Maybe I can improve the current rating if I focus more on a relative-to-match kill efficiency than on an absolute efficiency, but this still doesn't solve the problem that an emphasis on kills / kill efficiency punishes players that do suppression damage to open up opportunities for the other players.

I suppose one could attempt to detect how many heals such a behaviour forces onto off-targets, but that requires a lot of code. But I'll keep it in mind, maybe something can be done.

masterkeychnk5 wrote: »

Utility rating is good to have, but if one half of the sci boff powers arent logged and the other part is, stuff will in fact get out of balance. Isnt it super unfair if 1 sci has an awesome and effective build (Shield stripper or such) and th other sci has gravwell/ewp, aceton, and the last example would get tons higher utility score..

So you consider it fairer if no sci builds get points than when at least the ones that can be detected get points? (And yes, I know that utility rating would benefit from taking more situational data into consideration to reduce the current emphasis on spamming.)

I'm not convinced that removing utility leads to an improved scoring. In fact I'm afraid it would mainly lead to nerfing sci build scores in favour of engineers (who simply have better healing due to power levels and RSF/MW).

adaephondelat · September 2013

masterkeychnk5 wrote: »

Isn't this inherently wrong then? Someone with massive AoE FaW, TBR, and all the other goodies like transphasic still get more rating (Even if they dont kill as efficiently at all, or at all) then quick and swift kills.

Put Elachi-Disruptors to that list. Ships equipped with those do huge amounts of damage on the scoreboard simply by bypassing shields, without being particulary effective most of the time.

g0h4n4 · September 2013

adaephondelat wrote: »

Put Elachi-Disruptors to that list. Ships equipped with those do huge amounts of damage on the scoreboard simply by bypassing shields, without being particulary effective most of the time.

There should be a distinction between scoreboard damage and actual damage that assist in a kill within lets say 10 seconds or is the kill damage itself with more points for doing the most damage that leads to the kill for example.

Again thanks Hilbert for your time spent setting this up

g0h4n4 · September 2013

hurleybird wrote: »

Here's what I think you need to do.

Add up all the player ratings prior to a match, and if a player does not yet have a rating assign him an average one.

Assign a bonus for the team with the lower rating, and a penalty for the team with the higher rating. Therefore it does not lower your rating as much to lose against better opponents, and it's harder to farm weaker players for rating.

After a certain threshold where one team has an overwhelming rating advance, do not even let the results of that match influence player ratings. For example, if my team is playing against pugs that we are rated 2-3 times higher than, it's possible that even if we win 15-0 we would lose rating because of the penalty. Weighting these kinds of matches would do no benefit to either the losing or winning teams.

This would mean Higher rated players wouldn't risk their rating to play against lower rated players and simple play against higher or equal rated groups. Not a bad way to go about it. But there will be fewer matches with a new player against seasoned veterans due to fear of losing points.

masterkeychnk5 · September 2013

mancom wrote: »

And kill efficiency has an inherent problem too: With increasing match length (which correlates with match balance) kill efficiency goes down if one takes damage per kill or kills per time as a metric.

I do not 100% agree with this assessment.

Ive seen plenty of matches (including TDs) where the matches were decently balanced (15-9)
Where both sides made their kills quite fast. in fact, these days i think matches are alot shorter then in the early days anyway.

What about rating people partially based on being on a winning group? Not sure if this would be interesting, just throwing it out there.

Yes the Sci versus utility will stay problematic (with no fault of your own ofcourse) as long as the other half of sci abilities isnt logged

I just thought sacrifices had to be made for the greater good so to speak.

daybre4k · October 2013

I'm a bit late to comment, sorry! Just want to say the Hilbert Guide Leaderboard is, in my opinion, a positive influence on PvP in STO. It's not perfect, but It's nice to have something like this around.

drkfrontiers · October 2013

Mancom,

Hi there mate - would you like to be featured in my magazine perhaps giving some insight into the leaderboard? The How, Why and Where and perhaps how to setup etc.

I would really appreciate it.

snoge00f · October 2013

And why is my Eng listed as a Tac?

PVP Leaderboards

Comments