Improving STO Under Wine (Linux)'s Performance (2014 edition)

auntkathy · July 2014

Header: I originally wrote this guide about 3.25 years ago. It has since gone into the archive (which is fine) - and this thread updates a few details which have changed since then.

Disclaimer: Cryptic / Perfect World does not provide support for this platform/application, so do not ask them to help you with setting it up. It's on an AS-IS basis.

Document version: 20140701-2000

Color key:
Green colored items are generally either safe commands or preferred options. In the case of diagnostics, if you do not run them as root, you have no chance of damaging your system. And, in most cases of the diagnostics, they are information gatherers only and do not do damage by themselves.

Yellow colored items are generally moderate severity. Only change them if you are reasonably certain you know what you're doing. On preferences (especially in the video post section), yellow colored items indicate that these are less optimal ideally, but may work for you just fine. Always write down the original values before you make the changes so you can restore them later if you need to do so.

Orange colored items are generally considered only slightly less severe or bad than red. (They rarely appear in this documentation for the simple fact that it's a quick and slippery slope from yellow to red.)

Red colored items are generally dangerous if not used properly. Only change them if nothing else works and if you've confirmed your hardware supports the configuration you're trying. As with the moderate severity items above, be sure to write down pre-change values to restore them in the event you need to do so.

auntkathy · July 2014

There are a few 'hidden' settings within the kernel that can improve your performance. Especially since they seem to be somewhat undocumented unless you know where to look. Note also: Some distributions have custom tailored and patched kernels. They will offer options that the default and generic Linux kernel may not have done in the same way or even implemented at all. If in doubt, check your distribution documentation to be sure.

The parameters I include here will be for the default generic Linux kernel that most, if not all, should have available.

Linux Kernel Parameters - Note, as with all of these parameters for the kernel, you will need to add them to your boot loader to make them permanent. Most distributions come with Grub these days, so check your distribution documentation on how to test this.

Provided you want to test any of these options prior to making them permanent in your config, you can do as follows (THESE ARE TEMPORARY AND DO NOT APPLY AFTER REBOOT):

Reboot your system.
Enter grub (lilo does not have as much in the way of doing so, but sometimes it works by putting: append="parameters" after your kernel name)
Go to the version of the kernel you wish to boot with (using the arrow keys)
Press E to 'edit'
Go to the end of the line and add these parameters
Press Enter to end editing
Press B to 'boot' with the parameters in question
- iommu The behavior of this kernel command line value depends entirely on your distribution. Most try to honor the default which is to say "enable by default". Note, incorrect settings can cause your system to crash upon kernel boot.
  
  Intel AND AMD systems: iommu=on usually is enabled by default.
  
  If you're sure that you have IOMMU enabled in your bios but you're not seeing the value in your dmesg, you'll want to enable it one of two ways. Operating under the assumption you have this feature enabled in your bios.
  
  (For example, in my distribution of Gentoo, I was able to confirm it this way):
  
  mars ~ # grep IOMMU /etc/kernels/kernel-config-x86_64-3.14.4-ck
  CONFIG_GART_IOMMU=y
  # CONFIG_CALGARY_IOMMU is not set
  CONFIG_IOMMU_HELPER=y
  CONFIG_IOMMU_API=y
  CONFIG_IOMMU_SUPPORT=y
  CONFIG_AMD_IOMMU=y
  CONFIG_AMD_IOMMU_STATS=y
  CONFIG_AMD_IOMMU_V2=y
  CONFIG_INTEL_IOMMU=y
  # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
  CONFIG_INTEL_IOMMU_FLOPPY_WA=y
  # CONFIG_IOMMU_DEBUG is not set
  # CONFIG_IOMMU_STRESS is not set
  
  (Your distribution may have this file located in /boot instead and by another name. Sometimes as /boot/config-KERNELVERSIONHERE (replace KERNELVERSION with the version on your system). If in doubt, just: "ls /boot/config*" If that also fails, some distributions have it built into the proc system. Check to see if /proc/config.gz exists. If it does, just do a: "zgrep IOMMU /proc/config.gz")
  
  The lines bolded are the relevant examples here.
  
  If you're not sure which values are supported, make sure you have the kernel sources installed for your distribution and check this file (it may vary depending on your distribution): /usr/src/linux/Documentation/kernel-parameters.txt
  
  Another possible value you can use, especially if you know your vendor, like ASUS, supports IOMMU by default but doesn't give you the option to change the values on how much memory is in that allocation:
  
  iommu=memaper=3
  
  Make sure you enter it like that. Value of 3 says "256MB" for IOMMU. 2 means 128MB. 1 means 64MB.
  
  Verify it's right in the dmesg:
  
  [ 0.000000] Your BIOS doesn't leave a aperture memory hole
  [ 0.000000] Please enable the IOMMU option in the BIOS setup
  [ 0.000000] This costs you 256 MB of RAM
  [ 0.000000] Mapping aperture over 262144 KB of RAM @ 20000000
  
  [ 0.818077] PCI-DMA: Disabling AGP.
  [ 0.818526] PCI-DMA: aperture base @ 20000000 size 262144 KB
  [ 0.818644] PCI-DMA: using GART IOMMU.
  [ 0.818740] PCI-DMA: Reserving 256MB of IOMMU area in the AGP aperture
Video
- Video drivers - One of the most common ways to improve your performance is if you have your vendor supplied drivers for performance. As is the case for both of the ones listed here, make certain you follow the video driver manufacturer's instructions on how to properly install them and how to include that configuration for X.
  
  Note: Whatever preferences you have for video cards are, there is one truth at the moment: Nvidia has the best 'Unix' experience for video card configuration and drivers (and have forums dedicated to this if you need to address Linux/Free BSD/Solaris questions with respect to their drivers) regardless of your distribution or Unix flavor. As such, it is my experience and greatest recommendation that if you use any video cards and gaming for Wine, use Nvidia (GeForce in particular is their preferred flagship of cards for gaming).
  - ATI - fglrx drivers are available for Linux, depending on your hardware and Linux distribution. Generally, consult with your distribution's documentation to determine the best method of installing them for your system.
    
    Sometimes, the distribution (as is the case with most Redhat/Fedora based systems) tells you to obtain the drivers from the website directly.
    
    Other distributions, like Gentoo, will package them for you. (Either edit your /etc/make.conf to include VIDEO_CARDS="fglrx" and/or emerge ati-drivers)
  - Nvidia - Nvidia drivers are also available for Linux. Method of installation depends on your distribution's documentation to determine the best method of installing them for your system.
    
    Sometimes, the distribution (as is the case with most Redhat/Fedora based systems) tells you to obtain the drivers from the website directly.
    
    Other distributions, like Gentoo, will package them for you. (Either edit your /etc/make.conf to include VIDEO_CARDS="nvidia" and/or emerge nvidia-drivers)
- Video settings
  - OpenGL (xorg/X11 video drivers) Make certain you have OpenGL enabled for your platform before starting X. Referring, of course, to your distribution's documentation on this. (Gentoo users, make sure: 'eselect opengl list' shows nvidia as active will help.) In most cases, OpenGL is enabled, otherwise, by default.
  - DirectDrawRenderer (wine application setting) - The other side to the opengl configuration is, oddly enough, going to depend on your card and your video drivers. Some users find that GDI (now the current standard in Wine) is better for them (as I do now). Others will find that "opengl" is a better choice in wine.
    - GDI for the directdraw renderer method, in this context, is the wine implementation of their version of 'DirectX'. It's not to say it is replacing DirectX (there's no native platform support for DirectX on Unix). It's a new implementation from the ground up. As such, you'll find it is sometimes missing features you'd have in a native Windows environment. Not to worry though, this is rarely permanent.
    - OpenGL for the directdraw renderer method, in this context, means to use OpenGL natively. This is usually a 'safe' choice but has a few drawbacks. While it can be faster in some cases, it is weaker and slower in others.
    The choice on which is largely up to you and what your hardware supports. I'd say go with the default (opengl) first. If that doesn't work for you, try gdi in your wine settings (Winetricks for this). Go with the one that works the best for your situation. At the moment, opengl appears to be winning for STO.
  - Multisampling: Off - If you have the option in your video card settings for X11/xorg to turn this off, do so. It can sometimes introduce graphical anomalies and usually also incurs a performance penalty. Even at the top of the line video card, this generally isn't worth the headache.
  - Composite: Off - Unless you're using compiz or a similar window manager that has 3D effects, there's little reason to use this otherwise. It can significantly drop your performance if enabled (if you have it enabled but do NOT have a 3D window manager {e.g. you use metacity instead which is default for Gnome}, don't worry, it's not going to harm your performance in that case).

auntkathy · July 2014

One easy way to improve your game loading is to cache constant reads or, more accurately, predictable reads to your harddrive. Some distributions do this nicely and some make one's eyes cross trying to determine why the setup is the way it is.

Disk Access
- Readahead - As the name implies, this is for a buffer that reads ahead of your normal requests and caches the information in memory. There are various methods for doing this.
  - Services - Programs you can run via 'services' to achieve improved reads.
    - readahead-early or readahead - This service is more for the system bootup rather than the game. It improves your bootup speed by placing frequently used files during the bootup process in a cache. Impact to the game: none.
    - readahead-later - This service is for anything after initial bootup. This is what you want to enable if your distribution supports it. It may also be simply called 'readahead'. Impact to the game: Good/Ideal.
    - prelink - Some distributions have this as a service, some have it as a cron job. Others have it running every time a new program is installed. What it does is link (or add) libraries a program requires within the program itself. It effectively cuts down the need to read various files all over the disk when a program starts. Impact to the game: initial execution of the game is faster.
    - preload - Most distributions don't include this by default, but it's a neat little service. This one, unlike the others, analyzes your read patterns over time and loads in advance the most frequently used files you access on your system. This way, it's already in memory by the time you go to read the disk. Impact to the game: Good/ideal.
  - Buffers - While the name is slightly less intuitive, the goal is to do this on a filesystem or partition level rather than an application level (unlike the services which do caching on an application level). Depending on your filesystem layout, there are a few options.
    
    Most of these buffers use the Byte system for calcutions in 512byte increments. For edification, 1024 bytes = 1k and 512 bytes = 0.5k. You may have to double your values in order to achieve the right amount. By default, most read-ahead buffers use 128k (or 256). So, 4M would be 8192 (4096 * 2).
    
    In order to determine what the optimal usage for your system is, I'd recommend the Phoronix Test Suite (google that). There are performance benchmarks you can do that will help you figure out what ideal usage is. For most purposes, I found a 4M readahead buffer (per stripe group) to be ideal (mdadm apparently shares this belief).
    - LVM - Logical Volume Manager - Most redhat derivatives these days use lvm. So, changing the value here is pretty easy. Determining the optimal value for your usage is another story. After you've determined what your best value is for your system, you need to determine what the current one is. To do this with lvm: # lvdisplay
      
      Locate the volume you use wine on. On this test CentOS (redhat derivative) system, I will give you an example:
      
      [root@hivemind ~]# mount | grep Vol
      /dev/mapper/VolGroup00-LogVol00 on / type ext4 (rw,noatime)
      
      [root@hivemind ~]# lvdisplay /dev/mapper/VolGroup00-LogVol00 | grep Read
      Read ahead sectors 256
      
      Note where it says "Read ahead sectors". Remember, this is again in 512 byte increments. So, 256 / 2 = 128k. That's not ideal at all. To change it immediately and make this change persistent on reboot, it's one command:
      
      [root@hivemind ~]# lvchange -r 8192 /dev/VolGroup00/LogVol00
      Logical volume "LogVol00" changed
      
      Now, check it again:
      
      [root@hivemind ~]# lvdisplay /dev/mapper/VolGroup00-LogVol00 | grep Read
      Read ahead sectors 8192
      
      That's it. You're done.
    - md software raid - Sometimes, folks (like myself) use software raid. And, sometimes, distributions properly set the raids up and sometimes they don't. While the raids function, they don't necessarily function optimally. In fact, the default chunk size for an md raid is 64k (or 128). As you can imagine, this is pretty crappy with modern hardware.
      
      Most distributions will at least bump this value, if you use the automated partitioning/install tools, to 128k. Still, quite crappy. Ideal benchmarks suggest one of two possible values are useful: 256k (for smaller files) or 1M (for larger files - like the game uses). Unfortunately, newer distributions can also place the value right in the dead zone for performance: 512k. So, you have a few options on combatting this.
      
      You can A.) reboot to distribution rescue cd, backup your file system, unmount the filesystem, stop the md device, zero the superblocks on the partitions, create a new md device with the right chunk value, install a new file system on the recreated raid partition, restore the filesystem, adjust appropriate config files, and reboot normally (it's a pain!) or B.) go with a hack to work around the problem and deal with it the next time you setup your system.
      
      For most folks, I recommend option B. It isn't intrusive and it is pretty easy to do with a LOT less negative impact. In fact, you can find the value that works for you. If you're not happy, you can just set it back to the original value and no harm done.
      
      To find out what the readahead is for your md raid partition: # blockdev --getra /dev/md2 (Replace /dev/md2 with your raid partition!)
      
      [root@unimatrix-01 ~]# blockdev --getra /dev/md0
      256
      
      That's less than ideal. You can change the readahead on the fly without a reboot, but note the change is NOT permanent. If you find a value that works for you, you'll need to put this in a startup script.
      
      [root@unimatrix-01 ~]# blockdev --setra 8192 /dev/md0
      [root@unimatrix-01 ~]# blockdev --getra /dev/md0
      8192
      
      Voila. 4M readahead.
    - partition - If you don't use LVM or mdadm, you can still look at your readahead value and change it accordingly. Use the above section for blockdev's syntax. The same applies.
      
      [root@unimatrix-01 ~]# blockdev --getra /dev/sda3
      256
      
      [root@unimatrix-01 ~]# blockdev --setra 8192 /dev/sda3
      [root@unimatrix-01 ~]# blockdev --getra /dev/sda3
      8192
    - hardware raid - If you're using this option, you generally know where to look for the readahead values and shouldn't bother changing them within the OS. Note, however, if you're using an onboard raid chip on your motherboard, MOST of them are software raid, NOT hardware!
- Raid Levels - This option isn't a service or an immediate setting you can change. This requires planning and consideration. I'll tell you the benefits and drawbacks of each. Make your own choice from there.
  - Raid 0 - Striping. This one has the benefit of being the fastest raid - but with the biggest penalty: This stores pieces of your file system distributed evenly across the number of drives in your system (and in the array). So, if you have 2 disks in Raid 0, this means the chunks go 1 on drive A, 2 on drive B, 3 on drive A, 4 on drive B. If a disk goes bad in this configuration, you have an incomplete filesystem and cannot recover from it. Only use this level if you do regular backups or don't care about the data.
  - Raid 1 - Mirroring. This one is slower on reads than raid 0, but faster reads than just a simple partition. (You can read from either side of the mirror.) However, whenever a change is made on disk A, it is immediately made on disk B. You get all the benefits of a backup and can usually substitute drive A for drive B if drive A goes bad. 2 disks are required to mirror. But you can function if 1 of the 2 is bad/offline.
  - Raid 5 - Striping and Parity. This one is slower than 0 and 1, but has the benefits of being able to reconstruct the array if one of the disks goes bad. Ideally, you want Raid 5 to consist of 3 active drives and 1 hot spare. While Raid 5 can continue to function if 1 of the 3 active drives goes bad, you will want a replacement disk to be on hot standby to repair itself.
  - Raid 0+1 - Striping and Mirroring. This one has the benefits of raid 0 and raid 1. However, unlike raid 0 and raid 1, it requires a minimum of 4 for optimal configuration. If 1 of the 4 drives goes bad, you can usually replace it and rebuild the array. In fact, you can lose 1 drive on each side of the mirror and still function. (So, you can lose 50% of your drives in this configuration) It works like this: (Disk Group 1 (Disk A and Disk B stripe)) <-- mirrors --> (Disk Group 2 (Disk C and Disk D stripe)).
  - Raid 1+0 (or Raid 10) - Mirroring and Striping. Like Raid 0+1, this has the benefits of raid 0 and raid 1, but differs from 0+1 in a key place. Note, just like 0+1, you can lose 1 drive on each side of the mirror and still function. It works like this (Disk Group 1 (Disk A and Disk B mirror)) <-- Stripe --> (Disk Group 2 (Disk C and Disk D mirror)). As you can see, it looks similar, but slightly different, compared to raid 0+1. Most benchmarking suggests this is the best performance and redundancy. (This is my personal preference, but Raid 5 is also valid with less performance if you prefer)
  - JBOD - Concatenated disks. This means you go in sequence through all the blocks on Disk A before using blocks on Disk B. This is probably just as bad as the Raid 0 option in terms of redundancy/failures. There is no redundancy and a very hard road ahead for recovery of your file system. Only use this if you have regular backups and don't care about the data. This has no performance improvements on read or write.

auntkathy · July 2014

One of the lesser tried performance tweaks involves the kernel (and this is different from the kernel parameters section for a good reason). In fact, using a different kernel entirely might be able to help you. Check with your distribution to see if it supports the ck kernel. (Gentoo users, ck-sources has the ~amd64 keyword, so you can unmask and try accordingly.)

Reference URLs: http://en.wikipedia.org/wiki/Con_Kolivas and http://users.on.net/~ckolivas/kernel/

This tends to make the system a bit more responsive under heavier loads and/or video.

---

There is one other option outside of the kernel: schedtool. Sometimes distributions include this and some do not. This particular tool has various methods for scheduling a program. Indeed, the documentation suggests some applications benefit more or less from different levels of priorities. Also note, you should not use this system-wide. Only use it on applications you want to make more interactive, or less. (batch/cron jobs, for example, require a lot less interactivity.)

Interactive is used for gaming, hd movies, etc. For this reason, I recommend using Interactive mode for Wine. Example usage: schedtool -I -e wine Star\ Trek\ Online.exe

In my tests, it improved glxgears scores about 10-15%. And, of course, Star Trek Online felt fairly responsive as well.

--

Filesystems are also a consideration. You might not think so, but reading and writing files can incur penalties depending on the filesystem you use. For the purposes of this documentation, I will color the higher performance ones which are generally stable and available in most distributions.

Note #1: All filesystems (regardless of OS) are subject to fragmentation. Unix (and by extension the filesystem) does a pretty good job of managing this most of the time. However, it is not immune to the problem - even though it does occur much less frequently and certainly less noticeably so than a comparable Windows environment.

Note #2: On systems with SSD drives, you will not want to defragment with xfs. This is not a limitation of the filesystem (as it will do that just fine). More to the point, SSD drives generally do NOT need to be defragmented as it does not improve their performance and only shortens the life of those drives. Magnetic/standard rotational disks, however, will see measurable and perceivable benefit from defragmentation as the technology is significantly different.

ext2 This is the antiquated version of the extended filesystem and does not have journaling to let the filesystem know where it last wrote, so reconstruction and repair can get quite messy. In addition, the performance is only marginally better than ext3. No distribution in the last 4-6 years uses this by default.
ext3 This is the previous generation of extended filesystem for Linux. It is generally recommended that you upgrade to ext4 if possible. See your distribution documentation for more information.
ext4 This is the latest generation of the extended filesystem for Linux. Because most distributions allow you to boot from it and the performance is rather competitive (as well as having the journaling of ext3), it is recommended you use this one. Ext4 also can detect raid striping and adjust parameters accordingly during filesystem creation for better performance.
reiserfs3 This is the current stable generation of the reiser filesystem. It has some performance options and is okay, but many distributions don't allow it by default or during the installation process. In fact, you sometimes have to install it manually.
reiserfs4 This is the latest generation of the reiser filesystem. It is not enabled by default in most distributions as it is currently considered unstable.
jfs This is a high performance filesystem. It, like reiserfs3, is not included by default in many distributions. However, it is quite robust and resilient and has journaling. Because installers do not allow you to select this option during installation, it is currently marked as a 'yellow' severity.
xfs This is a high performance filesystem. In addition to journaling, many distributions also include the xfs_fsr program (sometimes in the xfsdump suite) that allows you to defragment your filesystem without taking the system offline and/or booting from CD/DVD. Note, however, if you plan to use this on Redhat derivatives (Fedora, CentOS, Scientific Linux, Mandriva, Mandrake, RHEL, etc), you'll want to create a separate partition for /home and use an xfs filesystem there. Redhat derivatives do not currently support ROOT and BOOT partitions as xfs, but any other partition can be xfs if you like. Like ext4, xfs can detect raid stripe settings for better performance during filesystem creation.

--

Another place you can improve your performance in a very small way, but appreciable nonetheless, is through compiler options. I am not going to propose or post any 'aggressive' optimizations. I am going to post 'what works' and tell you the caveats.

All distributions share one common flag for stability: -O2

Do not use -O3 (or higher - while gcc accepts higher numbers, it stops at -O3) for any reason as it contains aggressive and dangerous flags that will break some applications horribly (even if they compile OK) and in unexpected ways.

If you want to compile the package to make certain the instruction sets are optimized for your CPU, there are two flags (and 3 if you like) you can use. But before addressing them, I'll explain why they're present and why they are not used in distributions in general. Every distribution tries to make an architecture specific package. For the sake of 64 bit systems, I'll say amd64 bit is the arch (and it is for 64 bit systems - even intel). So, they compile the binaries to work on the widest possible number of CPUs possible. So, -mtune=generic is used during compilation. These are optimizations that work on all platforms in that architecture.

This can miss some key optimizations both for 32 and 64 bit binaries/libraries. While not earth shattering when done on a single package, when done system-wide, it's a noticeable improvement when you can use optimizations native to your system. However, doing so is extremely tedious from an end-user perspective. And distributions out there do not like to create branchpoints and downloads for every possible cpu combination gcc provides an option to optimize for.

In addition to this, once you optimize your programs for your CPU type, they're generally not compatible on older models/classes of your processor or that of the competitor. So, if you optimize programs for an AMD Phenom II, for example, you cannot use them on an Intel i7 or even an AMD FX processor. You could, however, use it on any of the Barcelona class AMD processors (or what they consider amdfam10 in gcc).

So, to compile or optimize for processor instructions that are available on your class CPU: -march=native -mtune=native. If you're compiling for a 32 bit binary, add -m32. For a 64 bit binary -m64. Note, you should generally leave it up to your distribution to determine how to compile for 32bit vs 64bit. (Gentoo users, just put your march and mtune options in the /etc/make.conf, emerge will take care of determining whether to make a binary 32 or 64 bit. Leave the m32 and m64 options out.)

--

There is another less considered option for SSD drives is to change your I/O scheduler. Note, this only applies if you have SSD drives. Doing this on standard rotational / magnetic drives will make your performance worse.

You can change your I/O scheduler to either deadline or noop. Right now, there's something of a discussion going on as to which is better. Some environments suggest one is better than the other. Either way, BOTH are significantly better than CFQ (the current default scheduler in most distributions). I currently use BFQ (as I use the CFS kernel). However, this should be relatively the same performance-wise as those two options. If you're not sure what you're using, do this to find out:

(For /dev/sda, for example):

mars ~# cat /sys/block/sda/queue/scheduler
noop deadline [bfq]

That says that bfq is my default scheduler. As I only use SSD drives in my system, it didn't make sense to compile anything else in. Your kernel and distribution may not give you the option. However, if you do see deadline or noop in that scheduler list, you can do this to enable it:

mars ~# echo "deadline" > /sys/block/sda/queue/scheduler

(Do that for each SSD device you have in your system, sdb, sdc, sdd, etc)

To make it persistent upon boot (only if you can confirm that this has indeed helped you), add the following argument to your grub kernel line: "elevator=deadline" When you reboot, this will by default use the deadline scheduler.

If you're not sure which is better, set it via the echo command as listed above. Then, run a benchmark - as well as use it in practice when playing the game and/or using your system. If it feels better and more responsive, great. If not, try another one. Generally speaking, deadline and noop should always be better ONLY for SSD drives.

--

And lastly, if you use the nvidia drivers for Linux and use the GLShaderDisk cache option (most distributions use it by default), check to see if you have a ".nv" folder in your home directory. If you do, it can sometimes hit your harddrive(s) and/or raid arrays each time it writes/reads these cached shader compiled data files from the disk. While it is intended to be faster, disk I/O is almost never faster than memory. So, I did something more interesting on my system. Since I'm the only one that uses my system, this only applies to me. If you have other people/accounts using X11 on your system, then you should make the decision as to whether or not you want to do this for everyone of them or just one.

mars ~ # tail -1 /etc/fstab
tmpfs /home/janeway/.nv tmpfs norelatime,uid=1000,gid=1000,size=200M,nosuid,mode=0700 1 2

As you can see, I mount /home/janeway/.nv from fstab using the tmpfs filesystem. This stores the contents in memory, rather than on disk, in the default path that the nvidia drivers look to use. However, unlike before, this is no longer hitting my hard drive(s) and/or raid partitions each time it either writes to the directory or reads from it. Instead, this is all in memory - and has the benefit of resetting everytime my system reboots (which I think is a sane choice).

mars ~ # df -h /home/janeway/.nv
Filesystem Size Used Avail Use% Mounted on
tmpfs 200M 45M 156M 23% /home/janeway/.nv

(If you use a distribution that can't do this except after a reboot, make the changes to /etc/fstab and reboot. But, make sure you originally had a ~/.nv directory first - and make sure the uid/gid apply to you as listed in the above example. If you don't know what your uid or gid are, that is outside of the scope of this documentation. (If you don't know, I can't tell you.)

auntkathy · July 2014

Factors to consider: When given a choice by your distribution, choose the 32 bit version of wine for STO. It may be counter-intuitive to think so on a 64 bit OS. Indeed, my system is 64 bit. Unfortunately, however, there's a few reasons why you should choose the 32 bit version where possible.

STO currently (and most games) is currently released as a 32 bit binary and software installation. wine has 2 versions: 64 bit and 32 bit. Since the game only has 32-bit software at the moment, it doesn't matter which you choose. However, I am informed that 32-bit has the least number of installation problems.

--

One frequently asked question, particularly with regards to games of the 3D variety is "why does the game crash after a few hours of playing?" The answer is multi-fold. Your video settings play into this. Max settings will speed this up and it will occur more frequently. But, here is the answer in parts:

First, the STO binary in and of itself is not large address aware. Which is to say that even though you have a 64 bit OS, the game runs in a 32 bit space - and is restricted to 4G of RAM (more accurately, about 3.2 - 3.5). Even if you have 16G of system memory, your game will never use more than 3.5-4G. It will run into an invisible wall with respect to performance if you have your settings turned up at some point or another. Note, STO is not the only game that suffers from this. Believe it or not, WoW (and a lot of other 3D heavy MMOs) does as well.

The solution is for STO (and similar games) to eventually make the games large address aware - with the caveat earlier OSes (like WinXP) will probably not be able to run STO. For the purposes of this documentation, I will not include methods which will alter the binary. I won't include material that will violate the EULA (don't ask in game or via mail either, I will not respond (except maybe to laugh)).

Second, which plays into the first reason, the game continues to use more memory over the course of time. This happens every time you change zones and/or encounter more people/objects (NPCs/etc) in a zone. The higher your graphics settings and the more objects around you to buffer, the more memory the game uses.

Windows 7 gets around this by managing its memory differently - a hack to make 32 bit binaries compatible - which doesn't honor the way 32bit and 64bit binaries should behave in general. It's not necessarily a problem wine is creating so much as honoring the 'right'[tm] way 32bit vs 64bit should work.

auntkathy · July 2014

There are three ways of approaching video settings in game. You can either start with the defaults and adjust, or you can set to maximum and pare down, or you start with the minimum settings and go upward until you find a happy spot for your settings.

My personal preference is go with "Start with the default" options and work from there. Game developers are pretty smart in their choice of "recommended" settings. They assume middle of the road system setups. If you know your hardware and setup are better than average, then you can tweak upward from there. Conversely, if you know your setup is less than ideal, you can pare your settings down from default.

As such, my recommendation is to go with the recommended default settings for video. With a few additional rules.

Max Shadowed Lights - For some reason, GDI doesn't seem to do well with more than 3 or 4. Max setting causes weird issues on ground missions where explosions from mines and grenades cause the screen to flicker black around the detonation point. So, 3 shadows is about 'right'. 4 seems to be pushing it. 5 is usually very bad. And, of course, OpenGL in DirectRendererMethod tends to not be able to see shadows at all.
Framerate Stabilizer - This should be set to On. It tends to work the best.
Auto-stabilize Framerate - This should be set to On. It tends to work the best.

If you prefer a barebones approach, this may help: STO Performance and Frame Rate Guide v1.0 . The recommended settings in this link will pare the game down to the lowest possible values and make it otherwise playable for you.

If you prefer a maximum settings approach, I still recommend my previous list above - with the other settings at maximum. The issue is with shadowed lights and frame stabilization. Middle of the road or maximum settings share those 3 options. The rest is flexible.

auntkathy · July 2014

In the [url=""]Disk Access[/url] post, I mentioned very basic numbers on how to improve the readahead buffer size for your disks (whether through LVM, mdadm, or direct file system access). Now, I'm going to go into this a bit further to explain how to derive some of the numbers. Again, if you find a different value works better for you, use it. This is a guide for general purpose use with respect to the game.

As mentioned in the previous post, a 4M readahead buffer (per stripe group) works best in general. The caveat in parenthesis may confuse some. So I'll explain that a little better here. However, note, the actual value may depend on your stripe size for your raid. But, I will start off by explaining to the LVM and flat partition users something else.

Disk types
- LVM and Flat Partition - Users that use these type typically don't have raids associated with them. This is usually because LVM can't contain the /boot partition.
  
  The exception to this rule is those folks using on-board raid chips on their motherboard. You're using a software raid at that point (sorry, but it's sadly true). It then causes the kernel and installation method to choose a 'dmraid' (device mapper) which then creates a software version of raid - just as if you had done it manually - for you. So, if your on-board raid chip is set to do any level of raid, you want to go to the mdadm section and read further.
  
  LVM and flat partition users, your life is somewhat easier. You don't have to worry about striping or chunk sizes. In your case, you'll just use a flat value of 8192 (or 4M) for your readahead. Noting, of course, that if you find a value that works better for your configuration, you can skip the rest of this post as it does not apply to you.
- software raid users (mdadm OR onboard raid chips) - This section is fairly complicated. In order to determine the 'right value' for you, you're going to have to know a few things. * Onboard raid chip users, please check your onboard raid setup in BIOS and confirm what the chunk size is. ** mdadm users, you must select the chunk size at md device creation. If you don't know, check /proc/mdstat and it will tell you. By default, if not specified (and manually created) on the command line, it is 64k. (Which, as you know, is pretty awful.)
  - Raid level
  - Chunk size
  - Number of disks in the array you're looking at.
  The value of your readahead is going to change depending on your answers. As such, the following section will describe useful values for you.
  - Raid 1 - which is mirroring, has no striping at all. It just copies data from one disk (or raid partition) to the other as an identical copy. Whatever exists on one, exists on the other. Your value is easy too: 8192 (or 4M).
  - Raid 0 - this one is somewhat tricky. It depends on your chunk size. So, if you chose a chunk size of 256k (I think most onboard raid chipsets choose 64k or 128k at most) and have 4 disks in your raid 0:
    
    (256k * 4) = 1M
    1M = 1024k
    1024k * 2 = 2048 readahead
    
    If you chose 512k as your chunk size:
    
    (512k * 4) = 2M
    2M = 2048k
    2048k * 2 = 4096 readahead
    
    Typical onboard raid controllers have these as their values, however:
    
    64k value
    
    (64k * 4) = 256k
    256k * 2 = 512 readahead
    
    or 128k value
    
    (128k * 4) = 512k
    512k * 2 = 1024 readahead
  - Raid 5 - I haven't done any substantial testing on this to give a good answer as to how the values are calculated. However, if it follows the logic, it should be ((number of active disks - 1) * chunk size). I'll have to get some confirmation and update this post later.
  - Raid 10 - This one is fairly interesting. This one is (stripes * mirrors * chunk size), In a 4 disk raid 10 configuration, assuming a 1024k chunk size (meaning that this has 4 mirrors and 2 stripes):
    
    1024k * 4 {mirrors} * 2 {stripes} = 8M
    8M readahead = 8192k
    8192k * 2 = 16384 readahead
    
    512k chunk and 4 disks:
    512k * 4 {mirrors} * 2 {stripes} = 4M
    4M readahead = 4096
    4096 * 2 = 8192 readahead
    
    256k chunk and 4 disks:
    256 * 4 {mirrors} * 2 {stripes} = 2M
    2048 * 2 = 4096 readahead
    
    128k chunk and 4 disks:
    128k * 4 {mirrors} * 2 {stripes} = 1M
    1024 * 2 = 2048 readahead
    
    64k chunk and 4 disks:
    64k * 4 {mirrors} * 2 {stripes) = 512k
    512k * 2 = 1024 readahead
    - Layouts for raid 10. If you decide to use this raid, you also have an option (if you select the mdadm choice to create your raid) of layout. There is a 'near' method (used by Redhat distributions) by default and a 'far' method. I'll explain what that means.
      - near - This method means that your disks will write data at identical places on each of the mirrors. Essentially, they will be in identical spots. This has a benefit of making it easier to 'dd' a disk partition while it's offline and copying it. It also could potentially be bad if your disks have a bad spot in exactly the same place on each disk.
      - far - This method means that your disks will write data at offset places on each of the mirrors. It makes copying the layout via 'dd' a bad idea. (Then again, you should be using mdadm to reconstruct your arrays anyway.) This has the benefit of making disk seeks slightly different on each of the mirrors and reducing the chance you could run into identical spots on two different drives that are bad (noting, of course, that this is only typical if you receive drives from the same manufacturer in the same batch - which can happen if you buy them all at the same time from the same vendor).
  - Raid 0+1 works similarly, in fact. The numbers are slightly different. Instead of 4 mirrors, you have 4 stripes and 2 mirrors. But, the math works out the same:
    
    1024k * 2 {mirrors} * 4 {stripes} = 8M
    8M readahead = 8192k
    8192k * 2 = 16384 readahead
    
    512k chunk and 4 disks:
    512k * 2 {mirrors} * 4 {stripes} = 4M
    4M readahead = 4096
    4096 * 2 = 8192 readahead
    
    256k chunk and 4 disks:
    256 * 2 {mirrors} * 4 {stripes} = 2M
    2048 * 2 = 4096 readahead
    
    128k chunk and 4 disks:
    128k * 2 {mirrors} * 4 {stripes} = 1M
    1024 * 2 = 2048 readahead
    
    64k chunk and 4 disks:
    64k * 2 {mirrors} * 4 {stripes) = 512k
    512k * 2 = 1024 readahead

auntkathy · July 2014

If your Linux distribution gives you the option to specify layout, keep in mind a few considerations:

How much redundancy do I need?
How much space do I need?
How important is speed vs. redundancy?
What is my primary use? (Gaming, development, just web browsing)
- Development / databases - I'm not going to cover those since it's a fairly light usage system for the most part and partition layout is wildly different depending on which of those two you go with.
- Desktop gaming (also for browsing)

Raid considerations - As you can see in the previous post, different raid configurations require different settings. So, how do you choose what's best for you?

Simple - If you don't like complex configurations, then a carte blanche onboard raid configuration system-wide is probably a better choice for you. Less to manage, less to think about. While not as optimal, it's easier to remember.
Complex - This is if your need for speed is greater than your need for simplicity. I'll give you some possible options to consider and you can see how/which of these options to apply in your setup.

For my system, I have 4 1 Terabyte Samsung EVO SSD SATA drives (with 64M onboard cache). I didn't like the idea of using the onboard software RAID controller (and really still don't) because it doesn't let me choose configurations which might be a bit wiser for my usage. I'll explain as such.

Linux distributions don't like for /boot to be on any partition it can't read immediately and without a minimal amount of overhead and kernel options. So, how do I accomplish redundancy and still manage to give the kernel what it wants?

Simple. I partitioned each of the 4 drives accordingly:

partition 1 {boot} is a 500M partition on all 4 drives.
partition 2 {swap) is a 6.5G partition on all 4 drives. (I'll get to why in a second)
partition 3 {root} is a 250G partition on all 4 drives.
partition 4 {home} is a roughly 0.7T partition on all 4 drives

I then configured each partition as a raid autodetect (type: fd in fdisk).

For partition 1, I wanted it to meet the Linux kernel requirements but still provide redundancy in event of drive failure. So, I made partition 1 on all drives a raid 1 (mirroring) array. This means if I make changes to one drive, all of the other see it. Indeed, you can then install your boot loader on each of the disks and still boot your system with no additional configuration following.

Because the chunk size doesn't matter on the boot partition (it's fairly small anyway), I used:

# mdadm -C /dev/md0 -l 1 -n 4 --bitmap=internal -e 0.90 /dev/sd[a-d]1

Voila. If you're using Redhat derivatives (Fedora, CentOS, Scientific Linux, Mandriva, Mandrake), you can do this in the GUI by specifying your own custom layout during initial installation of the OS, creating each of the partitions as 'software raid' but with similar layouts to the above.

Note: If you're using Redhat derivatives, make certain that you use ext4 (for newer releases) or ext3 (for older releases) on your boot partition as the filesystem. Since I don't use a redhat derivative, I chose xfs for mine. See the above post regarding filesystems and the "optimal" ones.

For partition 2, this is the swap partition. I know a few people are scratching their heads going "why 6.5G"? That's easy. It's not going to be 6.5G in the end. With a swap partition, you don't want all of your I/O on one disk if you can avoid it. Especially with a raid. And since this data is NOT volatile (in this context, that means, it doesn't matter if it becomes corrupt, you can always reformat it with zero loss to you), it made raid 0 the perfect choice for this configuration.

# mdadm -C /dev/md1 -l 0 -n 4 -e 0.90 /dev/sd[a-d]2 --chunk=128k

Now, when you go to use the swap space, it will spread I/O out across the disks rather than a single one. You'll thank yourself later for doing that. In addition, that 6.5G is because it is 24G in the end because it's 6.5+6.5+6.5+6.5 in raid 0. So, instead of 6.5G of swap space, I actually have 24G (more than sufficient). (I used a 128k chunk because swap shouldn't be in large chunks.)

For partition 3, this is the root partition. This is where most of your OS (but not personal files) will sit. I decided that raid 10 was ideal. However, I didn't necessarily need massive chunks as I would with the home partition later. I also wanted it to be setup in 'near' mode (as mentioned in my previous advanced raid configuration post) So, I set it up like this:

# mdadm -C /dev/md2 -l 10 -n 4 -p n2 --bitmap=internal --chunk=1024k -e 0.90 /dev/sd[a-d]3

256k-512k is the sweet spot if you're using smaller files. On my system, I use a mix of small and large files in the root partition. Smaller chunks are better for that partition than the home partition, due to the large amount of smaller files I access frequently. Thus, the home partition has a larger chunk size.

That 250G on the 4 drives in raid 10 config became 500G for the root partition.

Note: If you're using Redhat derivatives, make certain that you use ext4 (for newer releases) or ext3 (for older releases) on your root partition as the filesystem. See the above post regarding filesystems and the "optimal" ones.

For partition 4, this is the home partition. This is where most of your personal files will sit. I decided that raid 10 was also ideal here. However, unlike the root partition, I would have significantly larger files here. The .hogg files in this game are particularly huge and I use them frequently. Unlike the root partition, I wanted this one in 'far' mode for raid 10.

# mdadm -C /dev/md3 -l 10 -n 4 -p f2 --bitmap=internal --chunk=2048k -e 0.90 /dev/sd[a-d]5

Note, for all of these, I also used "--assume-clean" so that it wouldn't attempt to rebuild the array after I created it. Whether or not you use it is up to you. The 'bitmap' flag is important too (except in the case of md1 - the swap partition. You don't care if data there is permanent). What this says is "If you need to rebuild, use the stopping point kept in the metadata" to start your rebuild from when you stop and restart the raid array again. This way, you don't have to rebuild the array from the start every time you stop it (for example, during a rebuild).

That 0.7T partition became 1.4T for the home partition in raid 10.

Note: You may use whatever filesystem is supported by your distribution for this. Either ext4 or xfs are ideal candidates..

As you can see from the configurations I made, I chose different optimizations and layouts depending on use. This allows me to use each partition at it's optimal configuration without sacrificing performance for the uses they provide.

auntkathy · July 2014

One of the truths about determining if you 'need' to make performance improvements is knowing how to diagnose problems, determining what your current status is, and seeing just how what changes you make have an impact on the system.

So, this will cover a few useful and, to a greater extent, lesser known utilities. (If you're an expert Linux user, you already know this so I don't mean you.)

Note, as with all of my instructions, whether or not a package or command is available for your distribution depends entirely on the distribution and/or whether or not you installed it. Research may be necessary on your part to determine what package, if any, is available for your particular distribution to obtain access to the commands.

Useful diagnostic tools
- lsof As the name implies, this is "listing" "open files". (ls meaning list, of course). This helps you determine, for example, if a program has a series of files open on your system and whether or not that file is being read vs written to.
  
  For example, as root (you may need to run this by prefixing the command with 'sudo' depending on your distribution): # lsof | grep bash
  
  That will give you a list of files either opened by a bash process and/or any processes matching the word 'bash'.
  
  Use in STO's case: Determine if a program, like wine, has files open even though the program may not be on your screen. Example: Wine crashes but you can't start it up and don't see any error messages. Example #2: Wine seemingly stalls during the installation process of STO but is in fact running normally (during the directx phase of the install is a prime target).
- grep effectively means 'grab' (so think of it as a filter for matching text.) You can also pass it the -i flag to tell it to do a case insensitive search like so: # lsof | grep -i bash
  
  You can also use grep on a file instead of grabbing data passed to it through stdin (often referred to in longer form as 'standard in'): # grep root /etc/passwd
  
  Use in STO's case: Filter through the results of lsof, as listed above, to make it easier to search for open files.
- watch This program means it does somewhat as it implies. It runs the command you tell it to at regular intervals (default of 2 seconds) and displays the results on your screen. (Control-C will break out of it.)
  
  Example usage: # watch "grep MHz /proc/cpuinfo"
  
  Use in STO's case: Use it to watch the files being opened, written to, and otherwise used during the installation process. (Not the only usage mind you, but it's a good one.) Used in combination with the above lsof and grep commands, this is particularly useful.
- dmesg This command allows you to see 'diagnostic' messages printed by the kernel, either from when it booted or have shown up since the system has been online.
  
  Example usage (which may or may not require root depending on your distribution): # dmesg
  
  In this example, I am only including a small snippet of dmesg on my system (as it's otherwise quite large):
  
  [ 0.000000] Linux version 2.6.36-ck-r5 (root@unimatrix-01) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #11 SMP Mon Mar 14 21:04:30 PDT 2011
  [ 0.000000] Command line: root=/dev/ram0 init=/linuxrc ramdisk=8192 real_root=/dev/md2 amd_iommu=on iommu=memaper=3 hpet=disable
  [ 0.000000] BIOS-provided physical RAM map:
- /proc/cpuinfo This file lets you know how many CPUs you have in your system, what the features are for them, what the current clock speed is, and what instruction sets are available. Note, in my example, my clock speed is fairly low since I have my system, when not gaming, set to 'ondemand' (Redhat derivatives can do the same thing through the 'cpuspeed' service):
  
  Example usage: # cat /proc/cpuinfo
  
  I'm only including a part of one of my 8 cpus just for viewing so you know what to look for:
  
  processor : 7
  vendor_id : AuthenticAMD
  cpu family : 21
  model : 1
  model name : AMD FX(tm)-8150 Eight-Core Processor
  stepping : 2
  microcode : 0x600063d
  cpu MHz : 1400.000
  cache size : 2048 KB
  physical id : 0
  siblings : 8
  core id : 7
  cpu cores : 4
  apicid : 23
  initial apicid : 7
  fpu : yes
  fpu_exception : yes
  cpuid level : 13
  wp : yes
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
  bogomips : 7224.48
  TLB size : 1536 4K pages
  clflush size : 64
  cache_alignment : 64
  address sizes : 48 bits physical, 48 bits virtual
  power management: ts ttp tm 100mhzsteps hwpstate cpb
  
  At the top, processor 7 means as such: All system CPUs start at 0 and count upward. So, consider 0 actually the 'first CPU'. In this case, 7 means the 8th CPU in my system.
- glxgears This program is not always included on systems by default. Whether it is or not depends on your distribution and package selection. It is, however, usually a pretty good idea as to whether or not you have opengl graphics working on your system. Running it will show a greater number of frames, usually, than your system is capable of producing in game or any 3D application.
  
  Note: If you get values higher than the refresh rate of your monitor, it does not actually mean you will get framerates that high. In fact, it means nothing useful to you at all. You'll want the framerate to match that of your monitor. In my case, that is 60FPS as my High Def LCD TV does 60FPS (sorry, even if it reported 15,000 or 1200, you'd never see that on ANY device in the world).
  
  # glxgears
  janeway@mars ~ $ glxgears
  Running synchronized to the vertical refresh. The framerate should be
  approximately the same as the monitor refresh rate.
  298 frames in 5.0 seconds = 59.428 FPS
  301 frames in 5.0 seconds = 60.001 FPS
  300 frames in 5.0 seconds = 59.995 FPS
  301 frames in 5.0 seconds = 60.002 FPS
  301 frames in 5.0 seconds = 60.006 FPS
  300 frames in 5.0 seconds = 59.998 FPS
  ^C
  
  Looks good so far.
  
  As I mention in the video section, your CPU speed does have an impact on your framerate in OpenGL. glxgears is OpenGL - NOT DirectX so it is CPU bound and single threaded (which is to say it only runs on one processor).
- glxinfo This command lets you see the OpenGL extensions available via your drivers. It is also useful when determining, if you are uncertain, what your card(s) is (are).
  
  janeway@mars ~ $ glxinfo | grep renderer
  OpenGL renderer string: GeForce GTX 680/PCIe/SSE2
- dmidecode This command lets you see information contained in your BIOS within Linux (without rebooting). Many distributions include this by default, many do not. Check with your distribution.
  
  Taking a snippet out from the output:
  
  Handle 0x0002, DMI type 2, 15 bytes
  Base Board Information
  Manufacturer: ASUSTeK COMPUTER INC.
  Product Name: M5A99FX PRO R2.0
  
  That is the correct board.
- dstat This particular utility is a fair combination of a few others in one handy spot. This one is not traditionally included with most distributions, so check with yours to see where/how to acquire it.
  
  The default configuration shows total CPU usage, disk reads/writes in aggregate, network send and receives in aggregate, swap file paging (in and out), as well as system interrupt and waits. This is particularly useful if you're trying to determine what, if anything, is going on. A particularly good usage of this command is during the STO install (again during the DirectX portion of the install) where you're not sure what's going on because it seemingly stalls but hasn't.
  
  Example: # dstat
- top This particular utility is the bread and butter of determining what is going on with your system. It is traditionally included with most Unix (and, by extension, Linux) distributions. Default configuration is somewhat hard on the eyes to sort through. Hitting z while in top (make sure this is a lowercase z) will change the colors for you if your terminal supports color. Capital z will allow you to customize the colors for your preferences.
  
  You can also sort by any of the columns - using the "?" question mark key for help on the syntax and keystrokes needed to do this.
  
  Note, some distributions allow you to save top settings. Though, don't necessarily count on it. Not all do.
- htop This particular utility is the 'cooler' version of top. Unlike top, however, this is colorized by default and easier to intuit. CPUs are all shown with bars colored and graphed according to usage (blue bars are niced processes usage on CPU, red is system IO, and green is standard processes using the CPU). In addition, this program supports mouse clicks, unlike top, so you can click on an application and it will remain highlighted for ease of tracking.
  
  Like top, it can sort based on preference. Though, you don't need to fish through commands to do this. Just click on the column headers and it will sort accordingly. Like top, you can also kill and renice processes through this utility.
BIOS settings
- IOMMU - Depending on your video card (PCIE cards, not AGP), this is particularly important to remember to have activated in your BIOS. Generally speaking (as is the case with some ASUS motherboards), if you do not see the option, do not have an AGP card (and use PCIE), it's probably turned on. However, you must verify this with your motherboard vendor's documentation. (Later, in the kernel parameters, I'll mention how you can also set this if you do NOT have the option in your BIOS)
- AGP Aperture - Depending on your video card (AGP cards, not usually PCIE), this is particularly important to remember to have activated in your BIOS. Generally speaking, you want an aperture of about a minimum of 64MB (I'd say closer to 128MB or 256MB to be safe). Note, however, that this does somewhat eat into your system's RAM, so if you're short on RAM, you may need to make a choice of more RAM or more resources to your video card.

Improving STO Under Wine (Linux)'s Performance (2014 edition)

Comments