Continue to Site

Welcome to MCAD Central

Join our MCAD Central community forums, the largest resource for MCAD (Mechanical Computer-Aided Design) professionals, including files, forums, jobs, articles, calendar, and more.

Mechanica on multiple processors

hammerpe said:
That is really cool. How did you create the graph? More
specifically how did you tie the number of threads to
each part of the analysis? Is it simply just cpu
time/elapsed time = # of threads being used?

The graph is from the *.stt file generated by Mechanica
for each job. It gives a CPU Time and an
Elapsed Time for each step - Ive just divided the
two and plotted them up.

I take it the Opteron Quad Core doesnt
hyperthread?


Edited by: moriarty
 
I'm cool with having this thread hijacked.. I'm really
more interested in what are the important computer options
for Mechanica (and WF)
 
Here's the effect of SOLRAM on the benchmark - it just
hogs more memory with no performance improvement!

The *.pas file says "Minimum recommended solram for
direct solver : 606"

Can anyone shed any light on this behaviour?

View attachment 4580

WF5M040
Win7x64
Xeon DP5050 3GHz (1processor,2cores,4threads)
RAM 11 GB PC2-5300
SATA2
Edited by: moriarty
 
Here's another datapoint on my new machine:

WF4 M100
Win7 64bit

Dell Precision T5500
MB Dell
North Bridge Intel 5520 Rev 22
South Bridge Intel 82801JR Rev 22
Single Processor 6-core Xeon X5680 3.33 GHz Westmere
6x32 KB L1 Data Cache
6x32 KB L1 Instructions Cache
6x256 KB L2 Cache
12.288 MB L3 Cache
12 GB DDR3-1333 PC3-10600 (1333 MHz) ECC

Elasped Time 979 seconds
CPU Time 2355 seconds
Memory Usage 7740462 kb
Work Dir Disk Usage 11408209 kb

Interestly, if I turn on Hyperthreading, I get:
Elasped Time 958 seconds
CPU Time 4140 seconds
Memory Usage 698972 kb
Work Dir Disk Usage 11408209 kb
 
moriarty said:
Can anyone shed any light on this
behaviour?

That seems to coincide with my attempt to speed up the
analysis by using a RAMdisk - it didn't affect the analysis
time.

I'm going to try again, now that I have more RAM to spare
 
antran7 said:
moriarty said:
Can anyone shed any light on this

behaviour?



That seems to coincide with my attempt to speed up the

analysis by using a RAMdisk - it didn't affect the analysis

time.



I'm going to try again, now that I have more RAM to spare
Are you using a 64 bit OS? If so I fail to see how these could possible speed up anything now that your OS can directly access all the RAM you can stuff in it. I think these are obsolete workarounds from when the OS could not take advantage of more than 2 or 4 gigs of RAM.
 
dr_gallup could be right - in WF5 the documentation in
HELP on this topic seems VERY obsolete - it talks about
machines with 128MB of RAM.
smiley36.gif


Maybe SOLRAM reserves RAM for the SOLVER Phase so that
other phases cant swamp it and cause SOLVER to page?

I guess if you are short of RAM you would want SOLVER to
be using most of it - this is the big computational phase
so you definitely dont want it paging.

In a modern machine the recommendations about setting
this to half the available RAM seem wrong unless you are
running a job requiring large amounts of RAM.
Edited by: moriarty
 
Hammerpe

Your graph is interesting - it exhibits different
multitasking behavior to my machine.

Maybe it is because its a twin processor - can you turn
one off and try it again?

Maybe because its AMD - anyone out there with twin
processor INTEL?

Maybe it has to do with Mechanica settings?
 
What are the settings for making it run on just one
processor?

You can tell mechanica to use a set number of processors in
config.proMaybe you can turn a processor off in bios?
Edited by: moriarty
 
I've gotten some nice increases in performance by upgrading the priority of the msengine.exe process in the task manager. I usually set it to 'high' or 'realtime', depending on what the privileges allow me.
 
I am going to run the benchmark on my new laptop and old laptop and post the results. However all of my previous posts got deleted and I don't remember the exact Static Analysis Definition settings. I think it was set to multi-pass adaptive with a max polynomial order of 9. But I don't know what the precent convergence limits were set to or what it was converging on. Does anyone have this info??? I posted a screenshot of this before, but like I said somehow all my posts got deleted.
 
Maybe SOLRAM reserves RAM for the SOLVER Phase so that other phases
cant swamp it and cause SOLVER to page?In a avant-garde apparatus the
recommendations about ambience this to bisected the accessible RAM seem
wrong unless you are running a job requiring large amounts of RAM.
 
moriarty,

if you have a single core processor and have hyper-threading turned 'on', you should see no benefit, in fact it may be worse. The floating point math crunched by the matrix solvers for FEA can't use hyper-threading, or so I've read from an article on this topic (if I find it I will post it). I have true multiple processors on an x64 box and it does reduce run times. The performance is not as good as the way ANSYS implements multiple cores, but it does make a very noticeable difference. The type of problem is important as well - smaller problems will not show much improvement and slow hard drives will not help, as you and others have pointed out. Bump the solve RAM some and run the job on a true multi-core machine and see what you get.
 
I posted this in planetPTC a while back - heres a graph of
how a hex Core i7 multithreads in Mechanica - mainly in
Solver and some more in Post - and thats
about it. Room for plenty of improvement there.
 
and another demonstrating how Mechanica doesnt use
hyperthreading at all !

The machine was set up to run, 1 core with and without
H/T, 2 cores ... 6 cores with and without H/T.

Bluelines - the elapsed time is worse with H/T than
without for all cores
Pinklines - best possible performance, actual performance
(bluelines) becomes slightly less efficient with more
cores.
Redlines - CPU time, double for H/T indicating
computational thrash.

4 cores looks best compromise.

View attachment 5065
 
Ill also include these comments off PlanetPTC from Tad Doxsee from PTC - he explains how SOLRAM works in WF5. Only time I've ever seen a description of this - see http://communities.ptc.com/message/162032#162032.

Hi All,

I've been reading this discussion and thought I'd try to clarify a few points.

Hyper-threading

First, concerning hyper-threading, Burt's graphs clearly show that there is no benefit to using hyper-threading. We found similar results through our own testing and therefore recommend that users do not use hyper-threading.

Parallel Processing

For very large models, the most time consuming part of a Mechanica analysis is solving the global stiffness matrix equations. For this part of the analysis, Mechanica uses, by default, all of the available CPU cores for multiprocessing, up to a limit of 64 cores. Today, there are a few other parts of the analysis where Mechanica uses multiple cores and we plan to expand multiprocessing to other parts of the analysis in the future.

RAM and solram

The biggest influences on performance are the amount of RAM in your machine and how that RAM is used by Mechanica.

The amount of memory that use used during an analysis depends on several factors, including the complexity of the model, the desired accuracy of the solution, and the type of analysis or design study you are running. You can see how much total memory an analysis takes by looking at the bottom of the Summary tab of the Run Status dialog. The line you're looking for looks like this:

Maximum Memory Usage (kilobytes): XXXX

If the maximum memory usage of Mechanica plus the memory used by the OS and the other applications exceeds the amount of RAM in your machine, then the operating system (OS) will swap data between RAM and the hard disk, which seriously degrades the performance of your applications. Thus, to achieve maximum performance, you want to make sure that the maximum memory usage is less than the amount of RAM in your machine,

For very large models, the thing that requires the most memory during an analysis is the global stiffness matrix. You can see how big the globalstiffness matrix is by looking on the Checkpoints tab of the Run Status dialog box (also in the .pas file in the study directory).The line you're looking for is

Size of global matrix profile (mb):

Mechanica allows you to limit the amount of memory that the global stiffness matrix will consume by setting the Memory Allocation field in the Solver Settings area of the Run Settings dialog. We often call this Memory Allocation setting "solram". With this setting, you allocate a fixed amount of memory in which to hold slices of the global stiffness matrix thatthe linear equation solver works with at any one time. If the global stiffness matrix is too big to fit in solram, then Mechanica will swap part of the matrix back and forth between disk and RAM using a specialized swapping algorithm that is more efficient than the general swapping algorithm used by the OS.

To explain these concepts in more detail, I describe three different scenarios of how Mechanica using memory during an analysis.

Scenario I

Mechanica runs most efficiently when the entire global stiffness matrix fits in solram and when the total memory used by Mechanica fits in RAM.

For example, suppose you have a machine with 4 GB of RAM and 4 GB of disk allocated to swap space. You run an analysis which needs 1 GB for the global stiffness matrix, K, and 2 GB for everything else, which I'll call DB. If you set solram to 1.5 GB, then, ignoring the RAM used by the operating system and other applications, the memory usage looks like this.


Available:
|--------RAM----------------|------------SWAP-----------|

Used by Mechanica:

******DB*****(####K####----)Ideal
solram

DB + solram < RAMgood (no OS swapping)
K < solram good (no matrix eqn swapping)

In the above, the memory used by DB is shown as ****, the memory used by K is shown as ###, and the memory allocated to solram is inside parentheses (###--). Because K is smaller than solram, there is some memory that is allocated to solram that is unused, shown as ---.This is the ideal situation because K < solram and DB + solram < RAM and hence, no swapping will occur.

Scenario II

Then next most efficient scenario is when the entire amount memory used by Mechanica still fits in RAM, but the global stiffness matrix does not fit in solram.


Available:
|--------RAM----------------|------------SWAP-----------|

Used by Mechanica:

*****DB******(####K#)###
solram

DB + solram < RAMgood (no OS swapping)
K > solram not so good (matrix eqns will be swapped)

In this case, the part of K which does not fit in solram, shown above as ###, will be swapped to disk with specialized, efficient Mechanica code.

In this scenario, the size of solram has some, but not a large, effect on the performance of the analysis. In general, the larger solram is, the faster the global stiffness matrix equations will be solved, as long as the total memory used fits in RAM.

Scenario III

The worst case scenario is when the total memory used by Mechanica does not fit in RAM. If the total memory allocated by Mechanica (and all of the other processes running on your machine) exceeds the total RAM of your machine, then the operating system will swap data.


Available:
|--------RAM----------------|------------SWAP-----------|

Used by Mechanica:

**********DB*******(#####K#######---)
solram

DB + solram > RAM BAD (OS will swap data)
K < solram doesn' t really matter

In this scenario, the analysis will run slowly because the operating system will swap data. If this occurs, it's better to decrease solram so that memory thatMechanica uses remains in RAM, as shown below


Available:
|--------RAM----------------|------------SWAP-----------|

Used by Mechanica:

**********DB*******(#####K#)######
solram

DB + solram < RAMgood (no OS swapping)
K > solram not so good (matrix eqns will be swapped)

This is the same as scenario II above.

There are few other things to keep in mind.
<UL>
<LI dir=ltr style="MARGIN-RIGHT: 0px">If you use a 32-bit Window OS, the maximum amount of memory that any one application can use is 3.2 GB.
<LI style="MARGIN-RIGHT: 0px">Solram is currently limited to a maximum of 8 GB. This maximum will be increased in a future release of Mechanica.</LI>[/list]
Here are some guidelines that you can follow to improve performance.

1. Run on a machine with a 64-bit OS and lots of RAM.
2. Exit other applications, so that Mechanica can use as much RAM as possible.
3. Set solram low enough so that the total memory used by Mechanica is less than your total amount of RAM.
4. If possible, set solram high enough so that the global stiffness matrix fits in solram (but don't violate guideline #3)

Disk Usage

The other major factor that influences performance is disk usage. During an analysis, Mechanica writes all ofit's results to disk. Also, Mechanica temporarily stores on disk intermediate data that is required during the analysis. Although we haven't done detailed studies to determine their actual impact, the following guidelines should help improve performance.

1. Make sure you are not using any drives that are mounted across the network.
2. Use drives that have a generous amount of empty space on them.
3. Occasionally defragment your disks so that data can be written and read in large contiguous blocks.
4. Use fast hard drives.
5. Use disk striping with a redundant array of independent disks (RAID) to increase IO performance.
6. Use a RAM disk instead of a hard disk.
7. Use a solid-state drive instead of a hard disk drive.

Sorry for the length of this note. It's more than I had originally intended to write, but I wanted to explain in detail how to get the maximum performance from your hardware for Mechanica. I would be curious to know if there are users out there who are already following these guidelines and what their real-world experiences are.

Tad Doxsee
PTC
Edited by: moriarty
 
I don't whether this is still relevant for you.

There is a setting in the config.pro to enable multiple cpus for Mechanica.

cpus_to_use 2
sim_run_num_threads all

sim_max_memory_usage 0
sim_solver_memory_allocation 512

cpus_to_use : number of cpus in your pc. I have a dual core hence the 2.

The last setting "sim_solver_memory_allocation" is the amount of RAM memory that the Pro/Mechanica solver is allowed to use. Use the task manager to see how much free RAM you have available and change this setting accordingly.

On my 4 GB Windows Vista 64-bit I set it to 1536 most of the time.

Best regards,

John Bijnens
 

Sponsor

Articles From 3DCAD World

Back
Top