Sharing

2012年5月21日 星期一

How to know PCI Express Speed on server?

目前企業比較常見到的 PCIe 2.0 , 一個 lane 的頻寬是  5.0GT/s,  每 8 bit 在傳輸時會使用 10bit 編碼以確保傳輸的資料是正常的, 所以實際的資料傳輸頻寬要打八折

5.0GT/s  * 0.8  =   4Gb/s  =  0.5GB/s

接下來就看有幾個 lane

PCIe x2   =   8 Gb/s   = 1 GB/s
PCIe x4   = 16 Gb/s   = 2 GB/s
PCIe x8   = 32 Gb/s   = 4 GB/s
PCIe x16 = 64 Gb/s   = 8 GB/s


版本
資料傳輸頻寬
單向單通道頻寬
雙向16通道頻寬
原始傳輸率
供電
發表日期
1.0
2Gb/s
250MB/s
8GB/s
2.5GT/s
2002722
1.0a
2Gb/s
250MB/s
8GB/s
2.5GT/s
2003415
1.1
2Gb/s
250MB/s
8GB/s
2.5GT/s
77W
2005328
2.0
4Gb/s
500MB/s
16GB/s
5.0GT/s
225W
20061220
2.1
4Gb/s
500MB/s
16GB/s
5.0GT/s
200934
3.0
8Gb/s
1GB/s
32GB/s
8.0GT/s
20101110
4.0
16.0GT/s
2014-2015[3]


System Bus
Version
Data Rate
Encoding
x1 lane
x4 lane
x8 lane
x16 lane
PCI Express
1.0
2.5 GT/s
8/10
2.0 Gb/s
8.0 Gb/s
16.0 Gb/s
32.0 Gb/s
PCI Express
2.0
5.0 GT/s
8/10
4.0 Gb/s
16.0 Gb/s
32.0 Gb/s
64.0 Gb/s
PCI Express
3.0
8.0 GT/s
128/130
7.9 Gb/s
31.5 Gb/s
63.0 Gb/s
126.0 Gb/s


所以如果是一張 10Gb 的網卡, 上面又有 2 個網孔時

10 Gb/s x 2 = 20 Gb/s  =>  Need  PCIe x 8 ,  所以在計算頻寬時也要注意一下 PCIe 的格規是否有滿足, 不然買了這麼高級的網卡, 也可能被限制住, 就可惜了.

Reference:


以下用 LSI Raid Card 當範例, 如何用 lspci 來查找 PCIe 的速度

# 首先先找出來 LSI 卡的 slot 編號, 前面的數字就是他的編號
wistor@ubuntu:~$ sudo lspci
04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
07:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
07:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)

# 接著利用剛剛那個編號找到更詳細的資訊, 記得要加上 sudo, 不然會看不到 Speed 資訊
wistor@ubuntu:~$ sudo lspci -vvv -s 04:00
04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
        Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9260-4i
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin A routed to IRQ 26
        Region 0: I/O ports at d000 [size=256]
        Region 1: Memory at fbdfc000 (64-bit, non-prefetchable) [size=16K]
        Region 3: Memory at fbd80000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at fbd40000 [disabled] [size=256K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB
        Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Connection timed out
                Not readable
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
                Vector table: BAR=1 offset=00002000
                PBA: BAR=1 offset=00003800
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [138 v1] Power Budgeting <?>
        Kernel driver in use: megaraid_sas
        Kernel modules: megaraid_sas


LnkCap 代表的是卡片本身支援的最高速度, 以上面 LSI 為例就是 5GT/s (PCIe v2), Width x8
LnkSta 代表是實際跑的的速度, 如果正常應該要和 LnkCap 一樣, 才能獲得最大的頻寬
如果發現 LnkSta 的速度比較小, 那就要追一下是不是插槽本身的速度就有限制,
或者是想了解是否有全速利用到插槽的頻寬, 那要怎麼知道是那一個插槽呢?

# 以樹狀圖來看 pci 串連的狀況
wistor@ubuntu:~$ sudo lspci -vt
 \-[0000:00]-+-00.0  Intel Corporation 5520 I/O Hub to ESI Port
             +-01.0-[07]--+-00.0  Intel Corporation 82575EB Gigabit Network Connection
             |            \-00.1  Intel Corporation 82575EB Gigabit Network Connection
             +-03.0-[06]--
             +-04.0-[05]--
             +-05.0-[04]----00.0  LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator]
             +-07.0-[03]--


以上面的例子可以看到 LSI 被接在 5520 I/O Hub 的第五個 Port
用同樣的方法可以查一下他的頻寬

wistor@ubuntu:~$ sudo lspci -vvv -s 00:05.0
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: fbd00000-fbdfffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Subsystem: Intel Corporation Device 0000
        Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
                Address: 00000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise+ LLActRep+ BwNot+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #6, PowerLimit 25.000W; Interlock- NoCompl-

                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd Off, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range BCD, TimeoutDis+ ARIFwd+
                DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis- ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB
        Capabilities: [e0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [150 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Kernel driver in use: pcieport
        Kernel modules: shpchp


以上面來看, LnkCap 是 5GT/s Width x8, 目前 LnkSta 也是 5GT/s Width x8, 很好, 一點也沒有浪費

# 另外這個也是很有用的指令
wistor@ubuntu:~$ sudo lshw -businfo
Bus info          Device     Class      Description
===================================================
pci@0000:04:00.0  scsi6      storage    MegaRAID SAS 2108 [Liberator]
pci@0000:07:00.0  eth0       network    82575EB Gigabit Network Connection
pci@0000:07:00.1  eth1       network    82575EB Gigabit Network Connection
pci@0000:00:1f.2  scsi0      storage    82801JI (ICH10 Family) SATA AHCI Controller
scsi@0:0.0.0      /dev/sda   disk       250GB ST9250610NS
scsi@0:0.0.0,1    /dev/sda1  volume     56GiB EXT4 volume
scsi@0:0.0.0,2    /dev/sda2  volume     175GiB Extended partition
                  /dev/sda5  volume     175GiB Linux swap / Solaris partition




Reference:
https://noc.sara.nl/wiki/Server_Performance_Tuning
http://benjr.tw/node/663

沒有留言: