Sharing

2013年8月19日 星期一

Duplicate definition found in icinga/nagios

最近在使用 icinga 來 monitor 系統,希望所有的 service 都儘量能重覆使用,也希望加入的 node 能設定最少的項目,卻能直接有完整的 monitoring 項目。所以會使用大量的 Object Inheritance
http://docs.icinga.org/latest/en/objectinheritance.html

有一篇文章也介紹了怎麼利用 Object Inheritance 和 Hostgroup
http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/

Inheritance service from hostgroup

最常見的例子是對 hostgroup 設置 service,一但新的 host 加入 hostgroup, 就會直接對這個 host 加上 monitoring service

define hostgroup {
        hostgroup_name  linux-servers
        alias           linux-servers
}

define service {
        use                             generic-service
        hostgroup_name                  linux-servers
        service_description             PING
        check_command                   check_ping!200.0,20%!500.0,60%
}

define host {
        use             generic-server
        host_name       test-server1
        hostgroups      linux-servers
        address         192.168.100.100
}


Overwrite service from hostgroup for single host

但有時會遇到加入的 host 不想要延用原來 service 的參數, 比方說 check_ping 第一個參數改成 100.0

define host {
        use             generic-server
        host_name       test-server2
        hostgroups      linux-servers
        address         192.168.100.100
}

define service {
        use                        generic-service
        host_name                  test-server2
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}

這個時候,雖然 icinga 會吐出 ”Warning: Duplicate definition found for service“ 的訊息,但單一 host 的設定會 overwrite hostgroup 的設定, 所以沒問題


Overwrite service from hostgroup for a hostgroup (FAIL)

但最近我卻遇到了無法 overwrite 的怪問題,我的設定如下

define host {
        use             generic-server
        host_name       web-server1
        hostgroups      linux-servers, web-servers
        address         192.168.100.100
}

define host {
        use             generic-server
        host_name       web-server2
        hostgroups      linux-servers, web-servers
        address         192.168.100.101
}

define service {
        use                        generic-service
        hostgroup_name             web-servers
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}


我希望一般的 server, ping 的第一個參數是 200.0, 但屬於 web-server 的部份是 100.0
但出來的結果很奇妙,只有其中一台 web server 是用新的參數

$ grep PING -A 2 -B 1 /var/cache/icinga/objects.cache 
        host_name       web-server1
        service_description     PING
        check_command   check_ping!100.0,20%!500.0,60%
--
        host_name       web-server2
        service_description     PING
        check_command   check_ping!200.0,20%!500.0,60%


Icinga source code Analysis

所以我就很好奇 icinga 對於 Duplication definition 的處理到底為何,原本以為只要在子目錄的設定一定會蓋掉母目錄的設定,但看起來不是這麼一回事,研究了半天,還去挖 source code 來看

parse file order


icinga-core/xdata/xdotemplate.c, loading config 的順序是 DFS
/* process all files in a specific config directory */
int xodtemplate_process_config_dir(char *dirname, int options) {
 /* process all files in the directory... */
 while ((dirfile = readdir(dirp)) != NULL) {

  case S_IFREG:
   /* process the config file */
   result = xodtemplate_process_config_file(file, options);
   break;

  case S_IFDIR:
   /* recurse into subdirectories... */
   result = xodtemplate_process_config_dir(file, options);
   break;

  default:
   /* everything else we ignore */
   break;
  }
 }
}

不過有趣的是 readdir 是沒有排序的,同樣的檔案結構在不同機器可能會得到不一樣的結果
http://www.wretch.cc/blog/awaysu/24060729
http://stackoverflow.com/questions/8977441/does-readdir-guarantee-an-order

root@ops-buildmonitor1:/etc/icinga/conf.d# ls -fl
total 72
-rw-r--r-- 1 root root    3515 Aug 16 18:05 common-commands.cfg
-rw-r--r-- 1 root root    1630 Jun 14 15:15 common-timeperiods.cfg
-rw-r--r-- 1 root root    1514 Aug 14 09:11 common-hostgroups.cfg
-rw-r----- 1 root nagios  3075 Jun 14 15:15 common-contacts.cfg
-rw-r----- 1 root nagios 10596 Aug 19 06:47 common-services.cfg
drwxr-xr-x 5 root root    4096 Aug 19 08:18 .
drwxr-xr-x 2 root root    4096 Aug 19 08:20 hosts
drwxr-xr-x 3 root root    4096 Aug 19 04:46 safesync
-rw-r--r-- 1 root root    5043 Aug 16 05:23 services.cfg
drwxr-xr-x 8 root root    4096 Aug 19 04:46 ..
-rw-r--r-- 1 root root    2242 Jun 14 15:15 generic.cfg
-rw-r--r-- 1 root root     221 Jun 14 15:15 smokeping-services.cfg
drwxr-xr-x 2 root root    4096 Jul 31 02:57 eventhandlers
-rw-r--r-- 1 root root    6122 Aug 13 13:06 common-hosts.cfg


Service Object Generation


xdata/xodtemplate.c, 下面是我的筆記,有點亂,希望看的懂,看不懂的話就看結論好了。
xodtemplate_read_config_data
       xodtemplate_process_config_dir
       xodtemplate_process_config_file
             xodtemplate_add_object_property
                 case XODTEMPLATE_SERVICE:
                        #register service into xodtemplate_service_list
                        xod_begin_def(service);

        xodtemplate_duplicate_services
        1) expand hostgroup and host
            temp_memberlist = xodtemplate_expand_hostgroups_and_hosts(temp_service->hostgroup_name, temp_service->host_name, temp_service->_config_file, temp_service->_start_line);
        2) add into xodtemplate_service_list
             a) first member, use old memory space
                 /* if this is the first duplication, use the existing entry */
             b) other member, use new memory space and add into xodtemplate_service_list at tail
                 result = xodtemplate_duplicate_service(temp_service, this_memberlist->name1);

                 ex: service A (group A) -> service B (group B)
                       => service A (A1) -> service B(B1) -> service A (A2) -> service A (A3) -> service B(B2)

        3) create xobject_skiplist
              a) move single host service into xobject_skiplist and check duplication
              b) move hostgroup service into xobject_skiplist and check duplication

Conclusion

  • Service for single host vs single host
    • In same file, latter one win
    • In differnt file, check the file loading order, but it is very unsafety, it may change when the file modified.
  • Service for single host vs hostgroup
    • configuration for single host win, file loading order is no different
  • Service for hostgroup vs hostgroup
    • no one win, the configuration will become a mess, no matter which file is loading first.
  • Do not setup two hosts in the same "define". Please separate them. Refer to the tricky note for why.

Tricky Note


如果你預期在一個設置中同時讓兩個 host 改變原本 hostgroup 的設定,那可能要失望了,我個人覺得是個 bug.
第二個 host, 在程式中會被當成 hostgroup, 所以會變成 hostgroup vs hostgroup 打架,一切就混亂掉了。
所以 workaround 的方式是分開設定
define hostgroup {
        hostgroup_name  linux-servers
        alias           linux-servers
}

define service {
        use                             generic-service
        hostgroup_name                  linux-servers
        service_description             PING
        check_command                   check_ping!200.0,20%!500.0,60%
}

define host {
        use             generic-server
        host_name       test-server1
        hostgroups      linux-servers
        address         192.168.100.100
}

define host {
        use             generic-server
        host_name       test-server2
        hostgroups      linux-servers
        address         192.168.100.101
}

#### will make a mess ##################################################
#define service {
#        use                        generic-service
#        host_name                  test-server1,test-server2
#        service_description        PING
#        check_command              check_ping!100.0,20%!500.0,60%
#}
#

#### define them separately, inconvenience but works ####################
define service {
        use                        generic-service
        host_name                  test-server1
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}

define service {
        use                        generic-service
        host_name                  test-server2
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}




Custom object variable

還有一種比較複雜的作法是利用 custom object variable 來做到差異化

http://docs.icinga.org/latest/en/customobjectvars.html
http://docs.icinga.org/latest/en/objectinheritance.html#objectinheritance-customobjectvariables

define hostgroup {
        hostgroup_name  linux-servers
        alias           linux-servers
}

define service {
        use                             generic-service
        hostgroup_name                  linux-servers
        service_description             PING
        check_command                   check_ping!$_HOSTPINGPARA$,20%!500.0,60%
}

define host {
        name            generic-linux-server
        hostgroups      linux-servers
        register        0
        _pingpara       200
}

define host {
        use             generic-linux-server
        host_name       test-server1
        hostgroups      linux-servers
        address         192.168.100.100
        _pingpara       100
}

define host {
        use             generic-server
        host_name       test-server2
        hostgroups      linux-servers
        address         192.168.100.101
        _pingprar       100
}


沒有留言: