Changeset 940 for trunk/Monitoring
- Timestamp:
- 08/26/13 09:45:18 (11 years ago)
- Location:
- trunk/Monitoring
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/Monitoring/Broker/README
r930 r940 1 - requirements 2 - CentOS 6 3 - outgoing internet access (port 80) 4 - Forge svn repository access 1 5 - Broker installation procedure (as root) 2 6 - check if EPEL repository is configured, if not, do steps below … … 5 9 - wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm 6 10 - rpm -ivh epel-release-6-8.noarch.rpm 11 - add user in /etc/passwd 12 - adduser -m -d /home/csbroker -s /sbin/nologin csbroker 13 - note 14 - csbroker stands for "Climate Simulation broker" 7 15 - RabbitMQ server 8 16 yum install nc … … 29 37 - mv -i rabbitmqadmin /usr/sbin 30 38 - chmod +x /usr/sbin/rabbitmqadmin 39 - change owner (to run rabbitmq as non-root) 40 - chown -R csbroker /var/log/rabbitmq 41 - chown -R csbroker /var/lib/rabbitmq 42 - install startup file in /etc/init.d 43 <--- 44 #!/bin/bash 45 # chkconfig: 2345 20 80 46 # description: AMQP service 47 48 RABBITMQ_SBIN=/opt/rabbitmq-server-3.0.2/sbin 49 50 case "$1" in 51 start) 52 sudo -H -u csbroker -- $RABBITMQ_SBIN/rabbitmq-server -detached 53 ;; 54 stop) 55 sudo -H -u csbroker $RABBITMQ_SBIN/rabbitmqctl stop 56 ;; 57 status) 58 sudo -H -u csbroker $RABBITMQ_SBIN/rabbitmqctl status 59 ;; 60 *) 61 echo $"Usage: $0 {start|stop}" 62 esac 63 64 exit 0 65 ---> 66 - chmod +x /etc/init.d/rabbitmq 67 - create startup symlinks with command below 68 - chkconfig --add rabbitmq 31 69 - conf 32 70 - vi /opt/rabbitmq-server-3.0.2/sbin/rabbitmq-defaults … … 47 85 sasl log : /var/log/rabbitmq/rabbit-sasl.log 48 86 database dir: /var/lib/rabbitmq/mnesia/rabbit 49 - run 50 - to start the daemon, use command below as root 51 - cd /opt/rabbitmq-server-3.0.2/sbin 52 - ./rabbitmq-server -detached 53 - to stop the daemon, use one of those 54 - kill -TERM $(pidof epmd) 55 - ./rabbitmqctl stop 87 - usage 88 - to start the daemon, run command below (as root) 89 - /etc/init.d/rabbitmq start 90 - note 91 - the warning message below is normal 92 - Warning: PID file not written; -detached was passed. 93 - to stop the daemon, use 94 - /etc/init.d/rabbitmq stop 56 95 - note 57 96 - it's normal to have the erlang process below running after stop (it's an erlang generic registry stuff) … … 68 107 - optional 69 108 - ./rabbitmqadmin list queues 109 # vim: set ts=4: -
trunk/Monitoring/CNClient/README
r866 r940 1 - requirements 2 - CentOS 6 3 - outgoing internet access (port 80) 4 - Forge svn repository access 5 - add Unix user 6 - adduser -m -d /home/cscompute -s /sbin/nologin cscompute 7 - check if EPEL repository is configured, if not, do steps below 8 - for CENTOS 6 9 (from http://www.tecmint.com/how-to-enable-epel-repository-for-rhel-centos-6-5/) 10 - wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm 11 - rpm -ivh epel-release-6-8.noarch.rpm 12 - note 13 - 14 - if you get error below 15 Could not parse metalink https://mirrors.fedoraproject.org/metalink?repo=epel-6&arch=x86_64 error was 16 No repomd file 17 - then, manually copy files below from another machine 18 - repomd.xml 19 - metalink.xml 20 - tools 21 - yum install gcc screen 1 22 - this program uses rabbitmq-c library (v0.3.0) 2 23 - https://github.com/alanxz/rabbitmq-c … … 4 25 - using system package 5 26 - install using commands below 6 - aptitudeinstall librabbitmq07 - aptitudeinstall librabbitmq-dev27 - yum install librabbitmq0 28 - yum install librabbitmq-dev 8 29 - note that it is likely that system packages versions are out of date (we need v0.3.0) 9 so betteruse installation from source30 in this case, use installation from source 10 31 - from source 11 32 - retrieve source … … 35 56 (because of the above error (AAA), you need to run make twice) 36 57 - make install 37 - compilation (static) 38 - gcc -static -I/usr/local/include -L/usr/local/lib -Wall -o sendAMQPMsg send_AMQP_msg.c -lrabbitmq 39 - we get warning below during compilation 40 - 41 <--- 42 /usr/local/lib/librabbitmq.a(librabbitmq_librabbitmq_la-amqp_socket.o): In function `amqp_open_socket': 43 rabbitmq-c-rabbitmq-c-v0.3.0/librabbitmq/amqp_socket.c:66: warning: Using 'getaddrinfo' in statically 44 linked applications requires at runtime the shared libraries from the glibc version used for linking 45 ---> 46 - it means that you may need to be sure all computing node have the same glibc version !!!! 47 - also means that a different binary must be use in each computing center 58 - sendAMQPMsg installation 59 - svn co svn+ssh://<login here>@forge.ipsl.jussieu.fr/ipsl/forge/projets/libigcm/svn/trunk/Monitoring/CNClient 60 - compilation (static) 61 - gcc -static -I/usr/local/include -L/usr/local/lib -Wall -o sendAMQPMsg send_AMQP_msg.c -lrabbitmq 62 - we get warning below during compilation 63 - 64 <--- 65 /usr/local/lib/librabbitmq.a(librabbitmq_librabbitmq_la-amqp_socket.o): In function `amqp_open_socket': 66 rabbitmq-c-rabbitmq-c-v0.3.0/librabbitmq/amqp_socket.c:66: warning: Using 'getaddrinfo' in statically 67 linked applications requires at runtime the shared libraries from the glibc version used for linking 68 ---> 69 - it means that you may need to be sure all computing node have the same glibc version !!!! 70 - also means that a different binary must be use in each computing center 48 71 - usage 49 72 - to send a message in the queue, do -
trunk/Monitoring/Watch/watch
r937 r940 22 22 from email.mime.text import MIMEText 23 23 import datetime 24 import logging 24 25 25 26 # line below is to include "smon" package in the search path … … 31 32 CSTE_BROKER_HOST='cstest-broker.ipsl.jussieu.fr' # cstest 32 33 #CSTE_BROKER_HOST='localhost' # vesg4 34 CSTE_LOG_DIR='/var/log/cssupervisor' 35 CSTE_LOG_FILENAME_MAIN='supervisor.log' 36 CSTE_LOG_FILENAME_DEBUG='debug.log' 37 CSTE_LOG_FILENAME_MSG='message.log' # log AMQP msgs 38 CSTE_LOG_FILE_MAIN="%s/%s"%(CSTE_LOG_DIR,CSTE_LOG_FILENAME_MAIN) 39 40 # logger init. 41 logging.basicConfig(filename=CSTE_LOG_FILE_MAIN,level=logging.INFO,) 42 33 43 34 44 class Mail(): … … 116 126 simulation=smon.types.Simulation(name=message.simuid,status="running") 117 127 118 119 128 repo_io.create_simulation(simulation) 120 129 … … 124 133 125 134 @classmethod 126 def print_stdout(cls,message): 127 # used for debug 135 def log_debug(cls,line): 136 cls.log(CSTE_LOG_FILENAME_DEBUG,line) 137 138 @classmethod 139 def log(cls,filename,line): 140 with open("%s/%s"%(CSTE_LOG_DIR,filename), "a") as log_file: 141 log_file.write("%s %s\n"%(datetime.datetime.now().strftime('%Y%m%d_%H%M%S'), line)) 142 143 @classmethod 144 def log_msg(cls,message): 145 line="%s %s %s %s"%(message.code,message.jobid,message.timestamp,message.command) 146 cls.log(CSTE_LOG_FILENAME_MSG,line) 128 147 129 148 """ 130 149 if message.file is not None: 131 print"%s %s %s %s %s\n"%(message.code,message.jobid,message.command,message.timestamp,message.file)150 "%s %s %s %s %s\n"%(message.code,message.jobid,message.command,message.timestamp,message.file) 132 151 else: 133 print "%s %s %s %s\n"%(message.code,message.jobid,message.command,message.timestamp) 134 """ 135 136 print "%s %s %s %s\n"%(message.code,message.jobid,message.command,message.timestamp) 137 #pass 138 139 @classmethod 140 def log(cls,message): 141 with open("/opt/supervisor/log/supervisor.log", "a") as log_file: 142 log_file.write("%s %s %s %s %s\n"%(datetime.datetime.now().strftime('%Y%m%d_%H%M%S'), message.code,message.jobid,message.timestamp,message.command)) 152 "%s %s %s %s\n"%(message.code,message.jobid,message.command,message.timestamp) 153 """ 143 154 144 155 @classmethod … … 163 174 # TAG0001: note that crea_sim must be BEFORE store_msg in the list (because when we insert the msg, we need the simu_id) 164 175 # 165 mapping = { "0000":["crea_sim", "log ", "store_msg", "print_stdout"],166 "0100":["log ", "store_msg", "print_stdout", "set_sim_status_to_complete"],167 "1000":["log ", "store_msg", "print_stdout"],168 "1100":["log ", "store_msg", "print_stdout"],169 "2000":["log ", "store_msg", "print_stdout"],170 "3000":["log ", "store_msg", "print_stdout"],176 mapping = { "0000":["crea_sim", "log_msg", "store_msg"], 177 "0100":["log_msg", "store_msg", "set_sim_status_to_complete"], 178 "1000":["log_msg", "store_msg"], 179 "1100":["log_msg", "store_msg"], 180 "2000":["log_msg", "store_msg"], 181 "3000":["log_msg", "store_msg"], 171 182 "8888":["cleanup"], 172 "9000":["log ", "store_msg", "print_stdout"],173 "9999":["log ", "store_msg", "print_stdout", "set_sim_status_to_error"] }183 "9000":["log_msg", "store_msg"], 184 "9999":["log_msg", "store_msg", "set_sim_status_to_error"] } 174 185 175 186 # prod … … 178 189 # 179 190 """ 180 mapping = { "0000":["crea_sim", "log ", "store_msg"],181 "0100":["log ", "store_msg", "set_sim_status_to_complete"],182 "1000":["log ", "store_msg"],183 "1100":["log ", "store_msg"],184 "2000":["log ", "store_msg"],185 "3000":["log ", "store_msg"],191 mapping = { "0000":["crea_sim", "log_msg", "store_msg"], 192 "0100":["log_msg", "store_msg", "set_sim_status_to_complete"], 193 "1000":["log_msg", "store_msg"], 194 "1100":["log_msg", "store_msg"], 195 "2000":["log_msg", "store_msg"], 196 "3000":["log_msg", "store_msg"], 186 197 "8888":["cleanup"], 187 "9000":["log ", "store_msg", "mail"],188 "9999":["log ", "store_msg", "set_sim_status_to_error", "mail"] }198 "9000":["log_msg", "store_msg", "mail"], 199 "9999":["log_msg", "store_msg", "set_sim_status_to_error", "mail"] } 189 200 """ 190 201 … … 218 229 self.channel = connection.channel() 219 230 220 221 print ' [*] Waiting for messages. To exit press CTRL+C' 231 logging.info("[*] Waiting for messages") 222 232 223 233 def callback(ch, method, properties, raw_msg): … … 237 247 238 248 # debug 239 # print " [x] Received %s" % field249 #logging.debug(" [x] Received %s"%field) 240 250 241 251 splitted_field=field.split(":") … … 248 258 249 259 # debug 250 # print " [x] Received %s (encoded)" % l__tmp_dic["body"]260 #logging.debug(" [x] Received %s (encoded)" % l__tmp_dic["body"]) 251 261 252 262 … … 256 266 257 267 # debug 258 # print " [x] Received %s" % raw_msg259 # print " [x] Received %s (uudecoded)" % base64_decoded_msg260 # print " [x] Received %s (uudecoded)" % base64_decoded_msg268 #logging.debug(" [x] Received %s" % raw_msg) 269 #logging.debug(" [x] Received %s (uudecoded)" % base64_decoded_msg ) 270 #logging.debug(" [x] Received %s (uudecoded)" % base64_decoded_msg ) 261 271 262 272 … … 268 278 269 279 # non working 270 # print message.type280 #logging.debug("DEB003 - %s"%message.type) 271 281 272 282 # working 273 # print message.code283 #logging.debug("DEB009 - %s"%message.code) 274 284 275 285 276 286 277 287 except Exception,e: 278 print "ERR009 - exception occurs (exception=%s,msg=%s)"%(str(e),base64_decoded_msg) 279 280 traceback.print_exc() 288 289 logging.exception("ERR009 - exception occurs (exception=%s)"%(str(e),)) 290 291 Actions.log_debug("DEB021 - %s"%base64_decoded_msg) 292 281 293 raise 282 294 … … 301 313 302 314 except Exception,e: 303 print "ERR019 - exception occurs (exception=%s)"%(str(e)) 304 #print "ERR019 - exception occurs (exception=%s,msg=%s)"%(str(e),base64_decoded_msg) 305 306 traceback.print_exc() 315 logging.exception("ERR019 - exception occurs (exception=%s)"%(str(e),)) 316 317 Actions.log_debug("DEB020 - %s"%base64_decoded_msg) 307 318 308 319 raise … … 324 335 325 336 def signal_handler(signal, frame): 326 print 'You pressed Ctrl+C!'337 logging.info("TERM signal received: exiting.") 327 338 Watcher.channel.stop_consuming() 328 339 Watcher.stop() … … 331 342 if __name__ == '__main__': 332 343 344 signal.signal(signal.SIGTERM, signal_handler) 333 345 signal.signal(signal.SIGINT, signal_handler) 334 346 … … 348 360 349 361 sys.exit(1) 362 # vim: set ts=4 sw=4 : -
trunk/Monitoring/doc/README
r937 r940 1 - requirements 2 - CentOS 6 3 - outgoing internet access (port 80) 4 - Forge svn repository access 1 5 - installation instructions 2 6 - LibIGCM AMQP agent installation … … 5 9 - see Broker/README 6 10 - Supervisor installation 11 - add user in /etc/passwd 12 - adduser -m -d /home/cssupervisor -s /sbin/nologin cssupervisor 13 - create directories 14 - mkdir /var/log/cssupervisor 15 - mkdir /var/lib/cssupervisor 16 - set owner 17 - chown -R cssupervisor /var/log/cssupervisor 18 - chown -R cssupervisor /var/lib/cssupervisor 19 - copy startup script below in /etc/init/cssupervisor.conf 20 - 21 description "Climate Simulation Supervisor" 22 23 start on runlevel [2345] 24 stop on runlevel [016] 25 26 chdir /home/cssupervisor 27 28 # debug 29 #script 30 # exec >>/tmp/deb 2>&1 31 # exec sudo -u cssupervisor /opt/python2.6_ve/bin/python /opt/supervisor/Monitoring/Watch/watch 32 #end script 33 34 #need upstart 1.4+ (for gid/uid support) 35 #exec sudo -u cssupervisor /opt/python2.6_ve/bin/python /opt/supervisor/Monitoring/Watch/watch 36 37 exec su -s /bin/sh -c 'exec "$0" "$@"' cssupervisor -- /opt/python2.6_ve/bin/python /opt/supervisor/Monitoring/Watch/watch 38 39 respawn 40 - reload upstart 41 - initctl reload-configuration 7 42 - check if EPEL repository is configured, if not, do steps below 8 43 - for CENTOS 6 … … 75 110 - sys.path.append("<snapshot_dir>/src") 76 111 - smon/repo_io.py 77 - sys.path.append("<snapshot_dir>/src") 78 - test 112 - sys.path.append("<snapshot_dir>/src") 113 - usage (as root) 114 - start 115 - start cssupervisor 116 - stop 117 - stop cssupervisor 118 - status 119 - status cssupervisor 120 - unit test 121 - TODO 122 - integration test 79 123 - connectivity test 80 124 - run command below on supervisor 81 125 - telnet pp-db-dev.private.ipsl.fr 5432 126 # vim: set ts=4 :
Note: See TracChangeset
for help on using the changeset viewer.