Heartbeat实现Nginx高可用性(style 2.x)

2009年12月9日 admin 没有评论

1.x并不能监控资源的状态,要想监控资源的状态,可以自己写监控脚本或者使用Mon脚本来监控服务,每当监控到资源(Nginx)Down掉后使用service heartbeat stop将heartbeatDown掉,这样便会发生故障转移。或者使用heartbeat的style
2.x版本,配置CRM(Cluster Resource Managemenet)来管理资源。

一、使用1.x配置Heartbeat (参见《Heartbeat实现Nginx高可用性(style 1.x)》

二、修改1.x的配置为2.x

1. 在ha.cf中添加下面行
# 开启集群资源管理器,使用heartbeat 2.x模式
crm on
# respwn列出将要执行和监控的命令
# respwn使得Heartbeat以userid(在本例中为hacluster)的身份来执行该进程并监视该进程的执行情况
# 如果其死亡便重启之。
# ipfail插件的用途是检测网络故障,并作出合理的反应,如果需要的话使集群资源故障转移。
# respawn
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
respawn hacluster /usr/lib/heartbeat/cibmon -d
apiauth cibmon   uid=hacluste

2.将haresources资源文件转换成cib.xml文件
执行下面命令:
mv
/etc/ha.d/haresources /etc/ha.d/haresources.bak
/usr/lib/heartbeat/haresources2cib.py /etc/ha.d/haresources.bak
会在/var/lib/heartbeat/crm下生成cib.xml

运行heartbeat后会在/var/lib/heartbeat/crm目录下生成cib.xml.last、cib.xml.sig、cib.xml.sig.last文件,此时再修改cib.xml需要先删除上面三个文件,rm -rf /var/lib/heartbeat/crm/cib.xml.*

CRM支持两种资源类型ocf和lsb:
LSB格式的脚本必须支持status功能,必须能接收start,stop,status三个参数;
OCF格式的脚本则必须支持start,stop,monitor三个参数。
其中status和monitor参数是用来监控资源的,非常重要。
如果是LSB风格的脚本,运行./nginxd status时候,返回值包含OK或则running则表示资源正常 ,返回值包含No或者stopped则表示资源不正常。
如果是OCF风格的脚本,运行./nginxd monitor时, 返回0表示资源是正常的,返回7表示资源出现问题。

ocf格式的启动脚本在/usr/lib/ocf/resource.d/heartbeat下面。
lsb的脚步一般在/etc/init.d/下面。
如:IPaddr使用的是ocf格式的控制脚本,路径为:/usr/lib/ocf/resource.d/heartbeat/IPaddr

修改style 1.x下的nginxd脚本使其支持monitor参数从而支持ocf格式:
[root@HA1 ~]# cat /usr/lib/ocf/resource.d/heartbeat/nginxd

#!/bin/sh

# source function library
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0

RETVAL=0
prog="nginx"

nginxDir=/usr/local/nginx
nginxd=$nginxDir/sbin/nginx
nginxConf=$nginxDir/conf/nginx.conf
nginxPid=$nginxDir/nginx.pid

nginx_check()
{
    if [[ -e $nginxPid ]]; then
        ps aux |grep -v grep |grep -q nginx
        if (( $? == 0 )); then
            echo "$prog already running..."
            exit 1
        else
            rm -rf $nginxPid &> /dev/null
        fi
    fi
}

start()
{
    nginx_check
    if (( $? != 0 )); then
        true
    else
        echo -n $"Starting $prog:"
        daemon $nginxd -c $nginxConf
        RETVAL=$?
        echo
        [ $RETVAL = 0 ] && touch /var/lock/subsys/nginx
        return $RETVAL
    fi
}

stop()
{
    echo -n $"Stopping $prog:"
    killproc $nginxd
    RETVAL=$?
    echo
    [ $RETVAL = 0 ] && rm -f /var/lock/subsys/nginx $nginxPid
}

reload()
{
    echo -n $"Reloading $prog:"
    killproc $nginxd -HUP
    RETVAL=$?
    echo
}

monitor()
{
    status $prog &> /dev/null
    if (( $? == 0  )); then
        RETVAL=0
    else
        RETVAL=7
    fi
}

case "$1" in
        start)
                start
                ;;
        stop)
                stop
                ;;
        restart)
                stop
                start
                ;;
        reload)
                reload
                ;;
        status)
                status $prog
                RETVAL=$?
                ;;
        monitor)
                monitor
                ;;
        *)
                echo $"Usage: $0 {start|stop|restart|reload|status|monitor}"
                RETVAL=1
esac
exit $RETVAL

查看cib.xml关于nginxd资源的配置情况:

<primitive class="ocf" id="nginxd_2" provider="heartbeat" type="nginxd">
    <operations>
        <op id="nginxd_2_mon" interval="20s" name="monitor" timeout="10s"/>
    </operations>
</primitive>

修改下面的值:
interval=”20s”
timeout=”10s”
即每20秒检测资源运行情况,如果发现资源不在,则尝试启动资源,如果10s后还未启动成功,则资源切换到另一节点,上述的数值可以缩减的更小,如果使用默认的2分钟会给人一种服务down掉没有重启或者切换的感觉。

3. 创建用户和用户组

heartbeat需要haclient用户组和hacluster用户,如果编译时没有创建用户及组需要执行本步操作。两个节点做同样的操作,并保证haclienthaclusterID一样。

groupadd -g 500 haclient

useradd -u 500 -g haclient hacluster

修改heartbeat目录权限:
find / -type d -name “heartbeat” -exec chown -R hacluster {} ;
find / -type d -name “heartbeat” -exec chgrp -R haclient {} ;

如果没有上述账户,启动heartbeat后将会出现下面的错误,系统会被重启:
EMERG: Rebooting system.  Reason: /usr/lib/heartbeat/cib

如果nginxd在系统启动时是自启动的,需要关闭它:
chkconfig –leve 2345 nginxd off

在两个节点上启动heartbeat:
service heartbeat start

在HA1上启动nginxd资源:
crm_resource -r nginxd_2 -p target_role -v started

CRM监控情况:
crm_mon -i1
Refresh in 1s…

============
Last updated: Sun Nov  8 03:20:15 2009
Current DC: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f)
2 Nodes configured.
1 Resources configured.
============

Node: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f): online
Node: ha1 (ad69968f-2db6-40a0-b71b-7433a689aab9): online

Resource Group: group_1
IPaddr_192_168_2_100        (ocf::heartbeat:IPaddr):        Started ha1
nginxd_2    (ocf::heartbeat:nginxd):        Started ha1

三、CRM管理

启动/停止资源
#crm_resource -r nginxd_2 -p target_role -v started
#crm_resource -r nginxd_2 -p target_role -v stopped
查看资源跑在那个节点上
crm_resource -W -r nginxd_2
将资源从当前节点转移到另个一节点
#crm_resource -M -r nginxd_2
将资源转移到指定节点
#crm_resource -M -r nginxd_2 -H HA1
允许资源回到正常的节点
#crm_resource -U -r nginxd_2
将资源从CRM中删除
#crm_resource -D -r nginxd_2 -t primitive
将资源从CRM中禁用
#crm_resource -p is_managed -r nginxd_2 -t primitive -v off
将资源从新从CRM中启用
#crm_resource -p is_managed -r nginxd_2 -t primitive -v on
重启资源
#crm_resource -C -H HA1 -r nginxd_2
检查所有节点上未在CRM中的资源
#crm_resource -P
检查指定节点上未在CRM中的资源
#crm_resource -P -H HA1
检查所有节点上未在CRM中的资源
#crm_resource -P
检查指定节点上未在CRM中的资源
#crm_resource -P -H HA1

四、测试

1. 手动停掉HA1上的nginx,heartbeat会尝试重启它。
service nginxd stop

2. 在HA1上改名nginx配置文件,heartbeat尝试重启失败会自动进行故障转移。
mv /usr/local/nginx/conf/nginx.conf /usr/local/nginx/conf/nginx.conf.bak
service nginxd stop

# 资源进行了自动故障转移
crm_mon -i1
Refresh in 1s…

============
Last updated: Sun Nov  8 03:37:59 2009
Current DC: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f)
2 Nodes configured.
1 Resources configured.
============

Node: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f): online
Node: ha1 (ad69968f-2db6-40a0-b71b-7433a689aab9): online

Resource Group: group_1
IPaddr_192_168_2_100        (ocf::heartbeat:IPaddr):        Started ha2
nginxd_2    (ocf::heartbeat:nginxd):        Started ha2

Failed actions:
nginxd_2_monitor_20000 (node=ha1, call=7, rc=7): complete
nginxd_2_start_0 (node=ha1, call=9, rc=1): complete

在HA1上将资源转移到正常的节点:

mv /usr/local/nginx/conf/nginx.conf.bak /usr/local/nginx/conf/nginx.conf
service heartbeat restart

3. 拔掉HA1的eth1网线,看资源是否自动故障转移

在HA2上查看资源情况:
crm_mon -i1
Refresh in 1s…

============
Last updated: Sun Nov  8 04:02:01 2009
Current DC: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f)
2 Nodes configured.
1 Resources configured.
============

Node: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f): online
Node: ha1 (ad69968f-2db6-40a0-b71b-7433a689aab9): OFFLINE

Resource Group: group_1
IPaddr_192_168_2_100        (ocf::heartbeat:IPaddr):        Started ha2
nginxd_2    (ocf::heartbeat:nginxd):        Started ha2

资源从HA1自动故障转移到HA2。

插上HA1的eth1网线,资源自动转回到HA1。
crm_mon -i1
efresh in 1s…

============
Last updated: Sun Nov  8 04:05:16 2009
Current DC: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f)
2 Nodes configured.
1 Resources configured.
============

Node: ha2 (cc3f9eb0-22be-4b1a-b0c7-706ea75d932f): online
Node: ha1 (ad69968f-2db6-40a0-b71b-7433a689aab9): online

Resource Group: group_1
IPaddr_192_168_2_100        (ocf::heartbeat:IPaddr):        Started ha1
nginxd_2    (ocf::heartbeat:nginxd):        Started ha1

排错:如果出现错误,查看heartbeat日志进行解决。

参考:
1. Writing your own OCF Resource Agent Heartbeat Resource Agents
2. 用Heartbeat配置Linux高可用性集群
3. heartbeat2.x的测试终结篇
4. crm_resource man page
5. Getting Started With Heartbeat

分类: 高可用性 标签: , ,

Heartbeat实现Nginx高可用性(style 1.x)

2009年12月8日 admin 1 条评论

一、准备工作

1. 系统:两台CentOS 5.4虚拟机
2. Hostname:HA1,HA2
3. IP地址:HA1   eth0:192.168.2.10   eth1:192.168.10.1
HA2   eth0:192.168.2.20   eth1:192.168.10.2
4. VIP:192.168.2.100   (Failover转移用的IP)

二、安装

1. Nginx编译安装
tar xzvf pcre-7.9.tar.gz
cd pcre-7.9
./configure
make
make install
cd ..

tar xzvf nginx-0.7.63.tar.gz
cd nginx-0.7.63
./configure –user=nobody –group=nobody –prefix=/usr/local/nginx –with-http_stub_status_module –with-http_ssl_module
make
make install

Nginx具体配置略。

2. Heartbeat编译安装

tar xzvf libnet-1.1.2.1.tar.gz
cd libnet
./configure
make
make install
cd ..

创建用户和用户组

heartbeat需要haclient用户组和hacluster用户两个节点做同样的操作,并保证haclienthaclusterID一样。

groupadd -g 500 haclient

useradd -u 500 -g haclient hacluster

tar jxvf STABLE-2.1.4.tar.bz2
cd Heartbeat-STABLE-2-1-STABLE-2.1.4/
./ConfigureMe configure
make
make install
# 拷贝配置文件到相应目录
cp doc/ha.cf /etc/ha.d/
cp doc/haresources /etc/ha.d/
cp doc/authkeys /etc/ha.d/
cd !$   # 跳转到/etc/ha.d/目录

三、配置Heartbeat

在/etc/ha.d/目录下进行配置:
1. vi authkeys   # 节点认证方式,这里使用第一种crc
auth 1
1 crc
# 修改authkeys权限为600
chmode 600 authkeys

2. 编辑/etc/ha.d/ha.cf:
[root@HA1 ha.d]# cat ha.cf |sed ‘/^#/d’
# 开启HA的debug日志,建议调试完后关闭此日志
debugfile /var/log/ha-debug
# 开启HA日志
logfile    /var/log/ha-log
# 设置日志打印级别
logfacility    local0
# 多长时间建材一次心跳
keepalive 2
# 连续多长时间检测失败示对方挂掉,单位秒
deadtime 30
# 连续多长时间检测失败开始警告提示,单位秒
warntime 10
# 为服务重启预留一段时间,在这段时间不进行心跳检测
initdead 120
# 默认端口是UDP 694,我改为了695,如果在局域网还有人在玩Heartbeat,并且他用广播,你最好改个端口
# 否则可能会导致认证失败
udpport    695
# 使用单播通信,在HA2上修改为ucast    eth1 192.168.10.1
ucast    eth1 192.168.10.2
# 主节点恢复正常后是否再切换回来
auto_failback on
# 设置看门狗
# Watchdog在实现上可以是硬件电路也可以是软件定时器,能够在系统出现故障时自动重新启动系统。
# 在Linux 内核下,
watchdog的基本工作原理是:当watchdog启动后(即/dev/watchdog
设备被打开后),
# 如果在某一设定的时间间隔内/dev/watchdog没有被执行写操作,
# 硬件watchdog电路或软件定时器就会重新启动系统。
watchdog /dev/watchdog
# 节点列表,主节点在前,不要写反了
node    HA1
node    HA2

3. [root@HA1 ha.d]# cat haresources

# 每一行代表一个资源组,资源组启动顺序是从左往右,关闭的顺序是从右往左。
# 一个资源组里面不同资源之间以空格分隔,不同的资源组之间没有必然关系
# 资源组的第一列是我们在ha.cf配置文件中列出的节点之一,而且应该是准备作为节点的那一个节点
# 每个资源都是一个脚本,可以放在/etc/init.d目录下面,也可以在/usr/local/etc/ha.d/resource.d目录下。
# 这些脚本必须要支持
start和stop参数。
#
脚本的参数通过::来分隔。
# 主节点   VIP      资源名
HA1    192.168.2.100    nginxd

4. 编写nginxd资源脚本,放到/etc/rc.d/init.d/和/etc/ha.d/resource.d/下

#!/bin/sh

# source function library
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0

RETVAL=0
prog="nginx"

nginxDir=/usr/local/nginx
nginxd=$nginxDir/sbin/nginx
nginxConf=$nginxDir/conf/nginx.conf
nginxPid=$nginxDir/nginx.pid

nginx_check()
{
    if [[ -e $nginxPid ]]; then
        ps aux |grep -v grep |grep -q nginx
        if (( $? == 0 )); then
            echo "$prog already running..."
            exit 1
        else
            rm -rf $nginxPid &> /dev/null
        fi
    fi
}

start()
{
    nginx_check
    if (( $? != 0 )); then
        true
    else
        echo -n $"Starting $prog:"
        daemon $nginxd -c $nginxConf
        RETVAL=$?
        echo
        [ $RETVAL = 0 ] && touch /var/lock/subsys/nginx
        return $RETVAL
    fi
}

stop()
{
    echo -n $"Stopping $prog:"
    killproc $nginxd
    RETVAL=$?
    echo
    [ $RETVAL = 0 ] && rm -f /var/lock/subsys/nginx $nginxPid
}

reload()
{
    echo -n $"Reloading $prog:"
    killproc $nginxd -HUP
    RETVAL=$?
    echo
}

case "$1" in
        start)
                start
                ;;
        stop)
                stop
                ;;
        restart)
                stop
                start
                ;;
        reload)
                reload
                ;;
        status)
                status $prog
                RETVAL=$?
                ;;
        *)
                echo $"Usage: $0 {start|stop|restart|reload|status}"
                RETVAL=1
esac
exit $RETVAL

5. 设置hosts
[root@HA1 ha.d]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        vpc localhost.localdomain localhost
::1        localhost6.localdomain6 localhost6
192.168.10.1    HA1
192.168.10.2    HA2

注:在HA1和HA2上进行二、三步(安装、配置heartbeat)操作

6. 启动heartbeat
注意:主服务器和备份服务器的时间同步,如果相差太多heartbeat可能发生故障。

service heartbeat restart
查看heartbeat的日志启动信息(日志对于排错很有帮助)
tail -100 /var/log/ha-log
heartbeat[13821]: 2009/11/07_19:41:27 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat[13822]: 2009/11/07_19:41:27 info: heartbeat: version 2.1.4
heartbeat[13822]: 2009/11/07_19:41:27 info: Heartbeat generation: 1257517561
heartbeat[13822]: 2009/11/07_19:41:27 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
heartbeat[13822]: 2009/11/07_19:41:27 info: glib: ucast: bound send socket to device: eth1
heartbeat[13822]: 2009/11/07_19:41:27 info: glib: ucast: bound receive socket to device: eth1
heartbeat[13822]: 2009/11/07_19:41:27 info: glib: ucast: started on port 695 interface eth1 to 192.168.10.2
heartbeat[13822]: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[13822]: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[13822]: 2009/11/07_19:41:27 notice: Using watchdog device: /dev/watchdog
heartbeat[13822]: 2009/11/07_19:41:27 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[13822]: 2009/11/07_19:41:27 info: Local status now set to: ‘up’
heartbeat[13822]: 2009/11/07_19:41:29 info: Link ha2:eth1 up.
heartbeat[13822]: 2009/11/07_19:41:29 info: Status update for node ha2: status up
harc[13828]:    2009/11/07_19:41:29 info: Running /etc/ha.d/rc.d/status status
heartbeat[13822]: 2009/11/07_19:41:30 info: Comm_now_up(): updating status to active
heartbeat[13822]: 2009/11/07_19:41:30 info: Local status now set to: ‘active’
heartbeat[13822]: 2009/11/07_19:41:30 info: Status update for node ha2: status active
harc[13845]:    2009/11/07_19:41:30 info: Running /etc/ha.d/rc.d/status status
heartbeat[13822]: 2009/11/07_19:41:45 info: local resource transition completed.
heartbeat[13822]: 2009/11/07_19:41:45 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[13900]:    2009/11/07_19:41:45 INFO:  Resource is stopped
heartbeat[13864]: 2009/11/07_19:41:45 info: Local Resource acquisition completed.
harc[13939]:    2009/11/07_19:41:45 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[13939]:    2009/11/07_19:41:45 received ip-request-resp 192.168.2.100 OK yes
ResourceManager[13960]:    2009/11/07_19:41:45 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr[13987]:    2009/11/07_19:41:45 INFO:  Resource is stopped
ResourceManager[13960]:    2009/11/07_19:41:45 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr[14063]:    2009/11/07_19:41:46 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr[14063]:    2009/11/07_19:41:46 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr[14063]:    2009/11/07_19:41:46 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr[14046]:    2009/11/07_19:41:46 INFO:  Success
heartbeat[13822]: 2009/11/07_19:41:46 info: remote resource transition completed.

查看网卡配置情况,VIP已配置到HA1上。
eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:35:6F:D0
inet addr:192.168.2.100  Bcast:192.168.2.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interrupt:67 Base address:0×2000
查看nginx已经启动。

如果看到下面日志,可能是同网段中有人在UDP 694端口运行广播的heartbeat,换个端口试试可能能解决问题。

heartbeat[9966]: 2009/11/07_00:18:53 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat[9967]: 2009/11/07_00:18:53 info: heartbeat: version 2.1.4
heartbeat[9967]: 2009/11/07_00:18:53 info: Heartbeat generation: 1257517538
heartbeat[9967]: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[9967]: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 – Status: 1
heartbeat[9967]: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[9967]: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[9967]: 2009/11/07_00:18:53 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[9967]: 2009/11/07_00:18:53 info: Local status now set to: ‘up’
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: process_status_message: bad node [master] in message
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG: Dumping message with 12 fields
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[0] : [t=status]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[1] : [st=active]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[2] : [dt=7530]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[3] : [protocol=1]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[4] : [src=master]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[5] : [(1)srcuuid=0x9696e70(36 27)]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[6] : [seq=1fed7]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[7] : [hg=4aee4ce7]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[8] : [ts=4af3a3d5]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[9] : [ld=0.11 0.03 0.01 1/107 30681]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[10] : [ttl=3]
heartbeat[9967]: 2009/11/07_00:18:55 ERROR: MSG[11] : [auth=1 ba81b6cc]
heartbeat[9967]: 2009/11/07_00:18:55 info: Link ha1:eth1 up.
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: process_status_message: bad node [slave] in message
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG: Dumping message with 12 fields
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[0] : [t=status]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[1] : [st=active]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[2] : [dt=7530]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[3] : [protocol=1]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[4] : [src=slave]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[5] : [(1)srcuuid=0x9696dc8(36 27)]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[6] : [seq=1f94b]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[7] : [hg=4aee4cf3]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[8] : [ts=4af3a3d6]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[9] : [ld=0.00 0.00 0.00 1/105 870]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[10] : [ttl=3]
heartbeat[9967]: 2009/11/07_00:18:56 ERROR: MSG[11] : [auth=1 bcd3be0a]

四、测试
1. 手动切换是否正常
在HA1上执行/usr/share/heartbeat/hb_standby看VIP是否能够转移到HA2
查看heartbeat的日志信息
tail -100 /var/log/ha-log
heartbeat[13822]: 2009/11/07_19:44:33 info: ha1 wants to go standby [all]
heartbeat[13822]: 2009/11/07_19:44:33 info: standby: ha2 can take our all resources
heartbeat[14194]: 2009/11/07_19:44:33 info: give up all HA resources (standby).
ResourceManager[14207]:    2009/11/07_19:44:34 info: Releasing resource group: ha1 192.168.2.100 nginxd
ResourceManager[14207]:    2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/nginxd  stop
ResourceManager[14207]:    2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 stop
IPaddr[14295]:    2009/11/07_19:44:34 INFO: ifconfig eth0:0 down
IPaddr[14278]:    2009/11/07_19:44:34 INFO:  Success
heartbeat[14194]: 2009/11/07_19:44:34 info: all HA resource release completed (standby).
heartbeat[13822]: 2009/11/07_19:44:34 info: Local standby process completed [all].
heartbeat[13822]: 2009/11/07_19:44:36 WARN: 1 lost packet(s) for [ha2] [83:85]
heartbeat[13822]: 2009/11/07_19:44:36 info: remote resource transition completed.
heartbeat[13822]: 2009/11/07_19:44:36 info: No pkts missing from ha2!
heartbeat[13822]: 2009/11/07_19:44:36 info: Other node completed standby takeover of all resources.
查看HA2上VIP已经配置上,nginx也已启动。

2. 切断主节点和备份节点的心跳线看是VIP否能够转移
Down掉HA1的eth1网卡,在HA2上查看heartbeat日志
[root@HA2 ~]# tail -100 /var/log/ha-log
heartbeat[3753]: 2009/11/07_19:59:36 WARN: node ha1: is dead
heartbeat[3753]: 2009/11/07_19:59:36 WARN: No STONITH device configured.
heartbeat[3753]: 2009/11/07_19:59:36 WARN: Shared disks are not protected.
heartbeat[3753]: 2009/11/07_19:59:36 info: Resources being acquired from ha1.
heartbeat[3753]: 2009/11/07_19:59:36 info: Link ha1:eth1 dead.
harc[4255]:    2009/11/07_19:59:36 info: Running /etc/ha.d/rc.d/status status
heartbeat[4256]: 2009/11/07_19:59:36 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys ha2] to acquire.
mach_down[4276]:    2009/11/07_19:59:36 info: Taking over resource group 192.168.2.100
ResourceManager[4310]:    2009/11/07_19:59:36 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr[4337]:    2009/11/07_19:59:37 INFO:  Resource is stopped
ResourceManager[4310]:    2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr[4413]:    2009/11/07_19:59:37 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr[4413]:    2009/11/07_19:59:37 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr[4413]:    2009/11/07_19:59:37 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr[4396]:    2009/11/07_19:59:37 INFO:  Success
ResourceManager[4310]:    2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/nginxd  start
mach_down[4276]:    2009/11/07_19:59:38 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[4276]:    2009/11/07_19:59:38 info: mach_down takeover complete for node ha1.
heartbeat[3753]: 2009/11/07_19:59:38 info: mach_down takeover complete.
资源从HA1转移到了HA2。

启动HA1的eth1网卡,可以看到资源从HA2上自动转移到HA1上。

3. 停掉HA1或是停掉HA1上的heartbeat,看VIP是否能够转移到HA2
资源从HA1转移到了HA2。

五、HA管理

启动/停止heartbeat:
service heartbeat start/stop

查看heartbeat状态:
[root@HA2 ~]# service heartbeat status
heartbeat OK [pid 4724 et al] is running on ha2 [ha2]…

手工切换(将本地资源转移到远程主机):
[root@HA1 ~]# /usr/share/heartbeat/hb_standby
2009/11/07_20:11:03 Going standby [all].

手动接管(将资源接管到本地):
[root@HA2 ~]# /usr/share/heartbeat/hb_takeover

总结:通过上面的配置可以达到当其中一个节点Down掉后有另一个节点接管资源目的,但是当nginx本身Down掉后并不能自动故障转移,要想达到此目的必须配置heartbeat style 2.x,请参考《Heartbeat实现Nginx高可用性(style 2.x)》

六、参考
1. authkeys配置参考:http://linux-ha.org/authkeys
2. ha.cf配置参考:http://linux-ha.org/ha.cf
3. http://logzgh.itpub.net/post/3185/466910
4. http://linux.chinaunix.net/bbs/archiver/?tid-1051263.html

分类: 高可用性 标签: , ,

HA体系架构及内部处理流程

2009年12月8日 admin 没有评论

1. 体系结构
本部分简要地概述了 High Availability体系结构。它提供了有关体系结构组件的信息,并描述了这些组件是如何协同工作的。

1.1 体系结构层

High Availability具有分层的体系结构。图“体系结构” 说明了不同的层及其相关的组件。

体系结构图:

消息交换和基础结构层
主层或第一层是消息交换/基础结构层,也称为 OpenAIS 层。此层包含了发送含有“我在线”信号的消息及其他信息的组件。High Availability的程序就位于此消息交换/基础结构层。

资源分配层
下一层是资源分配层。此层最复杂,它包含以下组件:
群集资源管理器 (CRM)
在资源分配层中执行的每个操作都要经过群集资源管理器。如果资源分配层的其他组件(或更高层中的组件)需要通讯,则它们通过本地 CRM 进行。在每个节点上,CRM 维护群集信息库 (CIB),包含所有群集选项、节点、资源及其关系和当前状态的定义。如果选择群集中的 CRM 为指定协调程序 (DC),则意味着它具有主 CIB。群集中的所有其他 CIB 是此主 CIB 的复本。对 CIB 的常规读写操作通过主 CIB 进行排序。DC 是群集中唯一可以决定需要在整个群集执行更改(例如节点屏障或资源移动)的实体。

群集信息库 (CIB)
群集信息库是整个群集配置和当前状态在内存中的 XML 表示。它包含所有群集选项、节点、资源、约束及其之间的关系的定义。CIB 还将更新同步到所有群集节点。群集中有一个主 CIB,由 DC 维护。所有其他节点包含一个CIB 复本。

策略引擎 (PE)
每当指定协调程序需要进行整个群集的更改(对新 CIB 做出反应),策略引擎就会根据群集的当前状态和配置计算群集的下一个状态。PE 还生成一个转换图,包含用于达到下一个群集状态的(资源)操作和依赖性的列表。PE 在每个节点上都运行以加速 DC 故障转移。

本地资源管理器 (LRM)
LRM 代表 CRM 调用本地资源代理(请参见“资源层”一节 [13])。因此它可以执行启动/停止/监视操作并将结果报告给 CRM。它还隐藏资源代理支持的脚本标准(OCF、LSB、Heartbeat V1)之间的区别。LRM 是其本地节点上所有资源相关信息的权威来源。

资源层
最高层是资源层。资源层包括一个或多个资源代理 (RA)。资源代理是为启动、停止和监视某种服务(资源)而编写的程序,通常是壳层脚本。资源代理仅由LRM 调用。第三方可将他们自己的代理放在文件系统中定义的位置,这样就为各自的软件提供了现成群集集成。

1.2 处理流程
High Availability使用 Pacemaker 作为 CRM。CRM 作为守护程序执行 (crmd),它在每个群集节点上都有一个实例。Pacemaker通过将某个 crmd 实例选为主实例,从而集中了所有的群集决策制定。如果选定的 crmd 过程(或它所在的节点)出现故障,则将建立一个新的过程。

在每个节点上保留了一个 CIB,它反映了群集的配置和群集中所有资源的当前状态。CIB 的内容会在整个群集的同步过程中自动保留下来。

群集中执行的许多操作都将导致整个群集的更改。这些操作包括添加或删除群集资源、更改资源约束等等。了解执行这样的操作时群集中会发生的状况是很重要的。

例如,假设您要添加一个群集 IP 地址资源。为此,您可以使用一种命令行工具或 GUI 修改 CIB。您不必在 DC 上执行此操作,可以使用群集中任何节点上的任何工具,此操作会被传送到 DC 上。然后 DC 将把此 CIB 更改复制到所有群集节点。

根据 CIB 中的信息,PE 便计算群集的理想状态及如何达到此状态,并将指令列表传递给 DC。DC 通过消息交换/基础结构层发出命令,其他节点上的 crmd 同级将收到此命令。每个 crmd 使用它的 LRM(作为 lrmd 实现)执行资源修改。lrmd 不是群集感知的,它直接与资源代理(脚本)交互。

所有同级节点将操作的结果报告给 DC。一旦 DC 做出所有必需操作已在群集中成功执行的结论,群集将返回至空闲状态并等待进一步事件。如果有操作未按计划执行,则会再次调用 PE,CIB 中将记录新信息。

在某些情况下,可能需要关闭节点以保护共享数据或完成资源恢复。为此,Pacemaker 附带了一个屏障子系统,stonithd。STONITH 是 “Shoot The Other NodeIn The Head(关闭其他节点)”的首字母缩写,通常通过一个远程电源开关实施。在 Pacemaker 中将 STONITH 设备构造成资源(并在 CIB 中配置)以便对它们监视故障;然而,stonithd 负责了解 STONITH 拓扑,这样它的客户端只需请求屏障节点,余下的工作由它来完成。

LVM + MySQL主从复制

2009年12月7日 admin 没有评论

两台虚拟机,系统为CentOS 5.4,分别有三块磁盘来做逻辑卷。

IP分配及磁盘情况:
HA1 eth0:192.168.0.77 eth1:192.168.10.1   /dev/sdc /dev/sdd /dev/sde
HA2 eth0:192.168.0.69 eth1:192.168.10.2   /dev/sdc /dev/sdd /dev/sde

一、配置逻辑磁盘
查看磁盘情况:
[root@HA1 ~]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   8e  Linux LVM

Disk /dev/sdb: 6442 MB, 6442450944 bytes
255 heads, 63 sectors/track, 783 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn’t contain a valid partition table

Disk /dev/sdc: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdc doesn’t contain a valid partition table

Disk /dev/sdd: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdd doesn’t contain a valid partition table

Disk /dev/sde: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sde doesn’t contain a valid partition table

为磁盘分区:
[root@HA1 ~]# fdisk /dev/sdc
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won’t be recoverable.

Warning: invalid flag 0×0000 of partition table 4 will be corrected by w(rite)

Command (m for help): m # 获取帮助
Command action
a   toggle a bootable flag
b   edit bsd disklabel
c   toggle the dos compatibility flag
d   delete a partition
l   list known partition types
m   print this menu
n   add a new partition
o   create a new empty DOS partition table
p   print the partition table
q   quit without saving changes
s   create a new empty Sun disklabel
t   change a partition’s system id
u   change display/entry units
v   verify the partition table
w   write table to disk and exit
x   extra functionality (experts only)

Command (m for help): n # 新建分区
Command action
e   extended
p   primary partition (1-4)
p # 新建主分区
Partition number (1-4): 1 # 输入分区号
First cylinder (1-512, default 1):     # 回车,默认即可
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-512, default 512):     # 回车,使用所有磁盘空间
Using default value 512

Command (m for help): t # 设置分区类型
Selected partition 1
Hex code (type L to list codes): L # 查看分区类型

0  Empty           1e  Hidden W95 FAT1 80  Old Minix       bf  Solaris
1  FAT12           24  NEC DOS         81  Minix / old Lin c1  DRDOS/sec (FAT-
2  XENIX root      39  Plan 9          82  Linux swap / So c4  DRDOS/sec (FAT-
3  XENIX usr       3c  PartitionMagic  83  Linux           c6  DRDOS/sec (FAT-
4  FAT16 <32M      40  Venix 80286     84  OS/2 hidden C:  c7  Syrinx
5  Extended        41  PPC PReP Boot   85  Linux extended  da  Non-FS data
6  FAT16           42  SFS             86  NTFS volume set db  CP/M / CTOS / .
7  HPFS/NTFS       4d  QNX4.x          87  NTFS volume set de  Dell Utility
8  AIX             4e  QNX4.x 2nd part 88  Linux plaintext df  BootIt
9  AIX bootable    4f  QNX4.x 3rd part 8e  Linux LVM       e1  DOS access
a  OS/2 Boot Manag 50  OnTrack DM      93  Amoeba          e3  DOS R/O
b  W95 FAT32       51  OnTrack DM6 Aux 94  Amoeba BBT      e4  SpeedStor
c  W95 FAT32 (LBA) 52  CP/M            9f  BSD/OS          eb  BeOS fs
e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a0  IBM Thinkpad hi ee  EFI GPT
f  W95 Ext’d (LBA) 54  OnTrackDM6      a5  FreeBSD         ef  EFI (FAT-12/16/
10  OPUS            55  EZ-Drive        a6  OpenBSD         f0  Linux/PA-RISC b
11  Hidden FAT12    56  Golden Bow      a7  NeXTSTEP        f1  SpeedStor
12  Compaq diagnost 5c  Priam Edisk     a8  Darwin UFS      f4  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       a9  NetBSD          f2  DOS secondary
16  Hidden FAT16    63  GNU HURD or Sys ab  Darwin boot     fb  VMware VMFS
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fc  VMware VMKCORE
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fd  Linux raid auto
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fe  LANstep
1c  Hidden W95 FAT3 75  PC/IX           be  Solaris boot    ff  BBT
Hex code (type L to list codes): 8e # 设置分区类型为Linux LVM
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w # 保存退出
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@HA1 ~]# fdisk /dev/sdd
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won’t be recoverable.

Warning: invalid flag 0×0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-512, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-512, default 512):
Using default value 512

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@HA1 ~]# fdisk /dev/sde
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won’t be recoverable.

Warning: invalid flag 0×0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-512, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-512, default 512):
Using default value 512

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

分区完成后查看磁盘情况:
[root@HA1 ~]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   8e  Linux LVM

Disk /dev/sdb: 6442 MB, 6442450944 bytes
255 heads, 63 sectors/track, 783 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn’t contain a valid partition table

Disk /dev/sdc: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1         512      524272   8e  Linux LVM

Disk /dev/sdd: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         512      524272   8e  Linux LVM

Disk /dev/sde: 536 MB, 536870912 bytes
64 heads, 32 sectors/track, 512 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1         512      524272   8e  Linux LVM

创建物理卷:
[root@HA1 ~]# pvcreate /dev/sdc1 /dev/sdd1 /dev/sde1
Physical volume “/dev/sdc1″ successfully created
Physical volume “/dev/sdd1″ successfully created
Physical volume “/dev/sde1″ successfully created

查看物理卷:
[root@HA1 ~]# pvdisplay
— Physical volume —
PV Name               /dev/sda2
VG Name               VolGroup00
PV Size               9.90 GB / not usable 22.76 MB
Allocatable           yes (but full)
PE Size (KByte)       32768
Total PE              316
Free PE               0
Allocated PE          316
PV UUID               1zBHox-Dla7-0ozU-0IFp-Onl4-V7V2-R10XXW

“/dev/sdc1″ is a new physical volume of “511.98 MB”
— NEW Physical volume —
PV Name               /dev/sdc1
VG Name
PV Size               511.98 MB
Allocatable           NO
PE Size (KByte)       0
Total PE              0
Free PE               0
Allocated PE          0
PV UUID               DwoEeZ-NmK5-ZDR6-qCmx-vJsw-7Wet-2qGako

“/dev/sdd1″ is a new physical volume of “511.98 MB”
— NEW Physical volume —
PV Name               /dev/sdd1
VG Name
PV Size               511.98 MB
Allocatable           NO
PE Size (KByte)       0
Total PE              0
Free PE               0
Allocated PE          0
PV UUID               YfolqL-6Qlm-bUki-qWTJ-8zIW-zeJI-Ssjxln

“/dev/sde1″ is a new physical volume of “511.98 MB”
— NEW Physical volume —
PV Name               /dev/sde1
VG Name
PV Size               511.98 MB
Allocatable           NO
PE Size (KByte)       0
Total PE              0
Free PE               0
Allocated PE          0
PV UUID               Rhdkyp-MBB6-UeTK-dmuP-6Dza-L69O-sW6eNv

创建逻辑卷组:
[root@HA1 ~]# vgcreate dataVg /dev/sdc1 /dev/sdd1 /dev/sde1
Volume group “dataVg” successfully created

创建逻辑卷:
[root@HA1 ~]# lvcreate –name dataLv –size 1G dataVg
Logical volume “dataLv” created

查看逻辑卷:
[root@HA1 ~]# lvdisplay
— Logical volume —
LV Name                /dev/dataVg/dataLv
VG Name                dataVg
LV UUID                gXPZmP-c41N-Yeu8-mT8U-0sUx-Mu2X-pR1PyE
LV Write Access        read/write
LV Status              available
# open                 0
LV Size                1.00 GB
Current LE             256
Segments               3
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           253:2

— Logical volume —
LV Name                /dev/VolGroup00/LogVol00
VG Name                VolGroup00
LV UUID                yTby3S-TYzd-x7fP-T8HJ-GOEg-lt7E-i90qZy
LV Write Access        read/write
LV Status              available
# open                 1
LV Size                8.88 GB
Current LE             284
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           253:0

— Logical volume —
LV Name                /dev/VolGroup00/LogVol01
VG Name                VolGroup00
LV UUID                bNfOaD-vcTc-hq4c-7Bd0-3a6S-wD0B-aFZMzM
LV Write Access        read/write
LV Status              available
# open                 1
LV Size                1.00 GB
Current LE             32
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           253:1

格式化逻辑卷:
[root@HA1 ~]# mkfs.ext3 /dev/dataVg/dataLv
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131072 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

挂载逻辑卷到/data目录:
[root@HA1 ~]# mount /dev/dataVg/dataLv /data/

设置开机自动挂载挂逻辑卷:
[root@HA1 ~]# vi /etc/fstab
/dev/dataVg/dataLv      /data                   ext3    defaults        0 0

在HA2上执行上面步骤。

二、安装MySQL并迁移MySQL数据到HA1 /data下。

三、安装MySQL LVM备份工具:
[root@HA1 ~]# wget http://search.cpan.org/CPAN/authors/id/S/SH/SHLOMIF/Config-IniFiles-2.54.tar.gz

[root@HA1 ~]# tar xzvf Config-IniFiles-2.54.tar.gz

[root@HA1 ~]# cd Config-IniFiles-2.54
[root@HA1 Config-IniFiles-2.54]# perl Makefile.PL
Checking if your kit is complete…
Looks good
Writing Makefile for Config::IniFiles
[root@HA1 Config-IniFiles-2.54]# make
cp lib/Config/IniFiles.pm blib/lib/Config/IniFiles.pm
Manifying blib/man3/Config::IniFiles.3pm
[root@HA1 Config-IniFiles-2.54]# make install
Installing /usr/lib/perl5/site_perl/5.8.8/Config/IniFiles.pm
Installing /usr/share/man/man3/Config::IniFiles.3pm
Writing /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/Config/IniFiles/.packlist
Appending installation info to /usr/lib/perl5/5.8.8/i386-linux-thread-multi/perllocal.pod
[root@HA1 Config-IniFiles-2.54]# cd ..

[root@HA1 ~]# wget http://www.lenzg.net/mylvmbackup/mylvmbackup-0.13.tar.gz
[root@HA1 ~]# tar xzvf mylvmbackup-0.13.tar.gz
[root@HA1 ~]# cd mylvmbackup-0.13
[root@HA1 mylvmbackup-0.13]# make install
[root@HA1 mylvmbackup-0.13]# cd ..

配置mylvmbackup:
[root@HA1 ~]# vi /etc/mylvmbackup.conf

[mysql]
user=root
password=
host=localhost
port=3306
socket=/data/mysql/mysql.sock
mycnf=/etc/my.cnf

#
# LVM-specific options
#
[lvm]
vgname=dataVg
lvname=dataLv
backuplv=backupLv
lvsize=0.45G

#
# File system specific options
#
[fs]
xfs=0
mountdir=/var/tmp/mylvmbackup/mnt/
backupdir=/var/tmp/mylvmbackup/backup/
relpath=

注意修改上面标红的配置项。

创建下面目录:
[root@HA1 ~]# mkdir -p  /var/tmp/mylvmbackup/backup
[root@HA1 ~]# mkdir -p  /var/tmp/mylvmbackup/mnt

查看数据库情况(employees库使用InnoDB 引擎):
[root@HA1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 8
Server version: 5.0.77 Source distribution

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> show databases;
+——————–+
| Database           |
+——————–+
| information_schema |
| employees          |
| mysql              |
+——————–+
3 rows in set (0.01 sec)

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show engines;
+————+———+—————————————————————-+
| Engine     | Support | Comment                                                        |
+————+———+—————————————————————-+
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance         |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables      |
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys     |
| BerkeleyDB | YES     | Supports transactions and page-level locking                   |
| BLACKHOLE  | NO      | /dev/null storage engine (anything you write to it disappears) |
| EXAMPLE    | NO      | Example storage engine                                         |
| ARCHIVE    | NO      | Archive storage engine                                         |
| CSV        | NO      | CSV storage engine                                             |
| ndbcluster | NO      | Clustered, fault-tolerant, memory-based tables                 |
| FEDERATED  | NO      | Federated MySQL storage engine                                 |
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                          |
| ISAM       | NO      | Obsolete storage engine                                        |
+————+———+—————————————————————-+
12 rows in set (0.00 sec)

mysql> show tables;
+———————+
| Tables_in_employees |
+———————+
| departments         |
| dept_emp            |
| dept_manager        |
| employees           |
| salaries            |
| titles              |
+———————+
6 rows in set (0.00 sec)

mysql> select count(*) from employees;
+———-+
| count(*) |
+———-+
|   300024 |
+———-+
1 row in set (1.94 sec)

mysql> quit
Bye

lvm快照备份数据库:
[root@HA1 ~]# mylvmbackup
20091125 14:50:10 Info: Connecting to database…
20091125 14:50:10 Info: Flushing tables with read lock…   # 锁定库表,准备备份
20091125 14:50:10 Info: Taking position record into /tmp/mylvmbackup-backup-20091125_145009_mysql-odzMgs.pos…    # 如开启bin-log则记录日志位置信息
20091125 14:50:10 Info: Running: lvcreate -s –size=0.45G –name=backupLv /dev/dataVg/dataLv
File descriptor 4 (socket:[21544]) leaked on lvcreate invocation. Parent PID 6062: /usr/bin/perl
Rounding up size to full physical extent 464.00 MB
Logical volume “backupLv” created
20091125 14:50:13 Info: DONE: taking LVM snapshot    # 只需3s完成lvm快照备份
20091125 14:50:13 Info: Unlocking tables…    # 完成备份,解除锁定,至此数据库完全恢复正常访问
20091125 14:50:13 Info: Disconnecting from database…
20091125 14:50:13 Info: Mounting snapshot…
20091125 14:50:13 Info: Running: mount -o rw /dev/dataVg/backupLv /var/tmp/mylvmbackup/mnt/backup
20091125 14:50:13 Info: DONE: mount snapshot
20091125 14:50:13 Info: Copying /tmp/mylvmbackup-backup-20091125_145009_mysql-odzMgs.pos to /var/tmp/mylvmbackup/mnt/backup-pos/backup-20091125_145009_mysql.pos…
20091125 14:50:13 Info: Copying /etc/my.cnf to /var/tmp/mylvmbackup/mnt/backup-pos/backup-20091125_145009_mysql_my.cnf…
20091125 14:50:13 Info: Taking actual backup…
20091125 14:50:13 Info: Creating tar archive /var/tmp/mylvmbackup/backup/backup-20091125_145009_mysql.tar.gz
20091125 14:50:13 Info: Running: cd ‘/var/tmp/mylvmbackup/mnt’ ;’tar’ cvf – backup/  backup-pos/backup-20091125_145009_mysql.pos backup-pos/backup-20091125_145009_mysql_my.cnf| gzip –stdout –verbose –best -> /var/tmp/mylvmbackup/backup/backup-20091125_145009_mysql.tar.gz.INCOMPLETE-54lIVbU
backup/
backup/lost+found/
backup/logs/
backup/logs/www.access.log
backup/logs/error.log
backup/backup/
backup/backup/cib.xml
backup/backup/ifcfg-lo:0
backup/mysql/
backup/mysql/ib_logfile0
tar: backup/mysql/mysql.sock: socket ignored
backup/mysql/employees/
backup/mysql/employees/departments.frm
backup/mysql/employees/dept_emp.frm
backup/mysql/employees/salaries.frm
backup/mysql/employees/employees.frm
backup/mysql/employees/db.opt
backup/mysql/employees/dept_manager.frm
backup/mysql/employees/titles.frm
backup/mysql/ib_logfile1
backup/mysql/mysql/
backup/mysql/mysql/help_category.MYD
backup/mysql/mysql/help_topic.MYI
backup/mysql/mysql/help_relation.MYD
backup/mysql/mysql/db.frm
backup/mysql/mysql/time_zone.frm
backup/mysql/mysql/time_zone.MYD
backup/mysql/mysql/time_zone_transition.MYI
backup/mysql/mysql/columns_priv.MYI
backup/mysql/mysql/tables_priv.frm
backup/mysql/mysql/host.MYD
backup/mysql/mysql/procs_priv.MYI
backup/mysql/mysql/proc.frm
backup/mysql/mysql/user.MYD
backup/mysql/mysql/db.MYI
backup/mysql/mysql/time_zone_name.MYI
backup/mysql/mysql/time_zone.MYI
backup/mysql/mysql/func.MYI
backup/mysql/mysql/help_keyword.MYI
backup/mysql/mysql/help_topic.MYD
backup/mysql/mysql/procs_priv.MYD
backup/mysql/mysql/db.MYD
backup/mysql/mysql/time_zone_name.MYD
backup/mysql/mysql/host.MYI
backup/mysql/mysql/time_zone_leap_second.frm
backup/mysql/mysql/time_zone_transition_type.MYD
backup/mysql/mysql/time_zone_transition_type.MYI
backup/mysql/mysql/help_relation.MYI
backup/mysql/mysql/time_zone_leap_second.MYI
backup/mysql/mysql/help_keyword.MYD
backup/mysql/mysql/user.frm
backup/mysql/mysql/func.MYD
backup/mysql/mysql/tables_priv.MYI
backup/mysql/mysql/tables_priv.MYD
backup/mysql/mysql/time_zone_transition.frm
backup/mysql/mysql/user.MYI
backup/mysql/mysql/help_category.frm
backup/mysql/mysql/procs_priv.frm
backup/mysql/mysql/columns_priv.MYD
backup/mysql/mysql/help_category.MYI
backup/mysql/mysql/help_keyword.frm
backup/mysql/mysql/time_zone_leap_second.MYD
backup/mysql/mysql/proc.MYI
backup/mysql/mysql/proc.MYD
backup/mysql/mysql/time_zone_transition_type.frm
backup/mysql/mysql/time_zone_transition.MYD
backup/mysql/mysql/func.frm
backup/mysql/mysql/time_zone_name.frm
backup/mysql/mysql/host.frm
backup/mysql/mysql/help_relation.frm
backup/mysql/mysql/help_topic.frm
backup/mysql/mysql/columns_priv.frm
backup/mysql/ibdata1
backup/html/
backup/html/www.baihe.com/
backup/html/www.baihe.com/test.html
backup/html/www.baihe.com/index.html
backup-pos/backup-20091125_145009_mysql.pos
backup-pos/backup-20091125_145009_mysql_my.cnf
64.0%
20091125 14:56:00 Info: DONE: create tar archive
20091125 14:56:01 Info: Cleaning up…
20091125 14:56:01 Info: Running: umount /var/tmp/mylvmbackup/mnt/backup
20091125 14:56:02 Info: DONE: Unmounting /var/tmp/mylvmbackup/mnt/backup
20091125 14:56:02 Info: LVM Usage stats:
20091125 14:56:02 Info:   LV       VG     Attr   LSize   Origin Snap%  Move Log Copy%  Convert
20091125 14:56:02 Info:   backupLv dataVg swi-a- 464.00M dataLv   0.09
20091125 14:56:02 Info: Running: lvremove -f /dev/dataVg/backupLv
Logical volume “backupLv” successfully removed
20091125 14:56:03 Info: DONE: Removing snapshot

[root@HA1 ~]# cd /var/tmp/mylvmbackup/backup
You have new mail in /var/spool/mail/root
[root@HA1 backup]# ls
backup-20091125_145009_mysql.tar.gz

[root@HA1 backup]# scp backup-20091125_145009_mysql.tar.gz HA2:/root/
root@ha2’s password:
backup-20091125_145009_mysql.tar.gz                                         100%   80MB 799.2KB/s   01:42

在HA2上进行有效性验证:
[root@HA2 data]# tar xzvf /root/backup-20091125_145009_mysql.tar.gz

[root@HA2 data]# ls
backup  backup-pos  lost+found
You have new mail in /var/spool/mail/root
[root@HA2 data]# cd backup
[root@HA2 backup]# ls
backup  html  logs  lost+found  mysql
[root@HA2 backup]# mv mysql/ ..

[root@HA2 backup]# cd ..

[root@HA2 data]# service mysqld start
Starting MySQL:                                            [  OK  ]
[root@HA2 data]# mysql
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 2
Server version: 5.0.77 Source distribution

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> show databases;
+——————–+
| Database           |
+——————–+
| information_schema |
| employees          |
| mysql              |
+——————–+
3 rows in set (0.00 sec)

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+———————+
| Tables_in_employees |
+———————+
| departments         |
| dept_emp            |
| dept_manager        |
| employees           |
| salaries            |
| titles              |
+———————+
6 rows in set (0.00 sec)

mysql> select count(*) from employees;
+———-+
| count(*) |
+———-+
|   300024 |
+———-+
1 row in set (0.58 sec)

mysql>

四、配置数据库主从复制:

HA1(主)

配置mysql
[root@HA1 ~]# cat /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
user=mysql
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

log-bin=/data/mysql/log/mysql-bin.log
server-id=1

[mysqld_safe]
log-error=/data/mysql/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[mysql]
socket=/data/mysql/mysql.sock

如果对配置文件有改动需要重启MySQL。

lvm快照备份数据库:
[root@HA1 backup]# mylvmbackup

拷贝备份文件到HA2
[root@HA1 backup]# scp backup-20091125_155132_mysql.tar.gz HA2:/root/
root@ha2’s password:
backup-20091125_155132_mysql.tar.gz                                         100%   80MB   1.2MB/s   01:07

在主库上添加同步账户:
[root@HA1 backup]# mysql
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 5
Server version: 5.0.77-log Source distribution

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> GRANT REPLICATION SLAVE ON *.* TO ‘rep’@'192.168.10.%’ IDENTIFIED BY ’slavepass’;

HA2(从)

解压主库备份数据文件到从库数据目录:
[root@HA2 data]# tar xzvf /root/backup-20091125_155132_mysql.tar.gz

拷贝数据文件到mysql数据目录:
[root@HA2 data]# mv backup/mysql/ .

查看备份时mysql日志位置:
[root@HA2 data]# cat backup-pos/backup-20091125_155132_mysql.pos
Master:File=mysql-bin.000001
Master:Position=244
Master:Binlog_Do_DB=
Master:Binlog_Ignore_DB=

修改mysql配置文件:
[root@HA2 data]# cat /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
user=mysql
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

server-id=2

[mysqld_safe]
log-error=/data/mysql/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[mysql]
socket=/data/mysql/mysql.sock

启动MySQL:
[root@HA2 log]# service mysqld start
Starting MySQL:                                            [  OK  ]

配置mysql从库:
[root@HA2 log]# mysql
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.0.77 Source distribution

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> CHANGE MASTER TO
-> MASTER_HOST=’192.168.10.1′,
-> MASTER_USER=’rep’,
-> MASTER_PASSWORD=’slavepass’,
-> MASTER_LOG_FILE=’mysql-bin.000001′,
-> MASTER_LOG_POS=244;
Query OK, 0 rows affected (0.00 sec)

mysql> slave start;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.1
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 500
Relay_Log_File: mysqld-relay-bin.000003
Relay_Log_Pos: 637
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 500
Relay_Log_Space: 637
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)

mysql> quit
Bye

验证同步配置情况:
在HA1上插入一条数据:
[root@HA1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 62
Server version: 5.0.77-log Source distribution

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> insert into employees values (66666666,’1982-10-17′,’Shi’,'Dongliang’,”M”,’2008-06-01′) ;
Query OK, 1 row affected (0.03 sec)

在HA2上查询,看同步情况:
[root@HA2 data]# mysql
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 7
Server version: 5.0.77 Source distribution

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from employees where emp_no=66666666;
+———-+————+————+———–+——–+————+
| emp_no   | birth_date | first_name | last_name | gender | hire_date  |
+———-+————+————+———–+——–+————+
| 66666666 | 1982-10-17 | Shi        | Dongliang  | M      | 2008-06-01 |
+———-+————+————+———–+——–+————+
1 row in set (0.04 sec)

不管你使用MyISAM存储引擎还是InnoDB存储引擎,通过LVM快照都很容易得到一个一致的MySQL备份。LVM快照备份MySQL数据时,一但锁定数据库完成只需几秒就可以做一个快照备份,释放表锁,数据库便可完全恢复正常访问,剩下的事情便是压缩数据备份并拷贝到从库进行恢复(根据数据大小,这可能需要很长时间,但基本不会再影响主库了)。

参考:
lvm新手指南:http://www.howtoforge.com/linux_lvm
LVM(逻辑卷管理器)总结:http://www.chinaunix.net/jh/4/258443.html

分类: MySQL 标签: , ,

LVM快照(snapshot)备份

2009年12月7日 admin 没有评论

Logical Volume Manager (LVM)提供了对任意一个Logical Volume(LV)做“快照”(snapshot)的功能,以此来获得一个分区的状态一致性备份。

在某一个状态下做备份的时候,可能有应用正在访问某一个文件或者数据库,这就是使得备份的时候文件处于一个状态,而备份完后,文件却处于另外一个状态,从而造成备份的非一致性,这种状态恢复数据库数据几乎不会成功。

状态的解决办法是将其分区挂载为只读,然后通过数据库的表级别锁定(table-level write locks)甚至停止数据库来备份数据。所有这些方法无意严重影响了服务的可用性。使用LVM snapshot既可以获得一致性备份,又不会影响服务器的可用性。

要提醒一点是,snapshot这种方法仅对LVM有效,对于非LVM文件系统无效。

snapshot的实现有多种方式(参考文章最后的连接),这里说说LVM中snapshot的“写时复制”(copy on write) 的实现方法。

当一个snapshot创建的时候,仅拷贝原始卷里数据的元数据(meta-data)。创建的时候,并不会有数据的物理拷贝,因此snapshot的创建几乎是实时的,当原始卷上有写操作执行时,snapshot跟踪原始卷块的改变,这个时候原始卷上将要改变的数据在改变之前被拷贝到snapshot预留的空间里,因此这个原理的实现叫做写时复制(copy-on-write)。

在写操作写入块之前,CoW将原始数据移动到snapshot空间里,这样就保证了所有的数据在snapshot创建时保持一致。而对于snapshot的读操作,如果是读取数据块是没有修改过的,那么会将读操作直接重定向到原始卷上,如果是要读取已经修改过的块,那么就读取拷贝到snapshot中的块。

这样,通常的文件I/0流程有一个改变,那就是在文件系统和设备驱动之间增加了一个cow层,变成了下面这个样子:
file I/0 —> filesystem — >CoW –> block I /O

下面的图也许可以比较容易了解CoW的原理:

采取CoW实现方式时,snapshot的大小并不需要和原始卷一样大,其大小仅仅只需要考虑两个方面:从shapshot创建到释放这段时间内,估计块的改变量有多大;数据更新的频率。一旦 snapshot的空间记录满了原始卷块变换的信息,那么这个snapshot立刻被释放,从而无法使用,从而导致这个snapshot无效。所以,非常重要的一点,一定要在snapshot的生命周期里,做完你需要做得事情。当然,如果你的snapshot大小和原始卷一样大,甚至还要大,那它的寿命就是“与天齐寿”了。

snapshot其实除了备份以外,还有很多其他用途:

1)虚拟化

在使用 LVM2 时,快照可以不是只读的。这意味着,在创建快照之后, 可以像常规块设备一样挂载和读写快照。

因为流行的虚拟化系统(比如 Xen、VMWare、Qemu 和 KVM)可以将块设备用作 guest 映像,所以可以创建这些映像的完整拷贝,并根据需要使用它们,它们就像是内存占用量很低的虚拟机。这样做的好处是部署迅速(创建快照的时间常常不超过几秒)和节省空间(guest 共享原映像的大多数数据)。

设置的步骤如下:

1. 为原映像创建一个逻辑卷。
2. 使用这个 LV 作为磁盘映像安装 guest 虚拟机。
3. 暂停这个虚拟机。内存映像可以是一个常规文件,所有其他快照都放在里面。
4. 为原 LV 创建一个可读写的快照。
5. 使用快照卷作为磁盘映像生成一个新的虚拟机。如果需要的话,要修改网络/控制台设置。
6. 登录已经创建的虚拟机,修改网络设置/主机名。

完成这些步骤之后, 就可以让用户访问刚创建的虚拟机了。如果需要另一个虚拟机,那么只需重复步骤 4 到 6(所以不需要重新安装虚拟机)。还可以用一个脚本自动执行这些步骤。

在使用完虚拟机之后, 可以停止虚拟机并销毁快照。

2)数据回溯

在一个生产系统上要执行一些操作,需要慎之又慎,即便在模拟环境中做过很多次测试都没有问题,但是并不能保证在生产环境就一定成功,于是这个时候,我们把系统做一个snapshot,这样一旦新操作出现问题,立刻回溯到创建snapshot的时间点,当然你也可以认为这是一个备份的扩展使用。

最后,我们举一些例子,加深对snapshot的理解。

a) 创建一个20M的snapshot,执行一些操作看看CoW的动作。

我们举一个例子来说明如何创建和使用snapshot。我们假定创建一个20M的snapshot,这就意味着在snapshot生命周期里,你仅能有20M的数据量改变。

下面的命令,为/dev/vg/lvdata创建/dev/vg/lvdata-sp

# lvcreate -L20M -s -n lvdata-sp /dev/vg/lvdata
Logical volume “lvdata-sp” created
其中lvdata大小为20MB。

# lvdisplay /dev/vg/lvdata-sp

— Logical volume —

LV Name /dev/vg/lvdata-sp
VG Name vg
LV UUID Yl0fQU-Ve9T-lfmp-xJPq-Uwrd-RVVM-lDDVz0
LV Write Access read/write
LV snapshot status active destination for /dev/vg/lvdata
LV Status available
# open 1
LV Size 200.00 MB
Current LE 50
COW-table size 20.00 MB
COW-table LE 5
Allocated to snapshot 0.27%
Snapshot chunk size 8.00 KB
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:0

上面的 Allocated to snapshot 0.27%是我们关心的,表示目前还有99.73%的空间没有使用。

我们尝试在lvdata创建一个10M的文件,再看看这个参数值。

# mount /dev/vg/lvdata /media/lvdata/
# dd if=/dev/hda of=/media/lvdata/10M bs=1M count=10

10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.272393 seconds, 38.5 MB/s

# lvdisplay /dev/vg/lvdata-sp

— Logical volume —

LV Name /dev/vg/lvdata-sp
VG Name vg
LV UUID Yl0fQU-Ve9T-lfmp-xJPq-Uwrd-RVVM-lDDVz0
LV Write Access read/write
LV snapshot status active destination for /dev/vg/lvdata
LV Status available
# open 0
LV Size 200.00 MB
Current LE 50
COW-table size 20.00 MB
COW-table LE 5
Allocated to snapshot 51.02%
Snapshot chunk size 8.00 KB
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:0

”Allocated to snapshot 51.02%“,符合我们的预期。此时snapshot还剩下大概10M不到的空间了,如果我么再在lvdata上创建一个12M的文件,会发生什么呢?

#dd if=/dev/hda of=/media/lvdata/12M bs=1M count=12

12+0 records in
12+0 records out
12582912 bytes (13 MB) copied, 0.288311 seconds, 43.6 MB/s
device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.

创建文件的过程中,一个报错出现了,snapshot已经无效。我们看看snapshot卷的详细信息。

# lvdisplay /dev/vg/lvdata-sp
/dev/vg/lvdata-sp: read failed after 0 of 4096 at 0: 输入/输出错误

— Logical volume —

LV Name /dev/vg/lvdata-sp
VG Name vg
LV UUID Yl0fQU-Ve9T-lfmp-xJPq-Uwrd-RVVM-lDDVz0
LV Write Access read/write
LV snapshot status INACTIVE destination for /dev/vg/lvdata
LV Status available
# open 0
LV Size 200.00 MB
Current LE 50
COW-table size 20.00 MB
COW-table LE 5
Snapshot chunk size 8.00 KB
Segments 1
Allocation inherit
Read ahead sectors 0

整个snapshot卷已经出现I/0错误了,而且snapshot的状态也是“INACTIVE”。

是否能挂载上来呢?

# mount /dev/vg/lvdata-sp /media/snapshot/
mount: you must specify the filesystem type

#dmesg

Buffer I/O error on device dm-0, logical block 0
Buffer I/O error on device dm-0, logical block 1
Buffer I/O error on device dm-0, logical block 2
Buffer I/O error on device dm-0, logical block 3
Buffer I/O error on device dm-0, logical block 4
Buffer I/O error on device dm-0, logical block 5
Buffer I/O error on device dm-0, logical block 6
Buffer I/O error on device dm-0, logical block 7
Buffer I/O error on device dm-0, logical block 8
Buffer I/O error on device dm-0, logical block 9
hfs: unable to find HFS+ superblock

从dmesg的错误信息来看,超级块的信息也丢失了

尝试激活一下lvdata-sp

# lvchange -ay /dev/vg/lvdata-sp

/dev/vg/lvdata-sp: read failed after 0 of 4096 at 0: 输入/输出错误

恩,这个snapshot已经被释放了,所以剩下要做得事情就是删除它。

# lvremove /dev/vg/lvdata-sp

/dev/vg/lvdata-sp: read failed after 0 of 4096 at 0: 输入/输出错误
Do you really want to remove active logical volume “lvdata-sp”? [y/n]: y
Logical volume “lvdata-sp” successfully removed

b)利用snapshot在线备份MySQL数据库(或者其他数据库)

流程是先做一个flush操作,并锁定表,然后创建snapshot,然后解锁,然后备份数据,最后释放snapshot。这样,MySQL几乎不会中断其运行。

FLUSH TABLES WITH READ LOCK;
! lvcreate –size 100m –snapshot –name snap /dev/VolGroup01/LogVol00
UNLOCK TABLES;

接着做一些备份的工作

mkdir /snap
mount /dev/VolGroup01/snap /snap
# This is where you back up whatever you need from /snap, e.g. rsync(1)
umount /snap
lvremove /dev/VolGroup01/snap
rmdir /snap

参考:

http://andrew.sayya.org/blog/?p=294

分类: Linux基础 标签: , ,

Memcached HA架构探索

2009年11月5日 admin 没有评论

magent是一款开源的Memcached代理服务器软件,可以用它做一些高可用尝试。

一、安装步骤:
1、编译安装libevent:
wget http://monkey.org/~provos/libevent-1.4.9-stable.tar.gz
tar zxvf libevent-1.4.9-stable.tar.gz
cd libevent-1.4.9-stable/
./configure –prefix=/usr
make && make install
cd ../

2、编译安装Memcached:
wget http://danga.com/memcached/dist/memcached-1.2.6.tar.gz
tar zxvf memcached-1.2.6.tar.gz
cd memcached-1.2.6/
./configure –with-libevent=/usr
make && make install
cd ../

3、编译安装magent:
mkdir magent
cd magent/
wget http://memagent.googlecode.com/files/magent-0.5.tar.gz
tar zxvf magent-0.5.tar.gz
/sbin/ldconfig
sed -i “s#LIBS = -levent#LIBS = -levent -lm#g” Makefile
make
cp magent /usr/bin/magent
cd ../

二、高可用网络架构

magent_memcached
服务器A                                                                   服务器B

启动两个memcached进程,端口分别为11211和11212:
memcached -m 1 -u root -d -l 127.0.0.1 -p 11211
memcached -m 1 -u root -d -l 127.0.0.1 -p 11212

启动两个magent进程,端口分别为10000和11000:
magent -u root -n 51200 -l 127.0.0.1 -p 10000 -s 127.0.0.1:11211 -b 127.0.0.1:11212
magent -u root -n 51200 -l 127.0.0.1 -p 11000 -s 127.0.0.1:11212 -b 127.0.0.1:11211
-s 为要写入的memcached, -b 为备份用的memcached。
说明:测试环境用magent和memached的不同端口来实现,在生产环境中可以将magent和memached作为一组放到两台服务器上。

也就是说通过magent能够写入两个memcached。
[root@odb ~]# telnet 127.0.0.1 10000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
set key 0 0 8                       <—在10000端口设置key的值
88888888
STORED
quit
Connection closed by foreign host.
[root@odb ~]# telnet 127.0.0.1 11211
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
get key                     <—在11211端口获取key的值成功
VALUE key 0 8
88888888
END
quit
Connection closed by foreign host.
[root@odb ~]# telnet 127.0.0.1 11212
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
get key                     <—在11212端口获取key的值成功
VALUE key 0 8
88888888
END
quit
Connection closed by foreign host.

高可用性测试:
[root@odb ~]# ps aux |grep -v grep |grep memcached
root     23455  0.0  0.0  5012 1796 ?        Ss   09:22   0:00 memcached -m 1 -u root -d -l 127.0.0.1 -p 11212
root     24950  0.0  0.0  4120 1800 ?        Ss   10:58   0:00 memcached -m 1 -u root -d -l 127.0.0.1 -p 11211
[root@odb ~]# ps aux |grep -v grep |grep ‘magent -u’
root     25919  0.0  0.0  2176  484 ?        Ss   12:00   0:00 magent -u root -n 51200 -l 127.0.0.1 -p 10000 -s 127.0.0.1:11211 -b 127.0.0.1:11212
root     25925  0.0  0.0  3004  484 ?        Ss   12:00   0:00 magent -u root -n 51200 -l 127.0.0.1 -p 11000 -s 127.0.0.1:11212 -b 127.0.0.1:11211
[root@odb ~]# telnet 127.0.0.1 10000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
set stone 0 0 6                      <—在10000端口设置stone的值
123456
STORED
quit
Connection closed by foreign host.
[root@odb ~]# telnet 127.0.0.1 11000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
set shidl 0 0 6                      <—在11000端口设置shidl的值
666666
STORED
get stone                      <—在11000端口获取stone的值成功
VALUE stone 0 6
123456
END
incr stone 2                      <—在11000端口修改stone的值成功
123458
get stone
VALUE stone 0 6                     <—在11000端口验证stone的值,证明上面的修改成功
123458
END
get shidl                      <—在11000端口获取shidl的值成功
VALUE shidl 0 6
666666
END
quit                     <—退出11000端口
Connection closed by foreign host.
[root@odb ~]# telnet 127.0.0.1 10000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
get stone                      <—在10000端口获取stone的值,已被修改
VALUE stone 0 6
123458
END
get shidl                      <—在10000端口获取shidl的值成功
VALUE shidl 0 6
666666
END
delete shidl                      <—在10000端口删除shidl
DELETED
get shidl                      <—在10000端口删除shidl生效
END
quit
Connection closed by foreign host.
[root@odb ~]# telnet 127.0.0.1 11000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
get shidl                      <—在11000端口验证删除shidl生效
END
get stone                      <—在11000端口获取stone的值成功
VALUE stone 0 6
123458
END
quit
Connection closed by foreign host.

Down机模拟测试:

Down掉11211端口的memcached:

[root@odb ~]# kill -9 24950
[root@odb ~]# telnet 127.0.0.1 10000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
get stone                      <—在10000依然可以获取stone的值
VALUE stone 0 6
123458
END
quit
Connection closed by foreign host.
[root@odb ~]# telnet 127.0.0.1 11000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
get stone                      <—在11000依然可以获取stone的值
VALUE stone 0 6
123458
END
quit
Connection closed by foreign host.

Down掉11000端口的magent:

[root@odb ~]# kill -9 25925
[root@odb ~]# telnet 127.0.0.1 10000
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
get stone                      <—在10000依然可以获取stone的值
VALUE stone 0 6
123458
END
quit
Connection closed by foreign host.