Passive Virtual System on Check Point VSX ARPs using physical intf. IP address instead og Cluster IP

I came across some important information. Although I did not find any useful information (at first), so hopefully this post will help speed up someone elses troubleshooting.

Problem statement: Passive VS on VSX ARPs for default GW using physical interface IP instead of cluster IP and no traffic flows from passive Virtual System. If ARP is manually configured, everything works fine.
Consequence: The passive VS will report an error as it fails to connect to for AV and AB-updates.
Setup: 2x Open Servers, running R80.10 VSX with VSLS.
Software level: R80.10 – Build 083 / HFA Take 121

Outbound interface
This is the interface the VS uses to communicate with everything external.

bond0.1011 Link encap:Ethernet HWaddr 48:DF:37:38:19:B9
inet addr: Bcast: Mask:

As we see from the tcpdump on bond0.1011, the VS ARPs using the physical address (intended for internal communication VSX hosts) and the default GW never responds to such an ARP. Therefor the ARP table is never populated with IP/MAC of its default gateway.

10:54:27.155698 04:eb:40:2f:f7:92 > 01:00:0c:cc:cc:cd SNAP Unnumbered, ui, Flags [Command], length 50
10:54:27.252529 arp who-has tell
10:54:28.252612 arp who-has tell
10:54:29.157385 04:eb:40:2f:f7:92 > 01:00:0c:cc:cc:cd SNAP Un

Cluster_hide_fold is enabled, as shown in fwd.elg after policy installation (fw ctl debug -m fw + filter)
:perform_cluster_hide_fold (true)
:perform_cluster_hide_fold (true)

If configuring static ARP like below, everything works as intended.
node2-fw-01:1> add arp static ipv4-address macaddress 78:72:5d:1c:17:01

TCPdump after static ARP
10:57:40.208071 IP > . ack 435797 win 137 <nop,nop,timestamp 434154076 3269636925>
10:57:40.225471 IP > P 23594:24424(830) ack 435797 win 137 <nop,nop,timestamp 434154093 3269636925>
10:57:40.268951 IP > . ack 436048 win 137 <nop,nop,timestamp 434154136 3269637025>
10:57:40.276747 IP > 43708+ A? (37)
10:57:40.326280 IP > 14476+ PTR? (45)

The Solution

The solution was to be found in the following SKs
sk123712 – Traffic originated from Standby VS fails, except for traffic from the lowest VLAN interface.
This is due to a design change in how Check Point handles ARP in clustered environments, as described in sk111956 – ARP Forwarding in Check Point ClusterXL.

By setting the following parameter, all ARP-requests are forwarded over the sync interface, rather than the involved interface.

[Expert@node1-fw-01:0]# fw ctl get int fwha_arp_fwd_ccp_via_sync_if
fwha_arp_fwd_ccp_via_sync_if = 0
[Expert@node1-fw-01:0]# fw ctl set int fwha_arp_fwd_ccp_via_sync_if 1

[Expert@node2-fw-01:0]# fw ctl set int fwha_arp_fwd_ccp_via_sync_if 1

To make the changes permanently, they need to be added to $FWDIR/boot/modules/fwkern.conf.

TCP dump as the parameter is being applied to both VSX-hosts

[Expert@node2-fw-01:0]# tcpdump -i bond0.1011 -nvvv 'arp'

13:20:26.171552 arp who-has tell
13:20:27.831631 arp who-has tell
13:20:28.831676 arp who-has tell
13:20:29.831700 arp who-has tell
13:20:31.173769 arp who-has tell
13:20:32.174376 arp who-has tell (Parameter changed on one host)
13:20:33.174400 arp who-has tell (Parameter changed on both hosts)
13:20:33.175001 arp reply is-at 78:72:5d:1c:17:01
13:20:44.161011 arp who-has tell
13:21:31.309374 arp reply is-at 78:72:5d:1c:17:01
13:21:44.467631 arp who-has tell
13:22:22.258694 arp reply is-at 78:72:5d:1c:17:01
13:22:44.764361 arp who-has tell
13:23:40.118178 arp reply is-at 78:72:5d:1c:17:01
13:23:45.052863 arp who-has tell

Hope this helps you with your troubleshooting!


0.00 avg. rating (0% score) - 0 votes

Leave a Reply

Your email address will not be published. Required fields are marked *