aboutsummaryrefslogtreecommitdiff
path: root/include/net/route.h
AgeCommit message (Collapse)AuthorFilesLines
2011-05-18ipv4: Pass explicit destination address to rt_bind_peer().Gravatar David S. Miller 1-2/+2
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Pass explicit destination address to rt_get_peer().Gravatar David S. Miller 1-1/+1
This will next trickle down to rt_bind_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Remove route key identity dependencies in ip_rt_get_source().Gravatar David S. Miller 1-1/+1
Pass in the sk_buff so that we can fetch the necessary keys from the packet header when working with input routes. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04ipv4: Kill rt->rt_{src, dst} usage in IP GRE tunnels.Gravatar David S. Miller 1-10/+9
First, make callers pass on-stack flowi4 to ip_route_output_gre() so they can get at the fully resolved flow key. Next, use that in ipgre_tunnel_xmit() to avoid the need to use rt->rt_{dst,src}. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03ipv4: Make caller provide on-stack flow key to ip_route_output_ports().Gravatar David S. Miller 1-6/+5
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03ipv4: Renamt struct rtable's rt_tos to rt_key_tos.Gravatar David S. Miller 1-1/+1
To more accurately reflect that it is purely a routing cache lookup key and is used in no other context. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28ipv4: Remove now superfluous code in ip_route_connect().Gravatar David S. Miller 1-2/+0
Now that output route lookups update the flow with source address et al. selections, the fl4->{saddr,daddr} assignments here are no longer necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28ipv4: Use caller's on-stack flowi as-is in output route lookups.Gravatar David S. Miller 1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-27ipv4: Kill RTO_CONN.Gravatar David S. Miller 1-4/+0
It's not used by anything in the kernel, and defined in net/route.h so never exported to userspace. Therefore we can safely remove it. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-27ipv4: Sanitize and simplify ip_route_{connect,newports}()Gravatar David S. Miller 1-32/+56
These functions are used together as a unit for route resolution during connect(). They address the chicken-and-egg problem that exists when ports need to be allocated during connect() processing, yet such port allocations require addressing information from the routing code. It's currently more heavy handed than it needs to be, and in particular we allocate and initialize a flow object twice. Let the callers provide the on-stack flow object. That way we only need to initialize it once in the ip_route_connect() call. Later, if ip_route_newports() needs to do anything, it re-uses that flow object as-is except for the ports which it updates before the route re-lookup. Also, describe why this set of facilities are needed and how it works in a big comment. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-24net: Remove __KERNEL__ cpp checks from include/netGravatar David S. Miller 1-4/+0
These header files are never installed to user consumption, so any __KERNEL__ cpp checks are superfluous. Projects should also not copy these files into their userland utility sources and try to use them there. If they insist on doing so, the onus is on them to sanitize the headers as needed. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22inet: constify ip headers and in6_addrGravatar Eric Dumazet 1-1/+2
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-07Merge branch 'master' of ↵Gravatar David S. Miller 1-2/+3
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/benet/be_main.c
2011-04-07ipv4: Fix "Set rt->rt_iif more sanely on output routes."Gravatar OGAWA Hirofumi 1-2/+3
Commit 1018b5c01636c7c6bda31a719bda34fc631db29a ("Set rt->rt_iif more sanely on output routes.") breaks rt_is_{output,input}_route. This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0". To fix it, this does: 1) Add "int rt_route_iif;" to struct rtable 2) For input routes, always set rt_route_iif to same value as rt_iif 3) For output routes, always set rt_route_iif to zero. Set rt_iif as it is done currently. 4) Change rt_is_{output,input}_route() to test rt_route_iif Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31ipv4: Use flowi4_init_output() in net/route.hGravatar David S. Miller 1-36/+24
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-25route: Take the right src and dst addresses in ip_route_newportsGravatar Steffen Klassert 1-2/+2
When we set up the flow informations in ip_route_newports(), we take the address informations from the the rt_key_src and rt_key_dst fields of the rtable. They appear to be empty. So take the address informations from rt_src and rt_dst instead. This issue was introduced by commit 5e2b61f78411be25f0b84f97d5b5d312f184dfd1 ("ipv4: Remove flowi from struct rtable.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipv4: fix route deletion for IPs on many subnetsGravatar Julian Anastasov 1-0/+1
Alex Sidorenko reported for problems with local routes left after IP addresses are deleted. It happens when same IPs are used in more than one subnet for the device. Fix fib_del_ifaddr to restrict the checks for duplicate local and broadcast addresses only to the IFAs that use our primary IFA or another primary IFA with same address. And we expect the prefsrc to be matched when the routes are deleted because it is possible they to differ only by prefsrc. This patch prevents local and broadcast routes to be leaked until their primary IP is deleted finally from the box. As the secondary address promotion needs to delete the routes for all secondaries that used the old primary IFA, add option to ignore these secondaries from the checks and to assume they are already deleted, so that we can safely delete the route while these IFAs are still on the device list. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Put fl4_* macros to struct flowi4 and use them again.Gravatar David S. Miller 1-7/+7
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv4: Use flowi4 in public route lookup interfaces.Gravatar David S. Miller 1-59/+59
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Make flowi ports AF dependent.Gravatar David S. Miller 1-20/+23
Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Put flowi_* prefix on AF independent members of struct flowiGravatar David S. Miller 1-18/+18
I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv4: Create and use route lookup helpers.Gravatar David S. Miller 1-0/+48
The idea here is this minimizes the number of places one has to edit in order to make changes to how flows are defined and used. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-04ipv4: Remove flowi from struct rtable.Gravatar David S. Miller 1-9/+13
The only necessary parts are the src/dst addresses, the interface indexes, the TOS, and the mark. The rest is unnecessary bloat, which amounts to nearly 50 bytes on 64-bit. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-04ipv4: Use passed-in protocol in ip_route_newports().Gravatar David S. Miller 1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02ipv4: ip_route_output_key() is better as an inline.Gravatar David S. Miller 1-1/+5
This avoid a stack frame at zero cost. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02ipv4: Make output route lookup return rtable directly.Gravatar David S. Miller 1-29/+29
Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01xfrm: Handle blackhole route creation via afinfo.Gravatar David S. Miller 1-0/+1
That way we don't have to potentially do this in every xfrm_lookup() caller. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Kill can_sleep arg to ip_route_output_flow()Gravatar David S. Miller 1-3/+3
This boolean state is now available in the flow flags. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01net: Add FLOWI_FLAG_CAN_SLEEP.Gravatar David S. Miller 1-0/+2
And set is in contexts where the route resolution can sleep. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"Gravatar David S. Miller 1-3/+3
Since that is what the current vague "flags" argument means. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Can final ip_route_connect() arg to boolean "can_sleep".Gravatar David S. Miller 1-2/+2
Since that's what the current vague "flags" thing means. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-24ipv4: Rearrange how ip_route_newports() gets port keys.Gravatar David S. Miller 1-8/+11
ip_route_newports() is the only place in the entire kernel that cares about the port members in the routing cache entry's lookup flow key. Therefore the only reason we store an entire flow inside of the struct rtentry is for this one special case. Rewrite ip_route_newports() such that: 1) The caller passes in the original port values, so we don't need to use the rth->fl.fl_ip_{s,d}port values to remember them. 2) The lookup flow is constructed by hand instead of being copied from the routing cache entry's flow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10inet: Create a mechanism for upward inetpeer propagation into routes.Gravatar David S. Miller 1-0/+1
If we didn't have a routing cache, we would not be able to properly propagate certain kinds of dynamic path attributes, for example PMTU information and redirects. The reason is that if we didn't have a routing cache, then there would be no way to lookup all of the active cached routes hanging off of sockets, tunnels, IPSEC bundles, etc. Consider the case where we created a cached route, but no inetpeer entry existed and also we were not asked to pre-COW the route metrics and therefore did not force the creation a new inetpeer entry. If we later get a PMTU message, or a redirect, and store this information in a new inetpeer entry, there is no way to teach that cached route about the newly existing inetpeer entry. The facilities implemented here handle this problem. First we create a generation ID. When we create a cached route of any kind, we remember the generation ID at the time of attachment. Any time we force-create an inetpeer entry in response to new path information, we bump that generation ID. The dst_ops->check() callback is where the knowledge of this event is propagated. If the global generation ID does not equal the one stored in the cached route, and the cached route has not attached to an inetpeer yet, we look it up and attach if one is found. Now that we've updated the cached route's information, we update the route's generation ID too. This clears the way for implementing PMTU and redirects directly in the inetpeer cache. There is absolutely no need to consult cached route information in order to maintain this information. At this point nothing bumps the inetpeer genids, that comes in the later changes which handle PMTUs and redirects using inetpeers. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27net: Pre-COW metrics for TCP.Gravatar David S. Miller 1-0/+4
TCP is going to record metrics for the connection, so pre-COW the route metrics at route cache entry creation time. This avoids several atomic operations that have to occur if we COW the metrics after the entry reaches global visibility. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26net: Implement read-only protection and COW'ing of metrics.Gravatar David S. Miller 1-0/+2
Routing metrics are now copy-on-write. Initially a route entry points it's metrics at a read-only location. If a routing table entry exists, it will point there. Else it will point at the all zero metric place-holder called 'dst_default_metrics'. The writeability state of the metrics is stored in the low bits of the metrics pointer, we have two bits left to spare if we want to store more states. For the initial implementation, COW is implemented simply via kmalloc. However future enhancements will change this to place the writable metrics somewhere else, in order to increase sharing. Very likely this "somewhere else" will be the inetpeer cache. Note also that this means that metrics updates may transiently fail if we cannot COW the metrics successfully. But even by itself, this patch should decrease memory usage and increase cache locality especially for routing workloads. In those cases the read-only metric copies stay in place and never get written to. TCP workloads where metrics get updated, and those rare cases where PMTU triggers occur, will take a very slight performance hit. But that hit will be alleviated when the long-term writable metrics move to a more sharable location. Since the metrics storage went from a u32 array of RTAX_MAX entries to what is essentially a pointer, some retooling of the dst_entry layout was necessary. Most importantly, we need to preserve the alignment of the reference count so that it doesn't share cache lines with the read-mostly state, as per Eric Dumazet's alignment assertion checks. The only non-trivial bit here is the move of the 'flags' member into the writeable cacheline. This is OK since we are always accessing the flags around the same moment when we made a modification to the reference count. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20ipv4: Flush per-ns routing cache more sanely.Gravatar David S. Miller 1-1/+1
Flush the routing cache only of entries that match the network namespace in which the purge event occurred. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-12-12ipv4: Don't pre-seed hoplimit metric.Gravatar David S. Miller 1-0/+11
Always go through a new ip4_dst_hoplimit() helper, just like ipv6. This allowed several simplifications: 1) The interim dst_metric_hoplimit() can go as it's no longer userd. 2) The sysctl_ip_default_ttl entry no longer needs to use ipv4_doint_and_flush, since the sysctl is not cached in routing cache metrics any longer. 3) ipv4_doint_and_flush no longer needs to be exported and therefore can be marked static. When ipv4_doint_and_flush_strategy was removed some time ago, the external declaration in ip.h was mistakenly left around so kill that off too. We have to move the sysctl_ip_default_ttl declaration into ipv4's route cache definition header net/route.h, because currently net/ip.h (where the declaration lives now) has a back dependency on net/route.h Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17net: use the macros defined for the members of flowiGravatar Changli Gao 1-7/+5
Use the macros defined for the members of flowi to clean the code up. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11ipv4: Make rt->fl.iif tests lest obscure.Gravatar David S. Miller 1-0/+10
When we test rt->fl.iif against zero, we're seeing if it's an output or an input route. Make that explicit with some helper functions. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11net: get rid of rtable->idevGravatar Eric Dumazet 1-2/+0
It seems idev field in struct rtable has no special purpose, but adding extra atomic ops. We hold refcounts on the device itself (using percpu data, so pretty cheap in current kernel). infiniband case is solved using dst.dev instead of idev->dev Removal of this field means routing without route cache is now using shared data, percpu data, and only potential contention is a pair of atomic ops on struct neighbour per forwarded packet. About 5% speedup on routing test. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27tproxy: check for transparent flag in ip_route_newportsGravatar Ulrich Weber 1-0/+2
as done in ip_route_connect() Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10net-next: remove useless union keywordGravatar Changli Gao 1-4/+2
remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: implements ip_route_input_noref()Gravatar Eric Dumazet 1-1/+16
ip_route_input() is the version returning a refcounted dst, while ip_route_input_noref() returns a non refcounted one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16percpu: add __percpu sparse annotations to netGravatar Tejun Heo 1-1/+1
Add __percpu sparse annotations to net. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. The macro and type tricks around snmp stats make things a bit interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All snmp_mib_*() users which used to cast the argument to (void **) are updated to cast it to (void __percpu **). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCHGravatar Eric W. Biederman 1-0/+1
The motivation for an additional notifier in batched netdevice notification (rt_do_flush) only needs to be called once per batch not once per namespace. For further batching improvements I need a guarantee that the netdevices are unregistered in order allowing me to unregister an all of the network devices in a network namespace at the same time with the guarantee that the loopback device is really and truly unregistered last. Additionally it appears that we moved the route cache flush after the final synchronize_net, which seems wrong and there was no explanation. So I have restored the original location of the final synchronize_net. Cc: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04net: cleanup include/netGravatar Eric Dumazet 1-8/+4
This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03net: skb->rtable accessorGravatar Eric Dumazet 1-1/+1
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01ipv4: Conditionally enable transparent flow flag when connectingGravatar KOVACS Krisztian 1-1/+5
Set FLOWI_FLAG_ANYSRC in flowi->flags if the socket has the transparent socket option set. This way we selectively enable certain connections with non-local source addresses to be routed. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01ipv4: Make inet_sock.h independent of route.hGravatar KOVACS Krisztian 1-0/+5
inet_iif() in inet_sock.h requires route.h. Since users of inet_iif() usually require other route.h functionality anyway this patch moves inet_iif() to route.h. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-27missing bits of net-namespace / sysctlGravatar Al Viro 1-2/+0
Piss-poor sysctl registration API strikes again, film at 11... What we really need is _pathname_ required to be present in already registered table, so that kernel could warn about bad order. That's the next target for sysctl stuff (and generally saner and more explicit order of initialization of ipv[46] internals wouldn't hurt either). For the time being, here are full fixups required by ..._rotable() stuff; we make per-net sysctl sets descendents of "ro" one and make sure that sufficient skeleton is there before we start registering per-net sysctls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>