aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)AuthorFilesLines
2007-10-15[IPV4]: Uninline netfilter okfnsGravatar Patrick McHardy 3-4/+4
Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc can generate tail calls for some of the netfilter hook okfn invocations, so there is no need to inline the functions anymore. This caused huge code bloat since we ended up with one inlined version and one out-of-line version since we pass the address to nf_hook_slow. Before: text data bss dec hex filename 8997385 1016524 524652 10538561 a0ce41 vmlinux After: text data bss dec hex filename 8994009 1016524 524652 10535185 a0c111 vmlinux ------------------------------------------------------- -3376 All cases have been verified to generate tail-calls with and without netfilter. The okfns in ipmr and xfrm4_input still remain inline because gcc can't generate tail-calls for them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15[NETFILTER]: Replace sk_buff ** with sk_buff *Gravatar Herbert Xu 44-484/+463
With all the users of the double pointers removed, this patch mops up by finally replacing all occurances of sk_buff ** in the netfilter API by sk_buff *. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15[NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroomGravatar Herbert Xu 4-56/+21
This patch replaces unnecessary uses of skb_copy, pskb_copy and skb_realloc_headroom by functions such as skb_make_writable and pskb_expand_head. This allows us to remove the double pointers later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15[IPVS]: Replace local version of skb_make_writableGravatar Herbert Xu 6-50/+16
This patch removes the IPVS-specific version of skb_make_writable and replaces it with the netfilter one. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15[NETFILTER]: Do not copy skb in skb_make_writableGravatar Herbert Xu 11-16/+16
Now that all callers of netfilter can guarantee that the skb is not shared, we no longer have to copy the skb in skb_make_writable. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15[IPV4]: Change ip_defrag to return an integerGravatar Herbert Xu 4-33/+25
Now that ip_frag always returns the packet given to it on input, we can change it to return an integer indicating error instead. This patch does that and updates all its callers accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15[IPV4]: Make ip_defrag return the same packetGravatar Herbert Xu 1-21/+55
This patch is a bit of a hack. However it is worth it if you consider that this is the only reason why we have to carry around the struct sk_buff ** pointers in netfilter. It makes ip_defrag always return the packet that was given to it on input. It does this by cloning the packet and replacing its original contents with the head fragment if necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-14fix endianness bug in inet_lroGravatar Al Viro 1-1/+1
all uses of and almost all assignments to lro_desc->tcp_ack assume that it's net-endian; one converts net-endian to host-endian and sticks it in lro_desc->tcp_ack. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14inet_lro: trivial endianness annotationsGravatar Al Viro 1-7/+7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-11[TCP]: Limit processing lost_retrans loop to work-to-do casesGravatar Ilpo Järvinen 2-3/+13
This addition of lost_retrans_low to tcp_sock might be unnecessary, it's not clear how often lost_retrans worker is executed when there wasn't work to do. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11[TCP]: Fix lost_retrans loop vs fastpath problemsGravatar Ilpo Järvinen 1-15/+22
Detection implemented with lost_retrans must work also when fastpath is taken, yet most of the queue is skipped including (very likely) those retransmitted skb's we're interested in. This problem appeared when the hints got added, which removed a need to always walk over the whole write queue head. Therefore decicion for the lost_retrans worker loop entry must be separated from the sacktag processing more than it was necessary before. It turns out to be problematic to optimize the worker loop very heavily because ack_seqs of skb may have a number of discontinuity points. Maybe similar approach as currently is implemented could be attempted but that's becoming more and more complex because the trend is towards less skb walking in sacktag marker. Trying a simple work until all rexmitted skbs heve been processed approach. Maybe after(highest_sack_end_seq, tp->high_seq) checking is not sufficiently accurate and causes entry too often in no-work-to-do cases. Since that's not known, I've separated solution to that from this patch. Noticed because of report against a related problem from TAKANO Ryousei <takano@axe-inc.co.jp>. He also provided a patch to that part of the problem. This patch includes solution to it (though this patch has to use somewhat different placement). TAKANO's description and patch is available here: http://marc.info/?l=linux-netdev&m=119149311913288&w=2 ...In short, TAKANO's problem is that end_seq the loop is using not necessarily the largest SACK block's end_seq because the current ACK may still have higher SACK blocks which are later by the loop. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11[TCP]: No need to re-count fackets_out/sacked_out at RTOGravatar Ilpo Järvinen 1-10/+16
Both sacked_out and fackets_out are directly known from how parameter. Since fackets_out is accurate, there's no need for recounting (sacked_out was previously unnecessarily counted in the loop anyway). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11[TCP]: Extract tcp_match_queue_to_sack from sacktag codeGravatar Ilpo Järvinen 1-19/+35
This is necessary for upcoming DSACK bugfix. Reduces sacktag length which is not very sad thing at all... :-) Notice that there's a need to handle out-of-mem at caller's place. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11[TCP]: Kill almost unused variable pcount from sacktagGravatar Ilpo Järvinen 1-6/+3
It's on the way for future cutting of that function. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark LGravatar Ilpo Järvinen 1-1/+1
This condition (plain R) can arise at least in recovery that is triggered after tcp_undo_loss. There isn't any reason why they should not be marked as lost, not marking makes in_flight estimator to return too large values. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11[TCP]: Add bytes_acked (ABC) clearing to FRTO tooGravatar Ilpo Järvinen 1-0/+2
I was reading tcp_enter_loss while looking for Cedric's bug and noticed bytes_acked adjustment is missing from FRTO side. Since bytes_acked will only be used in tcp_cong_avoid, I think it's safe to assume RTO would be spurious. During FRTO cwnd will be not controlled by tcp_cong_avoid and if FRTO calls for conventional recovery, cwnd is adjusted and the result of wrong assumption is cleared from bytes_acked. If RTO was in fact spurious, we did normal ABC already and can continue without any additional adjustments. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETLINK]: fib_frontend build fixesGravatar David S. Miller 1-11/+5
1) fibnl needs to be declared outside of config ifdefs, and also should not be explicitly initialized to NULL 2) nl_fib_input() args are wrong for netlink_kernel_create() input method Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: make netlink user -> kernel interface synchroniousGravatar Denis V. Lunev 3-24/+14
This patch make processing netlink user -> kernel messages synchronious. This change was inspired by the talk with Alexey Kuznetsov about current netlink messages processing. He says that he was badly wrong when introduced asynchronious user -> kernel communication. The call netlink_unicast is the only path to send message to the kernel netlink socket. But, unfortunately, it is also used to send data to the user. Before this change the user message has been attached to the socket queue and sk->sk_data_ready was called. The process has been blocked until all pending messages were processed. The bad thing is that this processing may occur in the arbitrary process context. This patch changes nlk->data_ready callback to get 1 skb and force packet processing right in the netlink_unicast. Kernel -> user path in netlink_unicast remains untouched. EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock drop, but the process remains in the cycle until the message will be fully processed. So, there is no need to use this kludges now. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[INET]: local port range robustnessGravatar Stephen Hemminger 5-19/+98
Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Move IP protocol setting from transforms into xfrm4_input.cGravatar Herbert Xu 5-9/+15
This patch makes the IPv4 x->type->input functions return the next protocol instead of setting it directly. This is identical to how we do things in IPv6 and will help us merge common code on the input path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Move IP length/checksum setting out of transformsGravatar Herbert Xu 7-37/+12
This patch moves the setting of the IP length and checksum fields out of the transforms and into the xfrmX_output functions. This would help future efforts in merging the transforms themselves. It also adds an optimisation to ipcomp due to the fact that the transport offset is guaranteed to be zero. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdrGravatar Herbert Xu 3-15/+15
This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since they're identical to the IPv4 versions. Duplicating them would only create problems for ourselves later when we need to add things like extended sequence numbers. I've also added transport header type conversion headers for these types which are now used by the transforms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Use IPv6 calling convention as the convention for x->mode->outputGravatar Herbert Xu 6-32/+26
The IPv6 calling convention for x->mode->output is more general and could help an eventual protocol-generic x->type->output implementation. This patch adopts it for IPv4 as well and modifies the IPv4 type output functions accordingly. It also rewrites the IPv6 mac/transport header calculation to be based off the network header where practical. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Set skb->data to payload in x->mode->outputGravatar Herbert Xu 7-10/+11
This patch changes the calling convention so that on entry from x->mode->output and before entry into x->type->output skb->data will point to the payload instead of the IP header. This is essentially a redistribution of skb_push/skb_pull calls with the aim of minimising them on the common path of tunnel + ESP. It'll also let us use the same calling convention between IPv4 and IPv6 with the next patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC] esp: Remove NAT-T checksum invalidation for BEETGravatar Herbert Xu 1-2/+1
I pointed this out back when this patch was first proposed but it looks like it got lost along the way. The checksum only needs to be ignored for NAT-T in transport mode where we lose the original inner addresses due to NAT. With BEET the inner addresses will be intact so the checksum remains valid. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Separate lost_retrans loop into own functionGravatar Ilpo Järvinen 1-37/+43
Follows own function for each task principle, this is really somewhat separate task being done in sacktag. Also reduces indentation. In addition, added ack_seq local var to break some long lines & fixed coding style things. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETFILTER]: Make netfilter code use the seq_open_privateGravatar Pavel Emelyanov 2-45/+6
Just switch to the consolidated calls. ipt_recent() has to initialize the private, so use the __seq_open_private() helper. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make core networking code use seq_open_privateGravatar Pavel Emelyanov 8-200/+22
This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Move state lock into x->type->outputGravatar Herbert Xu 2-3/+14
This patch releases the lock on the state before calling x->type->output. It also adds the lock to the spots where they're currently needed. Most of those places (all except mip6) are expected to disappear with async crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Store IPv6 nh pointer in mac_header on outputGravatar Herbert Xu 1-1/+1
Current the x->mode->output functions store the IPv6 nh pointer in the skb network header. This is inconvenient because the network header then has to be fixed up before the packet can leave the IPsec stack. The mac header field is unused on output so we can use that to store this instead. This patch does that and removes the network header fix-up in xfrm_output. It also uses ipv6_hdr where appropriate in the x->type->output functions. There is also a minor clean-up in esp4 to make it use the same code as esp6 to help any subsequent effort to merge the two. Lastly it kills two redundant skb_set_* statements in BEET that were simply copied over from transport mode. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Move output replay code into xfrm_outputGravatar Herbert Xu 2-4/+4
The replay counter is one of only two remaining things in the output code that requires a lock on the xfrm state (the other being the crypto). This patch moves it into the generic xfrm_output so we can remove the lock from the transforms themselves. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Move common output code to xfrm_outputGravatar Herbert Xu 1-36/+4
Most of the code in xfrm4_output_one and xfrm6_output_one are identical so this patch moves them into a common xfrm_output function which will live in net/xfrm. In fact this would seem to fix a bug as on IPv4 we never reset the network header after a transform which may upset netfilter later on. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC] ah: Remove keys from ah_data structureGravatar Herbert Xu 1-7/+2
The keys are only used during initialisation so we don't need to carry them in esp_data. Since we don't have to allocate them again, there is no need to place a limit on the authentication key length anymore. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC] esp: Remove keys from esp_data structureGravatar Herbert Xu 1-11/+5
The keys are only used during initialisation so we don't need to carry them in esp_data. Since we don't have to allocate them again, there is no need to place a limit on the authentication key length anymore. This patch also kills the unused auth.icv member. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: sparse warning fixesGravatar Stephen Hemminger 6-11/+10
Fix a bunch of sparse warnings. Mostly about 0 used as NULL pointer, and shadowed variable declarations. One notable case was that hash size should have been unsigned. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: "Annotate" another fackets_out state resetGravatar Ilpo Järvinen 1-1/+2
This should no longer be necessary because fackets_out is accurate. It indicates bugs elsewhere, thus report it. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Fix two off-by-one errors in fackets_out adjusting logicGravatar Ilpo Järvinen 1-2/+4
1) Passing wrong skb to tcp_adjust_fackets_out could corrupt fastpath_cnt_hint as tcp_skb_pcount(next_skb) is not included to it if hint points exactly to the next_skb (it's lagging behind, see sacktag). 2) When fastpath_skb_hint is put backwards to avoid dangling skb reference, the skb's pcount must also be removed from count (not included like above). Reported by Cedric Le Goater <legoater@free.fr> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Wrap-safed reordering detection FRTO checkGravatar Ilpo Järvinen 1-0/+3
In case somebody has a suggestion about a better place for this check, which must guarantee execution "early enough" (i.e, before the wrap can occur), I'm very open to them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Update comment of SACK block validatorGravatar Ilpo Järvinen 1-2/+9
Just came across what RFC2018 states about generation of valid SACK blocks in case of reneging. Alter comment a bit to point out clearly. IMHO, there isn't any reason to change code because the validation is there for a purpose (counters will inform user about decision TCP made if this case ever surfaces). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: fix comments that got messed up during code moveGravatar Ilpo Järvinen 1-2/+6
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: No fackets_out/highest_sack tuning when SACK isn't enabledGravatar Ilpo Järvinen 1-3/+4
This was found due to bug report from Cedric Le Goater though it turned this turned out to be unrelated bug. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETFILTER]: ctnetlink: use netlink policyGravatar Patrick McHardy 2-13/+9
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETFILTER]: nfnetlink: rename functions containing 'nfattr'Gravatar Patrick McHardy 7-22/+22
There is no struct nfattr anymore, rename functions to 'nlattr'. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETFILTER]: nfnetlink: convert to generic netlink attribute functionsGravatar Patrick McHardy 3-33/+33
Get rid of the duplicated rtnetlink macros and use the generic netlink attribute functions. The old duplicated stuff is moved to a new header file that exists just for userspace. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Move hardware header operations out of netdevice.Gravatar Stephen Hemminger 3-7/+14
Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Wrap hard_header_parseGravatar Stephen Hemminger 1-4/+2
Wrap the hard_header_parse function to simplify next step of header_ops conversion. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Wrap netdevice hardware header creation.Gravatar Stephen Hemminger 2-4/+3
Add inline for common usage of hardware header creation, and fix bug in IPV6 mcast where the assumption about negative return is an errno. Negative return from hard_header means not enough space was available,(ie -N bytes). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make the loopback device per network namespace.Gravatar Eric W. Biederman 2-10/+10
This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_devGravatar Eric W. Biederman 3-6/+10
Now that multiple loopback devices are becoming possible it makes the code a little cleaner and more maintainable to test if a deivice is th a loopback device by testing dev->flags & IFF_LOOPBACK instead of dev == loopback_dev. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4]: Remove unnecessary test for the loopback device from inetdev_destroyGravatar Eric W. Biederman 1-2/+0
Currently we never call unregister_netdev for the loopback device so it is impossible for us to reach inetdev_destroy with the loopback device. So the test in inetdev_destroy is unnecessary. Further when testing with my network namespace patches removing unregistering the loopback device and calling inetdev_destroy works fine so there appears to be no reason for avoiding unregistering the loopback device. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>